Citation
The Effects of statistical information on clinical judgement

Material Information

Title:
The Effects of statistical information on clinical judgement
Creator:
Moxley, Ann Weimer, 1946-
Publication Date:
Copyright Date:
1970
Language:
English
Physical Description:
vii, 63 leaves. : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Clinical judgment ( jstor )
Clinical psychology ( jstor )
Conditional probabilities ( jstor )
Discriminants ( jstor )
Graduate students ( jstor )
Length of stay ( jstor )
Propriety ( jstor )
Psychotherapy ( jstor )
Rate bases ( jstor )
Z score ( jstor )
Clinical psychology ( lcsh )
Dissertations, Academic -- Psychology -- UF ( lcsh )
Psychology thesis Ph. D ( lcsh )
Statistical decision ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis--University of Florida, 1970.
Bibliography:
Bibliography: leaf 37.
Additional Physical Form:
Also available on World Wide Web
General Note:
Manuscript copy.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
022060155 ( AlephBibNum )
13493553 ( OCLC )
ACY4899 ( NOTIS )

Downloads

This item has the following downloads:


Full Text









THE EFFECTS OF STATISTICAL INFORMATION

ON CLINICAL JUDGEMENT











By
ANN WEIMER MOXLEY













A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA
1970














ACKNOWLEDGEMENTS


The author wishes to express her gratitude to her committee

chairman, Dr. Paul Satz, for his comments, patience, and inspiration

throughout the initiation, implementation and completion of this re-

search. The author is also indebted to the other members of her

committee, Dr. Audrey Schumacher, Or. Ben Barger, Dr. Marvin Shaw, and

Dr. Donald Childers, for their assistance and criticisms. The author

wishes to express her special thanks to her husband, Jim, whose moral

support was invaluable and most appreciated.

















TABLE OF CONTENTS


ACKNOWLEDGEMENTS

LIST OF TABLES

LIST OF FIGURES

ABSTRACT .


INTRODUCTION . .

METHOD . . .

RESULTS ...

DISCUSSION .

SUMMARY ...


APPENDIX A INSTRUCTIONS . . .

APPENDIX B SUMMARY OF NEWMAN-KEULS TESTS


REFERENCES . . . . . . . . . . .

BIOGRAPHICAL SKETCH ........


. . . 48

. . . . . . 56














LIST OF TABLES



bible Page


1. Actual and test-predicted therapeutic outcome . . .. 9

2. Design schematic .... .............. .18

3. Proportion of correct judgments . . . . .... 22

4. Summary of analysis of variance of accuracy ...... 23

5. Mean confidence scores . . . . . .... . .. 27

6. Summary of analysis of variances of confidence . . 28

7. Correlation coefficients for appropriateness . . .. .31

8. Summary of analysis of variance of appropriateness

correlations . . .. . . .. . . ... . 332














LIST OF FIGURES



Figure Page


i. Accuracy by information levels . . . . . ... 24

2. Confidence by information levels . . . . ... 29

3. AppropriaLeness correlations between accuracy and

confidence by information levels . . . . .... 33








Abstract of Dissertation Presented to the Graduate Council
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
at the University of Florida


THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT


By


Ann Weimer Moxley


December, 1970



Chairman: Dr. Paul Satz
Major Department: Psychology


The present study was designed to look at the effects of adding

quantitative and qualitative data to a relevant clinical judgment

task. In essence, it compared judges with varying degrees of clini-

cal experience to actuarial prediction methods. The study also

attempted to train judges to use actuarial information to improve

their prediction accuracy.

Twelve judges representing three levels of clinical experience

made post-dictive judgments on the length of stay in psychotherapy

(short or long) from a sample of MMPI profiles of clients seen in a

university mental health service. Judgments were made under four

conditions in which qualitative and quantitative information was

added incrementally at each level. The three levels of judges' ex-

perience were professional clinical psychologists, "sophisticated"

third year clinical psychology graduate students trained in








statistical decision theory, and "unsophisticated" third year clini-

cal psychology graduate students without any training in statistical

decision theory.

Accuracy increased over levels of information but there were no

differences in accuracy for the three levels of experience. A sig-

nificant group by information level interaction demonstrated some

group effects due to a lower proportion of correct judgments for the

less experienced judges under conditions involving the least amount

of information.

Judges became more confident in their judgments as they received

more information. Appropriateness, defined as accuracy weighted by

confidence and measured by correlation coefficients between accuracy

and confidence, increased substantially as increments of information

were added. The group trained in statistical decision theory tended

to make the most appropriate judgments and the least experienced

group of graduate students tended to make the least appropriate judg-

ments.

The present study showed that clinicians can use quantitative

data to improve their own judgmental ability and to predict more

accurately than actuarial data alone. Also, since those judges with

the most experience in using actuarial tasks tend to be the most

appropriate in their judgments, this implies that clinicians can also

be trained to be more appropriate and to know when their judgments

are more likely to be accurate.









INTRODUCTION


The objectives of the present study were two-fold. The primary

objective was to examine the effects of test and non-test (statistical)

information on the judgmental process. The secondary objective, and

the focus for implementing the primary objective, was to study the

psychological attributes of individuals who stay only a short time in

therapy versus those who remain a long time. That is, the objective

of studying the judgmental process was couched in a real and relevant

situation, length of stay in psychotherapy, which is a meaningful and

pressing problem for psychologists today. However this secondary

objective was minor in relation to the major issue of examining thp

clinical judgment process.


Clinical Versus Actuarial Predictio.

Ever since Paul Meehl's book, Clinical Versus Statistical Pre-

dict ,ion clarified the issue of clinicians' predictions versus

actuarial predictions, there have been numerous studies comparing

these two prediction methods. As Meehl (1954t) points out, however,

the two methods need not be mutually exclusive since the clinician can

incorporate actuarial methods and data into his prediction process.

Many studies have focused not only on comparing clinicians to statis-

tical formulae but also on improving the clinician's ability to

predict by giving him useful statistical information and training him

to use this information.

In general, the studies which compared clinicians to actuarial








methods found that the actuarial methods were either superior to clin-

icians or equal in efficiency to clinicians (Meehl, 1965). With the

exception of one study, the clinician has shown no superiority to

purely quantitative actuarial prediction. The one study which did

find clinicians superior (Lindzey, 1965) used one to two clinicians

and its application is somewhat questionable. One reason the clin-

ician has not been superior to actuarial methods is that he has seldom

been given the opportunity to incorporate the actuarial information in

formulating his final decision. He has been at a disadvantage so that

the demonstrated superiority of the actuarial method may be due to the

experimental design rather than to an actual superiority of statistical

techniques. Also, the information available to the clinician has often

been based on non-quantitative data such as interview material, case

history data, and projective tests.

Holtzman (1960) separates the clinician's diagnostic task into

three phases: (1)collection of information; (2)preparation and trans-

lation of this information for analysis; (3)interpretation of this

information. As he points out, actuarial methods, and specifically

the computer, are superior to the clinician in processing information

once the primary coding has been done. The clinician is still

superior at collecting information and at interpreting it because at

present the computer lacks the appropriate rules and parameters for

interpretation. Thus, studies which emphasize aspects of prediction

suitable for actuarial methods do not use the clinician's talents to

best advantage. It is when skilled clinicians use familiar methods

to predict a criterion they know something about that they have the








most success (Holt, 1958). This includes their having a rich body of

data and systematic actuarial procedures at their disposal in addition

to their own experience, intuition, and knowledge.

Recent studies suggest that as the amount of clinical experience

increases, prediction accuracy decreases (Goldberg, 1959; Oskamp,

1962; Shagoury, 1969; Shagoury & Satz, 1969). These studies compared

trained clinicians with a professional degree to clinical psychology

graduate students and even to non-professional groups, such as secre-

taries, and have found that the trained clinicians were not superior

to the other groups. An explanation of this finding is that the more

experienced clinician has developed a particular way of looking at

data which interferes with his making unbiased, objective decisions.

Another aspect of research in the area of clinical versus statis-

tical predictions is the confidence clinicians place in their judg-

ments and the appropriateness of their predictions. Appropriateness

is a measure of confidence weighted by accuracy which was developed

by Adams (1957). Confidence in judgments also differs between groups

of graduate students and trained clinicians,with the trained psychol-

ogists being less confident in their judgments (Goldberg, 1959; Oskamp,

1962). When the measurement of appropriateness of the judgment is

introduced, however, the trained clinicians are more appropriate in

their confidence levels than are either graduate students or non-

professionals (Oskamp, 1962; Shagoury, 1969). That is, clinicians

are more confident of their correct decisions and less confident of

their incorrect decisions. The amount of information available to

the judge does not correlate with his predictive accuracy but








increased amounts of information substantially increase confidence
levels (Goldberg,1968).

Goldberg (1968) also discusses the nature of the judgmental

process. He questions whether judges use simple decision-making

models such as linear models, or complex processes such as configural

models. In an analysis of clinician's judgments he found that a

linear model usually reproduced 90 to 100 per cent of the reliable

judgmental variance on most decision-making tasks even though the

clinicians generally felt that they used more complex, configural

models.


Using Statistical Information to increase Prediction Accuracy

Training in the use of statistical information has been shown

to improve judgmental accuracy. In a study by Oskamp (1962), clini-

cians were able to improve their ability to distinguish psychiatric

and medical patients on the basis of their Minnesota Multiphasic

Personality Inventory (MMPI) profiles when'they were provided with

actuarial rules. Statistical formula predicted with 75 per cent accur-

acy and the clinicians, after training, were able to reach this 75

per cent accuracy level.

Goldberg (1968) trained judges by giving them a formula and

optimum cutting score for distinguishing neurotic form psychotic MMPI

profiles. The judges were told that the statistical information

predicted with 70 per cent accuracy and they were encouraged to use

this information along with any other information they thought would

improve their prediction accuracy. Goldberg found that after eight








weeks of "value training," the judges, on the average, increased their

accuracy from between 52 per cent to 65 per cent to approximately 70

per cent. This was the only type of training that substantially im-

proved accuracy. Thus, feedback is necessary if the clinician is to

learn how to improve his decision-making techniques.

Another useful type of statistical information is the incidence,

or base rate, of a given trait in the population available to the

clinician. Goldberg (1959), for example, had judges predict brain-

damaged patients from functional patients on the basis of Bender-

Gestalt protocols. The protocols were randomized into different

groups in which the incidence of brain-damage varied from high (P=.8)

to low (.=.2). Goldberg found no difference in judgmental accuracy

between these groups. Unfortunately, the base rate information was

not provided to the judges.

The importance of base rates for evaluating predictive tests was

discussed by Meehl and Rosen (1955). They cite as an example an Army

adjustment test for predicting which inductees would adjust to the

service. The test predicted inductee adjustment with an accuracy of

79.7 per cent. However, the overall percentage of inductees who

adjusted was 95 per cent; thus, utilization of the base rates alone

(i.e., predicting adjustment in all cases) would result in a hit rate

of 95 per cent.

Another application of base rates is through Bayesian statisti-

cal theory which combines the base rates with the valid and false

positive rates of a particular test to give a conditional probability

for the likelihood of being correct or incorrect given a certain test








sign in a given base rate population.

Shagoury (1969) and Shagoury and Satz (1969) demonstrated that

clinicians can substantially improve their predictive accuracy when

provided with information on base rates and conditional probabilities.

These studies showed that increments in statistical information,added

to test data, significantly increased the accuracy of judges in a

real-life clinical decision task of predicting brain-damaged and

functional patients on the basis of a block rotation task (Satz, 1966).

Their judges' accuracy approximated that obtained by a discriminant

function predictor score (L). Composite 7 scores were de-emphasized

by the judges in favor of using the additional information such as

the base rates, differential error risks, and conditional probabil-

ities. However, in groups with a high incidence of brain-damaged

individuals (base rate=.8) the judges' overall accuracy decreased,

perhaps due to a reluctance to diagnose pathology.

Mechl and Rosen (1955) point out that test development should be

concentrated on populations with base rates near .50 rather than on

populations with base rates approaching .00 or 1.00 since the use of

a test in the latter cases will lower the hit rate of using the base

rates alone.

A cutting score, or composite Z score, derived from discriminant

function analysis can be manipulated for various purposes in predic-

tion. It can be used to maximize the number of correct predictions

for all cases or for maximizing only correct predictions for positives.

An excellent application of this technique of discriminant func-

tion analysis to decision theory in a clinical setting was demonstrated









by Satz (1966). Discriminant function analysis is a statistical tech-

nique devised to maximally differentiate discrete criterion groups

when multiple measurements are involved. This is essentially a multi-

ple regression technique except for a discontinuous distribution on

the criterion variables. The following linear equation expresses this

function:

= X, + X 22 +... + nn

where 7 is the composite predictor score based on the individual scores

on each of the variables (XI, X2,...-,X) and the respective weights,

or lambdas, assigned to each of the variable scores (A*' A2"'"... n

If there are two criterion groups involving multiple measures, the

discriminant function determines optimal weights lambdass) for these

variables which will maximize the difference between the composite Z

scores on both criterion groups.


Length of Stay in Psychotherapy as a Criterion Variable

Why is length of stay in psychotherapy a meaningful problem for

study? First, there is the great demand for psychological services

with a present-day manpower shortage of trained clinicians. Most

clinics that see individuals with psychological problems are under-

staffed, have patient waiting-lists, or both. There are also differ-

ential risks involved in selecting who will be seen in therapy. It

is far more serious to miss those who are severely disturbed and need

long-term psychotherapy because of the threat these individuals may

pose to themselves or to society, than it is to wrongly classify

persons who need only a few sessions and are experiencing minor








difficulties in their lives. The first type of error, that of pre-

dicting a short stay in therapy based on a negative test score when

in fact the person stays a long time, is a false negative error. A

false positive error results from the prediction of therapy sessions

based on a positive test score when the individual actually stays

only a few sessions.

Meehl and Rosen (1955) point out that often in a clinical set-

ting external restraints are imposed, perhaps due to a shortage of

staff time, patient waiting-lists, or administrative policy. If this

is the case, decisions cannot always be made in accordance with known

base rates. They give the following example to illustrate the use of

an externally imposed selection ratio. If 80 per cent of the patients

referred to a mental health clinic are recoverable with intensive

psychotherapy, then everyone should be treated rather than relying on

a test which predicts only 75 per cent of those who will have a favor-

able therapy outcome. However, if staff time is limited and only half

of the referrals can be treated, following the base rates is meaning-

less because this would lead to a decision that would be impossible

to implement. In this case, where a selection ratio of .5 is exter-

nally imposed, the use of the test becomes worthwhile. Given the

figures in Table 1 (Meehl & Rosen, 1955), those 50 cases out of the

100 referrals to be treated are selected from those individuals the

test predicts will be "good" therapy risks. If this is done there is

a 92.3 per cent hit rate among those selected for therapy (60/65).

Stated another way, the test will be correct in 46 out of the 50

cases which will be successes (half of the 80 good therapeutic














Table I

Actual and Test-Predicted Therapeutic Outcome


Therapeutic Outcome
Test Good Poor Total
Prediction





Good 60 5 65


Poor 20 15 35


Total 80 20 100









outcome group).

A second reason for selecting length of stay in psychotherapy as

the focus for a clinical judgment study is that the problem can be

subjected to multivariate and statistical decision theory analysis in

order to increase the predictive relationship between signs and cri-

teria. This possibility thus increases its application and potential

usefulness to clinical judges.

One study in this area found that there are differences in be-

havior in psychotherapy between individuals which are predictable

from an MMPI profile (Mello & Guthrie, 1958). Mello and Guthrie

studied 219 individuals seen at a college psychological clinic. They

used only those profiles with at least one T score greater than 70.

They found that length of stay in therapy was related to high scores

on various scales of the MMPI. Of those students with high scores

on Scale 2(D), 45 per cent remained only one to three sessions. Per-

sons high on Scale 3(Hy) tended to stay in therapy longer than the

high 2's and also developed dependency on the therapists more easily.

Scale 4(Pd) individuals seldom stayed past seven counseling sessions

and as a group were quite resistant to therapy although they did not

often cancel their appointments. Persons who stayed the longest in

therapy were high on Scales 7(Pt) or 8(Sc) with some clients contin-

uing past 60 and 21 sessions respectively for these two scales. Most

of the high 9(Ma) students stayed fewer than II sessions and cancelled

therapy sessions frequently. Mello and Guthrie concluded that a

therapist can get some idea of what to expect from a particular client

on the basis of his MMPI profile.









The Mello and Guthrie study is interesting because it suggests

that psychological data (MlPI) may be used by clinicians to more

efficiently select clients for psychotherapy. Unfortunately, the

authors did not examine this problem within the context of a decision-

making task nor did they subject their data to multivariate analysis.

Using length of stay in psychotherapy as the predictor criterion

is valuable for other reasons. For the professional involved, it may

clarify the services offered by his agency and help him to provide

more adequate services to his clients. For example,.he may decide

that seeing many clients for a short period of time is of more value

than giving those who need long-term therapy this service and thus

seeing fewer clients. That is, prevention may be emphasized in a

college mental health clinic and such a clinic may be designed to see

as many students as possible to ease their transition from high school

or junior college to a college curriculum. On the other hand, a

clinic may be more treatment oriented and seek to help those who are

more disturbed and require longer therapy. This emphasis would re-

quire more staff time per individual client and would necessitate

seeing fewer clients. Decisions of whom to treat could be more ade-

quately made with test and non-test information.

To be able to predict length of stay in therapy could affect

therapist expectations which could in turn affect outcome variables.

Just what effect an expectation for a particular length of stay in

therapy will have on the outcome of the therapy is outside the scope

of the present study but is an important research question in itself.

Of course, if the clinician intends to see everyone who enters his








clinic, a screening procedure is worthless or may even be detrimental

if the test predicts that an individual will not stay in therapy or

will not improve in therapy, because this may lead the therapist to

expect just these results to the client's disadvantage (Meehl & Rosen,

1955).

It is often necessary for the clinician to indicate a therapy

prognosis for an individual. If the clinician can predict or learn

to accurately predict whether or not a person will stay in therapy,

he is providing useful information for the person's treatment.

Thus it can be seen that clinicians are constantly involved in

the task of prediction and decision-making. If they can be trained

to make use of relevant data and material, they may improve their

predictions. Although many clinicians look with disfavor on the use

of tests, tests combined with other relevant data can be shown to

have practical and research applications. The clinician may use them

'to better his predictions and decision-making processes.


.HEpothLeses Tested

The present study was addressed to two objectives. First, to

examine the decision-making process and to determine whether predic-

tion accuracy is influenced by independent variables such as clinical

experience and varying amounts and kinds of information. Second, the

question of whether clinicians can be trained to improve their clin-

ical decision processes was also examined. The first and primary

objective was studied in terms of the second objective, a real-life

situation that is meaningful to clinicians today--the problem of









length of stay in psychotherapy. If increments in levels of statis-

tical information increase prediction accuracy and thereby improve

the clinicians' decision process, this type of information may be

dovetailed into the operation of a clinic and taught to the staff to

identify high-risk individuals. Specific questions, or hypotheses,

were raised. Does judgmental accuracy increase as more information

is added to the prediction task and what types of information are

most useful in increasing judgmental accuracy? Will there be differ-

ences in accuracy dependent on experience level? That is, will grad-

uate clinical psychology students trained in statistical decision

theory be better clinical judges than experienced PhD clinical

psychologists (without such training) and will less experienced

psychologists be superior to more experienced clinicians? Will con-

fidence and appropriateness increase with increments in information

and will there be differences between the three experience levels,

with regard to their confidence and appropriateness.
















METHOD


Sub ects. Twelve judges (Js) represented three levels of

experience and sophistication in statistical decision-making. A pro-

fessional (P) group of four PhD clinical psychologists represented

the highest level of clinical experience. A group of four clinical

psychology graduate students trained (sophisticated) in statistical

dccisicn-maKing theory (SGS) represented the highest level of statis-

ticai sophistication. Another group of four unsophisticated (not

trained in statistical decision theory) clinical psychology graduate

students (UGS) represented the same experience level as the SGS group

and the same level of statistical sophistication as the P group.

Sophistication in decision theory was defined as participation in a

graduate course in statistical decision theory for clinical psychology

students at the University of Florida. Sophistication here only im-

plies special training and by no means implies that the professionals

were clinically unsophisticated.

Materials. Test materials for Js were a random sample of 100

MMPI profiles of clients seen in a university mental health service.

The sample profiles were drawn from 241 profiles of all clients seen

during a three-year period. Each J received 25 of the 100 profiles.

Profiles were divided into two groups based on the client's

length of stay in psychotherapy at the mental health service. A short









stay (S) was defined as four or less therapy sessions and a long stay

(L) as five or more therapy sessions. The mean length of stay for

the S group was 2.00 sessions and for the L group 9.27 sessions.

A discriminant function analysis which maximized the difference

between the two length of stay in psychotherapy groups was run on

the 241 MMPI profiles. The mean discriminant composite scores for

the two length of stay in therapy groups on the 13 MMPI scale vari-

ables were Z =29.74 for the few-session group (S) and Z2=34.26 for

the many-session group (L). An analysis of variance of the composite

means showed a significant difference between the two groups (F=4.19,

df=12,224, p<.001). A commonly used rule of 2= 1i + -2 was used to
2
determine the optimal predictive cutting Z score.

With an emphasis on minimizing the false negative rate, the com-

posite Z score of 32.02 predicted with an overall hit rate of 67 per

cent for the original protocol pool. False negative errors repre-

sented those clients who were predicted as, hort-stays (S), or

negatives, but who remained long in therapy (L). It was felt that

this predictive error was more serious than the false positive error

which included those clients who were predicted as long-stays (L)

but who remained a short time in therapy (S). It seemed more impor-

tant to identify those clients who really needed long-term therapy

than to identify those who did not. Of course, some of the individ-

uals with high test scores who stayed only a few sessions may have

been very disturbed but dropped out of therapy prematurely. There

was no way to identify these case's when a very disturbed student may

have panicked or become threatened by therapy and dropped out or









simply missed appointments. The false negative rate for the Z score

of 32.02 was .38, the false positive rate was .31, giving a valid

negative rate of .69 and a valid positive rate of .62.

Another Z score which minimized the false positive error pre-

diction with an overall accuracy of 71 per cent was not used in

the present study for the reason stated above.

Conditional probabilities were calculated for the 7 cutting

score. Conditional probabilities were computed with the following

equations:

P/ (L)-/L-- and P(/-) --L-
P(L/+) = P(L)P(+/L) + P(S)P(+/S) anP(S)P(-/S) + P(L)P(-/L)
wherL L=many therapy sessions or a long stay in therapy (base rate=.66)

S=few therapy sessions or a short stay in therapy (base rate=.34)

+=a positive test score (Z 1 32.02)

-=a negative test score (Z < 32.02)

For the 2 score of 32.02 the conditional probabilities were:

P(L/1)=.51 and P(S/-)=.78. With this new information it can be seen

that with a positive test score, predictions will be wrong as often

as they are correct. But given a negative test score, predictions

will be right 78 per cent, or most of the time.

Finally, a random sample of 100 profiles from the total protocol

pool of 241 cases was drawn. This was done so that the Js would have

fewer protocols to judge, making their task more economical with

regard to time.



'Failure to control this factor undoubtedly lowered the predic-
tive accuracy of the discriminant function equation (and perhaps
clinical judgment) in that some of the disturbed profiles in the (S)
criterion 9rouip may well have remained (L) if they had not dropped out.









A second reason for drawing a random sample was to make the

situation more relevant clinically in terms of the base rates. That

is, the sample had only approximate base rates and the judges did not

know the exact probabilities for their sample of those who remained a

long or short time in therapy. However, for the sample, the 2 score

predicted with the same accuracy that it did for the total protocol

pool.

Procedure. Refer to Table 2 for a schematic of the design. Is

were asked to predict a client's length of stay in psychotherapy

from the 25 MIPI profiles. These profiles, the sample of 100 pro-

files and the original profile,pool all had approximately the same

base rates: 35 per cent of the clients stayed many sessions (L) and

b6 per cent stayed a few sessions (S). The Js predicted length of

stay in therapy (S or L) during four sessions,with additional infor-

mation added incrementally at each session. These sessions, or

levels of information, represented one class of independent variables.

Groups, or experience level, represented the other class of indepen-

cent variables.

Each J made his predictions on the same 25 protocols that he

received at the first level throughout the training. Level 1: Js

were first given MMPI profiles with no other information Level 11:

Is were again presented the same 25 protocols for the same judgment

but with the additional information of biographical data such as age,

sex, marital status, religious preference, parents' marital status,

previous counseling experience, and subsequent counseling exper-

ience. Level ill: For the third decision task, is were given the



















2ZJ v\Or vm


en Uvf


a fi iU


Ufi 0Vc enf OV~ enfi U~U
gil Iii jai
-cN-vf 1Jen\Oh-m 0-\O-IN


Q J0u
-NM@


O)O-N










mff?
mirt









????
T I V 7
.I u









profiles, biographical data, with the additional statistical infor-

mation of the cutting score based on discriminant function analysis.

Valid positive and false positive percentages were also provided

with the cut-off Z score. Level !V: Conditional probabilities and

the base rates were added to the previous information for the fourth

presentation of profiles for prediction. (For a copy of the instruc-

tions for each information level see Appendix A.) For each judgment

Js also indicated their confidence in the accuracy of their judg-

ment.

To rule out a practice effect from repeated presentation of the

same profiles, two control judges were used who predicted length of

stay in psychotherapy using profiles only,with no additional infor-

mation on four separate occasions.

Judges were presented the profiles for judgments on four days

in a row with only one information level given each cay.

Hypotheses. I--Information Level: It was hypothesized that

increments of information would increase overall judgmental accuracy

and group accuracy. (A) Level I accuracy would be at approximately

the level of chance. (B) At Level II, accuracy would decrease or

remain the same. (C) Level III accuracy would be approximately that

of the actuarial prediction accuracy of the discriminate function

analysis. (D) Level IV accuracy would increase slightly over Level

II accuracy.

ll--Experience Level: It was hypothesized that the statistic-

ally sophisticated graduate students would be the most accurate, the

statistically unsophisticated graduate students next most accurate,






20


and the professionals least accurate.

Ill--Confidence and Appropriateness: It was hypothesized that

confidence ratings would increase with increments of information and

that appropriateness would also increase with more information.












RESULTS


Accuracy: The Effects of Infornation and Experience

Accuracy was defined as the proportion of correct judgments per

presentation of 25 MMPI profiles. The two control Js showed no prac-

tice effects. Judge A's accuracy was 52 per cent on the first

presentation and 48 per cent on each of the three subsequent presen-

tations. Judge B's accuracy was distributed across sessions as

follows: 76 per cent, 48 per cent, 68 per cent, and 68 per cent.

Table 3 presents accuracy by information level and experience

level. Two analyses of variance were conducted to determine the

effects of information level, experience level, judges, profile set,

and profile. The analysis of variance for profile set effects was

non-significant (f'=.59, df=3,91). An Emax test for homogeneity of

variances between groups was also non-significant (fmax=343' k=3,

df=16). A surmary of the analysis of variance for information level

and experience level effects is shown in Table 4.

Information. Mean judgmental accuracy increased consistently

with increments of information from Level i to Level IV (x =.55,

Xl=.61, Xi '.67, XV=.69). These differences were significant

(L-10.82, df=3,27, pa .01). A graphic presentation of this trend

is shown in Figure 1. Inspection of Figure I shows approximately a

linear increase in accuracy for the three groups by information

level. Both the P and UGS groups increased their accuracy at each

level while the SGS group showed increases at Levels II and Ill

21












Table 3

Proportion of Correct Judgments




Experience Information Level Totals

Level J I II III IV


S .64 .68 .76 .76 .71
SGS 2 .56 .68 .72 .64 .65
3 .64 .68 .72 .72 .69
4 .63 .60 .72 .68 .65

Total .61 .66 .73 .70 .68


5 .72 .64 .64 .76 .69
UGS 6 .40 .64 .72 .68 .61
GS 7 .36 .52 .48 .56 .48
8 .36 .48 .68 .64 .54

Total .46 .57 .63 .66 .58


9 .44 .48 .68 .72 .58
10 .68 .68 .64 .76 .69
P 11 .64 .68 .68 .72 .68
12 .56 .60 .64 .60 .60

Total .58 .61 .66 .70 .64


Totals .55 .61 .67 .69


Control A .52 .48 .48 .48 .49
Control B .76 .48 .68 .68 .65















Table 4

Summary of Analysis of Variance of Accuracy


Source of Variation df MS F




Mean I 478.80


Information Leel 3 1.21 10.82*-


Experience Level 2 0.92 2.31;


Information X Experience 6 0.81 7.23*'


Judges 9 0.39 0.69


Information X Judges 27 0.11 0.95


Profile 288 0.57


Information X Profile 864 0.11


** Significant at the .01 level.



















--- SGS

A-- UGS

70





0




5 0
0






II III IV

Information Level

Fig. i. Accuracy by information levels.








but a slight decrease in accuracy from Level III to Level IV.

Fxperiencc. There were no differences in accuracy due to exper-

ience level except for a trend toward group differences (F=2.34, df=

2,9, p<.20). The SGS group was the most accurate and the UGS group

the least accurate (XGs=.68, p=.64, _uGS=-58). Only the SGS

group's overall accuracy was at the level of the discriminant func-

tion which predicted with 67 per cent accuracy.

Information and experience level interaction. The only other

significant source of variance was the group by information level

interaction (F=7.23, df=6,27, p<.01). The Newman-Keuls test of

differences between means was used (Kirk, 19G8) and the results of

this analysis are given in Appendix B. The interaction was based

largely on a significantly lower proportion of correct judgments of

the UGS group at Level I. The UGS group not only started with the

lowest proportion of correct judgments, but also showed the most

significant increase in accuracy as information was added. Their

final degree of accuracy, however, was approximately the same as the

SGS accuracy at Level I! The UGS group significantly increased

their level of accuracy at Levels III and IV from Level I when the

composite 7 score, conditional probabilities, end base rates were

added (p<.01).

The only significant increase in accuracy for the P group was

between the first level, with the profile only, and the final level

With all information (p<.05). Increases in accuracy for all groups

across information levels were significant except the increase from

Level III to Level IV where conditional probabilities and base rates









were added. Adding conditional probabilities and base rates to the

previous information did not result in a significant increase in

Js' accuracy over Level III, which included the composite 7 score.

For the SGS group there were no significant differences in accuracy

across information levels. The only significant group differences

within information levels were between the SGS group and the UGS

group (p4.Ol) and between the SGS group and the P group (p .05).


Confidence

Mean confidence scores by information level and experience

level are shown in Table 5. Table 6 shows the summary of the

analysis of variance for confidence scores. The Js confidence in-

creased significantly as subsequent items of information were added

to the protocols for all groups (F=5.38, df=2,28, p4.05). Although

there were no differences between confidence scores for groups, the

P group tended to be the most confident and the SGS group tended to

be the least confident (Xp=76.86, UGS=72.72, GS=69.96). These

trends are shown in Figure 2.


Aonroraten ass

A measure of appropriateness (confidence weighted by accuracy)

was measured by Pearson product-moment correlations between confi-

dence scores and accuracy scores for each J at each level of

information. There were no significant group effects or interact-

tions but the SGS group tended to be the most appropriate (LSGS=.26,

p =.20, I GS=.17). The higher the correlation, the more appropriate

the judgment. Correlation coefficients for appropriateness are











Table 5

Mean Confidence Scores




Experience Information Level Totals

Level J I I III IV


1 61.2 64.0 65.6 65.6 64.1
2 61.2 60.8 60.8 72.0 63.7
SGS 3 75.6 74.0 73.6 74.4 74.9
4 72.0 76.2 80.4 82.0 77.7

Total 67.5 68.8 70.1 73.5 70.0


5 59.0 58.0 66.0 69.2 63.1
UGS 6 85.4 86.8 84.0 92.2 87.1
7 78.2 79.8 76.2 82.2 79.1
8 64.2 61.6 61.6 62.0 62.4

Total 71.7 71.6 71.5 76.4 72.7


9 87.6 87.2 88.4 90.0 88.3
10 64.8 56.0 65.8 56.0 60.7
II 85.2 87.8 86.4 86.6 86.5
12 68.0 70.4 76.8 72.8 72.0

Total 75.2 75.4 79.4 76.4 76.9


Totals 71.4 71.9 73.8 75.4














Table 6

Summary of Analysis of Variance of Confidence







Source of Variation df MS F




Information Level 2 52.67 5.38*


Experience Level 2 191.84 .39


Information X Experience 6 12.73 1.30


Judges within Croups 9 493.76


Information X Judges 28 9.79


* Significant at the .05 level.

























80





ou
70


0


0 60
o5 .--n SGS
2: C----0 P
h- L UGS

50

I II III IV

Information Level

Fig. 2. Confidence by information levels.









given in Table 7 with the analysis of variance summary for appro-

priateness in Table 8. The analysis of variance was based on Z

transformations of the correlation coefficients. Mean appropriate-

ness scores were significantly higher at each level of information

(E=22.03, df=2,28, p<.01).

The SGS group was most appropriate because they were most

accurate and not overconfident, that is, their confidence was con-

sistent with their accuracy. The P group was overly confident and

the UGS group was less accurate, making these two groups' confidence

inconsistent with their accuracy. These trends can be seen in

Figure 3.


Judges versus the discriminant function

The discriminant function correctly classified 67 per cent of

the profiles. This information was given to the Js at Level Ill.

At Level Ill only the SGS judges were more accurate than the discri-

minant function with a hit rate of 73 per cent. The P group had a

hit rate of 66 per cent and the UGS group had a hit rate of 63 per

cent at Level 11l. The accuracy for all judges combined was 67 per

cent. Five judges (two in the SGS group, two in the P group and one

in the UGS group) were more accurate overall than the linear regres-

sion 2 score and only one J (in the UGS group) operated below the

chance level overall.

At Level I, four Js had accuracy scores below the level of

chance and two others were only slightly above chance. However,

none of tie four Js who was below chance were in the SGS group. At

Level !I there were two Js below chance and one slightly above; again,












Table 7

Correlation Coefficients for Appropriateness


Experience Information Level Totals

Level J I II III IV


S .35 .35 .34 .28 .33
SGS 2 .25 .20 .18 .65 .32
3 .34 .31 .A1 .28 .34
4 .11 .06 .01 .02 .05

Total .26 .23 .24 .31 .26


5 .13 -.06 .24 -.03 .07
U S 6 -.03 -.15 .10 .47 .10
7 -.05 -.03 .30 .47 .17
8 .29 .37 .41 .27 .34

Total .09 .04 .26 .30 .17


9 .27 .20 .17 .33 .24
S10 .06 .23 .33 .31 .23
1I .11 .19 .29 .28 .22
12 -.02 -.07 .08 .41 .10

Total .11 .14 .22 .33 .20


Totals .12 .13 .24 .31















Table 8

Summary of Analysis of Variance of Appropriateness Correlations


Source of Variation df MS F




Information Level 2 .0705 22.03**


Experience Level 2 .0366 2.26


Information X Experience 6 .0032 2.25


Judges within Groups 9 .0162


Information X Judges 28 .0072


** Significant at the .01 level.





33














rj-0 SGS
0-0 P
--& UGS

.30





20





^ .10





.00

I II llI IV

Information Level

Fig. 3. Appropriateness correlations between accuracy

and confidence by information levels.





34


none of these Js was in the SGS group. With the addition of the

statistical information, only one J (in the UGS group) remained near

the chance level of accuracy and he was the least accurate of all

the Js.















DISCUSSION


The present study demonstrated that judges can substantially

improve their decision accuracy when provided with increments of

information, particularly statistical information. This finding

extends the earlier findings reported for a different clinical judg-

ment task (Shagoury & Satz, 1969) and contrasts with'previous studies

which have used non-quantitative data. These findings also suggest

that if the clinician is able to incorporate quantitative information,

he may improve his own decision-making ability and equal or surpass

the accuracy of actuarial methods.

The findings of the present study also showed that accuracy

increased directly as a function of the amount of information avail-

able to the judges. Two conclusions that can be drawn from this

finding are that the information was relevant to the judgmental task

and that the judges used this information in formulating their

decisions.


Information

A post-testing interview revealed that the type of information

used varied between groups, among judges, and between information

levels. However, the interview was not structured enough to deter-

mine the actual decision rules used by the judges.

At Level I, with the MMPI profile only, most judges used their

35









own intuition about the relationships of which scales were elevated

and the extent of these elevations to the length of stay in treat-

ment criterion. There was a great deal of individual variation in

approach since each judge had different training experiences with

the MMPI. The judges of the SGS group had the most similar training

experience in the use of the MMPI since some training in the ration-

ale and use of this instrument was given in the statistical decision

theory course. The SGS group also showed the least amount of indi-

vidual variation in accuracy at Level I. The other group of graduate

students (UGS) had the least amount of exposure to the MMPI. The

UGS group was barely familiar with this test instrument and none of

these judges had had any formal training in its use. It is inter-

esting to note that the group of unsophisticated graduate students

showed the lowest accuracy throughout and was the only group whose

accuracy was never below the level of chance. It seems then, that

the more familiar a judge is with a test instrument, the more accur-

ate he will be in using it for prediction.

The SGS group was not familiar with the specific type of task

used in the present study. That is, they had not been trained in

correlating MMPI data to length of stay in psychotherapy. This

aspect of the study was novel to each of the three groups.

At Level II again each judge approached the data differently

and selected certain measures to use in making his decisions. The

variation within groups decreased and there was less difference in

this variation between groups. Accuracy increased or stayed the

same for all but one judge whose accuracy dropped. Thus, the









hypothesis that accuracy would decrease at Level II was not supported.

It was originally felt that all of the biographical data provided

would make the task more complex and more difficult and would thus

confuse the judges. However, the judges were able to relate some of

the information to the task and thereby improve their judgments.

Most judges used some combination of factors. Whether the profile

subject had previous counseling or subsequent counseling and his age

were the factors used most often. Some of the judges also considered

marital status when the subject was married. This finding (Level II)

contrasts with other studies which indicate lowering of accuracy

when data are combined (Golden, 1964).

In support of the hypothesis, overall accuracy for Level III

was the same as the discriminant function's accuracy of 67 per cent.

With the addition of Z scores at Level Ill, only one judge, who was

in the P group, used the cut-off score exclusively. In this same

group one judge changed none of his judgments from Level II and the

other two judges used primarily their own subjective inferences. The

UGS group essentially ignored the Z scores and relied on their own

intuition and thus did not reach the level of accuracy of the cut-

off score. All judges in the SGS group combined the cut-off score

data with their own intuition to improve upon the accuracy of the

discriminant function. These findings imply that the clinician can

make use of his intuition and experience but not at the expense of

ignoring available data, particularly when they include quantitative

information. The findings also imply that the most accurate judges

are the ones who are able to utilize statistical data.









The fact that there was an increase in accuracy from Level III

to Level IV,but that this increase was non-significant, supported the

hypothesis that Level IV accuracy would increase slightly over Level

III accuracy. For the SGS group there was a slight decrease in

accuracy. One reason for this decrease might have been the informa-

tion itself. These judges had been trained to use more powerful

statistical information, that is, data that discriminated groups and

sub-groups more than did the data of the present study. The base

rates of .65 and .35 were not sufficiently different from base rates

of .5 to be of much help. Also the conditional probabilities were

not high enough to provide maximum discrimination. All of the

statistical data given were in approximately a 2/3 to 1/3 ratio.

Because all of the statistical data had approximately the same pre-

dictive power, it may have been difficult to know which kind of

information would be most useful. Instead, judges may have tried to

combine two or more kinds of data and as a, result were less accurate

than they would have been using one type exclusively. Quantitative

information is most useful when it represents higher ratios, such as

base rates of .2/.8 or .1/.9; conditional probabilities of .85/.15;

and cut-off scores of 75 per cent or higher.

Even though five out of the twelve judges showed decreases in

accuracy at Level IV in comparison with Level III, these decreases

were slight and represented only one more incorrect judgment out of

the 25 judgments for all five judges.

At Level IV, the UGS group ignored the base rates and used the

conditional probabilities. The SGS group and the P group both used









a combination of conditional probabilities and base rates and both

of these groups had the same degree of accuracy at Level IV. Also,

both the SGS and P groups were more accurate than the UGS group.

It seems probable that statistical information is more impor-

tant than biographical information about the subjects since there

was a greater increase in accuracy with the addition of statistical

information. Other studies have shown that biographical dataare of

minimal value to judges. Golden (1964) found that judges agreed

less in their description of protocols when they were given identi-

fying data alone than when they were given a single psychological

test or a combination of tests. Kostlan (1954) found that judges

were more accurate in their psychodiagnoses when they received both

social case histories and the more quantitative MMPI protocol than

when they received social case histories alone.

One may ask if the judges would have been more accurate had they

been given some feedback on their accuracy at each level of infor-

mation. This is possible but then the task would not have been as

life-like in the sense that clinicians in actual situations must

usually wait some time before learning the accuracy of their predic-

tions. However, this does emphasize the point that clinicians should

check the accuracy of their predictions when possible and learn what

helps them to predict most accurately.


Exeperie nce and train i n

The lack of overall differences between groups was not antici-

pated. It was assumed that the SGS group would have benefited

from their training in statistical decision theory. However,









artifacts in the design tended to wash out group effects by providing

a guaranteed hit rate if the Z scores were used at Levels III and IV.

The convergence of judgmental accuracy foi each group at Level IV

lends some support for this argument.

The hypothesis that the SGS group would be the most accurate

was thus only tentatively supported since there was not a signifi-

cant group effect. However, the SGS group tended to be the most

accurate in their judgments. This finding implies that clinicians

can be trained to improve their own subjective inferences with

statistical information. These judges trained in statistical de-

cision theory were able to add their own intuitive judgments to the

statistical data and thereby predict more accurately than did the

discriminant function alone or than they had done without the statis-

tical data. This special training taught them not only how to use

statistical information but also how to use their clinical intuition

to its best advantage. The SGS group also tended to be the most

appropriate, that is, to know when their judgments were most accurate

and when they were most inaccurate.

The fact that the P group tended to be more accurate than the

UGS group was also unexpected in light of previous findings con-

cerning amount of clinical training and accuracy of prediction.

This finding does not support the previous evidence (Goldberg, 1959;

Oskamp, 1962; Shagoury, 1969; Shagoury & Satz, 1969) that as the

amount of clinical experience increases, prediction accuracy

decreases. In the present study, judges in the SGS and P groups

used familiar methods, the MMPI profiles or statistical data; the








UGS group, by contrast, was presented with essentially unfamiliar

prediction tools. One reason data in the present study were at

variance with previous findings is that previous studies required

clinicians to predict an unknown criterion or to use unfamiliar

methods so that any previous "set" of the clinician was not advanta-

geous. In the present study, the judges' familiarity with either the

MMPI or statistical types of data helped them in their predictions.


Interaction effects

The significant group by information interaction effect showed

at least indirect support for the experience level hypothesis that

the SGS group would be most accurate. This interaction showed that

the unsophisticated graduate students started off predicting below

chance and, finally at Level IV, reached the level of accuracy that

the sophisticated graduate students attained at Level I (MMPI pro-

files alone). It was the former group with the least amount of

experience, familiarity with the MMPI, and sophistication with the

statistical decision theory which accounted for most of the group

differences and much of the interaction effect.

The rest of the interaction effect was due to the changes in

accuracy across information levels with the UGS group showing the

greatest change and the SGS group showing the least amount of

change in accuracy. The latter group started out predicting fairly

accurately and had less room for improvement while the former group

started out so poorly that their improvement was marked. The SGS

group predicted almost as well as the discriminant function with








the profiles only. The UGS group improved from below chance to the

level of accuracy achieved by the discriminant function.


Confidence

Previous studies would suggest that the trained clinicians

should have had less confidence in their judgments than the two pre-

professional groups. Although group differences for confidence were

non-significant, the professionals in the present study tended to be

the most confident. Again, this may have been because they were

using the WIPI with which they were more familiar than were the other

two groups. Also the professional group was predicting a criterion

about which they knew something, that is length of stay in psycho-

therapy. This again suggests that previous studies have placed the

clinician at a disadvantage so that he is less accurate and less con-

fident than he would be predicting in a familiar setting.

In general, adding information substantially increased the

judges' confidence. Judges became more confident as well as more

accurate with increments in information. However, the UGS group's

confidence did not increase until they had all the available infor-

mation.

One problem with asking judges to assign a confidence rating to

each judgment was that each judge had a different standard or set

for measuring how confident he was. The range of confidence scores

used also varied between'judges and within groups so that one judge

used all six possible levels of confidence ranking (50, 60, 70, 80,

90, and 100) while another judge only used two (60, 70) or three

(80, 90, 100) rankings.









Approprlateness

The most meaningful measure to express appropriateness, defined

as the relationship between accuracy and confidence, was the corre-

lation coefficient. Just as accuracy and confidence increased with

each level of information, so did appropriateness. As judges became

more accurate they also became appropriately more confident.

The increases in appropriateness followed the same pattern as

the increases in accuracy and confidence. That is, appropriateness

increased significantly across levels of information but there was

only a tendency for one group to be more appropriate than the other

groups. As with accuracy, the SGS group tended to be the most appro-

priate and the UGS group tended to be the least appropriate. This

contrasts earlier findings that trained clinicians are more appro-

priate in their confidence levels than are graduate students in

psychology (Oskamp, 1962; Shagoury, 1969). The findings of the

present study, however, do not contradict.earlier findings since the

present differences between groups on the measure of appropriateness

were non-significant.


A ppl icat ions

It appears that actuarial data and training in their use can be

applied to situations in which clinicians must predict and make de-

cisions. In the present study, judges were able to post-diet length

of stay in psychotherapy fairly well. The next step would be to

apply these techniques to the same setting and Lr.dict a client's

length of stay in psychotherapy. This could then be followed up at









the end of treatment as a check of prediction accuracy. This would

enable the clinician to determine which short stays were "no shows"

and which were treated. Thus the discriminant function Z score and

judges predictions could be much higher and more useful for practical

application to the clinician's population of clients. This type of

procedure is most useful in a clinic situation which must limit the

number of clients seen or must screen those that will be seen.

Statistical methods of prediction can be particularly applicable to

the screening of patients to determine what type of treatment is most

appropriate and would be most useful for each client.

To use actuarial data in a clinic situation,they must first be

collected and analyzed. Too many clinical situations today fail to

make use of the data they have available. They do not even know the

base rates for various classifications of the clients they see.

Collecting and analyzing statistical data is another way to more

fully understand a particular clinical setting by learning what type

of patients are seen, how long they stay in treatment, and hopefully,

which ones are most likely to improve.

If a clinic decides to see everyone who comes in for help, tests

and statistical data are not of benefit in selecting whom to see.

However, these data might be used for prediction and research in a

setting which sees all clients. It is in situations where everyone

cannot be treated that improving tests and collecting base rate

information is most needed. Where decisions and predictions must be

made, actuarial methods are most needed to improve the clinician's

decision-making ability.





45



A further study which would be a fair and optimal test of clini-

cal versus statistical prediction would be to give judges an oppor-

tunity to see the relationships of test variables with a criterion

on a standardization sample. Then, the judges would be compared

with a discriminant function on a cross validation sample. However,

this was not the purpose of the present study.















SUMMARY


The present study was designed to look at the effects of adding

quantitative and qualitative data to a relevant clinical judgment

task. In essence, it compared judges with varying degrees of clini-

cal experience to actuarial prediction methods. The study also

attempted to train judges to use actuarial information to improve

their prediction accuracy.

Twelve judges representing three levels of clinical experience

made post-dictive judgments on the length of stay in psychotherapy

(short or long) fro.a a sample of MMPI profiles of clients seen in a

university mental health service. Judgments were made under four

conditions in which qualitative and quantitative information was

added incrementally at each level. The three levels of judges' ex-

perience were professional clinical psychologists, "sophisticated'

third year clinical psychology graduate students trained in statis-

tical decision theory, and "unsophisticated" third year clinical

psychology graduate students without any training in statistical

decision theory.

Accuracy increased over levels of information but there were no

differences in accuracy for the three levels of experience. A sig-

nificant group by information level interaction demonstrated some

group effects due to a lower proportion of correct judgments for the

less experienced judges under conditions involving the least amount

46









of information.

Judges became more confident in their judgments as they received

more information. Appropriateness, defined as accuracy weighted by

confidence and measured by correlation coefficients between accuracy

and confidence, increased substantially as increments of information

were added. The group trained in statistical decision theory tended

to make the most appropriate judgments and the least experienced

group of graduate students tended to make the least appropriate judg-

ments.

The present study showed that clinicians can use quantitative

data to improve their own judgmental ability and to predict more

accurately than actuarial data alone. Also, since those judges with

the most experience in using actuarial tasks tend to be the most

appropriate in their judgments, this implies that clinicians can also

be trained to be more appropriate and to know when their judgments

are more likely to be accurate.































APPENDIX A

INSTRUCTIONS















APPENDIX A-I


INSTRUCTIONS PART I


This study is designed to examine the decision process when only

limited information is available. You will be presented with 25

Minnesota Multiphasic Personality Inventory (HMPI) profiles of stu-

dents seen at the University of Florida Infirmary Mental Health

Service. Some of these students stayed a long time in therapy (5 or

more sessions, X=9) and some stayed only a short time (4 or less

sessions, X=2). Your task will be to decide which students stayed

a long time (L) and which stayed only a short time (S) on the basis

of the test profile alone.

Your task is to try to make the best estimate of probable length

of stay in psychotherapy given only limited information. It is pos-

sible to correctly classify all the profiles. It is hoped that your

predictions will in some way help us to understand one aspect of the

decision-making process as it is applied by psychologists in clinical

settings.

You will also be asked to rate your confidence for each subject

on a scale from 50 per cent to 100 per cent. If you are positive of

your decision, you should mark 100 per cent; if you are only guessing

you should mark 50 per cent. That is, the more certain of your de-

cision, the higher percentage you should mark.















APPENDIX A-Il


INSTRUCTIONS PART II


Your task on Part II is identical to that on Part I. You will

be presented the same 25 profiles and asked to predict (S) or (L).

However, this time more information will be available to you. That

is, you will also have biographical data. You may use this infor-

mation in any way you wish. You may choose to disregard the infor-

mation altogether and make your predictions as you did in Part I.

Your task is to try to make the best estimate of probable length

of stay in therapy given only limited information. It is possible to

correctly classify all the profiles. It is hoped that your predic-

tions will in some way help us to understand one aspect of the

decision-making process as it is applied by psychologists in clinical

settings,

Again, please indicate your confidence in your judgment for each

subject from 50 per cent to 100 per cent.















APPENDIX A-II


INSTRUCTIONS PART III


Your task on Part III is identical to that of Parts I and II.

You will be presented the same 25 profiles and asked to predict as

accurately as possible, on the basis of the information given,

whether the student is (S) or (L). Again, more information will be

made available to you. The following statistical information will

be added.

Discriminant function analysis provided weights for each of the

13 MMPI scale variables in order to obtain maximal differentiation

between long stayers (I.) and short stayers (S). A composite score

'(Z) was obtained which best estimates the combined relative effects

of all the scale variables.

This Z score is used to make the best prediction as to which

criterion group a particular profile belongs. This can be summarized

as follows:

1. ZA32.02 is a positive test sign (4) and indicates a probable

long stay in therapy (.).

2. Z<32.02 is a negative test sign (-) and indicates a probable

short stay in therapy (S).

No test, however, classifies without some errors. This derived com-

posite cut-off (Z=32.02) yields the following percentages of








classification:



Composite test sign Criterion

S L

2<32.02 69% 38%

Z!32.02 31% 62%
1

In other words:

1. A (+) test sign (Zd 32.02) correctly classified 62 per cent

of the long stayers (L). This is known as the valid positive

rate. Also, a (+) test sign incorrectly classified 31 per

cent of the short stayers (S) and this is the false positive

rate.

2. A (-) test sign (Z c32.02) correctly classified 69 per cent

of (5), the valid neaatjv ra, and incorrectly classified

38 per cent of (L), the false negative rate.

This means that 38 per cent of the (L)'s scored below 32.02 and were

incorrectly classified (S), and 31 per cent of the (S)'s scored above

32.02 and were incorrectly classified as (L). The total percentage

correctly classified was 67 per cent.

You will be required to predict as accurately as possible

whether the student belongs to (S) or (L), short or long stay. It is

possible to score every profile correctly scoring 100 per cent.

You may predict (S) or (L) by using (1)the composite Z score

cut-off, (2)the biographical data, (3)the profile alone, or (4)any

combination of (1), (2), and (3). The composite cut-off score was

applied to yield the best overall classification rate but no test




53



is perfect and errors may be made with any procedure. It is quite

possible that the clinician may be able to improve upon the linear

statistical method (Z score) by utilizing combinations of both

"intuitive" and statistical data.

Your task is to try to make the best estimate of probable length

of stay in therapy given additional, but limited, information. It is

possible to correctly classify all the profiles. It is hoped that

your predictions will in some way help us to understand one aspect

of the decision-making process as it is applied by psychologists in

clinical settings.














APPENDIX A-IV


INSTRUCTIONS PART IV


Your task on Part IV is identical to that of Parts I, II, and

III, utilizing the same 25 profiles. You are to predict as accur-

ately as possible on the basis of the information given, whether the

student is (S) or (L). Again, more information will be made avail-

able to you. In addition to the composite Z score, biographical

data, and test data, you will also be told the conditional probabili-

ties and base rates for the groups and test signs.

Conditional probabilities combine test signs, (+) or (-), and

base rates to yield a quantitative index of the probability of

correct classification when Z232.02 (+) or when Z 32.02 (-).

For example, some of the subjects will be (L) when Z232.02 (+)

and some will be (S) when Z 32.02 (-). The problem is to determine

how confident we can be with each test sign under the base rates of

the population. The base rates for the two groups are: Short (S)=

66 per cent and Long (L)=34 per cent. In other words, 34 per cent

of the subjects stayed a long time in therapy and 66 per cent stayed

only a short time. The majority, therefore, were shorts (S).

Based on this information, the conditional probabilities are:

for a (+) test sign, P(L/+)=.51 and for a (-) test sign, P(S/-)=.78.

This means that the probability of a person staying a long time in








therapy (L), given a positive test sign, is .51, and the probability

of a person staying a short time (S), given a negative test sign, is

.78.

A conditional probability of .51 for a (+) test sign means that

you would be as often wrong as you were correct in prediction (L) for

a (+) sign. A conditional probability of .78 for a (-) test sign

means that you would be correct more often than you would be wrong

in predicting (S) for a (-) test sign.

Your task is to try to make the best estimate of probable length

of stay in psychotherapy given additional, but limited, information.

It is possible to correctly classify all the profiles. It is hoped

that your predictions will in some way help us to understand one

aspect of the decision-making process as it is applied by psycholo-

gists in clinical settings.

Again, please indicate your confidence in your judgment for each

"subject from 50 per cent to 100 per cent.





























APPENDIX B

SUMMARY OF NEWMAN-KEULS TESTS













APPENDIX B-I

SUMMARY OF NEWMAN-KEULS TEST FOR GROUP MEAN DIFFERENCES



Differences among Level I means


-GS XSGS

"GS = .46 --- .12* .15**

= .58 --- .03

LSGS = 61



* 2<. 01




Differences among Level II means


iGS P SGS


-GS= .57 -- .04 .11
= .61 --- .05
-p
X = .66
-SGS














Differences among Level III means


X XX
UGS 1P -SGS

GS = .63 --- .03 .10
= .66 --- .07

XG =.73 -
-SGS










Differences among Level IV means



UGS -P SGS

GS .66 -- .04 .05

Xp .70 --- .01

XSGS .71











APPENDIX B-II

SUMMARY OF NEWMAN-KEULS TEST FOR INFORMATION MEAN DIFFERENCES




Differences among SGS means



-I II -Iv II

S = .61 --- .05 .09 .12
-I
X1 = .66 --- .04 .07

S.70 --- .03
-Iv
X = .73 ---
-III


Differences among P means


x| = .58

X, = .61

X = 66

I .71
-Iv


X X X X
-I I I -II II IV

--- .03 .08 .13"

--- .05 .10

--- .05


-------------------------------















Differences among UGS means


-I -II -III IV

x = .46 --- .I* .17 .20;

XI .57 --- .06 .09

- I .63 --- .03

I .66 -


<** .01

* U<.05




Differences among group means


-I -II -Ill Ilv


S = .55 --- .06* 12** .14,k,

- = .61 --- .06* .08*

S= .67 --- .02

xV = .69 ---















REFERENCES


Adams, J. K. A confidence scale defined in terms of expected percen-

tages. Amer. J. Psychol., 1957, 70, 432-436.

Dahlstrom, W. G., & Welsh, G. S. An MMPI handbook: a ouide to use

in clinical practice and research. Minneapolis, Minn.: The

University of Minnesota Press, 1968.

Edwards, A. L. Exerimental design in psychological research. New

York: Rinehart & Company, Inc., 1950.

Goldber., L, R. 1he effectiveness of clinicians' judgements: the

diagnosis of brain damage from the Bender-Gestalt Test. J.

consul,. Ps-ychol., 1959, 23, 25-33.

Go!dberg, L. R. Simple models or simple processes? Some research on

clinics! judgements. Amer. Psyc t, 1968, 23, 483-496.

Golden, M. Some effects of combining psychological tests on clinical

inferences. 1. consult. Psychol., 1964, 28, 440-446.

Holt, R. R. Clinical and statistical prediction: a reformation and

some new data. J. abnorm. soc. Psychol., 1958, 56, 1-12.

Holtzman, W. H. Can the computer supplant the clinician? J. clin.

Psychol., 1960, 16, 119-122.

Kirk, R. E. Experimental design: procedures for the behavioral

sciences. Belmont, Calif.: Brooks/Cole Publishing Co., 1968.

Kostlan, A. A method for the empirical study of psychodiagnosis. J.

consult. Psychol. 1954, 18, 83-88.

61








Lindzey, G. Seer versus sign. J. exp. Res. Pers., 1965, 1, 17-26.

Meehl, P. E. Clinical versus statistical prediction. Minneapolis,

Minn.: University of Minnesota Press, 1954.

Meehl, P. E. Seer versus signs: the first good example. J. exi .

Res. Pers., 1965, 1, 27-33.

Meehl, P. E., & Rosen, A. Antecedent probability and the efficiency

of signs, patterns, or cutting scores. Psvchol. Bull., 1955,

52, 194-216.

Megargee, E. I. (ed.). Research in clinical assessment. New York:

Harper & Row Publishers, 1966.

Mello, Nancy K., & Guthrie, G. M. MMPI profiles and behavior in

counseling. J. counsel. Psychol., 1958, 5, 125-129.

Oskamp, S. The relationship of clinical experience and training

methods to several criteria of clinical prediction. Psvchol.

Mo-no 1S2. 76, No. 547.

Satz, P A block rotation task: the application of rultivariate and

decision theory analysis for the prediction of organic brain

disorder. Psychol. Monoqg., 1966, 80, No. 629.

Shagoury, P. Ta1 influence of statistical information on clinical

decisions. Unpublished master's thesis, University of

Florida, 1969.

Shagoury, P., & Satz, P. Tne effect of statistical information on

clinical prediction. Proceedings: 77th Annual Convention,

APA, !969, 310-311.















BIOGRAPHICAL SKETCH


Ann Weimer Moxley was born March 14, 1946, in New York City,

New York. She attended public schools in Gainesville, Florida, and

graduated from Gainesville High School in 1964. She enrolled in the

University of Florida in September, 1964, and received her Bachelor

of Arts degree, magna cum laude, in December, 1967. She received

her Master of Science degree at the University of Florida in December,

1968. She is currently engaged in her clinical psychology internship

at the University of Rochester's Strong Memorial Hospital in Rochester,

New York.

She is married to James Edward Moxley who received his Master 's

in Business Administration at the University of Florida and is now

employed by Eastman Kodak in Rochester, New York. Ann is a member of

Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board.














This dissertation was prepared under the direction of the

chairman of the candidate's supervisory committee and has been

approved by all members of that committee. It was submitted to

the Dean of the College of Arts and Sciences and to the Graduate

Council, and was approved as partial fulfillment of the require-

ments for the degree of Doctor of Philosophy.


December 1970



Dean, College of Arts and ciencs




Dean, Graduate School






Supervisory Committee:




Chairman


-2Aa- sS --

r ma




Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID ESX35SI9F_QD0P1W INGEST_TIME 2011-07-15T22:43:46Z PACKAGE UF00097732_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

0L6X VaraOTH iO AJLISHSAINn AHdOSOlIHd HO HOXDOa HO HHaOHQ anx HOH SiN3W3OTn5Ta anx ho XNHwnimnH Tvusvd ni VaraOlH JO AXISHHAINn 3HX HO iDNnoD axvnavao hhx ox aaxNHssnid NOuvx«HSsia v A31XOIM •3Wmi5Ci NNV iN3H3eanr ivoinho no NOIiVlAJ^OdNI IVOIlSllViS dO Si03dd3 3Hi

PAGE 2

ACKNOWLEDGEMENTS The author v-jishes to express her gratitude to her committee chairman, Dr. Paul Satz, For his comments, patience, and inspiration throughout the initiation, impl emente t ion and completion of this research. The author is also indebted to the other members of her committe, Dr. Audrey Schum::cher, Dr. Ben Barger, Dr. Marvin Shav-.', and Dr. Donald Childers, for their assistance and criticisms. The author wishes to express her special thanks to her husband, J ini, whose mora! support vsias invaluable and most appreciated.

PAGE 3

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ii LIST OF TABLES ?v LIST OF FIGURES v ABSTRACT vi INTRODUCTION 1 METHOD 14 RESULTS 21 DISCUSSION 35 SUMMARY LiG APPENDIX A INSTRUCTIONS k8 APPENDIX B SUMMARY OF NEWMAN-KEULS TESTS 56 REFERENCES 61 BIOGRAPHICAL SKETCH 63 iii

PAGE 4

LIST OF TABLES Table Page Actual and test-predicted therapeutic outcome 9 Design scfiematic 18 Proportion of correct judgments 22 Summary of analysis of variance of accuracy 23 Mean confidence scores 27 Summary of analysis of variances of confidence 25 Correlation coefficients for appropriateness 31 Summary of analysis of variance of appropriateness correlations 32

PAGE 5

LIST or FIGURES Figure Page 1. Accuracy by information levels 2k 2. Confidence by information levels 29 3. Appropr ia Leness correlations betv.'een accuracy and confidence by information levels 33

PAGE 6

Abstract of Dissertation Presented to the Graduate Council in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy at the University of Florida THE EFFECTS OF STATISTICAL INFORMATION ON CLINICAL JUDGEMENT By Ann Weimer Moxley December, 1970 Chairman: Dr, Paul Satz Major Department: Psychology The present study was designed to look at the effects of adding quantitative and qualitative data to a relevant clinical judgment task. In essence, it compared judges with varying degrees of clinical experience to actuarial prediction methods. The study also attempted to train judges to use actuarial information to improve their prediction accuracy. Twelve judges representing three levels of clinical experience made post-dictive judgments on the length of stay in psychotherapy (short or long) from a sample of MMPI profiles of clients seen in a university mental health service. Judgments were made under four conditions in which qualitative and quantitative information was added incrementally at each level. The three levels of judges' experience were professional clinical psychologists, "sophisticated" third year clinical psychology graduate students trained in

PAGE 7

statistical decision theory, and "unsophisticated" third year clinical psychology graduate students without any training in statistical decision theory. Accuracy increased over levels of information but there were no differences in accuracy for the three levels of experience. A significant group by information level interaction demonstrated some group effects due to a lower proportion of correct judgments for the less experienced judges under conditions involving the least amount of information. Judges became more confident in their judgments as they received more information. Appropriateness, defined as accuracy weighted by confidence and measured by correlation coefficients between accuracy and confidence, increased substantially as increments of information were added. The group trained in statistical decision theory tended to make the most appropriate judgments and the least experienced group of graduate students tended to make the least appropriate judgments. The present study showed that clinicians can use quantitative data to improve their own judgmental ability and to predict more accurately than actuarial data alone. Also, since those judges with the most experience in using actuarial tasks tend to be the most appropriate in their judgments, this implies that clinicians can also be trained to be more appropriate and to know when their judgments are more likely to be accurate.

PAGE 8

INTRODUCTION The objectives of the present study were two-fold. The primary objective was to examine the effects of test and non-test (statistical) information on the judgmental process. The secondary objective, end the focus for implementing the primary objective, was to study the psychological attributes of individuals who stay only a short time in therapy versus those wh,o remain a long time. That is, the objective of studying the judgmental process was couched in a real end reievant situation, length of stay in psychotherapy, which is a meaningful and pressing problem for psychologists today. However this secondary objective v.£:5 minor in relation to the major issue of e;
PAGE 9

methods found that the actunrial methods were either superior to clinicians or equal in efficiency to clinicians (Meehl, 1965). V/ith the exception of one study, the clinician has shown no superiority to purely quantitative actuarial prediction. The one study v.'hich did find clinicians superior (Lindzey, ly65) used one to two clinicians and its application is sonewhat questionable. One reason the clinician has not been superior to actuarial methods is that he has seldom been given the opportunity to incorporate the actuarial information in formulating his final decision. He has been at a disadvantage so that the demonstrated superiority of the actuarial method may be due to the experimental design rather tl'an to an actual superiority of statistical techniques. Also, the information available to the clinician has often been based on non-quantitative data such as interview material, case history data, and projective tests. Holtzman (1S6C) separates the clinician's diagnostic task into three phases: ( I ) col 1 ect i on of information; (2) preparat i on and translation of this information for analysis; (3) i nterpretat ion of this information. As he points out, actuarial methods, and specifically the computer, ore superior to the clinician in processing information once the primary coding has been done. The clinician is still superior at collecting information and at interpreting it because at present the computer lacks the appropriate rules and parameters for interpretation. Thus, studies which emphasize aspects of prediction suitable for actuarial methods do not use the clinician's talents to best advantage. It is when skilled clinicians use familiar methods to predict a criterion they know something about that they have the

PAGE 10

most success (h'olt, 1958). This includes their having a rich body of data and systematic actuarial procedures at their disposal in addition to their own experience, intuition, and knowledge. Recent studies suggest that as the amount of clinical experience increases, prediction accuracy decreases (Goldberg, 1959; Oskamp, 1962; Shaqoury, 19^9; Shagoury & Satz, 1969). These studies compared trained clinicians with a professional degree to clinical psychology graduate students end even to non-professional groups, such as secretaries, and have found that the trained clinicians were not superior to the other groups. An explanation of this finding is that the more experienced clinician har, developed a particular way of looking at data which interferes with his making unbiased, objective decisions. Another aspect of research in the area of clinical versus statistical predictions is the confidence clinicians place in their judgments and the appropriateness of their predictions. Appropriateness Is a measure of confidence weighted by accuracy which was developed by Adams (1957). Confidence in judgments also differs between groups of graduate students and trained cl i nic ians, wi th the trained psychologists being less confident in their judgments (Goldberg, 1959; Oskamp, 1962). When the measurement of appropriateness of the judgment is introduced, however, the trained clinicians are more appropriate in their confidence levels than are either graduate students or nonprofessionals (Oskamp, 19^2; Shagoury, I969). That is, clinicians are more confident of their correct decisions and less confident of their incorrect decisions. The amount of information available to the judge does not correlate with his predictive acc-.iracy but

PAGE 11

increased amounts of information substantially increase confidence levels (Goldberg, 1968). Goldberg (I968) also discusses the nature of the judgmental process. He questions vjhether judges use simple decision-making models such as linear models, or complex processes such as configural models. In an analysis of clinician's judgments he found that a linear model usually reproduced 90 to 100 per cent of the reliable judgmental variance on most decision-making tasks even though the clinicians generally felt that they used more complex, configural model s. Using Statist ical Information to Increase Predi ction Accuracy Training in the use of statistical information has been shown to improve judgmental accuracy. in a study by Oskamp (I962), clinicians 'vicre able to improve their ability to distinguish psychiatric and medical patients on the basis of their Minnesota Multiphasic Personality Inventory (MMPI) profiles vjhen' they were provided with actuarial rules. Statistical formula predicted with 75 per cent accuracy and the clinicians, after training, were able to reach this 75 per cent accuracy level. Goldberg {ISGS) trained judges by giving them a formula and optimum cutting score for distinguishing neurotic form psychotic MMPi profiles. The judges were told that the statistical information predicted v;ith 70 per cent accuracy and they \-iere encouraged to use this information along v/ith any other information they thought would improve their prediction accuracy. Goldberg found that after eight

PAGE 12

weeks of "value training," the judges, on the average, increased theii accuracy from between 52 per cent to 65 per cent to approximately 70 per cent. This was the only type of training that substantially improved accuracy. Thus, feedback is necessary if the clinician is to learn how to ir.iprove his decision-making techniques. Another useful type of statistical information is the incidence, or base rate, of a given trait in the population available to the clinician. Goldberg (1959), for example, had judges predict braindamaged patients from functional patients on the basis of BenderGestalt protocols. The protocols were randomized into different groups in which the incidence of b ra i n-da'nage varied from high (£=.8) to low (£=.2). Goldberg found no difference in judgmental accuracy between these groups. Unfortunately, the base rate information was not provided to the judges. The importance of base rates for evaluating predictive te-jts was discussed by Meehl and Rosen (1955). They cite as an example an Army adjustment test for predicting vjhich inductees would adjust to the service. The test predicted inductee adjustment with an accuracy of 79.7 per cent. However, the overall percentage of inductees who adjusted was 95 per cent; thus, utilization of the base rates alone (i.e., predicting adjustment in all cases) would result in a hit rate of 95 per cent. Another application of base rates is through Bayesian statistical theory which combines the base rates with the valid and false positive rates of a particular test to give a conditional probability for the likelihood of being correct or incorrect given a certain test

PAGE 13

sign in a given base rate population. Shagoury (1963) and Shagoury and Satz (I969) demonstrated that clinicians can substantially improve their predictive accuracy when provided vjith information on base rates and conditional probabilities. These studies sho\-;ed that increments in statistical i nformat ion, added to test data, significantly increased the accuracy of judges in a real-life clinical decision task of predicting brain-damaged end functional patients on the basis of a block rotation task (Satz, I966). Their judges' accuracy approximated that obtained by a discriminant function predictor score (Z.) . Composite Z. scores were de-emphasized by the judges in favor of using the additional information such as the base rates, differential error risks, end conditional probabilities. Hovjever, in groups with a high incidence of brain-damaged individuals (base rate=.8) the judges' overall accuracy decreased, perhaps due to a reluctance to diagnose pathology. Mechl and Rosen (1955) point out that test development should be concentrated on populations with base rates near .50 rather than on populations with base rates approaching .00 or I. 00 since the use of a test in the latter cases will lower the hit rate of using the base rates alone. A cutting score, or composite Z_ score, derived from discriminant function analysis can be manipulated for various purposes in prediction. It can be used to maximize the number of correct predictions for all cases or for maximizing only correct predictions for positives. An excellent application of this teclinique of discriminant function analysis to decision theory in a clinical setting was demonstrated

PAGE 14

by Satz (I966). Discriminant function analysis is a statistical technique devised to maximally differentiate discrete criterion groups when multiple measurements are involved. This is essentially e multiple regression technique occept for a discontinuous distribution on the criterion variables. The follovjing linear equation expresses this funct ion: i-^\\ ^ \h^ w where Z_ is the composite predictor score based on the individual scores on each of the variables (Xj , X2i..-iXp) and the respective weights, or lambdas, assigned to each of the variable scores (X), \j ^n^ ' If there arc two criterion groups involving multiple measures, the discriminant function determines optimal v^jeights (lambdas) for these variables which v;ill maximize the difference betv^een the composite Z scores on botti criterion groups. Length _pf S tay in Psychotherapy as a Criteri on Va riable Why is length of stay in psychotherapy a meaningful problem for study? First, there is the great demand for psychological services with a present-day manpower shortage of trained clinicians. Host clinics that see individuals with psychological problems are understaffed, have pat ient wa i t ing-1 ists , or both. There are also differential risks involved in selecting who will be seen in therapy. It is far more serious to miss those who are severely disturDeo and need long-term psychotherapy because of the threat these individuals may pose to themselves or to society, than it is to wrongly classify persons who need only a fevv sessions and are experiencing minor

PAGE 15

difficulties in their lives. The first type of error, that of predicting a short stay in therapy based on a negative test score when in fact the person stays a long time, is a false negative error. A false positive error results from the prediction of therapy sessions based on a positive test score when the individual actually stays only a few sessions. Meehl and Rosen (1955) point out that often in a clinical setting external restraints are imposed, perhaps due to a shortage of staff time, patient vja i t i ng1 i sts , or administrative policy. If this is the case, decisions cannot always be made in accordance with known base rates. They give the following example to illustrate the use of an externally imposed selection ratio. if 80 per cent of the patients referred to a mental health clinic are recoverable v;ith intensive psychotherapy, then everyone should be treated rather than relying on a test which predicts only 75 per cent of those who will have a favorable therapy outcome. However, if staff time is limited and only half of the referrals can be treated, following the base rates is meaningless because this would lead to a decision that would be impossible to implement. In this case, where a selection ratio of .5 is externally imposed, the use of the test becomes worthwhile. Given the figures in Table 1 (Keehl & Rosen, 1955). those 50 cases out of the 100 referrals to be treated are selected from those individuals the test predicts will be "good" therapy risks. If this is done there is a 92.3 per cent hit rate among those selected for therapy (6O/65) . Stated another way, the test will be correct in ^6 out of the 50 cases which v.ill be successes (half of the 80 good therapeutic

PAGE 16

Table 1 Actual and Test-Predicted Therapeutic Outcome Test Predict ion Therapeutic Outcome Good Poor Total Good Poor 60 20 65 35 Total 80 20 100

PAGE 17

10 outcome group) . A second reason for selecting length of stay in psychotherapy as the focus for a clinical judgment study is that the probler.i can be subjected to multivariate and statistical decision theory analysis in order to increase the predictive relationship bet'ween signs and criteria. This possibility thus increases its application and potential usefulness to clinical judges. One study in this area found that there are differences in behavior in psychotherapy betv.een individuals which are predictable from an HHPI profile (Mello & Guthrie, 1958). Kello and Guthrie studied 219 individuals seen at a college psychological clinic. They used only those profiles with at least one T score greater than 70. They found that length of stay in therapy was related to high scores on various scales of the HNPI. Of those students with high scores on Scale 2(D), kS per cent remained only one to three sessions. Persons high en Scale 3(Hy) tended to stay in therapy longer than the high 2's and also developed dependency on tiie therapists more easily. Scale ^(Pd) individuals seldom stayed past seven counseling sessions and as a group iMere quite resistant to therapy although they did not often cancel their appointments. Persons who stayed the longest in therapy were high on Scales 7(Pt) or 8(Sc) with some clients continuing past 60 and 21 sessions respectively for these two scales. Most of the high S(Ma) students stayed fewer than 11 sessions and cancelled therapy sessions frequently. Mello and Guthrie concluded thet a therapist can get seme idea of what to expect from a particular client on ti.e basis of h^s MMPI profile.

PAGE 18

n The flello and Guthrie study is interesting because it suggests that psychological data (MMPl) may be used by clirticians to more efficiently select clients for psychotherapy. Unfortunately, the authors did not examine this problem within the context of a decisionmaking task nor did they subject their data to multivariate analysis. Using length of stay in psychotherapy as the predictor criterion is valuable for other reasons. For the professional involved, it may clarify the services offered by his agency and help hir,i to provide more adequate services to his clients. For example,. he may decide that seeing many clients for a short period of time is of more value tiian giving those who need long-term therapy this service and thus seeing fevJer clients. That is, prevention may be emphasized in a college mentai health clinic and such a clinic may be designed to see as many students as possible to ease their transition from high school or junior college to a college curriculum. On the other hand, a clinic nay be more treatment oriented and seek to help those vjho are more disturbed and require longer therapy. This emphasis would require more staff time per individual client and would necessitate seeing fev.'cr clients. Decisions of whom to treat could be more adequately made with test and non-test information. To be able to predict length of stay in therapy could affect therapist expectations which could in turn affect outcome variables. Just what effect an expectation for a particular length of stay in therapy will have on the outcome of the therapy is outside the scope of the present study but is an important research question in itself. Of couf-se, if the clinician intends to see ever/one who enters his

PAGE 19

12 clinic, a screening procedure is vjorthless or may even be detrimental if the test predicts that an individual will not stay in therapy or will not improve in therapy, because this may lead the therapist to expect just these results to the client's disadvantage (Meehl & Rosen, 1955). It is often necessary for the clinician to indicate a therapy prognosis for an individual. If the clinician can predict or learn to accurately predict whether or not a person will stay in therapy, he is providing useful information for the person's treatment. Thus it can be seen that clinicians are constantly involved in the task of prediction and decision-making. If they can be trained to make use of relevant data and material, they may improve their predictions. Although mar.y clinicians look VN'ith disfavor on the use of tests, tests combined v-jith other relevant data can be shovjn to have practical and research applications. The clinician may use them 'to better his predictions and decision-making processes. Hypotheses Test ed The present study was addressed to two objectives. First, to examine the decision-making process and to determine VJhether prediction accuracy is influenced by independent variables such as clinical experience and varying amounts and kinds of information. Second, the question of vjhether clinicians can be trained to improve their clinical decision processes vsias also examined. The first and primary objective was studied in terms of the second objective, a real-life situation that is meaningful to clinicians today--the problem of

PAGE 20

13 length of stay in psychotherapy. If increments in levels of statistical information increase prediction eccuracy and thereby improve the clinicians' decision process, this type of information may be dovetailed into the operation of a clinic and taught to the staff to identify high-risk individuals. Specific questions, or hypotheses, were raised. Does judgmental accuracy increase as more information is added to the prediction task and what types of information are most useful in increasing judgmental accuracy? Will there be differences in accuracy dependent on experience level? That is, will graduate clinical psychology students trained in statistical decision theory be better clinical judges than experienced PhD clinical psychologists (vjithout such training) and vjI 1 1 less experienced psychologists be superior to more experienced clinicians? Will confidence and appropriateness increase vvilth increments in information and will there be differences betvjeen the three experience levels, with regard to their confidence and appropriateness.

PAGE 21

METHOD Sub i ects . Tv.elve judges (Js) represented three levels of experience aoo sophistication in statistical decision-making, A professional (P) group of four PhD clinical psychologists represented the highest level of clinical experience. A group of four clinical psychology graduate students trained (sophisticated) in statistical dec is ion-ma King theory (SGS) represented the highest level of statistical sophistication. Another group of four un^ophii st icated (not trained in statistical decision theory) clinical psychology graduate students (UGS) represented the saiue experience level as the SGS group and the same level of statistical sophistication as the P group. Sophistication in decision theory was defined as pa rt i c i fia t ion in a graduate course in statistical decision theory for clinical psychology students at the University of Florida. Sophistication here only Implies special training and by no means implies that ttie professionals were clinically unsophisticated. Ma t e r I a 1 s . Test materials for Js were a random sample of 100 MMPI profiles of clients seen in a university mental health service. The sample profiles were drawn from 2^1 profiles of all clients seen during a three-year period. Each J received 25 of the 100 profiles. Profiles viere divided into tvjo groups based on the client's length of stay in psychotherapy at the mental health service. A short 14

PAGE 22

15 stay (S) vjas defined as four or less therapy sessions end a long stay (L) as five or more therapy sessions. The mean length or stay for the S group was 2.00 sessions and for the L group 9.27 sessions. A discriminant function analysis which maximized the difference between the two length of stay in psychotherapy groups v-.'as run on the 2^1 MMPI profiles. The mean discriminant composite scores for the two length of stay in therapy groups on the 13 ^'MP| scale variables were Z_, =29.7^1 for the few-session group (S) and 1^-3^.26 for the many-session group (L) . .An analysis of variance of the composite means showed a significant difference between the two groups (£=^^1.19. o'f/-:12,22't, p<.O0I). A com.monly used rule of Z= -1 *" -2 was used to 2 determine the optimal predictive cutting I score. With an emphasis on minimizing the false negative rate, the composite Z. score of 32.02 predicted with an overall hit rate of 67 per cent for the original protocol pool. False negative er'-ors represented those clients who were predicted as, : hort-stays (S) , or negatives, but who remained long in therapy (L). It was felt that this predictive error was more serious than the false positive error which included those clients who were predicted as long-stays (L) but who remained a short time in therapy (S). It seemed more important to identify those clients who really needed long-term therapy than to identify those who did not. Of course, some of the individuals with high test scores who stayed only a fev; sessions may have been very disturbed but dropped out of therapy prematurely. There was no v;ay to identify these case's when a very disturbed student may have 3?nicked cr become threatened by therapy and dropped out or

PAGE 23

16 simply missed appointments. The false negative rate for the Z score of 32.02 vjas .38, tiic false positive rate was .31, giving a valid negative rate of .69 and a valid positive rate of .62. Another Z. score v.'liich minimized the false positive error prediction v.'ith an overall accuracy of 71 per cent vs'as not used in the present study for the reason stated above. Conditional probabilities v;ere calculated for the Z. cutting score. Conditional probabilities v.'ere computed v-jith the following equi) t ions : P'.U-r) p(gp(,./L) + p(s)r(+/S) ^""^ ^^^^ '' P(S)P(-/S) + P(L)P(-/L) v^hert: L=many ther'-ipy sessions or a long stay in therapy (base rate-. 66) S=fcw therapy sessions or a short stay in therapy (base rate=.3^0 +-a positive test score (Z ^32.02) --a negative test score (Z < 32.02) For the Z. score of 32.02 tiie conditional probabilities were: P(L/^)-c5i snd P(S/-)-.78. With this ne.; i niorrrat i on it c^n be seen that with a positive test score, predictions will he wrong as often as they are correct. But given a negative test sco'e, predictions will be riglit 7S per cent, or most of the time. Finally, s random sample of 100 profiles from the total protocol pool of 2^+1 cases was drawn. This was done so that the J.s would have fevJer pro;:ocois to judge, making their task more economical with regard to time. 'Failure to control this factor undoubtedly lovjered the predictive accuracy of the discriminant function equation (and perhaps clinical judgment) in that some of the disturbed profiles in the (S) criterion grcip may well have remained (L) if they had not dropped out.

PAGE 24

17 A second reason for dravjing a random sample was to make the situation more relevant clinically in terms of the base rates. That is, the sample had only approximate base rates and the judges did not knov; the exact probabilities for their sample of those viho remained a long or short time in therapy. Hov;ever, for the sample, the 2 score predicted with the same accuracy that it did for the total protocol pool . Procedure . Refer to Table 2 for a schematic of the design. J,5 were asked to predict a client's length of stay in psychotherapy from tiie 25 MHP! profiles. These profiles, the sample of 100 profiles and the original profile, pool all had approximately the same base rates; 3^ per cent of the clients stayed many sessions (L) and 66 per cenl stayed a few se<;sion5 (S) . The Js predicted length of st.^y in therapy (S or L) during four sess i ons, wi th additional information added incrementally at each session. These sessions, or level? of information, represented one class of independent variables Groups, or experience level, represented the other class of independent variables. Each J made his predictions on the same 25 protocols that he received at th.e first level throughout the training. Level 1: Js were first given KMPI profiles with no other information. Level 11: Js were again presented the same 25 protocols for the same judgment but with the additional information of biographical data such as age, sex, marital status, religious preference, parents' marital status, previous counseling experience, and subsequent counseling experience. Level ill: For the tliird decision task, Js were given the

PAGE 25

0) JD O "O O) JD O XI
PAGE 26

19 profiles, biographical data, with the additional statistical information of the cutting score based on discriminant function analysis. Valid positive and false positive percentages were also provided with the cut-off I score. Level !V: Conditional probabilities and the base rates were added to the previous information for the fourth presentation of profiles for prediction. (For a copy of the instructions for each information level see Appendix A.) For each judgment Js also indicated their confidence in the accuracy of their judgment . To rule out a practice effect from repeated presentation of the same profiles, two control judges were used who predicted length of stay in psychotherapy using profiles only, with no additional information on four separate occasions. Judges v.'ere presented the profiles for judgments on four days in a rov; with only one information level given each cay. Hyp otheses. I -I nf ormat ion Level: It was hypothesized that incre.nients of info'-mation would inciease overall judgmental accuracy and group accuracy. (A) Level I accuracy would be at approximately the level of chance. (B) At Level II, accuracy would decrease or remain the same. ( C) Level III accuracy would be approximately that of the actuarial prediction accuracy of the d i scr imi nar,t function analysis. (D) Level IV accuracy would increase slightly over Level III accuracy. I |--Exper ience Level: It was hypothesized that the statistically sophisticated graduate students v.'Duld be the most accurate, the statistically unsophisticated graduate students next most accurate.

PAGE 27

20 and the professionals least accurate. I I I --Confidence and Appropriateness: It was hypothesized that confidence ratings v.'ould increase with increments of information and that appropriateness would also increase with more information.

PAGE 28

RESULTS Accuracy: The Effects of Information and Experience Accuracy was defined as the proportion of correct judgments per presentation of 25 MMPl profiles. The two control Js showed no practice effects. Judge A's accuracy was 52 per cent on the first presentation and ^48 per cent on each of the three subsequent presentations. Judge B's accuracy vias distributed across sessions as follows: 76 per cent, '(8 per cent, 68 per cent, and 68 per cent. Table 3 presents accuracy by information level and experience level. Two analyses of variance were conducted to determine the effects of i ."format ion level, experience level, judges, profile set, and profile. The analysis of variance For profile set effects was non-sigoi f ici.nt (f,' = .5'3, df.=3,9I)An F^^^^ test for homogeneity of variances between groups was also non-significant (£niax~-^ •^-' ' — "^ ' d£=l6). A sumTiary cf the analysis of variance for information level and expL^rience level effects Is shown in Table k. .I nformation . Mean judgmental accuracy increased consistently with increments of information from Level ! to Level IV (X^, = .55, X.,=.6I, X,,,=,67, iju=.63). These differences were significant (F.^10.82, df.=3,27, £-<^.01). A graphic presentation of this trend is shown in Figure i. Inspection of Figure 1 shows approximately a linear increase in accuracy for the three groups by information level. Both the P and UGS groups increased their accuracy at each level w'l. ile the SGS group shewed increases at Levels 11 and Ml 21

PAGE 29

Table 3 Proportion of Correct Judgments 22 Exper ience

PAGE 30

Table k Summary of Analysis of Variance of Accuracy 23 Source of Variation df MS Mean 478.80 Information Level

PAGE 31

2k 70 60 50 !_. SGS P -A. UGS I il III Information Level Fig, I. Accuracy by information levels, IV

PAGE 32

25 but a slight decrease in accuracy from Level III to Level IV. Exper ience . There VJere no differences in accuracy due to experience level except for a trend tovjard group differences (£=2.3'-t, d.f= 2,9, p<.20). The SGS group vjas the most accurate and the UGS group the least accurate (XsGS=^-^S, Xp=.6^4, iuGS = -58). Only the SGS group's overall accuracy v;as at the level of the discriminant function VJhich predicted with 67 per cent accuracy. Information and experi en ce level interaction . The only other significant source of variance was the group by information level interaction (F-7.23. df.=6,27, p<.Ol). The Newman-Keuls test of differences hetvveen means v-jas used (Kirk, I9G8) and the results of tliis analysis are given in Appendix B. The interaction v.as based largely on a significantly lower proportion of correct judgn.ents of the UGS group at Level I. The UGS group not only started with the lowest proportion of correct judgments, but also shovN'ed the most significant Increase in accuracy as information was added. Their final degree of accuracy, however , was approximately the same as the SGS accuracy at Level I.' The UGS group significantly increased their level of accuracy at Levels II! and IV from Level I VJhen the composite I score, conditional probabilities, and base rates were added (p<,01). The only significant increase in accuracy for the P group was between the first level, \-i\th the profile only, and the final level with all information (p<.05). Increases in accuracy for all groups across information levels were significant except the increase from Level !!! to Level IV where conditional probabilities and base rates

PAGE 33

26 were added. Adding conditional probabilities and base rates to the previous information did not result in a significant increase in J s ' accuracy over Level Mi, which included tiie composite Z. score. For the SGS group there were no significant differences in accuracy across Infornation levels. The only significant group differences within information levels were between the SGS group and the UGS group (p<.01) and between the SGS group and the P group (p<;.05). Conf idenc e Mean confidence scores by information level and experience level are shov
PAGE 34

27 SGS UGS Table 5 Mean Confidence Scores Experience Information Level Totals Level J I 11 Ml IV ' 61.2

PAGE 35

28 Table 6 Summary of Analysis of Variance of Confidence Source of Var i at ion df. MS F Inforniat ion Level 2 52.6? 5.38-' ExperieiiCt^ Level 2 191.8^* .39 Information X Experience 6 12.73 1.30 Judges vvithin Groups 9 ' ^93.76 lnforrr,at ion X Judges 28 9-79 ".'>' Significant at the .05 level.

PAGE 36

29 70 60 50 Cl n SGS A 1 UGS I M III Information Level Fig. 2. Confidence by information levels.

PAGE 37

30 giveri in Teble 7 vjith the analysis of variance summary for appropriateness in Table 8. The analysis of variance was based on 1 transformations of the correlation coefficients. Mean appropriateness scores were significantly higher at each level of information (F=22.03, cLf=2,28, p<.01). The SGS group was most appropriate because they were most accurate and not overconfident, that is, their confidence was consistent VJith their accuracy. The P group was overly confident and the UGS group vs'as less accurate, making these two groups' confidence inconsistent vJith their accuracy. These trends can be seen in Figure J). Judges versu s the discriminant functio n The discriminant function correctly classified 67 per cent of the profiles. This information Vv'as given to the J_s at Level Ml. At Level ill only the SGS judges v;ere more accurate than the discriminant function with a hit rate of 73 per cent. The P group had a hit rate of 66 per cent and the UGS group had a hit rate of 63 per cent at Level 111. The accuracy for all judges combined was 67 per cent. Five judges (two in the SGS group, two in the P group and one in the UGS croup) were more accurate overall than the linear regression _Z score and only one J. (in the UGS group) operated below the chance level Ov-erall. At Level I, four Js had accuracy scores below the level of chsncs and tv.o otliers vjere only slightly above cliance. However, none of tr.e four J^s who was belovj chance wzrc in the SGS group. At Level 1! thr^re were two J.S below chance and one slightly above, again,

PAGE 38

Table 7 Correlation Coefficients for Appropriateness 31 Exper ience

PAGE 39

32 Table 8 Summary of Analysis of Variance of Appropriateness Correlations Source of Varidtion df MS Information Level 2 Experience Level 2 information X Experience 6 Judges within Groups 9 Information X Judges 28 0705

PAGE 40

33 30 .00 IV and I II III Information Level Fig. 3. Appropriateness correlations between accuracy confidence by information levels.

PAGE 41

34 none of these Js was in the SGS group. V/ith the addition of the statistical information, only one J. (in the UGS group) remained near the chance level of accuracy and he vjas the least accurate of all the Js.

PAGE 42

DISCUSSION The present study demonstrated that judges can substantially improve their decision accuracy v>fhen provided vjlth increments of information, particularly statistical information. This finding extends the earlier findings reported for a different clinical judgment task (Shagoury & Satz, ^SbS) and contrasts wi th " prev ious studies which have used non-quantitative data. These findings also suggest that if the clinician is able to incorporate quantitative information, he may improve his ov;n decision-making ability and equal or surpass the accuracy of actuarial methods. The findings of the present study also shov.'ed that accuracy increased directly as a function of the amount of information available to the judges. Two conclusions that can be drawn from this finding are that the information v;as relevant to tiie judgmental task and that the judges used this information in formulating their dec is ions. Information A post-testing interview revealed that the type of information used varied between groups, among judges, and betvjeen information levels. Hov^ever, the interview was not structured enough to determine the actual decision rules used by the judges. At Level I, witii the HMPi profile only, most judges used their 35

PAGE 43

36 ovjn intuition about the relationships of vjliich scales were elevated and the extent of these elevations to the length of stay in treatment criterion. There was a great deal of individual variation in approach since each judge had differenct training experiences with the MMPI. The judges of the SGS group had the most similar training experience in the use of the I'nMPI since some training in the rationale and use of this instrument was given in the statistical decision theory course. The SGS group also shov;ed the least amount of individual variation in accuracy at Level I. The other group of graduate students (UGS) had the least amount of exposure to the KMPI. The UGS group was barely familiar vjith this test instrument and none of these judges had had any formal training in its use. It is interesting to note that the group of unsopti i st i Cdted graduate students shoveled the lowest accuracy througliout and v.'as the only group whose accuracy was never below the level of chance. It seems then, that the more familiar a judge is with a test instrument, the more accurate he will be in using it for prediction. The SGS group vjas not familiar with the specific type of task used in the present study. That is, they had not been trained in correlating MMPI data to length of stay in psychotherapy. This aspect of the study vjas novel to each of the three groups. At Level II again each judge approached the data differently and selected certain measures to use in mai
PAGE 44

37 hypothesis that accuracy vjould decrease at Level II was not supported, It was originally felt that all of the biographical data provided would make the task more complex and more difficult and would thus confuse the judges. However, the judges were able to relate some of the information to the task and thereby improve their judgments. Most judges used some combination of factors. V/hether the profile subject had previous counseling or subsequent counseling and his age were the factors used most often. Some of the judges also considered marital status when the subject was married. This finding (Level II) contrasts with other studies which indicate lowering of accuracy vjhen data are combined (Golden, I96'i)In support of the hypothesis, overall accuracy for Level III was the same as the discriminant function's accuracy of 67 per cent. With the addition of 1 scores at Level III, only one judge, who was in the P group, used the cut-off score exclusively. In this same group one judge changed none of his judgments from Level I! and the other two judges used primarily their own subjective inferences. The UG5 group essentially ignored the Z. scores and relied on their own intuition and thus did not reach the level of accuracy of the cutoff score. All judges in the SGS group combined the cut-off score data vjith their own intuition to improve upon the accuracy of the discriminant function. These findings imply that the clinician can make use of his intuition and experience but not at the expense of ignoring available data, particularly when they include quantitative information. The findings also imply that the most accurate judges arc. the ones who are able to utilize statisticei data.

PAGE 45

38 The fact that there v/as an increase in accuracy from Level III to Leve! IV, but that this increase was non-significant, supported the hypothesis that Level IV accuracy would increase slightly over Level II! accuracy. For the SGS group there was a slight decrease in accuracy. One reason for this decrease might have been the information itself. These judges had been trained to use more pov-Jerful statistical information, that is, data that discriminated groups and sub-groups more than did the data of the present study. The base rates of .65 end .35 vjere not sufficiently different from base rates of .5 to be of much help. Also the conditional probabilities were not high enough to provide maximum discrimination. All of the statistical data given VJere in approximately a 2/3 to 1/3 ratio. Because all of the statistical data had approximately the same pre-dictive power, it may have been difficult to knovrj which kind of information would be most useful. Instead, judges may have tried to coH-bine two or more kinds of data and as aresult were less accurate than they would have been using one type exclusively. Quantitative information is most useful when it represents higher ratios, such as base rates of .2/. 8 or .1/.9; conditional probabilities of .85/. 15; and cut-off scores of 75 per cent or higher. Even thougii five out of the twelve judges showed decreases in accuracy at Level IV in comparison with Level III, these decreases were slight and represented only one more incorrect judgment out of the 25 judgments for all five judges. At Level IV, the UGS group ignored the base rates and used the conditional probabilities. The SGS group and the P group both used

PAGE 46

39 a combination of conditional probabilities and base rates and both of these groups had tlie same degree of accuracy at Level IV. Also, both the SGS and P groups were more accurate than the UGS group. It seems probable that statistical information is more important than biographical information about the subjects since there was a greater increase in accuracy with the addition of statistical information. Other studies have siiown that biographical data are of minimal value to judges. Golden {]3Gk) found that judges agreed less in their description of protocols v-.'hen they were given identifying data alone than when they were given a single psychological test or a combination of tests. Kostlan (195'+) found that judges VJere more accurate in their psychod iagnoses when they received both social cose histories and the more quantitative MMPI protocol than when they received social case histories alone. One may ask if the judges would liave been more accurate had they been given some feedback on their accuracy at each levol of information. This is possible but then the task would not have been as life-like in the sense that clinicians in actual situations must usually wait some time before learning the accuracy of their predictions. However, this does emphasize the point that clinicians should check the accuracy of their predictions when possible and learn what helps them to predict most accurately. Exper i en ce and training The lack of overall differences between groups was not anticipated. It was assumed that the SGS group would have benefited from tlieir training in statistical decision theory. However,

PAGE 47

ko artifacts in the design tended to v.'ash out group effects by providing a guaranteed hit rate if the I scores vjere used at Levels III and IV. The convergence of judgmental accuracy for each group at Level IV lends some support for this argument. The hypothesis that the SGS group v-Jould be the most accurate was thus only tentatively supported since there vias not a significant group effect. However, the SGS group tended to be the most accurate in their judgments. This finding implies that clinicians can be trained to improve their ov.'n subjective inferences vnth statistical information. These Judges trained in statistical decision theory were able to add thsir own intuitive judgments to the statistical datu and thereby predict more accurately than did the discriminant function alone or than they had done without the statistical data. This special training taught them not only now to use statistical information but also how to use their clinical intuition to its best advantage. The SGS group also tended to be the most appropriate, that is, to know when their judgments were most accurate and when they were most inaccurate. The fact that the P group tended to be more accurate than the UG5 group was also unexpected in light of previous findings concerning amount of clinical training and accuracy of prediction. This finding does not support the previous evidence (Goldberg, 1959; Oskamp, 1962; Shagoury, 1969, Shagoury & Satz, 1969) that as the amount of clinical experience increases, prediction accuracy decreases. In the present study, judges in the SGS and P groups used familiar methods, the MHPI profiles or statistical data; the

PAGE 48

k] UGS group, by contrast, was presented v.'ith essentially unfamiliar prediction tools. One reason data in the present study were at variance VJith previous findings is that previous studies required clinicians to predict an unknovjn criterion or to use unfamiliar methods so that any previous "set" of the clinician was not advantageous. In the present study, the j udges' fami 1 ia r i ty with either the MMPI or statistical types of data helped them in their predictions. Interaction e ffects The significant group by information interaction effect showed at least indirect support for the experience level hypothesis that the SGS group Vviould be most accurate. This interaction showed that the unsophisticated graduate students started off predicting belov-; chance and, finally at Level IV, reached the level of accuracy that the sophisticated graduate students attained at Level I (MMPI profiles alor.e). It was the former group with the least amount of experience, familiarity with the KMPI, and sophistication with the statistical decision theory which accounted for most of the group differences and much of the interaction effect. The rest of the interaction effect was due to the changes in accuracy across information levels vnth the UGS group shcv\iing the greatest change and the SGS group showing the least amount of change in accuracy. The latter group started out predicting fairly accurately and had less room for improvement while the former group started out so poorly that their improvement was marked. The SGS group predicted almost as well as the discriminant function with

PAGE 49

m the profiles only. The UGS group improved from belovvi chance to the level of accuracy actiieved by the discriminant function. Conf i de nce Previous studies would suggest that the trained clinicians should have had less confidence in their judgments than the tv-jo preprofessional groups. Although group differences for confidence viere non-significant, the professionals in the present study tended to be the most confident. Again, this may have been because they were using the KMPl with which they v;ere more familiar than viere the other two groups. Also the professional group was predicting a criterion about which they knew sox.ething, that is length of stay in psychotherapy. This again suggests that previous studies have placed the clinician at a disadvantage so that he is less accurate and less confident than he would be predicting in a familiar setting. In general, adding information substantially increased the judges' confidence. Judges becarrie more confident as well as more accurate vv'itli increments in information. However, the UGS group's confidence did not increase until they had all the available informa t i on . Cne problem with asking judges to assign a confidence rating to each judgment was that each judge had a different standard or set for measuring how confident he was. The range of confidence scores used also varied between judges and within groups so that one judge used all six possible levels of confidence ranking (50, 60, 70, 80, 90, and 100) v-;hile another judge only used tvio (60, 70) or three (80, 50, 100) rankings.

PAGE 50

^3 Appropr la teness The most meaningful measure to express appropriateness, defined as the relationship betv.'een accuracy and confidence, was the correlation coefficient. Just as accuracy and confidence increased with each level of information, so did appropriateness. As judges became more accurate they also became appropriately more confident. The increases in appropriateness follovjed the same pattern as the increases in accuracy and confidence. That is, appropriateness increased significantly across levels of information but there was only a tendency for one group to be more appropriate than the other groups. As with accuracy, the SGS group tended to be the most appropriate end the UGS group tended to be the least appropriate. This contrasts earlier findings that trained clinicians are more appropriate in their confidence levels than are graduate students in psychology (Oskanp, 1962, Shegoury, 1965). The findings of the present study, however, do not cont rad i ct -ea r 1 i er findings since the present differences between groups on the measure of appropriateness were non-significant. Appl icat ions It appears that actuarial data and training in their use can be applied to situations in which clinicians must predict and make decisions. In the present study, judges were able to post-diet length of stay in psychotherapy fairly v-Jel 1 . The next step v^jould be to apply these techniques to the same setting and £redict a client's length of stay in psychotherapy. This could then be followed up at

PAGE 51

kh the end of treatment as a check of prediction accuracy. This would enable the clinician to determine vjhich short stays were "no shovjs'' and which were treated. Thus the discriminant function I score and judges predictions could be much higher and more useful for practical application to the clinician's population of clients. This type of procedure is most useful in a clinic situation vjhich must limit the number of clients seen or must screen those that will be seen. Statistical methods of prediction can be particularly applicable to the screening of patients to determine what type of treatment is most appropriate and vjould be most useful for each client. To use actuarial data in a clinic s i tuat i on , they must first be collected and analyzed. Too many clinical situations today fail to make use of the data they have available. They do not even know the base retes for vario'JS classifications of the clients they see. Collecting and analyzing statistical data is another way to more fully understand a particular clinical setting by learning what type of patients are seen, how long they stay in treatment, and hopefully, which ones are most likely to improve. If a clinic decides to see everyone v.'ho comes in for help, tests and statistical data are not of benefit in sel ect i ng whom to see. However, these data might be used for prediction and research in a setting which sees all clients. It is in situations where everyone cannot be treated that improving tests and collecting base rate information is most needed. V/here decisions and predictions must be made, actuarial methods are most needed to improve the clinician's dec i s ion-.mski ng ability.

PAGE 52

^5 A further study v^Jhich would be a fair and optimal test of clini cal versus statistical prediction v.'ould be to give judges an opportunity to see tiic relat ionsln i ps of test variables v\iith a criterion on a standardization sample. Then, the judges would be compared with a discriminant function on a cross validation sample. However, this vjas not trie purpose of the present study.

PAGE 53

SUMMARY The present study was designed to look at the effects of adding quantitative and qualitative data to a relevant clinical judgment task. In essence, it compared judges with varying degrees of clinical experience to actuarial prediction methods. The study also attempted to train judges to use actuarial information to improve their prediction accuracy. Twelve judges representing three levels of clinical experience marie pcst-dictive judgments on the length of stay in psychotherapy (short or long) fro.i a Sc^T.ple of MMPI profiles of clients seen in a university mental health service. Judgments were made under four conditions in which qualitative ;nd quantitative information was added incrementally at each level. The three levels of judges' experience were professional clinical psychologists, "sophisticated" third year clinical psychology graduate students trained in statistical decision theory, and "unsophisticated" third year clinical psychology graduate students vjithout any training in statistical decision theory. Accuracy increased over levels of information but there were no differences in accuracy for the three levels of experience. A significant group by information level interaction demonstrated some group effects cue to a lower proportion of correct juogmerits for the less experienced judges under conditions involving the least amount ^6

PAGE 54

^7 of inf oruii^t ion. Judges became more confident in tfieir judgments as they received more information. Appropriateness, defined as accuracy weighted by confidence and measured by correlation coefficients between accuracy and confidence, increased substantially as increments of information were added. The group trained in statistical decision theory tended to make the most appropriate judgments and the least experienced group of graduate students tended to make the least appropriate judgments. The present study showed that clinicians can use quantitative data to improve their own judgmental ability and to predict more accurately than actuarial data alone. Also, since triose judges with the most experience in using actuarial tasks tend to be the most appropriate in their judgments, this implies that clinicians can also be trained to be more appropriate and to know when their judgments are more likely to be accurate.

PAGE 55

APPENDIX A INSTRUCTIONS

PAGE 56

^9 APPENDIX A1 INSTRUCTIONS PART I This study is designed to exa.'nine the decision process v;hen only limited inforni?tion is available. You will be presented with 25 Minnesota Multipfiasic Personality Inventory (MMPI) profiles of students seen at the University of Florida Infirmary Mental Health Service. Some of these students stayed a long time in therapy (5 or more sessions, X=9) snd some stayed only a short time (4 or less sessions, X=2) , Your task v-;ill be to decide v.'hich students stayed a long time (L) and which stayed only a short time (S) on the basis of the test profile alone. Your task is to try to make the best estimate of probable length of stay in psychotherapy given only limited information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decision-making process as it is applied by psychologists in clinical sett ings. You will also be asked to rate your confidence for each subject on a scale from 50 per cent to 100 per cent. If you are positive of your decision, you should mark 100 per cent; if you are only guessing you should mark 50 per cent. That is, the more certain of your decision, the higher percentage you should mark.

PAGE 57

50 APPENDIX A-l! INSTRUCTIONS PART II Your task on Part II is identical to that on Part I. You will be presented the same 25 profiles and asked to predict (S) or (L). Hovjever, this time more i nforn-iat ion VJill be available to you. That is, you vi']]] also hove biographical data. You may use this information in any v.'ay you wish. You may choose to disregard the information altogether and make your predictions as you did in Part I. Your task is to try to make ttie best estimate of probable length of stay in therapy given only limited information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decision-making process as it is applied by psychologists in clinical sett ings. Again, please indicate your confidence in your judgment for each subject from 50 per cent to 100 per cent.

PAGE 58

51 APPENDIX A-l! i INSTRUCTIONS PART I II Your task on Part III is identical to that of Parts I and II. YoLi will be presented the same 25 profiles and asked to predict as accurately as possible, on the basis of the information given, whether the student Is (S) or (L). Again, more information will be made available to you. The follov.'ing statistical information vn 1 I be added. Discriminant function analysis provided weights for each of the 13 MMP! scale variables in order to obtain maximal differentiation between long stayers (I.) and short slayers (S). A composite score " (_Z) v-jas obtained which best estimates the combined relative effects of all the scale variables. This Z score is used to make the best prediction as to vjhich criterion group a particular profile belongs. This can be summarized as fol lows : 1. Zi32.02 is a positive test sign {{) and indicates a probable long stay in therapy (L). 2. Z<32,02 is a negative test sign (-) and indicates a probable short stay in therapy (S). No test, however, classifies without some errors. This derived composite cut-off (Z=32.02) yields the follov'jing percentages of

PAGE 59

class if icat ion: 52 Composite test sign Cr iter ion Z<32.02 I > 32.02 S 69% 31% L 38% 62% In other vjords: 1. A (-0 test sign (Z^32.02) correctly classified 62 per cent of the long steyers (L). This is knovm as the ya lid pos.itive _ra t e . Also, a (+) test sign incorrectly classified 31 per cent of the short stayers (S) and this is the fals e positi ve rate . 2. A (-) test sign (Z'i32.02) correctly classified 69 per cent of (S) , the valid n eoatjy a rate , and incorrectly classified 38 per cent of (L) , the false negat ive rate. This means that 38 per cent of the (l-)'5 scored below 32,02 and were incorrectly classified (S), and 31 per cent of the (S)'s scored above 32.02 and were incorrectly classified as (L) . The total percentage correctly classified was 67 per cent. You will be required to predict as accurately as possible v-jhether the student belongs to (S) or (L) , short or long stay. It is possible to score every profile correctly scoring 100 per cent. You r.iay predict (S) or (L) by using (l)the composite Z score cut-off, (2)the biographical data, (3)the profile alone, or (^)any combination of (1), (2), and (3). The composite cut-off score v-vas applied to yield the best overall classification rate but no test

PAGE 60

53 is perfect and errors may be made with any procedure. It is quite possible that the clinician may be able to improve upon tiie linear statistical method (_Z score) by utilizing combinations of both "intuitive" and statistical data. Your task, is to try to make the best estimate of probable length of stay in therapy given additional, but limited, information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of the decision-making process as it is applied by psychologists in cl inical sett ings.

PAGE 61

5^+ APPENDIX AIV INSTRUCTIONS PART IV Your task on Part IV is identical to that of Parts I, II, and III, utilizing the same 25 profiles. You arc to predict as accurately as possible on the basis of the information given, VJhether the student is (S) or (L) . Again, more information will be made available to you. In addition to the composite Z_ score, biographical data, and test data, you will also be told the conditional probabilities and base rates for the groups and test signs. Conditional probabilities combine test signs, (+) or (-) , and base rates to yield a quantitative index of the probability of correct classification when Z> 32.02 (+) or when Z^32.02 (-). For example, some of the subjects will be (L) v.'hen Z_i32,02 (+) and some will be (S) when Z^< 32.02 (-). The problem is to determine how confident vis can be with each test sign under the base rates of the population. The base rates for the two groups are: Short (S)66 per cent and Long (L)=3'+ per cent. In other words, 3^ per cent of the subjects stayed a long time in therapy and 66 per cent stayed only a short tine. The majority, therefore, were shorts (S). Based on this information, the conditional probabilities are: for a (+) test sign, P(L/+)=.51 and for a (-) test sign, P(S/-)=.78. This means that the probability of a person staying a long time in

PAGE 62

55 therapy (L) , given a positive test sign, is .51, and the probability of a person staying a short time (S), given a negative test sign, is .78. A conditional probability of .51 for a (+) test sign means that you vjould be as often wrong as you viere correct in prediction (L) for a (+) sign. A conditional probability of .78 for a (-) test sign means that you would be correct more often than you would be v/rong in predicting (S) for a (-) test sign. Your task is to try to make the best estimate of probable length of stay in psychotherapy given additional, but limited, information. It is possible to correctly classify all the profiles. It is hoped that your predictions will in some way help us to understand one aspect of tiie decision-making process as it is applied by psychologists in clinical settings. Again, please indicate your confidence in your judgment for each 'subject from 50 per cent to 100 per cent.

PAGE 63

APPENDIX B SUMFiARY OF NEV/MAN-KEULS TESTS

PAGE 64

57 APPENDIX B-l SUMMARY OF NEWMAN-KEULS TEST FOR GROUP MEAN DIFFERENCES Differences among Level I means ^GS h -SGS -Vs = -^^

PAGE 65

Differences among Level III means 58 X XX njGS -p -SGS XuGS = -63 — .03 .10 X^ = .66 — .07 ^SGS -73 Differences amonq Level IV means

PAGE 66

59 APPENDIX B-l I SUMMARY OF NEWMAN-KEULS TEST FOR INFORMATION MEAN DIFFERENCES Differences amonq SGS means X = .61 X .70 -|v X = .73 -III ^1

PAGE 67

60 Differences among UGS means ^. ^11 ^ill ^IV X| = .kS — .11-X.,, ^ .63 .17-

PAGE 68

REFERENCES Adams, J. K. A confidence scale defined in terms of expected percentages. An-.er. J. Psychol.. 1957, 70, i+32-^36. Dahlstror,!, W, G. , 6V/elsh, G. S. An MMP I handbook: a. guide t^ use in g-LLn.'-t:.^l pra ctice and r esearch . Minneapolis, Minn.: The University of Minnesota Press, 19^8. Edwards, A. L. ixp^JlilLejita.i des.ijin In p_syd^^ New York: Rinehart & Company, inc., 1950. Goicibirij, i. . R. The effect i veness of clinicians' judgements: the diagr^c-r-is of brain dcr.oge from the Bender-Gesta 1 1 Test. J.. £.5..1SiiU..PL^Jl-boi. . 1S59, 23, 25-33. Goldbercj, L, f<. Simple models c simple processes? Some research on clinics! judgements. Arnejr. Psy chologist , 1968, 23, ^83-^*96. Gclde.-ij H. Some effects of combining psychological tests on clinical inferences. J. consult. Psychol. . 196'+. 28, UkO-k^S. Hoit, K. R. Clinical and statistical prediction: a reformation and some ne\i data. J. a b n orm , spc . Psychol . , 1958, 56, 1-12. Holtzman, W. H. Can the computer supplant th.e clinician? J. cji±n. Psychol. . I960, 16, 119-122. K i r i< , R . E . Ex per ime ntp 1 de_s_l_qn: grocediires. for t.h_e behavioral scie nces . Belmont, Calif,: Brooks/Cole Publishing Co., 1968. Kostlan, A. A method for the empirical study of psychod iagnos i s , J. con_5.Ljlt_. Psychol. , 195'», 13, 83-88. 61

PAGE 69

62 LJncIzey, G. Seer versus sign. J. ex p . Res. Pers . . I965, 1, 17-26. Keehl , P. E. CI in ica 1 versLis stat i st ical pred ict ion . Minneapolis, Minn.: University of Minnesota Press, 195^. Meehl , P. E= Seer versus signs: the first good example. J., ex p. R_esEsUJl, 1965, 1, 27-33. Heehl, P. E. , SKosen, A. Antecedent probability and the efficiency of signs, patterns, or cutting scores. Psychol . Bui 1 . . 1955, 52, 19^)-216. Megargee, E. !. (ed.). Research in clinical assessment . New York: Harper £ Row Publishers, 19^6. Mello, r,'ancy K, , £ Guthrie, G. M. MHPl profiles and behavior in counseling. J. c,gyji5_eJL. Psychol. . I958, 5, 125-129. Oskamp, 5. The relationship of clinical experience and training metf-.cds to several criteria of clinical prediction. Psychol . Mpn.oar,. , I562. 76. No. 5^*7. "Satz, P A block rotation task: the application or" ir-u 1 1 i va r iate and decision tlieory analysis for the prediction of organic brain disorder. Psychol . Monqgx. 1966, 8O, No. 629. Shaqoury, P. The jnf.l uence of. s tat i st ica 1 i nformat ion on cl ini ca 1 dec is "ons. Unpublished master's thesis. University of Florida, I969. Shagoury. P., & Satz, P. Tne effect of statistical information on clinical prediction. Proceed ings: 77th Annual Corivention, APA, •!S69, 310-311.

PAGE 70

BIOGRAPHICAL SKETCH Ann V-/eimer Moxley v/as born Ha rcfi Ui , \Shb, in Ne\-J York City, NeV'J York. She attended public schools in Gainesville, Florida, and graduated from Gainesville High School in 196'+. She enrolled in the University of Florida in September, \3Sh , and received her Bachelor of Arts degree, magna cum laude, in December, 1967. She received her Master of Science degree at the University of Florida in December, 1968. She is currently engaged in her clinical psychology internship at the University of Rochester's Stronj Memorial Hospital in Rochester NevJ York. She is married to James Edward Moxley vvho received his Ncster 's in Business Administration at the University of Florida and is novj employed by Eastman Kodak in Rochester, Nev: York. Ann is a member of Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board. 63

PAGE 71

This dissertation W£)s prepared under the direction of the chairinaii of the candidate's supervisory ccnimittee and has been approved by all members of that committee. It was submitted to the Dean of the College of Arts and Sciences and to the Graduate Couiicil, end was approved as partial fulfillment of the requirements for the degree of Doctor of Pliiloscphy. Dece;nber 1S70 Dean, College of Arts and Sciences Dean, Graduate School Supervisory Committee: Chairman ' i^(^

PAGE 72

/fff ^/^^ ^:t^ RX) 1 38.^^' ^'