Group Title: effects of statistical information on clinical judgement
Title: The Effects of statistical information on clinical judgement
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00097732/00001
 Material Information
Title: The Effects of statistical information on clinical judgement
Physical Description: vii, 63 leaves. : ill. ; 28 cm.
Language: English
Creator: Moxley, Ann Weimer, 1946-
Publication Date: 1970
Copyright Date: 1970
 Subjects
Subject: Clinical psychology   ( lcsh )
Statistical decision   ( lcsh )
Psychology thesis Ph. D   ( lcsh )
Dissertations, Academic -- Psychology -- UF   ( lcsh )
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
 Notes
Thesis: Thesis--University of Florida, 1970.
Bibliography: Bibliography: leaf 37.
Additional Physical Form: Also available on World Wide Web
General Note: Manuscript copy.
General Note: Vita.
 Record Information
Bibliographic ID: UF00097732
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000559443
oclc - 13493553
notis - ACY4899

Downloads

This item has the following downloads:

PDF ( 2 MBs ) ( PDF )


Full Text









THE EFFECTS OF STATISTICAL INFORMATION

ON CLINICAL JUDGEMENT











By
ANN WEIMER MOXLEY













A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA
1970














ACKNOWLEDGEMENTS


The author wishes to express her gratitude to her committee

chairman, Dr. Paul Satz, for his comments, patience, and inspiration

throughout the initiation, implementation and completion of this re-

search. The author is also indebted to the other members of her

committee, Dr. Audrey Schumacher, Or. Ben Barger, Dr. Marvin Shaw, and

Dr. Donald Childers, for their assistance and criticisms. The author

wishes to express her special thanks to her husband, Jim, whose moral

support was invaluable and most appreciated.

















TABLE OF CONTENTS


ACKNOWLEDGEMENTS

LIST OF TABLES

LIST OF FIGURES

ABSTRACT .


INTRODUCTION . .

METHOD . . .

RESULTS ...

DISCUSSION .

SUMMARY ...


APPENDIX A INSTRUCTIONS . . .

APPENDIX B SUMMARY OF NEWMAN-KEULS TESTS


REFERENCES . . . . . . . . . . .

BIOGRAPHICAL SKETCH ........


. . . 48

. . . . . . 56














LIST OF TABLES



bible Page


1. Actual and test-predicted therapeutic outcome . . .. 9

2. Design schematic .... .............. .18

3. Proportion of correct judgments . . . . .... 22

4. Summary of analysis of variance of accuracy ...... 23

5. Mean confidence scores . . . . . .... . .. 27

6. Summary of analysis of variances of confidence . . 28

7. Correlation coefficients for appropriateness . . .. .31

8. Summary of analysis of variance of appropriateness

correlations . . .. . . .. . . ... . 332














LIST OF FIGURES



Figure Page


i. Accuracy by information levels . . . . . ... 24

2. Confidence by information levels . . . . ... 29

3. AppropriaLeness correlations between accuracy and

confidence by information levels . . . . .... 33








Abstract of Dissertation Presented to the Graduate Council
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
at the University of Florida


THE EFFECTS OF STATISTICAL INFORMATION
ON CLINICAL JUDGEMENT


By


Ann Weimer Moxley


December, 1970



Chairman: Dr. Paul Satz
Major Department: Psychology


The present study was designed to look at the effects of adding

quantitative and qualitative data to a relevant clinical judgment

task. In essence, it compared judges with varying degrees of clini-

cal experience to actuarial prediction methods. The study also

attempted to train judges to use actuarial information to improve

their prediction accuracy.

Twelve judges representing three levels of clinical experience

made post-dictive judgments on the length of stay in psychotherapy

(short or long) from a sample of MMPI profiles of clients seen in a

university mental health service. Judgments were made under four

conditions in which qualitative and quantitative information was

added incrementally at each level. The three levels of judges' ex-

perience were professional clinical psychologists, "sophisticated"

third year clinical psychology graduate students trained in








statistical decision theory, and "unsophisticated" third year clini-

cal psychology graduate students without any training in statistical

decision theory.

Accuracy increased over levels of information but there were no

differences in accuracy for the three levels of experience. A sig-

nificant group by information level interaction demonstrated some

group effects due to a lower proportion of correct judgments for the

less experienced judges under conditions involving the least amount

of information.

Judges became more confident in their judgments as they received

more information. Appropriateness, defined as accuracy weighted by

confidence and measured by correlation coefficients between accuracy

and confidence, increased substantially as increments of information

were added. The group trained in statistical decision theory tended

to make the most appropriate judgments and the least experienced

group of graduate students tended to make the least appropriate judg-

ments.

The present study showed that clinicians can use quantitative

data to improve their own judgmental ability and to predict more

accurately than actuarial data alone. Also, since those judges with

the most experience in using actuarial tasks tend to be the most

appropriate in their judgments, this implies that clinicians can also

be trained to be more appropriate and to know when their judgments

are more likely to be accurate.









INTRODUCTION


The objectives of the present study were two-fold. The primary

objective was to examine the effects of test and non-test (statistical)

information on the judgmental process. The secondary objective, and

the focus for implementing the primary objective, was to study the

psychological attributes of individuals who stay only a short time in

therapy versus those who remain a long time. That is, the objective

of studying the judgmental process was couched in a real and relevant

situation, length of stay in psychotherapy, which is a meaningful and

pressing problem for psychologists today. However this secondary

objective was minor in relation to the major issue of examining thp

clinical judgment process.


Clinical Versus Actuarial Predictio.

Ever since Paul Meehl's book, Clinical Versus Statistical Pre-

dict ,ion clarified the issue of clinicians' predictions versus

actuarial predictions, there have been numerous studies comparing

these two prediction methods. As Meehl (1954t) points out, however,

the two methods need not be mutually exclusive since the clinician can

incorporate actuarial methods and data into his prediction process.

Many studies have focused not only on comparing clinicians to statis-

tical formulae but also on improving the clinician's ability to

predict by giving him useful statistical information and training him

to use this information.

In general, the studies which compared clinicians to actuarial








methods found that the actuarial methods were either superior to clin-

icians or equal in efficiency to clinicians (Meehl, 1965). With the

exception of one study, the clinician has shown no superiority to

purely quantitative actuarial prediction. The one study which did

find clinicians superior (Lindzey, 1965) used one to two clinicians

and its application is somewhat questionable. One reason the clin-

ician has not been superior to actuarial methods is that he has seldom

been given the opportunity to incorporate the actuarial information in

formulating his final decision. He has been at a disadvantage so that

the demonstrated superiority of the actuarial method may be due to the

experimental design rather than to an actual superiority of statistical

techniques. Also, the information available to the clinician has often

been based on non-quantitative data such as interview material, case

history data, and projective tests.

Holtzman (1960) separates the clinician's diagnostic task into

three phases: (1)collection of information; (2)preparation and trans-

lation of this information for analysis; (3)interpretation of this

information. As he points out, actuarial methods, and specifically

the computer, are superior to the clinician in processing information

once the primary coding has been done. The clinician is still

superior at collecting information and at interpreting it because at

present the computer lacks the appropriate rules and parameters for

interpretation. Thus, studies which emphasize aspects of prediction

suitable for actuarial methods do not use the clinician's talents to

best advantage. It is when skilled clinicians use familiar methods

to predict a criterion they know something about that they have the








most success (Holt, 1958). This includes their having a rich body of

data and systematic actuarial procedures at their disposal in addition

to their own experience, intuition, and knowledge.

Recent studies suggest that as the amount of clinical experience

increases, prediction accuracy decreases (Goldberg, 1959; Oskamp,

1962; Shagoury, 1969; Shagoury & Satz, 1969). These studies compared

trained clinicians with a professional degree to clinical psychology

graduate students and even to non-professional groups, such as secre-

taries, and have found that the trained clinicians were not superior

to the other groups. An explanation of this finding is that the more

experienced clinician has developed a particular way of looking at

data which interferes with his making unbiased, objective decisions.

Another aspect of research in the area of clinical versus statis-

tical predictions is the confidence clinicians place in their judg-

ments and the appropriateness of their predictions. Appropriateness

is a measure of confidence weighted by accuracy which was developed

by Adams (1957). Confidence in judgments also differs between groups

of graduate students and trained clinicians,with the trained psychol-

ogists being less confident in their judgments (Goldberg, 1959; Oskamp,

1962). When the measurement of appropriateness of the judgment is

introduced, however, the trained clinicians are more appropriate in

their confidence levels than are either graduate students or non-

professionals (Oskamp, 1962; Shagoury, 1969). That is, clinicians

are more confident of their correct decisions and less confident of

their incorrect decisions. The amount of information available to

the judge does not correlate with his predictive accuracy but








increased amounts of information substantially increase confidence
levels (Goldberg,1968).

Goldberg (1968) also discusses the nature of the judgmental

process. He questions whether judges use simple decision-making

models such as linear models, or complex processes such as configural

models. In an analysis of clinician's judgments he found that a

linear model usually reproduced 90 to 100 per cent of the reliable

judgmental variance on most decision-making tasks even though the

clinicians generally felt that they used more complex, configural

models.


Using Statistical Information to increase Prediction Accuracy

Training in the use of statistical information has been shown

to improve judgmental accuracy. In a study by Oskamp (1962), clini-

cians were able to improve their ability to distinguish psychiatric

and medical patients on the basis of their Minnesota Multiphasic

Personality Inventory (MMPI) profiles when'they were provided with

actuarial rules. Statistical formula predicted with 75 per cent accur-

acy and the clinicians, after training, were able to reach this 75

per cent accuracy level.

Goldberg (1968) trained judges by giving them a formula and

optimum cutting score for distinguishing neurotic form psychotic MMPI

profiles. The judges were told that the statistical information

predicted with 70 per cent accuracy and they were encouraged to use

this information along with any other information they thought would

improve their prediction accuracy. Goldberg found that after eight








weeks of "value training," the judges, on the average, increased their

accuracy from between 52 per cent to 65 per cent to approximately 70

per cent. This was the only type of training that substantially im-

proved accuracy. Thus, feedback is necessary if the clinician is to

learn how to improve his decision-making techniques.

Another useful type of statistical information is the incidence,

or base rate, of a given trait in the population available to the

clinician. Goldberg (1959), for example, had judges predict brain-

damaged patients from functional patients on the basis of Bender-

Gestalt protocols. The protocols were randomized into different

groups in which the incidence of brain-damage varied from high (P=.8)

to low (.=.2). Goldberg found no difference in judgmental accuracy

between these groups. Unfortunately, the base rate information was

not provided to the judges.

The importance of base rates for evaluating predictive tests was

discussed by Meehl and Rosen (1955). They cite as an example an Army

adjustment test for predicting which inductees would adjust to the

service. The test predicted inductee adjustment with an accuracy of

79.7 per cent. However, the overall percentage of inductees who

adjusted was 95 per cent; thus, utilization of the base rates alone

(i.e., predicting adjustment in all cases) would result in a hit rate

of 95 per cent.

Another application of base rates is through Bayesian statisti-

cal theory which combines the base rates with the valid and false

positive rates of a particular test to give a conditional probability

for the likelihood of being correct or incorrect given a certain test








sign in a given base rate population.

Shagoury (1969) and Shagoury and Satz (1969) demonstrated that

clinicians can substantially improve their predictive accuracy when

provided with information on base rates and conditional probabilities.

These studies showed that increments in statistical information,added

to test data, significantly increased the accuracy of judges in a

real-life clinical decision task of predicting brain-damaged and

functional patients on the basis of a block rotation task (Satz, 1966).

Their judges' accuracy approximated that obtained by a discriminant

function predictor score (L). Composite 7 scores were de-emphasized

by the judges in favor of using the additional information such as

the base rates, differential error risks, and conditional probabil-

ities. However, in groups with a high incidence of brain-damaged

individuals (base rate=.8) the judges' overall accuracy decreased,

perhaps due to a reluctance to diagnose pathology.

Mechl and Rosen (1955) point out that test development should be

concentrated on populations with base rates near .50 rather than on

populations with base rates approaching .00 or 1.00 since the use of

a test in the latter cases will lower the hit rate of using the base

rates alone.

A cutting score, or composite Z score, derived from discriminant

function analysis can be manipulated for various purposes in predic-

tion. It can be used to maximize the number of correct predictions

for all cases or for maximizing only correct predictions for positives.

An excellent application of this technique of discriminant func-

tion analysis to decision theory in a clinical setting was demonstrated









by Satz (1966). Discriminant function analysis is a statistical tech-

nique devised to maximally differentiate discrete criterion groups

when multiple measurements are involved. This is essentially a multi-

ple regression technique except for a discontinuous distribution on

the criterion variables. The following linear equation expresses this

function:

= X, + X 22 +... + nn

where 7 is the composite predictor score based on the individual scores

on each of the variables (XI, X2,...-,X) and the respective weights,

or lambdas, assigned to each of the variable scores (A*' A2"'"... n

If there are two criterion groups involving multiple measures, the

discriminant function determines optimal weights lambdass) for these

variables which will maximize the difference between the composite Z

scores on both criterion groups.


Length of Stay in Psychotherapy as a Criterion Variable

Why is length of stay in psychotherapy a meaningful problem for

study? First, there is the great demand for psychological services

with a present-day manpower shortage of trained clinicians. Most

clinics that see individuals with psychological problems are under-

staffed, have patient waiting-lists, or both. There are also differ-

ential risks involved in selecting who will be seen in therapy. It

is far more serious to miss those who are severely disturbed and need

long-term psychotherapy because of the threat these individuals may

pose to themselves or to society, than it is to wrongly classify

persons who need only a few sessions and are experiencing minor








difficulties in their lives. The first type of error, that of pre-

dicting a short stay in therapy based on a negative test score when

in fact the person stays a long time, is a false negative error. A

false positive error results from the prediction of therapy sessions

based on a positive test score when the individual actually stays

only a few sessions.

Meehl and Rosen (1955) point out that often in a clinical set-

ting external restraints are imposed, perhaps due to a shortage of

staff time, patient waiting-lists, or administrative policy. If this

is the case, decisions cannot always be made in accordance with known

base rates. They give the following example to illustrate the use of

an externally imposed selection ratio. If 80 per cent of the patients

referred to a mental health clinic are recoverable with intensive

psychotherapy, then everyone should be treated rather than relying on

a test which predicts only 75 per cent of those who will have a favor-

able therapy outcome. However, if staff time is limited and only half

of the referrals can be treated, following the base rates is meaning-

less because this would lead to a decision that would be impossible

to implement. In this case, where a selection ratio of .5 is exter-

nally imposed, the use of the test becomes worthwhile. Given the

figures in Table 1 (Meehl & Rosen, 1955), those 50 cases out of the

100 referrals to be treated are selected from those individuals the

test predicts will be "good" therapy risks. If this is done there is

a 92.3 per cent hit rate among those selected for therapy (60/65).

Stated another way, the test will be correct in 46 out of the 50

cases which will be successes (half of the 80 good therapeutic














Table I

Actual and Test-Predicted Therapeutic Outcome


Therapeutic Outcome
Test Good Poor Total
Prediction





Good 60 5 65


Poor 20 15 35


Total 80 20 100









outcome group).

A second reason for selecting length of stay in psychotherapy as

the focus for a clinical judgment study is that the problem can be

subjected to multivariate and statistical decision theory analysis in

order to increase the predictive relationship between signs and cri-

teria. This possibility thus increases its application and potential

usefulness to clinical judges.

One study in this area found that there are differences in be-

havior in psychotherapy between individuals which are predictable

from an MMPI profile (Mello & Guthrie, 1958). Mello and Guthrie

studied 219 individuals seen at a college psychological clinic. They

used only those profiles with at least one T score greater than 70.

They found that length of stay in therapy was related to high scores

on various scales of the MMPI. Of those students with high scores

on Scale 2(D), 45 per cent remained only one to three sessions. Per-

sons high on Scale 3(Hy) tended to stay in therapy longer than the

high 2's and also developed dependency on the therapists more easily.

Scale 4(Pd) individuals seldom stayed past seven counseling sessions

and as a group were quite resistant to therapy although they did not

often cancel their appointments. Persons who stayed the longest in

therapy were high on Scales 7(Pt) or 8(Sc) with some clients contin-

uing past 60 and 21 sessions respectively for these two scales. Most

of the high 9(Ma) students stayed fewer than II sessions and cancelled

therapy sessions frequently. Mello and Guthrie concluded that a

therapist can get some idea of what to expect from a particular client

on the basis of his MMPI profile.









The Mello and Guthrie study is interesting because it suggests

that psychological data (MlPI) may be used by clinicians to more

efficiently select clients for psychotherapy. Unfortunately, the

authors did not examine this problem within the context of a decision-

making task nor did they subject their data to multivariate analysis.

Using length of stay in psychotherapy as the predictor criterion

is valuable for other reasons. For the professional involved, it may

clarify the services offered by his agency and help him to provide

more adequate services to his clients. For example,.he may decide

that seeing many clients for a short period of time is of more value

than giving those who need long-term therapy this service and thus

seeing fewer clients. That is, prevention may be emphasized in a

college mental health clinic and such a clinic may be designed to see

as many students as possible to ease their transition from high school

or junior college to a college curriculum. On the other hand, a

clinic may be more treatment oriented and seek to help those who are

more disturbed and require longer therapy. This emphasis would re-

quire more staff time per individual client and would necessitate

seeing fewer clients. Decisions of whom to treat could be more ade-

quately made with test and non-test information.

To be able to predict length of stay in therapy could affect

therapist expectations which could in turn affect outcome variables.

Just what effect an expectation for a particular length of stay in

therapy will have on the outcome of the therapy is outside the scope

of the present study but is an important research question in itself.

Of course, if the clinician intends to see everyone who enters his








clinic, a screening procedure is worthless or may even be detrimental

if the test predicts that an individual will not stay in therapy or

will not improve in therapy, because this may lead the therapist to

expect just these results to the client's disadvantage (Meehl & Rosen,

1955).

It is often necessary for the clinician to indicate a therapy

prognosis for an individual. If the clinician can predict or learn

to accurately predict whether or not a person will stay in therapy,

he is providing useful information for the person's treatment.

Thus it can be seen that clinicians are constantly involved in

the task of prediction and decision-making. If they can be trained

to make use of relevant data and material, they may improve their

predictions. Although many clinicians look with disfavor on the use

of tests, tests combined with other relevant data can be shown to

have practical and research applications. The clinician may use them

'to better his predictions and decision-making processes.


.HEpothLeses Tested

The present study was addressed to two objectives. First, to

examine the decision-making process and to determine whether predic-

tion accuracy is influenced by independent variables such as clinical

experience and varying amounts and kinds of information. Second, the

question of whether clinicians can be trained to improve their clin-

ical decision processes was also examined. The first and primary

objective was studied in terms of the second objective, a real-life

situation that is meaningful to clinicians today--the problem of









length of stay in psychotherapy. If increments in levels of statis-

tical information increase prediction accuracy and thereby improve

the clinicians' decision process, this type of information may be

dovetailed into the operation of a clinic and taught to the staff to

identify high-risk individuals. Specific questions, or hypotheses,

were raised. Does judgmental accuracy increase as more information

is added to the prediction task and what types of information are

most useful in increasing judgmental accuracy? Will there be differ-

ences in accuracy dependent on experience level? That is, will grad-

uate clinical psychology students trained in statistical decision

theory be better clinical judges than experienced PhD clinical

psychologists (without such training) and will less experienced

psychologists be superior to more experienced clinicians? Will con-

fidence and appropriateness increase with increments in information

and will there be differences between the three experience levels,

with regard to their confidence and appropriateness.
















METHOD


Sub ects. Twelve judges (Js) represented three levels of

experience and sophistication in statistical decision-making. A pro-

fessional (P) group of four PhD clinical psychologists represented

the highest level of clinical experience. A group of four clinical

psychology graduate students trained (sophisticated) in statistical

dccisicn-maKing theory (SGS) represented the highest level of statis-

ticai sophistication. Another group of four unsophisticated (not

trained in statistical decision theory) clinical psychology graduate

students (UGS) represented the same experience level as the SGS group

and the same level of statistical sophistication as the P group.

Sophistication in decision theory was defined as participation in a

graduate course in statistical decision theory for clinical psychology

students at the University of Florida. Sophistication here only im-

plies special training and by no means implies that the professionals

were clinically unsophisticated.

Materials. Test materials for Js were a random sample of 100

MMPI profiles of clients seen in a university mental health service.

The sample profiles were drawn from 241 profiles of all clients seen

during a three-year period. Each J received 25 of the 100 profiles.

Profiles were divided into two groups based on the client's

length of stay in psychotherapy at the mental health service. A short









stay (S) was defined as four or less therapy sessions and a long stay

(L) as five or more therapy sessions. The mean length of stay for

the S group was 2.00 sessions and for the L group 9.27 sessions.

A discriminant function analysis which maximized the difference

between the two length of stay in psychotherapy groups was run on

the 241 MMPI profiles. The mean discriminant composite scores for

the two length of stay in therapy groups on the 13 MMPI scale vari-

ables were Z =29.74 for the few-session group (S) and Z2=34.26 for

the many-session group (L). An analysis of variance of the composite

means showed a significant difference between the two groups (F=4.19,

df=12,224, p<.001). A commonly used rule of 2= 1i + -2 was used to
2
determine the optimal predictive cutting Z score.

With an emphasis on minimizing the false negative rate, the com-

posite Z score of 32.02 predicted with an overall hit rate of 67 per

cent for the original protocol pool. False negative errors repre-

sented those clients who were predicted as, hort-stays (S), or

negatives, but who remained long in therapy (L). It was felt that

this predictive error was more serious than the false positive error

which included those clients who were predicted as long-stays (L)

but who remained a short time in therapy (S). It seemed more impor-

tant to identify those clients who really needed long-term therapy

than to identify those who did not. Of course, some of the individ-

uals with high test scores who stayed only a few sessions may have

been very disturbed but dropped out of therapy prematurely. There

was no way to identify these case's when a very disturbed student may

have panicked or become threatened by therapy and dropped out or









simply missed appointments. The false negative rate for the Z score

of 32.02 was .38, the false positive rate was .31, giving a valid

negative rate of .69 and a valid positive rate of .62.

Another Z score which minimized the false positive error pre-

diction with an overall accuracy of 71 per cent was not used in

the present study for the reason stated above.

Conditional probabilities were calculated for the 7 cutting

score. Conditional probabilities were computed with the following

equations:

P/ (L)-/L-- and P(/-) --L-
P(L/+) = P(L)P(+/L) + P(S)P(+/S) anP(S)P(-/S) + P(L)P(-/L)
wherL L=many therapy sessions or a long stay in therapy (base rate=.66)

S=few therapy sessions or a short stay in therapy (base rate=.34)

+=a positive test score (Z 1 32.02)

-=a negative test score (Z < 32.02)

For the 2 score of 32.02 the conditional probabilities were:

P(L/1)=.51 and P(S/-)=.78. With this new information it can be seen

that with a positive test score, predictions will be wrong as often

as they are correct. But given a negative test score, predictions

will be right 78 per cent, or most of the time.

Finally, a random sample of 100 profiles from the total protocol

pool of 241 cases was drawn. This was done so that the Js would have

fewer protocols to judge, making their task more economical with

regard to time.



'Failure to control this factor undoubtedly lowered the predic-
tive accuracy of the discriminant function equation (and perhaps
clinical judgment) in that some of the disturbed profiles in the (S)
criterion 9rouip may well have remained (L) if they had not dropped out.









A second reason for drawing a random sample was to make the

situation more relevant clinically in terms of the base rates. That

is, the sample had only approximate base rates and the judges did not

know the exact probabilities for their sample of those who remained a

long or short time in therapy. However, for the sample, the 2 score

predicted with the same accuracy that it did for the total protocol

pool.

Procedure. Refer to Table 2 for a schematic of the design. Is

were asked to predict a client's length of stay in psychotherapy

from the 25 MIPI profiles. These profiles, the sample of 100 pro-

files and the original profile,pool all had approximately the same

base rates: 35 per cent of the clients stayed many sessions (L) and

b6 per cent stayed a few sessions (S). The Js predicted length of

stay in therapy (S or L) during four sessions,with additional infor-

mation added incrementally at each session. These sessions, or

levels of information, represented one class of independent variables.

Groups, or experience level, represented the other class of indepen-

cent variables.

Each J made his predictions on the same 25 protocols that he

received at the first level throughout the training. Level 1: Js

were first given MMPI profiles with no other information Level 11:

Is were again presented the same 25 protocols for the same judgment

but with the additional information of biographical data such as age,

sex, marital status, religious preference, parents' marital status,

previous counseling experience, and subsequent counseling exper-

ience. Level ill: For the third decision task, is were given the



















2ZJ v\Or vm


en Uvf


a fi iU


Ufi 0Vc enf OV~ enfi U~U
gil Iii jai
-cN-vf 1Jen\Oh-m 0-\O-IN


Q J0u
-NM@


O)O-N










mff?
mirt









????
T I V 7
.I u









profiles, biographical data, with the additional statistical infor-

mation of the cutting score based on discriminant function analysis.

Valid positive and false positive percentages were also provided

with the cut-off Z score. Level !V: Conditional probabilities and

the base rates were added to the previous information for the fourth

presentation of profiles for prediction. (For a copy of the instruc-

tions for each information level see Appendix A.) For each judgment

Js also indicated their confidence in the accuracy of their judg-

ment.

To rule out a practice effect from repeated presentation of the

same profiles, two control judges were used who predicted length of

stay in psychotherapy using profiles only,with no additional infor-

mation on four separate occasions.

Judges were presented the profiles for judgments on four days

in a row with only one information level given each cay.

Hypotheses. I--Information Level: It was hypothesized that

increments of information would increase overall judgmental accuracy

and group accuracy. (A) Level I accuracy would be at approximately

the level of chance. (B) At Level II, accuracy would decrease or

remain the same. (C) Level III accuracy would be approximately that

of the actuarial prediction accuracy of the discriminate function

analysis. (D) Level IV accuracy would increase slightly over Level

II accuracy.

ll--Experience Level: It was hypothesized that the statistic-

ally sophisticated graduate students would be the most accurate, the

statistically unsophisticated graduate students next most accurate,






20


and the professionals least accurate.

Ill--Confidence and Appropriateness: It was hypothesized that

confidence ratings would increase with increments of information and

that appropriateness would also increase with more information.












RESULTS


Accuracy: The Effects of Infornation and Experience

Accuracy was defined as the proportion of correct judgments per

presentation of 25 MMPI profiles. The two control Js showed no prac-

tice effects. Judge A's accuracy was 52 per cent on the first

presentation and 48 per cent on each of the three subsequent presen-

tations. Judge B's accuracy was distributed across sessions as

follows: 76 per cent, 48 per cent, 68 per cent, and 68 per cent.

Table 3 presents accuracy by information level and experience

level. Two analyses of variance were conducted to determine the

effects of information level, experience level, judges, profile set,

and profile. The analysis of variance for profile set effects was

non-significant (f'=.59, df=3,91). An Emax test for homogeneity of

variances between groups was also non-significant (fmax=343' k=3,

df=16). A surmary of the analysis of variance for information level

and experience level effects is shown in Table 4.

Information. Mean judgmental accuracy increased consistently

with increments of information from Level i to Level IV (x =.55,

Xl=.61, Xi '.67, XV=.69). These differences were significant

(L-10.82, df=3,27, pa .01). A graphic presentation of this trend

is shown in Figure 1. Inspection of Figure I shows approximately a

linear increase in accuracy for the three groups by information

level. Both the P and UGS groups increased their accuracy at each

level while the SGS group showed increases at Levels II and Ill

21












Table 3

Proportion of Correct Judgments




Experience Information Level Totals

Level J I II III IV


S .64 .68 .76 .76 .71
SGS 2 .56 .68 .72 .64 .65
3 .64 .68 .72 .72 .69
4 .63 .60 .72 .68 .65

Total .61 .66 .73 .70 .68


5 .72 .64 .64 .76 .69
UGS 6 .40 .64 .72 .68 .61
GS 7 .36 .52 .48 .56 .48
8 .36 .48 .68 .64 .54

Total .46 .57 .63 .66 .58


9 .44 .48 .68 .72 .58
10 .68 .68 .64 .76 .69
P 11 .64 .68 .68 .72 .68
12 .56 .60 .64 .60 .60

Total .58 .61 .66 .70 .64


Totals .55 .61 .67 .69


Control A .52 .48 .48 .48 .49
Control B .76 .48 .68 .68 .65















Table 4

Summary of Analysis of Variance of Accuracy


Source of Variation df MS F




Mean I 478.80


Information Leel 3 1.21 10.82*-


Experience Level 2 0.92 2.31;


Information X Experience 6 0.81 7.23*'


Judges 9 0.39 0.69


Information X Judges 27 0.11 0.95


Profile 288 0.57


Information X Profile 864 0.11


** Significant at the .01 level.



















--- SGS

A-- UGS

70





0




5 0
0






II III IV

Information Level

Fig. i. Accuracy by information levels.








but a slight decrease in accuracy from Level III to Level IV.

Fxperiencc. There were no differences in accuracy due to exper-

ience level except for a trend toward group differences (F=2.34, df=

2,9, p<.20). The SGS group was the most accurate and the UGS group

the least accurate (XGs=.68, p=.64, _uGS=-58). Only the SGS

group's overall accuracy was at the level of the discriminant func-

tion which predicted with 67 per cent accuracy.

Information and experience level interaction. The only other

significant source of variance was the group by information level

interaction (F=7.23, df=6,27, p<.01). The Newman-Keuls test of

differences between means was used (Kirk, 19G8) and the results of

this analysis are given in Appendix B. The interaction was based

largely on a significantly lower proportion of correct judgments of

the UGS group at Level I. The UGS group not only started with the

lowest proportion of correct judgments, but also showed the most

significant increase in accuracy as information was added. Their

final degree of accuracy, however, was approximately the same as the

SGS accuracy at Level I! The UGS group significantly increased

their level of accuracy at Levels III and IV from Level I when the

composite 7 score, conditional probabilities, end base rates were

added (p<.01).

The only significant increase in accuracy for the P group was

between the first level, with the profile only, and the final level

With all information (p<.05). Increases in accuracy for all groups

across information levels were significant except the increase from

Level III to Level IV where conditional probabilities and base rates









were added. Adding conditional probabilities and base rates to the

previous information did not result in a significant increase in

Js' accuracy over Level III, which included the composite 7 score.

For the SGS group there were no significant differences in accuracy

across information levels. The only significant group differences

within information levels were between the SGS group and the UGS

group (p4.Ol) and between the SGS group and the P group (p .05).


Confidence

Mean confidence scores by information level and experience

level are shown in Table 5. Table 6 shows the summary of the

analysis of variance for confidence scores. The Js confidence in-

creased significantly as subsequent items of information were added

to the protocols for all groups (F=5.38, df=2,28, p4.05). Although

there were no differences between confidence scores for groups, the

P group tended to be the most confident and the SGS group tended to

be the least confident (Xp=76.86, UGS=72.72, GS=69.96). These

trends are shown in Figure 2.


Aonroraten ass

A measure of appropriateness (confidence weighted by accuracy)

was measured by Pearson product-moment correlations between confi-

dence scores and accuracy scores for each J at each level of

information. There were no significant group effects or interact-

tions but the SGS group tended to be the most appropriate (LSGS=.26,

p =.20, I GS=.17). The higher the correlation, the more appropriate

the judgment. Correlation coefficients for appropriateness are











Table 5

Mean Confidence Scores




Experience Information Level Totals

Level J I I III IV


1 61.2 64.0 65.6 65.6 64.1
2 61.2 60.8 60.8 72.0 63.7
SGS 3 75.6 74.0 73.6 74.4 74.9
4 72.0 76.2 80.4 82.0 77.7

Total 67.5 68.8 70.1 73.5 70.0


5 59.0 58.0 66.0 69.2 63.1
UGS 6 85.4 86.8 84.0 92.2 87.1
7 78.2 79.8 76.2 82.2 79.1
8 64.2 61.6 61.6 62.0 62.4

Total 71.7 71.6 71.5 76.4 72.7


9 87.6 87.2 88.4 90.0 88.3
10 64.8 56.0 65.8 56.0 60.7
II 85.2 87.8 86.4 86.6 86.5
12 68.0 70.4 76.8 72.8 72.0

Total 75.2 75.4 79.4 76.4 76.9


Totals 71.4 71.9 73.8 75.4














Table 6

Summary of Analysis of Variance of Confidence







Source of Variation df MS F




Information Level 2 52.67 5.38*


Experience Level 2 191.84 .39


Information X Experience 6 12.73 1.30


Judges within Croups 9 493.76


Information X Judges 28 9.79


* Significant at the .05 level.

























80





ou
70


0


0 60
o5 .--n SGS
2: C----0 P
h- L UGS

50

I II III IV

Information Level

Fig. 2. Confidence by information levels.









given in Table 7 with the analysis of variance summary for appro-

priateness in Table 8. The analysis of variance was based on Z

transformations of the correlation coefficients. Mean appropriate-

ness scores were significantly higher at each level of information

(E=22.03, df=2,28, p<.01).

The SGS group was most appropriate because they were most

accurate and not overconfident, that is, their confidence was con-

sistent with their accuracy. The P group was overly confident and

the UGS group was less accurate, making these two groups' confidence

inconsistent with their accuracy. These trends can be seen in

Figure 3.


Judges versus the discriminant function

The discriminant function correctly classified 67 per cent of

the profiles. This information was given to the Js at Level Ill.

At Level Ill only the SGS judges were more accurate than the discri-

minant function with a hit rate of 73 per cent. The P group had a

hit rate of 66 per cent and the UGS group had a hit rate of 63 per

cent at Level 11l. The accuracy for all judges combined was 67 per

cent. Five judges (two in the SGS group, two in the P group and one

in the UGS group) were more accurate overall than the linear regres-

sion 2 score and only one J (in the UGS group) operated below the

chance level overall.

At Level I, four Js had accuracy scores below the level of

chance and two others were only slightly above chance. However,

none of tie four Js who was below chance were in the SGS group. At

Level !I there were two Js below chance and one slightly above; again,












Table 7

Correlation Coefficients for Appropriateness


Experience Information Level Totals

Level J I II III IV


S .35 .35 .34 .28 .33
SGS 2 .25 .20 .18 .65 .32
3 .34 .31 .A1 .28 .34
4 .11 .06 .01 .02 .05

Total .26 .23 .24 .31 .26


5 .13 -.06 .24 -.03 .07
U S 6 -.03 -.15 .10 .47 .10
7 -.05 -.03 .30 .47 .17
8 .29 .37 .41 .27 .34

Total .09 .04 .26 .30 .17


9 .27 .20 .17 .33 .24
S10 .06 .23 .33 .31 .23
1I .11 .19 .29 .28 .22
12 -.02 -.07 .08 .41 .10

Total .11 .14 .22 .33 .20


Totals .12 .13 .24 .31















Table 8

Summary of Analysis of Variance of Appropriateness Correlations


Source of Variation df MS F




Information Level 2 .0705 22.03**


Experience Level 2 .0366 2.26


Information X Experience 6 .0032 2.25


Judges within Groups 9 .0162


Information X Judges 28 .0072


** Significant at the .01 level.





33














rj-0 SGS
0-0 P
--& UGS

.30





20





^ .10





.00

I II llI IV

Information Level

Fig. 3. Appropriateness correlations between accuracy

and confidence by information levels.





34


none of these Js was in the SGS group. With the addition of the

statistical information, only one J (in the UGS group) remained near

the chance level of accuracy and he was the least accurate of all

the Js.















DISCUSSION


The present study demonstrated that judges can substantially

improve their decision accuracy when provided with increments of

information, particularly statistical information. This finding

extends the earlier findings reported for a different clinical judg-

ment task (Shagoury & Satz, 1969) and contrasts with'previous studies

which have used non-quantitative data. These findings also suggest

that if the clinician is able to incorporate quantitative information,

he may improve his own decision-making ability and equal or surpass

the accuracy of actuarial methods.

The findings of the present study also showed that accuracy

increased directly as a function of the amount of information avail-

able to the judges. Two conclusions that can be drawn from this

finding are that the information was relevant to the judgmental task

and that the judges used this information in formulating their

decisions.


Information

A post-testing interview revealed that the type of information

used varied between groups, among judges, and between information

levels. However, the interview was not structured enough to deter-

mine the actual decision rules used by the judges.

At Level I, with the MMPI profile only, most judges used their

35









own intuition about the relationships of which scales were elevated

and the extent of these elevations to the length of stay in treat-

ment criterion. There was a great deal of individual variation in

approach since each judge had different training experiences with

the MMPI. The judges of the SGS group had the most similar training

experience in the use of the MMPI since some training in the ration-

ale and use of this instrument was given in the statistical decision

theory course. The SGS group also showed the least amount of indi-

vidual variation in accuracy at Level I. The other group of graduate

students (UGS) had the least amount of exposure to the MMPI. The

UGS group was barely familiar with this test instrument and none of

these judges had had any formal training in its use. It is inter-

esting to note that the group of unsophisticated graduate students

showed the lowest accuracy throughout and was the only group whose

accuracy was never below the level of chance. It seems then, that

the more familiar a judge is with a test instrument, the more accur-

ate he will be in using it for prediction.

The SGS group was not familiar with the specific type of task

used in the present study. That is, they had not been trained in

correlating MMPI data to length of stay in psychotherapy. This

aspect of the study was novel to each of the three groups.

At Level II again each judge approached the data differently

and selected certain measures to use in making his decisions. The

variation within groups decreased and there was less difference in

this variation between groups. Accuracy increased or stayed the

same for all but one judge whose accuracy dropped. Thus, the









hypothesis that accuracy would decrease at Level II was not supported.

It was originally felt that all of the biographical data provided

would make the task more complex and more difficult and would thus

confuse the judges. However, the judges were able to relate some of

the information to the task and thereby improve their judgments.

Most judges used some combination of factors. Whether the profile

subject had previous counseling or subsequent counseling and his age

were the factors used most often. Some of the judges also considered

marital status when the subject was married. This finding (Level II)

contrasts with other studies which indicate lowering of accuracy

when data are combined (Golden, 1964).

In support of the hypothesis, overall accuracy for Level III

was the same as the discriminant function's accuracy of 67 per cent.

With the addition of Z scores at Level Ill, only one judge, who was

in the P group, used the cut-off score exclusively. In this same

group one judge changed none of his judgments from Level II and the

other two judges used primarily their own subjective inferences. The

UGS group essentially ignored the Z scores and relied on their own

intuition and thus did not reach the level of accuracy of the cut-

off score. All judges in the SGS group combined the cut-off score

data with their own intuition to improve upon the accuracy of the

discriminant function. These findings imply that the clinician can

make use of his intuition and experience but not at the expense of

ignoring available data, particularly when they include quantitative

information. The findings also imply that the most accurate judges

are the ones who are able to utilize statistical data.









The fact that there was an increase in accuracy from Level III

to Level IV,but that this increase was non-significant, supported the

hypothesis that Level IV accuracy would increase slightly over Level

III accuracy. For the SGS group there was a slight decrease in

accuracy. One reason for this decrease might have been the informa-

tion itself. These judges had been trained to use more powerful

statistical information, that is, data that discriminated groups and

sub-groups more than did the data of the present study. The base

rates of .65 and .35 were not sufficiently different from base rates

of .5 to be of much help. Also the conditional probabilities were

not high enough to provide maximum discrimination. All of the

statistical data given were in approximately a 2/3 to 1/3 ratio.

Because all of the statistical data had approximately the same pre-

dictive power, it may have been difficult to know which kind of

information would be most useful. Instead, judges may have tried to

combine two or more kinds of data and as a, result were less accurate

than they would have been using one type exclusively. Quantitative

information is most useful when it represents higher ratios, such as

base rates of .2/.8 or .1/.9; conditional probabilities of .85/.15;

and cut-off scores of 75 per cent or higher.

Even though five out of the twelve judges showed decreases in

accuracy at Level IV in comparison with Level III, these decreases

were slight and represented only one more incorrect judgment out of

the 25 judgments for all five judges.

At Level IV, the UGS group ignored the base rates and used the

conditional probabilities. The SGS group and the P group both used









a combination of conditional probabilities and base rates and both

of these groups had the same degree of accuracy at Level IV. Also,

both the SGS and P groups were more accurate than the UGS group.

It seems probable that statistical information is more impor-

tant than biographical information about the subjects since there

was a greater increase in accuracy with the addition of statistical

information. Other studies have shown that biographical dataare of

minimal value to judges. Golden (1964) found that judges agreed

less in their description of protocols when they were given identi-

fying data alone than when they were given a single psychological

test or a combination of tests. Kostlan (1954) found that judges

were more accurate in their psychodiagnoses when they received both

social case histories and the more quantitative MMPI protocol than

when they received social case histories alone.

One may ask if the judges would have been more accurate had they

been given some feedback on their accuracy at each level of infor-

mation. This is possible but then the task would not have been as

life-like in the sense that clinicians in actual situations must

usually wait some time before learning the accuracy of their predic-

tions. However, this does emphasize the point that clinicians should

check the accuracy of their predictions when possible and learn what

helps them to predict most accurately.


Exeperie nce and train i n

The lack of overall differences between groups was not antici-

pated. It was assumed that the SGS group would have benefited

from their training in statistical decision theory. However,









artifacts in the design tended to wash out group effects by providing

a guaranteed hit rate if the Z scores were used at Levels III and IV.

The convergence of judgmental accuracy foi each group at Level IV

lends some support for this argument.

The hypothesis that the SGS group would be the most accurate

was thus only tentatively supported since there was not a signifi-

cant group effect. However, the SGS group tended to be the most

accurate in their judgments. This finding implies that clinicians

can be trained to improve their own subjective inferences with

statistical information. These judges trained in statistical de-

cision theory were able to add their own intuitive judgments to the

statistical data and thereby predict more accurately than did the

discriminant function alone or than they had done without the statis-

tical data. This special training taught them not only how to use

statistical information but also how to use their clinical intuition

to its best advantage. The SGS group also tended to be the most

appropriate, that is, to know when their judgments were most accurate

and when they were most inaccurate.

The fact that the P group tended to be more accurate than the

UGS group was also unexpected in light of previous findings con-

cerning amount of clinical training and accuracy of prediction.

This finding does not support the previous evidence (Goldberg, 1959;

Oskamp, 1962; Shagoury, 1969; Shagoury & Satz, 1969) that as the

amount of clinical experience increases, prediction accuracy

decreases. In the present study, judges in the SGS and P groups

used familiar methods, the MMPI profiles or statistical data; the








UGS group, by contrast, was presented with essentially unfamiliar

prediction tools. One reason data in the present study were at

variance with previous findings is that previous studies required

clinicians to predict an unknown criterion or to use unfamiliar

methods so that any previous "set" of the clinician was not advanta-

geous. In the present study, the judges' familiarity with either the

MMPI or statistical types of data helped them in their predictions.


Interaction effects

The significant group by information interaction effect showed

at least indirect support for the experience level hypothesis that

the SGS group would be most accurate. This interaction showed that

the unsophisticated graduate students started off predicting below

chance and, finally at Level IV, reached the level of accuracy that

the sophisticated graduate students attained at Level I (MMPI pro-

files alone). It was the former group with the least amount of

experience, familiarity with the MMPI, and sophistication with the

statistical decision theory which accounted for most of the group

differences and much of the interaction effect.

The rest of the interaction effect was due to the changes in

accuracy across information levels with the UGS group showing the

greatest change and the SGS group showing the least amount of

change in accuracy. The latter group started out predicting fairly

accurately and had less room for improvement while the former group

started out so poorly that their improvement was marked. The SGS

group predicted almost as well as the discriminant function with








the profiles only. The UGS group improved from below chance to the

level of accuracy achieved by the discriminant function.


Confidence

Previous studies would suggest that the trained clinicians

should have had less confidence in their judgments than the two pre-

professional groups. Although group differences for confidence were

non-significant, the professionals in the present study tended to be

the most confident. Again, this may have been because they were

using the WIPI with which they were more familiar than were the other

two groups. Also the professional group was predicting a criterion

about which they knew something, that is length of stay in psycho-

therapy. This again suggests that previous studies have placed the

clinician at a disadvantage so that he is less accurate and less con-

fident than he would be predicting in a familiar setting.

In general, adding information substantially increased the

judges' confidence. Judges became more confident as well as more

accurate with increments in information. However, the UGS group's

confidence did not increase until they had all the available infor-

mation.

One problem with asking judges to assign a confidence rating to

each judgment was that each judge had a different standard or set

for measuring how confident he was. The range of confidence scores

used also varied between'judges and within groups so that one judge

used all six possible levels of confidence ranking (50, 60, 70, 80,

90, and 100) while another judge only used two (60, 70) or three

(80, 90, 100) rankings.









Approprlateness

The most meaningful measure to express appropriateness, defined

as the relationship between accuracy and confidence, was the corre-

lation coefficient. Just as accuracy and confidence increased with

each level of information, so did appropriateness. As judges became

more accurate they also became appropriately more confident.

The increases in appropriateness followed the same pattern as

the increases in accuracy and confidence. That is, appropriateness

increased significantly across levels of information but there was

only a tendency for one group to be more appropriate than the other

groups. As with accuracy, the SGS group tended to be the most appro-

priate and the UGS group tended to be the least appropriate. This

contrasts earlier findings that trained clinicians are more appro-

priate in their confidence levels than are graduate students in

psychology (Oskamp, 1962; Shagoury, 1969). The findings of the

present study, however, do not contradict.earlier findings since the

present differences between groups on the measure of appropriateness

were non-significant.


A ppl icat ions

It appears that actuarial data and training in their use can be

applied to situations in which clinicians must predict and make de-

cisions. In the present study, judges were able to post-diet length

of stay in psychotherapy fairly well. The next step would be to

apply these techniques to the same setting and Lr.dict a client's

length of stay in psychotherapy. This could then be followed up at









the end of treatment as a check of prediction accuracy. This would

enable the clinician to determine which short stays were "no shows"

and which were treated. Thus the discriminant function Z score and

judges predictions could be much higher and more useful for practical

application to the clinician's population of clients. This type of

procedure is most useful in a clinic situation which must limit the

number of clients seen or must screen those that will be seen.

Statistical methods of prediction can be particularly applicable to

the screening of patients to determine what type of treatment is most

appropriate and would be most useful for each client.

To use actuarial data in a clinic situation,they must first be

collected and analyzed. Too many clinical situations today fail to

make use of the data they have available. They do not even know the

base rates for various classifications of the clients they see.

Collecting and analyzing statistical data is another way to more

fully understand a particular clinical setting by learning what type

of patients are seen, how long they stay in treatment, and hopefully,

which ones are most likely to improve.

If a clinic decides to see everyone who comes in for help, tests

and statistical data are not of benefit in selecting whom to see.

However, these data might be used for prediction and research in a

setting which sees all clients. It is in situations where everyone

cannot be treated that improving tests and collecting base rate

information is most needed. Where decisions and predictions must be

made, actuarial methods are most needed to improve the clinician's

decision-making ability.





45



A further study which would be a fair and optimal test of clini-

cal versus statistical prediction would be to give judges an oppor-

tunity to see the relationships of test variables with a criterion

on a standardization sample. Then, the judges would be compared

with a discriminant function on a cross validation sample. However,

this was not the purpose of the present study.















SUMMARY


The present study was designed to look at the effects of adding

quantitative and qualitative data to a relevant clinical judgment

task. In essence, it compared judges with varying degrees of clini-

cal experience to actuarial prediction methods. The study also

attempted to train judges to use actuarial information to improve

their prediction accuracy.

Twelve judges representing three levels of clinical experience

made post-dictive judgments on the length of stay in psychotherapy

(short or long) fro.a a sample of MMPI profiles of clients seen in a

university mental health service. Judgments were made under four

conditions in which qualitative and quantitative information was

added incrementally at each level. The three levels of judges' ex-

perience were professional clinical psychologists, "sophisticated'

third year clinical psychology graduate students trained in statis-

tical decision theory, and "unsophisticated" third year clinical

psychology graduate students without any training in statistical

decision theory.

Accuracy increased over levels of information but there were no

differences in accuracy for the three levels of experience. A sig-

nificant group by information level interaction demonstrated some

group effects due to a lower proportion of correct judgments for the

less experienced judges under conditions involving the least amount

46









of information.

Judges became more confident in their judgments as they received

more information. Appropriateness, defined as accuracy weighted by

confidence and measured by correlation coefficients between accuracy

and confidence, increased substantially as increments of information

were added. The group trained in statistical decision theory tended

to make the most appropriate judgments and the least experienced

group of graduate students tended to make the least appropriate judg-

ments.

The present study showed that clinicians can use quantitative

data to improve their own judgmental ability and to predict more

accurately than actuarial data alone. Also, since those judges with

the most experience in using actuarial tasks tend to be the most

appropriate in their judgments, this implies that clinicians can also

be trained to be more appropriate and to know when their judgments

are more likely to be accurate.































APPENDIX A

INSTRUCTIONS















APPENDIX A-I


INSTRUCTIONS PART I


This study is designed to examine the decision process when only

limited information is available. You will be presented with 25

Minnesota Multiphasic Personality Inventory (HMPI) profiles of stu-

dents seen at the University of Florida Infirmary Mental Health

Service. Some of these students stayed a long time in therapy (5 or

more sessions, X=9) and some stayed only a short time (4 or less

sessions, X=2). Your task will be to decide which students stayed

a long time (L) and which stayed only a short time (S) on the basis

of the test profile alone.

Your task is to try to make the best estimate of probable length

of stay in psychotherapy given only limited information. It is pos-

sible to correctly classify all the profiles. It is hoped that your

predictions will in some way help us to understand one aspect of the

decision-making process as it is applied by psychologists in clinical

settings.

You will also be asked to rate your confidence for each subject

on a scale from 50 per cent to 100 per cent. If you are positive of

your decision, you should mark 100 per cent; if you are only guessing

you should mark 50 per cent. That is, the more certain of your de-

cision, the higher percentage you should mark.















APPENDIX A-Il


INSTRUCTIONS PART II


Your task on Part II is identical to that on Part I. You will

be presented the same 25 profiles and asked to predict (S) or (L).

However, this time more information will be available to you. That

is, you will also have biographical data. You may use this infor-

mation in any way you wish. You may choose to disregard the infor-

mation altogether and make your predictions as you did in Part I.

Your task is to try to make the best estimate of probable length

of stay in therapy given only limited information. It is possible to

correctly classify all the profiles. It is hoped that your predic-

tions will in some way help us to understand one aspect of the

decision-making process as it is applied by psychologists in clinical

settings,

Again, please indicate your confidence in your judgment for each

subject from 50 per cent to 100 per cent.















APPENDIX A-II


INSTRUCTIONS PART III


Your task on Part III is identical to that of Parts I and II.

You will be presented the same 25 profiles and asked to predict as

accurately as possible, on the basis of the information given,

whether the student is (S) or (L). Again, more information will be

made available to you. The following statistical information will

be added.

Discriminant function analysis provided weights for each of the

13 MMPI scale variables in order to obtain maximal differentiation

between long stayers (I.) and short stayers (S). A composite score

'(Z) was obtained which best estimates the combined relative effects

of all the scale variables.

This Z score is used to make the best prediction as to which

criterion group a particular profile belongs. This can be summarized

as follows:

1. ZA32.02 is a positive test sign (4) and indicates a probable

long stay in therapy (.).

2. Z<32.02 is a negative test sign (-) and indicates a probable

short stay in therapy (S).

No test, however, classifies without some errors. This derived com-

posite cut-off (Z=32.02) yields the following percentages of








classification:



Composite test sign Criterion

S L

2<32.02 69% 38%

Z!32.02 31% 62%
1

In other words:

1. A (+) test sign (Zd 32.02) correctly classified 62 per cent

of the long stayers (L). This is known as the valid positive

rate. Also, a (+) test sign incorrectly classified 31 per

cent of the short stayers (S) and this is the false positive

rate.

2. A (-) test sign (Z c32.02) correctly classified 69 per cent

of (5), the valid neaatjv ra, and incorrectly classified

38 per cent of (L), the false negative rate.

This means that 38 per cent of the (L)'s scored below 32.02 and were

incorrectly classified (S), and 31 per cent of the (S)'s scored above

32.02 and were incorrectly classified as (L). The total percentage

correctly classified was 67 per cent.

You will be required to predict as accurately as possible

whether the student belongs to (S) or (L), short or long stay. It is

possible to score every profile correctly scoring 100 per cent.

You may predict (S) or (L) by using (1)the composite Z score

cut-off, (2)the biographical data, (3)the profile alone, or (4)any

combination of (1), (2), and (3). The composite cut-off score was

applied to yield the best overall classification rate but no test




53



is perfect and errors may be made with any procedure. It is quite

possible that the clinician may be able to improve upon the linear

statistical method (Z score) by utilizing combinations of both

"intuitive" and statistical data.

Your task is to try to make the best estimate of probable length

of stay in therapy given additional, but limited, information. It is

possible to correctly classify all the profiles. It is hoped that

your predictions will in some way help us to understand one aspect

of the decision-making process as it is applied by psychologists in

clinical settings.














APPENDIX A-IV


INSTRUCTIONS PART IV


Your task on Part IV is identical to that of Parts I, II, and

III, utilizing the same 25 profiles. You are to predict as accur-

ately as possible on the basis of the information given, whether the

student is (S) or (L). Again, more information will be made avail-

able to you. In addition to the composite Z score, biographical

data, and test data, you will also be told the conditional probabili-

ties and base rates for the groups and test signs.

Conditional probabilities combine test signs, (+) or (-), and

base rates to yield a quantitative index of the probability of

correct classification when Z232.02 (+) or when Z 32.02 (-).

For example, some of the subjects will be (L) when Z232.02 (+)

and some will be (S) when Z 32.02 (-). The problem is to determine

how confident we can be with each test sign under the base rates of

the population. The base rates for the two groups are: Short (S)=

66 per cent and Long (L)=34 per cent. In other words, 34 per cent

of the subjects stayed a long time in therapy and 66 per cent stayed

only a short time. The majority, therefore, were shorts (S).

Based on this information, the conditional probabilities are:

for a (+) test sign, P(L/+)=.51 and for a (-) test sign, P(S/-)=.78.

This means that the probability of a person staying a long time in








therapy (L), given a positive test sign, is .51, and the probability

of a person staying a short time (S), given a negative test sign, is

.78.

A conditional probability of .51 for a (+) test sign means that

you would be as often wrong as you were correct in prediction (L) for

a (+) sign. A conditional probability of .78 for a (-) test sign

means that you would be correct more often than you would be wrong

in predicting (S) for a (-) test sign.

Your task is to try to make the best estimate of probable length

of stay in psychotherapy given additional, but limited, information.

It is possible to correctly classify all the profiles. It is hoped

that your predictions will in some way help us to understand one

aspect of the decision-making process as it is applied by psycholo-

gists in clinical settings.

Again, please indicate your confidence in your judgment for each

"subject from 50 per cent to 100 per cent.





























APPENDIX B

SUMMARY OF NEWMAN-KEULS TESTS













APPENDIX B-I

SUMMARY OF NEWMAN-KEULS TEST FOR GROUP MEAN DIFFERENCES



Differences among Level I means


-GS XSGS

"GS = .46 --- .12* .15**

= .58 --- .03

LSGS = 61



* 2<. 01




Differences among Level II means


iGS P SGS


-GS= .57 -- .04 .11
= .61 --- .05
-p
X = .66
-SGS














Differences among Level III means


X XX
UGS 1P -SGS

GS = .63 --- .03 .10
= .66 --- .07

XG =.73 -
-SGS










Differences among Level IV means



UGS -P SGS

GS .66 -- .04 .05

Xp .70 --- .01

XSGS .71











APPENDIX B-II

SUMMARY OF NEWMAN-KEULS TEST FOR INFORMATION MEAN DIFFERENCES




Differences among SGS means



-I II -Iv II

S = .61 --- .05 .09 .12
-I
X1 = .66 --- .04 .07

S.70 --- .03
-Iv
X = .73 ---
-III


Differences among P means


x| = .58

X, = .61

X = 66

I .71
-Iv


X X X X
-I I I -II II IV

--- .03 .08 .13"

--- .05 .10

--- .05


-------------------------------















Differences among UGS means


-I -II -III IV

x = .46 --- .I* .17 .20;

XI .57 --- .06 .09

- I .63 --- .03

I .66 -


<** .01

* U<.05




Differences among group means


-I -II -Ill Ilv


S = .55 --- .06* 12** .14,k,

- = .61 --- .06* .08*

S= .67 --- .02

xV = .69 ---















REFERENCES


Adams, J. K. A confidence scale defined in terms of expected percen-

tages. Amer. J. Psychol., 1957, 70, 432-436.

Dahlstrom, W. G., & Welsh, G. S. An MMPI handbook: a ouide to use

in clinical practice and research. Minneapolis, Minn.: The

University of Minnesota Press, 1968.

Edwards, A. L. Exerimental design in psychological research. New

York: Rinehart & Company, Inc., 1950.

Goldber., L, R. 1he effectiveness of clinicians' judgements: the

diagnosis of brain damage from the Bender-Gestalt Test. J.

consul,. Ps-ychol., 1959, 23, 25-33.

Go!dberg, L. R. Simple models or simple processes? Some research on

clinics! judgements. Amer. Psyc t, 1968, 23, 483-496.

Golden, M. Some effects of combining psychological tests on clinical

inferences. 1. consult. Psychol., 1964, 28, 440-446.

Holt, R. R. Clinical and statistical prediction: a reformation and

some new data. J. abnorm. soc. Psychol., 1958, 56, 1-12.

Holtzman, W. H. Can the computer supplant the clinician? J. clin.

Psychol., 1960, 16, 119-122.

Kirk, R. E. Experimental design: procedures for the behavioral

sciences. Belmont, Calif.: Brooks/Cole Publishing Co., 1968.

Kostlan, A. A method for the empirical study of psychodiagnosis. J.

consult. Psychol. 1954, 18, 83-88.

61








Lindzey, G. Seer versus sign. J. exp. Res. Pers., 1965, 1, 17-26.

Meehl, P. E. Clinical versus statistical prediction. Minneapolis,

Minn.: University of Minnesota Press, 1954.

Meehl, P. E. Seer versus signs: the first good example. J. exi .

Res. Pers., 1965, 1, 27-33.

Meehl, P. E., & Rosen, A. Antecedent probability and the efficiency

of signs, patterns, or cutting scores. Psvchol. Bull., 1955,

52, 194-216.

Megargee, E. I. (ed.). Research in clinical assessment. New York:

Harper & Row Publishers, 1966.

Mello, Nancy K., & Guthrie, G. M. MMPI profiles and behavior in

counseling. J. counsel. Psychol., 1958, 5, 125-129.

Oskamp, S. The relationship of clinical experience and training

methods to several criteria of clinical prediction. Psvchol.

Mo-no 1S2. 76, No. 547.

Satz, P A block rotation task: the application of rultivariate and

decision theory analysis for the prediction of organic brain

disorder. Psychol. Monoqg., 1966, 80, No. 629.

Shagoury, P. Ta1 influence of statistical information on clinical

decisions. Unpublished master's thesis, University of

Florida, 1969.

Shagoury, P., & Satz, P. Tne effect of statistical information on

clinical prediction. Proceedings: 77th Annual Convention,

APA, !969, 310-311.















BIOGRAPHICAL SKETCH


Ann Weimer Moxley was born March 14, 1946, in New York City,

New York. She attended public schools in Gainesville, Florida, and

graduated from Gainesville High School in 1964. She enrolled in the

University of Florida in September, 1964, and received her Bachelor

of Arts degree, magna cum laude, in December, 1967. She received

her Master of Science degree at the University of Florida in December,

1968. She is currently engaged in her clinical psychology internship

at the University of Rochester's Strong Memorial Hospital in Rochester,

New York.

She is married to James Edward Moxley who received his Master 's

in Business Administration at the University of Florida and is now

employed by Eastman Kodak in Rochester, New York. Ann is a member of

Phi Beta Kappa, Phi Kappa Phi, Psi Chi, and Mortar Board.














This dissertation was prepared under the direction of the

chairman of the candidate's supervisory committee and has been

approved by all members of that committee. It was submitted to

the Dean of the College of Arts and Sciences and to the Graduate

Council, and was approved as partial fulfillment of the require-

ments for the degree of Doctor of Philosophy.


December 1970



Dean, College of Arts and ciencs




Dean, Graduate School






Supervisory Committee:




Chairman


-2Aa- sS --

r ma




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs