Contextual influences on the rating process

MISSING IMAGE

Material Information

Title:
Contextual influences on the rating process the effects of accountability and rating outcomes on performance rating quality
Physical Description:
ix, 208 leaves : ill. ; 29 cm.
Language:
English
Creator:
Mero, Neal P., 1955-
Publication Date:

Subjects

Subjects / Keywords:
Employees -- Rating of -- Research   ( lcsh )
Performance standards -- Research   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1994.
Bibliography:
Includes bibliographical references (leaves 114-121).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Neal P. Mero.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001981101
notis - AKF7988
oclc - 31913459
System ID:
AA00003219:00001

Full Text








CONTEXTUAL INFLUENCES ON THE RATING PROCESS:
THE EFFECTS OF ACCOUNTABILITY AND RATING
OUTCOMES ON PERFORMANCE RATING QUALITY


NEAL


A DI
OF THE


MERO


SSERTATION PRESENTED TO THE GRADUATE SCHOOL
UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
































parents


, James


Inez


Mero,


who


taught


value


faith,


family,


friends.












ACKNOWLEDGMENTS


With


a doctoral


program


as well


as most


important


things


life


there


are


always


many


people


who


help


you


along


way.


That


was


certainly


case


completion


this


work.


There


is no question


that


success


have


patient,


selfless,


and


unshakable


love


and


support


wife


Cheryl.


constant


companion


and


friend,


regardless


path


have


chosen


to follow


she has


always


een


with


me.


thank


her


many


times


when


was


her


confidence


in me


that


kept


me going.


would


also


like


thank


children


Samantha


Brent


their


willingness


to collate


patience,


support,


and staple material,


help.


move


away


Their


from the


computer,


and


skip


a school


activity


in support


their


Dad


remarkable.


also


provided


blessed


tremendous


with


support


wonderful


throughout


family


this


who


program.


Thanks


parents


Jim


and


Inez


Mero


brothers


sisters


Mary,


Carole,


Donna,


Pam,


Jim,


Mark


always


being


there


me.


They


form


core


most


"functional"


family


know.


also


count


dear


friends


Phil


Karen


Doucet


as part


my family


my support


group.






Their


friendship


been


one


the


true


joys


life


many


years


owe


special


thanks


to Professor Stephan Motowidlo


patient


intellectual


nurturing,


encouragement,


and


occasional


careful


redirection


have


contributed


greatly


this


proj


intellectual


development.


am indebted


committee


members


, Prof


essors


Henry


Tosi


Jerald


Young,


Linda


Crocker


and


Barry


Schlenker.


They


were


always


willing


listen


ideas


point


directions


that


greatly


increased


the


usefulness


of this


rese


arch.


would


also


like


thank


Prof


essor


John


Hall


constant


encouragement


and


support


research


am also


grateful


Air


Force


mentors


Colonel


James


Woody


, Colonel


Chuck


Yoos


and


Mike


Wenger


their


confidence


support.


There


who


were


contributed


many family

to this rese


members,


arch


friends


as reviewers


and


, expe


colleagues

rt judges,


or actors.


am indeed


thankful


the


help


of Mike


Wenger


Rita


Campbell,


Bill


Paul,


Tony Wolusky,


Deborah Hall


Thornton


Burgess


, Maureen


Hornyak,


Marty


Hornyak,


Julie


Boit,


Phil


Doucet


Vaughn


Karen


Brian


Doucet,


Vaughn


Kevin Banning,

Jennifer Burnet


Allison

t, Steve


Banning,


Werner


Mary


Theresa


Van


Noy


, Steve


Green


, Kurt


Heppard,


Kevin


Wolfe


, Jeannie


Wolfe,


Tammi


Jackson,


Ron


Jackson,


Cheryl


Mero


Jeff


Katz


Shelly


Katz,


Robin


Sammons.


Additionally,


project


* hn r n i ar rioTino


*ho


snrl


" ni1i1 a


hr;rpn


w; fhnllt


h ;rl~a


~nmn 1 Pf Or~


nnt






technical


support


Olen


Doris


McCullough.


also


grateful


Dennis


Carrie


Lee


their


gracious


willingness


allow


use


Florida


Woodland


facilities.


Finally,


would


like


thank


three


special


friends


Steve


Werner,


Jeff


Katz,


Kevin


Banning


their


encouragement,


intellectual


support,


friends


during


this


endeavor.


They


were


always


willing


to help


me maintain


my perspective


even


when


didn


realize


lost












TABLE


OF CONTENTS


ACKNOWLEDGMENTS


viii


ABSTRACT


CHAPTERS


INTRODUCTION,
HYPOTHESES


LITERATURE


REVIEW


AND


Introduction
Literature Review
Hypotheses

METHODOLOGY .


* S S S S S S S S S 0


S S S S 5 5 5 42


Methodologi
Managerial
Overview of
Subjects
Manipulatio
Rating Ou
Subordinate
Study .
Instruments
Dependent V


al Issues
imulation
Procedures


Laboratory


n of Accountability
tcomes . .
Performance Presen


a:


* .
Thi
* and
and


Sted
ted


Studies


. 42


* 44
s Study 58
S 60


thu


. 60


S . . . 62
S . . . 64
riables .. . 66


RESULTS .

Manipulation Check
Methods of Analysi
Tests of Hypothese
Additional Analyse


a a a 0 5 5 570

* S S 5 5 5 70
*( S S S S S S S S 73
0 5 5 5 0 S S S S 0 5 73
S S 5 S S S S S 5 84


DISCUSSION.


Findings
Implications
Summary .


S
for


Future


Research


* S
0 0 0


REFERENCES






APPENDICES


SAMPLE


IN-BASKET


ITEMS


BEHAVIORALLY


ANCHORED


RATING


SCALES


EXPERT


JUDGES'


INSTRUCTIONS


STUDY


PROTOCOL


ACCOUNTABILITY


MANIPULATION


RATING


OUTCOME


MANIPULATION


RATING


FORMAT


DEMOGRAPHIC Q

MANIPULATION

OBSERVATIONAL


QUESTIONNAIRE


CHECK .

ACCURACY


MEASURE


POST


EXPERIMENT


QUESTIONNAIRE


ATTENTIVENESS


RATING


SCALE


NOTE


TAKING


RATING


SCALE


BIOGRAPHICAL


SKETCH











Abstract


of Dissertation


Presented


Graduate


School


:he University
Requirements I


of Florida


the


Degree


in Partial


of Doctor


Fulfillment


of Philosophy


CONTEXTUAL


EFFECTS


OUTCOMES


INFLUENCES


ON THE


ACCOUNTABILITY


ON PERFORMANCE


RATING


AND


RATING


PROCESS
RATING


QUALITY


Neal P

April,


. Mero

1994


Chairp
Major


erson:


Stephan


Department


Motowidlo


: Management


Recent


research


factors


that


affect


quality


performance

improving r


ratings


water


accuracy


made

Thi


little


progress


actually


ed researchers


call


increased


consideration of


context


within which


rating


decisions


are


made


This


research


considered


factors


organi


national


context.


Rater


accountability


consid


ers


raters


behave


when


they


are


required


justify


their


ratings


to others


Rating


outcome


considers


differences


information


about


outcomes


potential


ratings


affect


the deci


sion


process


the subsequ


ent


ratings.


Accountability


where


subjects


were


told


they


would


either


have


justify


their


ratings


that


their


ratings


would remain anonymous


, was crossed with


four different


rating


outcomes.


unique


laboratory


design


was


used


that





incorporated


use


in-baskets


videotaped


presentations of ratee performance to


of the rating task.


increase the complexity


Two hundred forty-seven subjects observed


and rated the performance of four rates on three performance


dimension


based


videotaped


samples


subordinate


performance presented in two sessions over a two week period.


Accountable


subjects


were


significantly


more


accurate


than subjects who were not held accountable for their ratings


regardless of rating outcome.


Accountable subjects


in two of


rating


outcome


conditions


also


had


higher


recall


specific


performance events.


Accountability


interacted with


rating outcome


two conditions as


raters


complied


with


pressure


situation


alter


their


ratings.


However,


these su]

affected,


affect


affected.


objects'


recall


suggesting


performance


that


information


Additional


differences


processing


structural


information


rating


even


analyses


was


outcome


rating


suggested


was


that


accountable


raters


were


more


accurate


because


they


more


carefully observed, r

of ratee performance.


recordedd


, and considered the implications


Those three factors,


which mediated the


effect of


accountability


on accuracy,


accounted for over


of the variance in the observational accuracy and differential


accuracy


of subjects


in a model


corrected


attenuation.


Implications of


findings and suggestions


future


research are discussed.










CHAPTER


INTRODUCTION,


LITERATURE REVIEW,


AND


HYPOTHESES


Introduction


Despite an


extensive body


research


on factors


that


affect


the accuracy


of performance


ratings,


little


progress


been made


in actually


improving rater


accuracy


(Bernardin &


Beatty,


1984;


DeNisi


Williams,


1988;


Landy


Farr


, 1980;


Murphy


Cleveland,


1991)


Most of


research


done


focused on


rater.


typically


involved


training programs


for raters and new rating methods


formats designed


limited


to help


progress


improve


rater's accuracy.


achieved by this


approach which


focused


on raters


to a


call


research


that


would


focus on


the context


surrounding the


rating process


(Ilgen


Feldman,


1983;


Murphy


Cleveland,


1991)


including


contextual


variables


their


studi


performance


appraisal,

organization


researchers


social


should be able t

system influences


o consider

rater inf


how


ormation


processing and


decision making.


Organizational


contexts


place many pressures on raters,


pressures


that


can affect


both


the accuracy


and


validity


of performance


ratings.


Partly


for this


reason,


Murphy


and


Cleveland


(1991)


suggested


that


very


little can be accomplished by


changing








rater


or the


rater's task


if the


context


influences


raters


to be


inaccurate.


Research reported here considers


aspects of


rating


context,


specific


rating outcomes


rater


accountability.


focus on rating


outcomes extends


previous work


effects of


rating purpose


on rater's


information processing.


Cleveland and Murphy


1992


argued


that


rating behavior


is goal


directed


suggested


that


social


context


where


rating takes place


influences


goals pursued by participants.


As a result


, performance


appraisal


should be


viewed as a social


communication


process.


Previous


research suggests


that what


raters


know


about


purposes or outcomes of


their ratings


can affect


their strategies


gathering


information,


leniency


, accuracy,


ultimate quality


their ratings.


The


possibility that


performance


various


appraisal may


features of


encourage


the context of th

rater to rate more


leniently,


more


harshly,


or less accurately to achieve


specific

Thi


purposes


is a primary


research also


consider


focus of this

rs effects of


research.

rater


accountability,


a contextual


variable


only rarely


considered


performance appraisal


literature.


Accountability


link between


individual


decision makers


the


social


system to which


they belong


(Tetlock,


1985).


Accountability


influences decision makers by


causing them to


behave


in a








(Schlenker,


1980;


1982;


Tetlock,


1985).


Decision makers


respond


pressures of


accountability


in at


least


two


ways.


First,


when


they


know the audience's views,


decision


makers make decisions they believe will be acceptable to


others


Tetlock,


1985


, p.311).


Second,


when


they do


know the


audience


views,


decision makers will


use more


cognitively


complex information processing


strategies


(Tetlock,


1985)


that


lead to


impressions based


on a more


careful


consideration of


relevant data.


Only Klimoski


and


Inks


1990)


have considered


the effects of


accountability


performance evaluation.


Literature


Review


This


review first


beginning with


an important


Wherry's


considers


(1952)


forerunner of


rating process


model


subsequent


research


rating process,


performance


appraisal


research.


Then,


other models of


rater


cognitive


processes


are considered.


Next,


theoretical


and


empirical


research


which


considered the effect of


rating purpose on rating


quality


is discussed.


Finally


, research on


the effects of


accountability


on decision makers


is reviewed.


This


latter


section


focuses on how


contextual


variables


such as


accountability may


lead raters


to choose different


information processing strategies.








Ratina


Process


Research


Recent


research attention


on performance


appraisal


focused


on cognitive


processes


used by


raters


to make


rating


deci


1980


sions.


concern


process emphasis


that despite


stems


research


from Landy


efforts


and Farr


into other


aspects of


performance appraisals,


little


progress


had been


made


understanding variables


that


to accurate or


inaccurate


appraisals.


a result


this work,


several


models of


rating processes were


proposed


(Bernardin


Beatty


, 1984;


DeNisi


Williams,


1988;


DeNisi,


Cafferty


Meglino


, 1984;


Feldman


, 1981;


Ilgen


Feldman,


1983;


Landy


Farr,


1980


1983;


Landy,


Zedeck


Cleveland,


1983;


Motowidlo,


1986;


literature


and Murphy


been


criticized,


Cleveland,


however


1991).


This


, because of


lack


practical


application


(Banks


Murphy


, 1985)


and


concerns


about


generalizability


rating process


field settings


laboratory


(Bernardin


studies of


Villanova,


1986;


Locke,


1986).


This


review will


show that


despite


efforts of


cognitive


process


researchers,


there


is very


little


that


can be


told


to practitioners that


would make


them more


accurate


raters.


However


, process


research


improved our understanding of


conditions


that


influence


rater


accuracy.


This


can subsequently


lead


development


of more


accurate


appraisal


systems.


this


I -


_


.,








Wherrv's model


ratings


(1952)


Wherry


1952)


provided a model


of ratings that was the


impetus of performance appraisal


work was


process


subsequently published in an


research.


edited version


This

(Wherry


Bartlett,


1982)


and proposed


factors that


influence


performance appraisal


accuracy.


from classical measurement


theory


These

and


propositions stem

emphasized minimizing


error


component


of performance


ratings.


summarize,


Wherry


(1952)


proposed


that


improved accuracy will


come


from


increasing r
using rating
behaviors;


atee control over the work situation;
scales based on ratee controlled


increasing
using scale
behaviors;


relevant


rater


items anchored by


contact with


easily


the


ratee;


observed


informing raters


of the behaviors


to be rated;


encouraging raters to avoid bias;


keeping written


records of


critical


incidents;


reducing time


between observation and rating;


increasing frequency
recall;


of observations


improve


using behaviors
increasing the


the


number


that are
number of


and diversity


easily c
items on


raters


classified;


a dimension
to reduce


or
the


random error


component


Wherry's model


to come


set


years


the


later


stage


DeNisi


for much of the work that


and Williams,


was


1988).


Of direct


interest


this


study


are


proposed


linkages


processing.


between


First


rating context and rater


, rating purpose will


affect


information

rating


accuracy


(Wherry


Bartlett,


1982)


In appraisal


settings


where


raters


know their rating will


effect


the


rate,


thev








seen as


important


the organization,


raters will


be more


accurate


(Wherry


Bartlett,


1982).


Wherry


1952)


predicted


that


ratings obtained for


experimental


purpo


ses


will


be more


accurate


than


decisions.


ratings collected


proposition


that


to support


different


administrative


rating purposes


may


lead


to biased ratings


suggests


effect


rating


context


can have on


rating process


a result,


rating


accuracy.


A second


proposition


suggested


that


having to


justify


ratings will


also affect


raters


type


information


gathered or recalled by that


rater


(Wherry


Bartlett,


1982).


They


suggested


that


ratings


justified


subordinate may


be more


lenient


while


ratings


justifi


the supervisor may


information known


lead


to be of


rater to


interest


recall


that


performance


supervisor


(Wherry


Bartlett,


1982)


Models of the ratina process


Subsequent


to the work of Wherry


1952


three models


rater


information proce


sses


were


proposed


Denisi,


Cafferty

1983).


Meglino,


These models


1984;


Feldman,


focused on


1981;

steps


Ilgen


Feldman,


involved


rating process with special


emphasis on rater


cognitive


processes.


Almost every


errors that could


lead


step


to inaccur


this process is

ate performance


subject


appraisals


(Cooper,


1981).


This


review considers


research on








rating


context


and


on the


information


processing


steps


information


gathering,


storage,


recall,


judgement


Cont


ext


the


rating


process


The


performance


appraisal


environment


context


in which


been


the


described


cognitive


the


"noisy"


processes


rater


are


performed


Feldman,


1981)


Three


contextual


factors


ch influence


rater


information


process


ing


have


been


suggested.


These


factors


include


the


amount


of contact


rater


s have


with


subordinates


, how


well


they


understand


their


job,


and


how


much


time


they


can


devote


process


(Feldman,


1981)


Ilgen


Feldman


(1983


proposed


that


appraisal


purpose


or specific


outcomes


linked


appraisal


could


lead


raters


to alter


eir


evaluations.


is consistent


with


Wherry


(1952


propo


sition


that


raters


will


evaluate


differently


apprai


sals


will


used


rewards


instead


feedback.


Murphy


Cleveland


1991)


proposed


rating


context


influences


the


entire


rating


process


including


the


rater


judgement


, rating,


evaluation.


They


define


context


the


"heterogeneous


mix


factors,


ranging


from


Soc


legal


climate


and


stem


culture


in whi


within


ch the


the


organic


organic


zation


zation"


exists


.25)


Several


res


searchers


(Cleveland


& Landy


, 1983;


Eder


1989;


Eder


Buckley


, 1988;


Murphy


Cleveland


, 1991)


have


suggested


that t


interactionist


perspective


Endler


,








to view the


interaction between


situation or


context


individual


behavior


judgement.

a function


This perspective


suggests


that


feedback between


individual


evaluation


situation.


context,


Within


this perspective


performance


suggests that


situational

critical fa


variables,


.ctors


as interpreted by the


r


in determining the outcome of


after,

the


are

rating


(Murphy


Cleveland


, 1989).


is perhaps


not


specifically


rating purpose


that


affects


ratings


instead,


rater'


beliefs about


Other researchers

organizational context


the outcomes of

have stressed

on personnel d


those


the


ratings.


importance of


decision making.


review of the selection interview research reported


researchers


influence


had not


interviewers


considered how


(Arvey


situational


Campion,


factors


1982).


performance appraisal


studies,


interview researchers


have


relied on


attempt


cognitive


information processing theory


understand how perceptual


processes


in an


interact


during


an interview


Eder


(1989


Eder


, 1989).


encouraged researchers


to consider


situational


factors


in decision making.


His


theory


influence of


interview


context on interviewers


proposed four


dimensions of


context


including task


clarity,


interview


purpose,


decision


risk,


and accountability.


Information aatherina.


Within


he performance








to make the eventual


judgments


about


performance.


Denisi,


Cafferty,

seeker of


and Meglino


: information.


(1984)

They


viewed the

suggested


rater

that "


an active


preconceived


notions"


rater


, such as


previous


impressions about


the ratee or


information about


the


rating purpose,


could


determine


the type of


information sought.


Rating purpose


could influence both


sample of


behaviors


gathered and


the amount of


time devoted


to gathering performance


information.


This


suggests that


raters may


more


than


just


alter their


evaluations as a result


the


rating


purpose;


they may


also


vary


how they


search


information.


As a result,


their rating may


be based on a different


behavioral


gather more


improve.


sample.


relevant


However,


rating purpose


information,


purpose


leads


rating accuracy


leads


rater to


should


rater to gather


less


relevant behaviors or


an otherwise skewed behavioral


sample,


then


result


should be


less accuracy.


This process of


sampling behaviors


from a


true score


domain was


called


content sampling


(Cooper,


1981).


Cooper


felt


that a large contributor to


halo error


could be caused


rater


judgments


based on a small


sample of


ratee


behaviors.

impressions"


This would


force


rate.


raters

Motowidl


to rely

o (1986)


on "global

proposed a


sampling model


of information processing and suggested that


there


is a


true


score domain which


includes


e population


__


&


_ ___ ___








through observation or


experience,


becomes aware of


a sample


that


information.


This


sample may


be skewed by


factors


that bias


rater's attention and


evaluation


that


information


(Motowidlo,


1986).


Feldman


(1981)


provided a


theory


how raters


attend


recognize


performance


information


through both


automatic


and controlled attention processes


(Sheffrin


Schneider,


1977).


He proposed


that


raters


automatically


react


to certain


cues


from the


ratee


(e.g.


dress,


sex,


race,


etc.


through


scripts (A

consistent


use of


belson,


with


1976).


cognitive maps or well


a result,


learned


behavior that


supervisor's expectations


noted and


stored automatically


(Feldman,


1981).


These automatic


processes may


lead


to generalizations which may result


inaccuracy.


Controlled


processes


occur when inconsistent


information is


received and becomes


salient.


this


point,


conscious


attention


is required.


Murphy


Cleveland


1991)


suggested


that


raters


attend


to specific


behaviors as a


function of


three


variables.


First


, characteristics of


the behavior may draw


attention


that


behavior.


Second,


behavior which is


unique


to a


purpose of


particular


context may become


the observation may


cause t


salient. T

he rater to


hird,

seek


specific


Specific


information related


rating purposes may make


to achieving that


certain behaviors


purpose.


salient








rater


they provide


information


that


is important


relative


Two


to the


points


purpose of


performance evaluation.


should be highlighted about


rater


information

ratings by


gathering processes.


affecting the


First,


type of behaviors


context may


that


impact


raters


attend


Variations


the sample of


performance


information


can


contribute


to variations


the


ultimate


rating.


Second


, certain behaviors may


become


salient


because


they provide


information


that


important based on


rating purpose.


This may trigger the


use of


controlled


instead


next section


automatic

n consider


information


gathering processes.


how differences


in the


The


types or


frequency


rating errors may be due


use of


automatic


or controlled processes.


Information storage and organization.


The


next


proposed


step


performance


rating process considers


information is organized


how


and stored.


Feldman


(1981


proposed


that


individuals


use


categorization schemes


facilitate


the


storage of


information


gathered by


automatic


processes.


These categories or


schematas can be


based on


stereotypes or trait


labels which


carry


some


judgmental


aspect


ie.


good,


bad,


lazy,


energetic


etc.)


carry


a set of


related


characteristics.


Cantor


and Mischel


(1977;


1979)


proposed the


use of


prototypes,


representative


example of the category


, to facilitate








rater's description of


"good"


workers while


another may


represent


a composite of behaviors or


cues


that


represents


the category


Information


that


"bad"


workers


is attended


(Murphy


Cleveland,


automatically


1991


placed


in a


category.


Since


these categories may


be based on


stereotypes or traits, a

automatic processing may


nd since


lead


they vary


among people,


inaccuracies on


subsequent


judgements.


Ilgen and Feldman


(198


suggested


that


many


important


information


processing


activities are


done


using


automatic


processes which are


Automatic processing


beyond


rater's


should result


conscious


in more


bias


control.


than


controlled


processes because of


the


use of


faulty


schematas


(Lord,


1985)


Controlled processes do


necessarily


imply more


rationality


(Bargh,


1984).


They


could


also


introduce


errors


into


rater


information


processing.


Some


aspect


rating situation,


cause


such as the


rater to


purpose of


process performance


the observation


information in a


controlled manner.


this point,


raters


first


attribute


the cause of


behavior


(Feldman,


1971).


Attribution


theory


(Kelly,


1971;


Jones


, 1979;


Weiner,


1972


is concerned


with whether


individuals attribute


behavior to


internal


or external


factors.


Research suggests


that


people


tend to


make


incorrect attributions about


the causes of


performance.








occurs because

in terms of in


people


ternal


tend


to explain


causes but


explain


the behavior of


their


others


own behavior


in terms of


external


causes.


a result,


when raters using


controlled


processes attempt


to determine


reasons


for ratee


behavior they may


introduce error


into


rating process.


Differences


in attributions


about


ratee performance may be


directly


related


the overall


favorability,


and


perhaps


accuracy,


rating.


also


responsibility


possible


that when raters attribute


for behavior to


either the subordinate or the


situation,


their decision is


influenced by the


implications


of that


attribution.


a desired rating outcome


achieved by


attributing ratee behavior to


external


causes,


which protects the


ratee


from responsibility


the


rater may


tend to


make


that attribution.


rater


s purposes are


better served by


attributing the


performance


ratee,


say to


justify


a lower rating


, then


that


attribution may


made.


This would


explain how rating purpose can affect


rating quality.


Although


rater may not


be conscious of


this process


(Bargh,


1984),


is possible


that


these


controlled


processes


have a significant


influence on


accuracy


of information stored by the


rater.


Feldman


(1981)


suggested


that after raters attribute


the cause of the behaviors to a source


they make a second


judgement about which


specific


catecrorv will


be used t








Wyer,


1979).


Since


these categories


include


judgmental


value,


the assignment


a behavior to a


category may mean


that


the behavior


judged by the


rater


time


observation


(Feldman


, 1981)


research reviewed above suggested


that


raters,


using either


automatic


or controlled


processes,


make


some


judgement about


the behavior


before


is stored and


that


either method is subject


introduction


errors.


was also


suggested


that


factors


such


rating purpose may


influence


judgements made


about


ratee


behavior.


Errors


made


in the


storage


process


should influence


the quality


information recalled by the


rater


time of the


rating


and,


a result,


affect


rating quality.


Information


recall


and


iudcement.


Feldman


1981


process


model


described the


process of


seeking


and recalling


stored


information.


He suggested information is


stored


on a


temporary


"work space"


categories


during which both


are available.


Over time


observed behaviors


the work space


cleared


only the categories


are


retained in


"storage


bins.


When


information


needed,


categories


are


recalled,


the behaviors.


several


categories


exist,


factors


such as the


recency


of storage or the supervisor's


emotional


state may


influence


recall


process


(Feldman,


1981).


This


categorization scheme may not


only


bias


recall,


it may


also


prevent


subsequent


information


from changing the








Finally

information


, the rater makes

(Denisi, Cafferty


judgements

& Meglino,


using the


1984;


recalled


Feldman,


1981).


This


retrieved information is biased


towards the


prototypes


that


formed


the basis


for the


rater


categorization schemes.


Additionally


, since


people tend


look for


information


that


supports a


particular decision,


entire


proc


ess


may be biased


(Feldman


, 1981)


rater'


search


stored information may


be guided by


factors


such as appraisal


purpose or


desired


outcomes.


While

that


the


rater may


purpose,


have observed


it may not be


information


recalled at


contrary to


time of


judgement.


Because


stored information is


already


integrated in


form of


categories,


perhaps no


further


judgement


needed.


When integration is


required,


recalled beliefs


about


individual


are


"thought


to be described best by


weighted -

evaluations

(Feldman, 1


average combination of

also affected by sali


981,


components or subsidiary


ence-causing factors"


138).


Summarv


information


processing model.


Raters


are


subject


to biases at


each step of


the


rating process.


However,


the chance of


biased ratings


increases when raters


must also deal


with a rating context


where


ultimate goal


may not be


to provide accurate and reliable


ratings.


Unfortunately,


research which has


focused on


lust


.. t-


rater








This


research will


focus on


interaction between


rater


and


rating


context.


Two contextual


factors,


rating purpose outcomes and rater


accountability,


are


considered


in the


next


sections.


Ratinca


Purpose.


Research


Recently,


Murphy


and


Cleveland


(1991)


suggested


that


rating behavior


should be considered a


goal


directed process


used by the


rater to


achieve


his or


own


interests.


They


suggested


that


there are wide


variances


in the


"instrumentality"


which specific


performance evaluation


outcomes may


have


for raters


see


Porter


Lawler,


1968;


Schwab,


Olian-Gottlieb,


Henneman,


1979;


Vroom,


1964).


This suggestion


executives


is consistent


reported


with one


study where


consciously manipulating ratings


to meet


their


specific


objectives


(Longenecker,


Gioia,


Simms,


1987).


Murphy


Cleveland


1991)


also


pointed


out


that


outcomes


valued by the


rater may not be


same


those


valued by the organization.


suggestion


their ratings


that


achieve


raters


intentionally manipulate


their own objectives


is not new.


Conditions within


the context of


performance appraisal


may


even


lead raters to determine


ratings


that


should be


given at


the beginning of


the rating process


(Longenecker,


Gioia &


Simms,


1987;


Murphy &


Cleveland,


1991)








performance


appraisal


environment


may


lead


to objective


evaluation


subordinate


performance


It has


been


argued


that


raters


not


fail


give


accurate


ratings


because


they


are


incapable


accuracy


because


they


are


unwilling


to rate


accurately


(Banks


Murphy,


1985)


Within


the


performance


appraisal


context


different


ces


purpose


or expected


outcomes


ratings


may


explain


variations


rating


quality


Purp


ose


been


described


"the


mec


hanism


which


context


rating


interacts


with


capabilities


and


cognitive


and


judgmental


processes


rater"


(Murphy


& Cleveland,


1991,


p.73).


Organi


zations


use


performance


appraisal


information


a wide


variety


purpo


ses.


study


reported


four


commonly


used


purpo


ses


performance


appraisal


(Murphy


and


Cleveland,


1991)


These


purposes


they


reported


were


betwe


en-person


feedback)


deci


(promotions


sons,


and


system


within


maintenance


person


manpower


planning),


and


documentation


egal


requirements


Each


of these


purposes


should


have


different


effects


terms


rater


anxiety


about


the


consequences


of those


ratings.


Murphy


, Bal


zer


, Kellam,


and


Armstrong


1984


propo


that


"purpose


ects


ratings


indirectly


ecting


basic


cognitive


processes


such


as the


observation,


encoding,


and


recall


rated


person


s behavior"


.46)


Citing


Landy


and


Farr


(1980)


model


they


argued


that


"raters


who








decisions may


be more


careful


in attending to


the behavior


individual


being


rated"


(Murphy


et al.,


1984,


p.46).


Studies of


purpose


effects on rating aualitv


Several


researchers


have


compared


results of


ratings


used


experimental


purposes with


those


used


administrative


purposes such as


selection,


assignment,


promotion.


Taylor


and Wherry


(1951)


found ratings made


administrative


purposes were significantly more


lenient


and


differentiated


less


between


rates


than ratings


provided


experimental


purposes.


In another


study


, Driscoll


Goodwin


1979)


found students were more


lenient


when


they


believed


that


ratings would be


used


administrative


decisions.


In a subsequent


study


, Meier


and Feldhusen


1979)


found


no effect


purpose on


leniency


student


raters.


They


suggested


, however,


that


their results were


tentative


since


a simple


statement of


purpose may not


have


desired


effect.


Murphy


et al.


1984


also


found purpose


no effect


on accuracy


in evaluations


teacher performance.


However,


they


reported their belief


that


rating


accuracy


might be

outcomes


affected by

rather than


raters'


beliefs


specific


about


purpose.


the r

They


eating

also


suggested


their results may


be due


lack


interpersonal


consequences


for the


raters


they


gave


ratings.







McIntyre,


Smith,


and Hassett


(1984)


considered the


effects of


purpose on accuracy


found only


small


support


proposition


that


purpose affects accuracy.


They


considered subject


presented in


ratings


the Murphy


for the


et al.


same


(1984)


videotaped


study


lectures


subjects


were


told


the purpose of their


evaluation was


for


either


research,

decisions.


course-improvement


instructions,


The authors suggested


that


or for


the small


hiring

effect


purpose


the


result


how the purp


ose


variable


was


manipulated.


They


suggested


that


future


research


provide


more explicit detail


about


the effects of


rating decisions


on the


rates.


Studies

have yielded


the


impact of


inconsistent


purpose


results.


on performance


Some


researchers


ratings

have


suggested


that


purpose manipulation may


have


had


personal


consequences


for the


raters or


rates and as


result


the


subjects may not have


adequately understood


effect of


the outcomes of


their decisions on subordinates.


Murphy


et al.


(1984


suggested that


it may


the


rater'


perception of the outcome of


ratings


rather than


the


stated


purpose


that


influences


rating


leniency.


They


also


contended


that


raters attempt


to avoid


"interpersonal


consequences"


of rating decisions


were guaranteed anonymity,


and since their subjects


interpersonal


consequences


(Murphy


et al.,


1984).


These differences


there were








inconsistencies


in research on leniency


function


purpose.


The above discussion is


consistent with


theoretical


discussions of this


issue.


First,


several


researchers


have


posited


that


leniency


in ratings


is more


likely to occur


when


ratings


have


important


consequences


for the


ratee


(Ilgen &


Feldman,


1983;


Lawler,


1976;


McCall


Devries,


1976


; Murphy


et al. ,


1984)


or the


rater


Cleveland


Murphy


1992;


Mohrman


Lawler,


1983;


Murphy


Cleveland,


1991).


Second,


rater


desire


to maintain


positive working


relationships may be


another


reason


rating


inflation


(Longenecker,


affect


Gioia,


rating quality


Sims,


1986).


only when


Rating purpose may then


there are


personal


consequences either


directly to


the


rater


indirectly


through


effects on


ratee.


Other research has


supported


proposition


that


rating distortion


occurs when raters


are


required


share


their ratings with subordinates


(DeNisi,


Cafferty


Meglino,


1984;


Klimoski


Inks,


1990;


Longenecker,


Gioia


Sims,


1986;


Waldman &


Thornton,


1988).


In one study,


ratings


used


administrative purposes were more


lenient


than ratings


used


for research purposes.


was told


their


Within


ratings would be


this


used


study


research


third group


purposes


they were also


told


that


their


ratings would be seen by


subordinates.


Ratings


for this group were


lenient








ratings


provided solely


administrative


purposes


(Waldman


& Thornton,


1988).


Empirical


evidence


suggests


that


the effects of


rating


purpose on


performance


ratings may be moderated by whether


or not


there


are


personal


implications


for the


rater


or the


ratee.


Rather than


providing raters vague


purposes


their


ratings,


this


study will


provide


raters


specific


information about


how rating decisions


affect


both


organization and


ratee.


This


is similar to


actual


appraisal


situations where


raters


see


how their


rating


decisions directly


impact


their


subordinates,


their unit,


perhaps more

provided by


importantly themselves.


information about


The contextual


the outcomes of


cues


ratings may


simply


lead raters to


provide


ratings that


will


achieve


preferred


outcomes.


Effects


ratinca Purpose on rater


cognitive


processes


There


is empirical


support


for the


possibility that


effect


of purpose on performance is through its


influence on


rater co

searched


gnitive


processes.


for different


conditions of


appraisal


One study reported


types of

purpose.


information

Raters re


that


under


quired


raters


different

to make


specific


decisions


such as giving an


employee a


raise and


determining the


amount of


that


raise,


searched for the


normative data necessary to


support


such a decision


(Williams


, DeNisi,


Blencoe &


Cafferty,


1985).


Williams et








do more


than


just motivate


raters


to provide


a desirable


rating.


It may


also


influence


rater


cognitive


processing of


performance


information.


Eder's


(1989


proposed


effects of


context


on interviewers also


suggested


that


purpose may


influence


type of


information


that


is sought


stored.


Raters may


determine


in advance what


kind


information


needed and


then


they


search


for that


information


(Crocker


1981).


There


have been several


investigations


into


proposition

processing.


that


rating purpose


Using a


policy


influences


information


capturing technique,


Zedeck and


Cascio


(1982)


found


that


raters combined information


differently when


ratings were


to be


used


different


purposes.


Here,


raters were given


one of three


purposes,


merit


raise,


developmental,


or retention.


Each


purpose


accounted


for differences


in how


raters weighed,


combined,


integrated performance dimensions.


study


effects of


purpose on


rating accuracy


(Murphy,


Philbin,


Adams,


1989)


found


that


subjects told


to evaluate a


lecturer'


performance evaluated


that


performance more


accurately than subjects


told


to rate


lecture


content.


However


, when time between performance observation and


evaluation


was delayed


days),


the evaluation


group


ratings were


poorer than


those of the


content


group.


Murphy


et al.


attributed


this to


the difference between on-line








rating purpose affects the encoding and retrieval

information about ratee behavior.


A related


line of


research considered how raters


process


information when performance appraisal


is not


their


main concern.


task was


Balzer


salient,


(1986


found that when


interacted with


rater' s


appraisal

impression


of rates to


influence


information was attended


and


encoded.


two other studies


(Hastie


Park,


1986;


Williams,


1984),


researchers


found


that


when


appraising


individual


performance


is a salient


purpose


of observation,


raters


used


person


categories to


organize


information.


Summary


purpose


literature


A review of


research on


effects of


rating


purpose


suggests


theoretical


basis


several


observations.


for purpose


affecting


First,


ratings


while


and rating


processes


been


established,


empirical


research


supporting that


linkage


is equivocal.


seems


that


purpose


may


affect


ratings


because


causes


rater to


concerned


about


specific


rating outcomes.


research


cited


above,


subjects were


typically


given


generic


purposes


(i.e.


research,


administrative deci


sons,


etc.


rather than


exposed


result,


to more specific


was


outcomes of


subject


their


to determine


ratings.


specific


outcomes of


their decisions.


. --


ro aan~rrhn'rhrc!a~a


rt+- i ncr niirnncc mnir


~h 3t


~r 1~1~


*i








for the


rater.


Subjects were


usually told


their


ratings


would


remain


anonymous,


a condition


that


unlikely to


occur


in organizations where both


rater's


supervisors


and subordinates


generally


have


access to


performance


ratings.


One way that


raters are


personally


linked


their


ratings


is when


they


have


justify those


decisions


others.


Third,


researchers


should


consider whether


contextual


factors


influence


rating quality


its


affects on


rater


cognitive


processes.


Just because


ultimate


rating


affected by

processing


rating purpose does


information


not mean


differently.


that


Perhaps


raters

raters


are

process


information


consistently


regardless


rating purpose or


outcome.


this


case,


the effect


purpose may


only


be on


which rating


determine


assigned.


relationship


More


between


research


is needed


rating outcomes,


rater


information processing,


ultimate


rating decision.


Accountability


Research


Wherry


1952)


proposed


that


requiring


raters


justify


ratings will


gathered


information


received some


performance


years an


affect


ratings given


or recalled.


theoretical


appraisal


extensive body


This


little


literature.


of literature


type of


proposition has


empirical


However,


support


in recent


has considered


this


their








Researchers


have considered


the effects of


accountability on


the


information processing


strategy


subjects


(Tetlock,


1985).


Results of


this


literature


provide different


pictures of the effects of


accountability


on decision makers.


When


decision makers


know


in advance


type of


decision


they


can best


justify,


they will


reach


that decision.


When


there are no


clues


about


which decision


is easiest


justify,


decision makers


are


thought


use a


more cognitively

information gath


complex decision process.


ering process may be more


While t

thorough,


:he

decision


makers


under this


information.


condition may


The next


two sections


consider


review thi


irrelevant

s research.


Decision makers


as cognitive mi


sers


When


their


audience's


views are known,


decision makers


make decisions


consistent


with


those


views


Klimoski


Inks,


1990;


Tetlock,


1985).


Tetlock


1985)


contended


accountable


decision makers who


employ


"acceptability


are aware of


heuristic"


views

This


their audience

based on the


assumption


that


people


are basically


lazy,


"cognitive


misers"


, who


prefer


solutions that involve


the


least


effort


(Taylor


Fiske,


1978;


Tetlock,


1985)


This


view


suggests


that


accountable decision makers who


know the


views of the


audience will


make decisions they


feel


will


be acceptable


(Tetlock,


1985).


As a result,


these decisions are based on


justifiable


they


are


to the audience.







There


is considerable


empirical


support


for this


position


Several


studies


have


supported


influence of


need for


approval


Jones


Wortman,


1977;


Wortman


and


Linsenmeier,


1977)


and


the motivation


individuals


present


themselves as positively


as possible to


those


whom they


are accountable


(Baumeister,


1982;


Schlenker,


1980).


Tetlock


(1983)


showed


that


accountable


subjects who


knew the


views of


their


audience


shifted


their opinions


be consistent with


those


views.


Similar


results were


found


in another


study where


subjects who were aware


views of


their


audience


relied on a


effort


acceptability


heuristic and again shifted


their views


towards


those of


audience


Tetlock,


Skitka


and Boettger,


1989).


Decision makers


usina


complex decision strategies


acceptability


heuristic


suggests


a second


view of


information


processing which occurs when decision makers are


unaware of


their


audience's views.


this


condition


they


are thought


use more


complex information


processing


strategies.


Tetlock proposed three effects on decision


makers


held accountable to audiences with


unknown


views


First,


they will


utilize more cognitively


complex


decision


making


own


strategies.


cognitive


Second,


processes.


they will


Third,


be more sure of


decisions will


their


be based


a more data driven process of


impression


forming


(Tetlock,


1985








Schlenker


(1986)


expanded on


the


effect of


accountability


on information processing.


He suggested


accountability


leads


to a


a more


thorough search


relevant inf

information,

strategies,


formation,


greater


more complex ju

more data-driven


learning and recall


dgement


and decision


processing of


information in


which


judgements


are


influenced by


details


of the


information,


strategies.


and


He also


a greater


proposed


awareness of


effects of


decision making


accountability


are attenuated when


tasks are


unchallenging or tedious


and


when


goal


is merely to convince a superior that


the


task


is completed rather than some more


important


Consistent with ideas considered within


goal.

performance


appraisal


literature,


attenuated


the effect


task has


accountability will be


few personal


implications


(Schlenker,


1986).


There


is empirical


support


for the


use of more


complex


decision strategies.


Tetlock


1983)


reported that


decision


makers,


unaware of


the


views of


their


audience,


used


"pre-


emptive


self-criticism"


by developing


counter-arguments to


potential


critics of


their


decisions.


Simonson


and Nye


(1992)


found


that


although accountable subjects


used a more


multi-dimensional


and self-critical


information


processing


strategy,


they


still


selected the


response expected to


more


Dositivelv


evaluated by the


audience.








Ashton


(1992)


found that auditors


required


justify


their rating were more accurate and


consistent


on a task


predicting bond ratings.


Telock and Kim


(1987)


found


that


accountable decision makers


complex impressions


formed more


and made more accura


Accountable decision makers do


integratively

te predictions.

always make more


accurate decisions.


Tetlock and Boettger


(1989)


found


that


in some


environments


, accountability


exacerbated


judgmental


biases


through a


"dilution


effect"


Accountable


subjects


used


a wide


range


information,


including


irrelevant


information,


when making


judgements.


A second study


found


similar

reported


relevant


results.

that ac


Gordon,


:countable


information


Rozelle,


and Baxter


interviewers


this case,


the


(1988)


considered


applicant'


less


age)


when making


hiring decisions.


In contexts where only


relevant


information


presented,


increased


attentiveness


accountable decision


makers should


improve decision quality.


where decision makers must


In environments


distinguish relevant


from


irrelevant


information,


accountability may


result


in biased


decisions.


Within most


organizational


decision making


contexts,


including performance apprai


sal,


decision makers


are


inundated with information,


much


of which has


no bearing


on the decision.


As a result,


possibility that a


dilution


effect may


influence


the quality


ratings








Accountability


and


rater


motivation


The

important


accountability

relationship


research

between a


suggests


accountability


an interesting


and


and


the


subsequent


motivation


of the


deci


sion


maker


It supports


theory


that


accountable


decision


makers


adopt


a deci


sion


strategy


that


facilitates


finding


an acceptable


solution


with


a minimum


effort.


This


view


accountability


as a


motivator


certain


information


processing


behaviors


consistent


with


several


theory


on human


motivation.


One

effects


theoretical


connection


accountability


evaluative


nature


between


motivation


accountability


behavioral


theory


The


is through


requirement


account


their


deci


sions


leads


deci


sion


makers


consider


the


criteria


which


they


must


use


when


they


stify


their


deci


sion.


This


evaluative


aspect


accountability


may,


other


things


being


equal,


motivate


individuals


to exert


greater


social


effort


loafing


Shepperd,


research


1993).


which


This


shown


is consistent


that


with


when


individual


feel


their


inputs


are


identifiable,


they


are


less likely


loaf


(Williams,


Harkins


Latane,


1981


When


inputs


are


identifiable


, it


een


propose


ed that


motivational


pressure


is dimini


shed


since


individual


are


held


accountable


their


efforts


or results


(Latane


Williams


& Harkins

fnnc W7?R7


1979;


in wh rhh


Shepperd,


1993).


nnnirn-nahi 1 i 4-sr


01 a-l-nrl -I-n


I;








rewards that


exist


in most organizational


contexts.


For


example,


suppose


raters


to account


their


supervisors


for the quality


their performance


ratings.


Those


raters


would be well


power


where


aware


form of


rating decisions


that


supervisors often hold


organizational


must be


rewards.


justified


considerable


In situations


audiences other


than supervisors,


raters may


feel


anxious about


the effect


negative


ratings would have on social


relationships within


the organization


(Klimoski


Inks,


1990).


relationship


between organizational


rewards


either


social


or economic)


and performance


been


established


elsewhere


(Mitchell,


1974;


Vroom,


1964).


core of


this


relationship


that


when rewards are associated with


performance


there


an underlying


sense of


evaluation


that


exists


prior to


achieving those


rewards.


Schlenker


(1986


argued


that


goal


setting


is related


accountability


because accountability


suggests evaluation and


goals provide


a clear


standard of


measurement.


Questions


considered in


this


research are


developed


from this


relationship


between accountability,


motivation,


evaluation.


Accountability,


the within context


performance appraisal,


can be


viewed as motivating


force on


rater.


Other research suggests


that


raters


held


accountable


those


for their


to whom they


ratings will


consider the


are accountable when


choosing


views of

a strategy







direction, rat

organizational

behavior. Tho


ers may be

context w


sensitive


hen they


se contextual


to other


determine


cues may


be in


cues


from the


acceptable


rating


form of


specific


rating outcomes


that are desired by the audience of


ratings.


Summary


accountability


literature


In summary,


there


is considerable


support


for the


idea


that accountability


information.


affects the way


decision makers


audience's views are


known,


process


decision


makers make decisions consistent


with


those


views.


In doing


they may


rely


on heuristics or


stereotypes


communicate more


easily with


the audience.


When


views


of the


audience


are


unknown,


the decision makers


use a more


cognitively


complex


information


processing


strategy


It has


been shown


that


this strategy


improves


accuracy


consistency


decisions,


but may


also


lead


to decision


biases when decision makers consider


irrelevant


information.


The effects of


research


accountability


that has demonstrated


are


consistent


a strong


and


with motivation

consistent


relationship between


the sense of


evaluation and


subsequent


behavior.


While


Klimoski


Inks


1990)


are


the only researchers


to consider the


effects


accountability


in performance


evaluations,


Bernardin


forcefully underscored


the


importance


accountability when he wrote


"Perhaps


this


single


factor,







are


held accountable


for other


expensive organizational


resources,


will


do more


improve


the effectiveness of


a PA


system than any


other technique or


intervention


that


could


be recommended"


Bernardin,


1986,


p.30


Hypotheses


Although


previous


research has considered how


generic


statements


objective of


rating purpose


this


affect


research is to


rating


consider t


quality,

he effect


one

of more


specific


rating outcomes on measures of


rating quality.


term rating


outcome as


used


this


study


refers to


specific


result


of a rating.


Rating outcomes


are


specific


implications


rating


organizational


constituents.


example,


promotion decisions


are


based


on performance


ratings,


the outcome of


a favorable


rating


would be


a promotion


for the


ratee.


The outcome


unfavorable


rating would be


that


ratee


is not


promoted.


this


example,


a desirable


rating outcome


for the


rater


may


see


his or


subordinates promoted.


achieve


this outcome


rater may


feel


pressure


to provide


a highly


favorable


rating.


This may


lead


rater to


search


only positi

raters may


examples of


be placed in


performance.


a context


In another


where a desirable


situation

rating


outcome


is to provide


performance


ratings that


are


useful


organizational


decision making.


Thi n niitnnme


can bh


'- J .-








attend


impact


a wider sample of behavior


of each behavior


carefully


ultimate


consider


rating.


Likewise,


if a desirable


rating outcome


can be


achieved by


providing


inflated ratings,


raters may


give greater weight


to positive


performance


information.


The


relationship


following predictions


between


summarize


independent


the expected


and dependent


variables


used


this


study.


Accountable


performance


raters will


ratings


provide more


and remember behavior


better than non-accountable


raters except


accurate
al episodes
in conditions


where desirable


providing


rating outcomes


inaccurate


can be achieved by


ratings.


Different


rating


outcomes will


only


affect


the


quality of performance
processing when raters


ratings and rater


are


personally


information


linked


their


performance


ratings


by the accountability manipulation.


Before


to provide


presenting


an overview of


specific


hypotheses


the manipulations


appropriate


used in


this


study.


Detailed


Chapter


information


this


study


on study


rating


design is


context


provided


was manipulated


creating two


accountability


conditions,


accountable or


not accountable,


crossed with


four rating outcome


conditions.


The


rating outcome manipulation


provided raters


information about


negative outcomes


that had occurred


because of


previous


rating decisions.


Raters were


then


encouraged


to make sure their ratings


to more desirable








ratings


an outcome where


unit's


subordinates


were


not


competitive


for promotion.


Subjects were made


aware of


this outcome and were


told


that


a desirable


rating


outcome


was


for their


subordinates


to be more


competitive


for promotions.


In a second


condition subjects were


told


that an


outcome of


previous performance


ratings was that


female


subordinates were


rated


lower than male


subordinates.


this


condition subjects were


told


that


a desirable


rating


outcome was


female


subordinates to


receive


ratings


equivalent


their male


colleagues.


In a


third


condition


subjects were


shown


that


past


performance


ratings


had been


inflated


highest


extent


ratings.


that


almost


They were


told


subordinates


that


received


the outcome


that


situation was that


the organization


could not


use


performance


ratings


to make


important


personnel


decisions.


this


condition subjects were


told


that


a desirable


outcome was to


have


performance


ratings


that


accurately


identified individual


strengths and weaknesses.


Subjects


fourth outcome


condition,


which was


used


comparison


purposes,


differed on


accountability manipulation but


not


receive


information about


rating outcomes.


Rating Accuracy


The


rating


first measure


accuracy.


rating quality


While several


considered


different accuracy measures







researchers


have


typically used one or


of the


four


accuracy measures


introduced by Cronbach


(1955).


Several


researchers


have


pointed out


that because


these accuracy


measures are not


correlated,


different measures can


lead


different


research conclusions


(Murphy


Balzer


, 1981;


Murphy


& Cleveland,


1991;


Sulsky


Balzer,


1988


result,


the conclusions


suggested by


a study of


accuracy may


depend more on researcher'


choice of


an accuracy measure


than on


the effects of


the manipulations


being


studied


(Becker


& Cardy,


1986;


Murphy


Cleveland,


1991)


. In


this


study


, Cornbach


(1955


measure of


differential


accuracy


was used.


Differential


accuracy measures the difference


between subject


ratings and


true


score


ratings of


strengths and weakness of


each rate on


each


performance


dimension.


This measure


is appropriate


because


the


focus of


this

the


study

subject


is to determine


ability to


the effect


process


accountability


complex performance


information.


The most


rigorous test


of this


ability to make


detailed distinctions


in performance


variations


is measured


differential


accuracy.


accountability


improves


rater


information


processing,


then


should be evident


the


complex task of


evaluating


individual


performance.


Improved information


processing by


accountable


raters


should


lead


to more


accurate


performance


ratings.


There


are


three


reasons why








were


provided several


cognitively


complex tasks,


justification


requirement will


make


the


performance


appraisal task salient.

justify their decisions


Second

should


subjects who


process


have


performance


information


in a way that


focuses


their


attention


on the


most


relevant


information


(Tetlock,


1985).


Third


within


setting of


performance appraisal,


accountability


should


improve

which


rating


raters


judgement


process multiple


increasing t

e pieces of


consistency with


performance


information


(Ashton,


1992).


HYPOTHESIS


Accountable


raters will


rate more


accurately than non-accountable


raters


except


when


placed in


a rating


context where desirable


achieved by providing


As suggested in


inaccurate


hypothesis


rating outcomes can
ratings.


accountable


only


raters


some conditions may


rate more accurately.


rating


context


desirable


suggests that


specific


rating outcome,


ratings will


accountable


lead


raters should


provide


those


ratings


since


those are


easiest


justify.


achieve


this desirable outcome


raters


should attend


store,


and


recall


performance


information


in way that makes


it easier to


justify the desired ratings.


In other words,


accountability


should motivate


raters


process


performance


information in a way that supports


ratings


consistent


with


desired


outcomes.


.f a ala.4 1 4 ~-- a ---- L -a n


--4-A~


LL


,, I,


,,,







outcome


that


cause


allows


the


them


rater


to choose


adequately


a decision


account


their


strategy


decisions


is consistent


with


research


that


suggests


that


rating


outcomes


will


only


affect


raters


when


there


are


personal


implications--such


as when


they


are


held


accountable.


result,


when


accountable


raters


are


placed


in a situation


where a

ratings


desirable


that


outcome


disregard


can


actual


be achieved

differences


providing


in subordinate


performance


they


should


provide


inaccurate


ratings.


Raters


who


are


not


held accountable


their


ratings


should


feel pressured

rating outcomes

when accountable


to provide


this


raters


ratings


that


study


are


achieve


situation


encouraged


to provide


esirable


created

equivalent


ratings


both


male


female


subordinat


es.


HYPOTHESIS


context
rating


where
female


performance
accountable


Accountable


a desirable
subordinates


warrants


raters


will


in the


aters
outcome
than


rating
higher
be less


comparison


placed
e can
their


accurate


a rating


be achieved


true


than


evel


by
of


non-


group.


One


rating

achieve


group


context

d only


subj


where


ects


desirable


raters


make


study


rating


accurate


was


outcomes

ratings.


placed


can

The


interaction


between


pressures


accountability


and


desire


achieve


outcomes


facilitated


accurate


ratings


should


result


in improved


rating


accuracy.







HYPOTHESIS


Accountable


raters


placed


in a rating


context where a


desirable


rating outcome can be


achieved by


rating


accurately will


be more


accurate


than


raters


in the


comparison


group


regardless of


accountability


status.


Within


current


study,


subjects


in another


condition


were given a


desired outcome


that


could be


achieved by


artificially


inflating performance


ratings.


While


this


manipulation


should affect


rating


inflation,


should not


affect


rating accuracy.


This


based


on previous


research


which has


shown


that


rating


inflation


is not


correlated with


rating


accuracy


(See Murphy


Cleveland,


1991).


Information Processina


and Recall Accuracy


This


study


also


considered whether


raters


in various


rating


contexts


processed


performance


information


differently


as indicated by


how well


subjects


recalled


specific


instances


observed performance.


Focusing on how


accurately


subjects


recall


performance


information


will


provide


insight


into


possible differences


in raters


information processing.


Hypotheses


above


predicts


that


accountable


raters


in certain


conditions will


be more accurate


raters.


This


result


should occur


because


accountable subjects more


carefully


they will


attend


use


to behavioral


episodes


information from those


consider


episodes


justify


their performance


ratings.


accountability


leads


these







subjects


in these conditions should more accurately recall


specific performance


information.


HYPOTHESIS


recall
except
outcome


Accountable


raters will more


performance episodes than non-accountable


when placed in a rating


can be


accurately
raters


context where a desirable


achieved by providing inaccurate


ratings.


Previous


research has


found some effects of


rating


purpose on rating quality.


What


is not


understood is why


this effect occurs.


Perhaps when


raters


face


a situation


where accurate


ratings would lead


an undesirable outcome


they respond


pressure


giving


an inaccurate


rating


that


they


know does


reflect


true


subordinate


performance.


Another possibility,


consistent with research


cited above,


is that


raters


selectively


attend to


information


that


facilitates the desired outcome.


this


case,


raters may


unaware


that


ratings


they


are


assigning are based


on an biased sample of performance


information.


For


example,


raters


are


in a


context


where


they want


to provide a positive


rating,


automatic


information


processing may


lead


them to attend


to only


positive


information about subordinate


performance.


inconsistent

performance


information is

information) ra


presented


(such as


ters may react


using


negative

controlled


processes


and may


choose


to attribute


the poor performance


to situational


factors beyond


control


of the


rate.


either


case,


when raters


recall


information


facilitate







flawed sample of behaviors that


have


been distorted


to meet


the desired


purpose of


rater.


this study,


subjects


in one condition were


placed in


a situation


where a desired outcome could be


achieved by


rating female


performance


subordinates


level.


higher than


as a result


their true


the outcome


manipulation,


subjects


this


condition search


for positive


performance


information and attribute


any negative


performance


information


to external


causes,


this


should


affect


their


ability to accurately


recall


performance


information.


Therefore,


hypothesis


predicts


HYPOTHESIS


Accountable


raters placed


in a


rating


context
female


where


a desirable outcome


subordinates


higher than


performance warrants will


episodes


have


than non-accountable


can be


their true 1
poorer recall


raters


achieved by


rating


evel


of performance


in the comparison


group.


Rating


Favorabilitv


Some


subjects were placed


in a


context


where a


desirable


outcome could be achieved by


inflating the


performance


ratings of


subordinates.


this case


there


should again be


an interaction between


inflationary


outcome


and accountability


conditions.


Raters who


are


required


justify their ratings


should not be


affected


because


they


have no


personal


involvement


rating


outcome.


However


, accountable


raters who


have


justify







HYPOTHESIS


rating


context


achieved by


6:
where


assigning


subordinates more


the comparison


Accountable
a desirable


raters who are placed in a
rating outcome can be


inflated ratings will


rate


favorably than non-accountable


group.


When subjects are

desirable rating outco


presented with a


me can be achieved


raters


situation where a

by artificially


inflating the


ratings of


female


subordinates,


rating


inflation is again


of interest.


HYPOTHESIS


context where
assigning in
females more


comparison


Accountable


e a desirable


flated


ratings


raters who


are


rating outcome can be


female


placed in a
achieved by


subordinates will


favorably than non-accountable


raters


rate
the


group.


Dilution Effect


A final

a dilution e


This


research


effect as


question considers

reported by Tetlock


possibility that


accountable


the


possibility


Boettge r


raters will


1989).


use


irrelevant


information when making rating


decisions.


specific


hypothesis


is made concerning this dilution


effect.


However,


accountable


subjects


give


weight


irrelevant


information


, they


should be


less accurate.


further test


for this effect


positive and

subordinate.


subjects were asked


i negative


to recall


number of


scenes they remembered seeing for


From this measure,


each


variables were created


each subject


based on


total


number of


positive and


negative


scenes


recalled.


Group means on


these


variables











CHAPTER


METHODOLOGY


This


chapter


details


the methods and


procedures


used


for this


study.


It begins with a review of


methodological


criticisms


that


have


been


directed


previous


laboratory


processes and


this


effects.


investigations of


In response


unique


research incorporates


performance


these


"technology"


appraisal


criticisms,


designed


allow


careful


evaluation


consideration


individual


of the observation and


performance within


a controlled


laboratory


setting.


The development


this


technology


also discussed


this chapter.


Next,


information about


study'


participants,


as well


as details


about


the


study's


design and procedures,


are


presented.


Methodoloaical


Issues


Laboratory


Studies


The methods


used in


this


study


address


several


criticisms of


previous


laboratory


studies which


investigated


performance


appraisal


process.


Researchers criticize


laboratory


studies


of performance


appraisal


because those


studies


failed


to consider the complexity


actual


performance evaluation


environments


(Ebbesen


Konecni


1980;


Funder,


1987;


Ilgen


Favero,


1985).


Funder


(1987)







suggested


that


"research


must


subjects


judge


real


people


in real


social


contexts


, and


use


realistic


external


criteria


determining


when


judgements


are


right


or wrong"


Another


criticism


Ebbesen


and


Konecni


(1980)


argued


that


people


may


make


judgements


real


world


same


way


they


do in


the


lab.


Many


these


criticisms


were


directed


at earlier


studies


that


created


"paper


people


provided


limited


presentations


relevant


ratee


performance.


Ilgen


and


Favero


(1985)


criticized


paper


people


paradigm


suggested


that


procedures


whi


incorporated


actual


observation


performance


directly


through


videotape


vignettes


was


a promi


sing


method.


The


methods


used


study


addr


ess


the


above


concerns


in several


way


First


, subjects


were


exposed


performance


information


on their


subordinates


during


two


hour


sessions


over


week


period.


This


significantly


more


involved


than


previous


studi


, and


extended


period


time


better


reflects


an actual


rating


environment.


second,


part


cipants


were


ass


signed


a seri


of complicated


admini


strative


tasks


which


placed


raters


in a more


realistic


supervisory


position


where


rating


subordinates


is one


many


management


tasks


that


require


their


attention.


Third


subordinate


behavior


was


depicted


on videotape


which


provided


raters


opportunity


to observe


subordinates


in a


wide


range


of diverse


contexts


common


in modern







rating relevant


irrelevant behavior.


This


technique


also


provided


opportunity to


introduce


several


different


forms of


feedback


(i.e.


direct


observation,


viewing written


work samples,


observations


from others


etc.


on actual


ratee


performance.


The


result


is a rating


context


that


reflects


an actual


work


environment.


Critical


establish


to any


"true


study


score"


accuracy


ratee


is the


performance.


need


The design


this


study,


while


limited by the


constraints of


laboratory


environment,


the advantage of


being


able


present


carefully


controlled information on subordinate


behavior


(for


forming true


scores).


This


is necessary to


establish a measure


from which


to evaluate


rating


accuracy.


Managerial


Simulation


This


study used a managerial


simulation developed


support


laboratory


investigations


of performance appraisal


settings.


The


simulation consists of


in-basket


exercises to


simulate


administrative aspects


a manger's


(see


Frederiksen,


Jensen,


& Beaton,


1972)


and a series of


videotaped performance


episodes


subordinates


introduced


within


the


in-basket.


In-basket Description


The


in-baskets


place


subjects


into


role of


Leslie







this


position


subjects


managed


activities


subordinate


managers,


Alice


Garcia,


Bill


Jensen,


Carol


Donaldson,


David


Fredericks


, and


Ed Montage.


These


subordinate


managers


are


responsible


for


divi


sion


sections


and


each


section


a staff


employees


in-baskets


included


administrative


items


designed


to provide


challenging


tasks


which


required


considerable


cognitive


effort


and


to acquaint


subjects


with


their


management


responsibilities


within


the


organic


zation


and


with


their


subordinat


es.


Administrative


items


in the


in-basket


were


representative


type


of paperwork


that


typically


confronts


mangers


on a day-to


-day


basi


Items


such


memos


from


their


supervisor


, fellow


divi


sion


chiefs,


subordinates,


or customers


were


a significant


part


the


in-basket


These


memos


would


raters


to resolve


provide


input


on organi


national


problems


, develop


police


ies,


participate


in spec


projects


, handle


human


resource


issues,


schedule


activity


Subjects


were


told


take


actions


they


thought


were


necessary


reso


problem.


These


actions


might


include


drafting


a letter


an unhappy


customer


, keeping


subordinates


abreast


important


information,


reviewing


trip


reports


, developing


plans


to resolve


personnel


problems


or advi


sing


their


supervisor


on important


policy


matters


Whil


in-basket


information


was


used


engage


subjects


in a complex







only


generic


information about


each of


subordinates


whose behavior was to


be rated.


Copies of


in-baskets


are


provided in Appendix A.


Videotape Description


While


subjects were working on


the


in-basket


, they were


shown several


short


videotaped


vignettes of


activities


occurring within


the organization.


Because


in-basket


scenario


typically


had


subjects working


their


desks,


videotapes would show


someone coming


their


office


present


information related


to problems


presented in


the


basket. Ea

performance


ch of these


vignettes was developed


information about


one of


to present


the subordinate


managers.


example,


one scene showed Alice missing a


deadline by turning


in a


written report


late.


Another scene


showed David dealing positively with


one of


subordinates.


Videotape Development


A total


separate scenes were


used


to present


total


sample of performance


information


subordinate managers.


Each scene was


performed


from a


script written


to reflect


on the


performance of


one


subordinate on one of


administration


the


four performance dimensions of


, coordination and negotiation,


supervision,







To guide development of


anchored rating


the scripts,


scale was developed


a behaviorally


for each performance


dimension.


Each dimension was


scaled from


7 with


representing the most effective


performance.


High,


medium,


and low points on


the scale have corresponding behavioral


anchors.


This


scale


served


purposes.


First,


provided behavioral


anchors


to describe examples of


effective and ineffective subordinate


performance


that


were


related


values on


the rating


scale.


Second,


these


behaviors


formed the basis


for the


behaviors that


would be


depicted in


the


vignettes.


Rating


scales used for


scene


development


were also


used by


subjects to


provide


performance


ratings.


These scales are


shown in Appendix B.


performance


scenes


used in


this


study were chosen


from a larger


sample of


scenes developed


to provide


comprehensive


sample of


subordinate


performance


that


could


be drawn


from to


support


studies


like


this one.


As stated


previously,


each scene demonstrated effective or


ineffective


performance


dimension.


for one


While


subordinate on


in-basket


one


presented


performance


subordinates,


situations were developed


to reflect on only


4 of


them.


The


fifth subordinate,


was used


facilitate


personnel


problems


presented in


in-baskets.


Performance


for


each


subordinate was presented in


situations being used t


situations with


:o reflect on


4 different


each of the







resulted


preparation


situations.


Each


situation


placed


"target


subordinate"


in a


situation


where


their


or her


supervisor


actions


, which was


could realistically


to be


viewed by


study participants.


each of


64 situations,


two


scripts were written,


one


showed


the


subordinate


performing


effectively


situation


and


the other


showed ineffective


performance.


all,


a total


scripts were


prepared.


Scripts were then


given


4 graduate


students who were


asked


read


each script


then


identify which


subordinate(


reflected on his


scene was


or her


behaving


job performance.


in a


way that


They were


also


asked


to identify which performance dimensions)


was


reflected on by the subordinate behavior.


For realism


purposes


several


scenes


showed more


than one


subordinate


even


though


intent


each scene


was to


reflect


on one


subordinate'


performance on one


dimension.


The


results of


this


review were


used


to modify


scripts


insure


clarity


both


target


subordinate


and dimension.


Permission was obtained from a


local business


use


their


facilities to


film each scene.


Actors were


chosen


from available


volunteers and consisted


fellow


graduate


students


of this


and a


faculty member


researcher.


In all


as well


over


family


actors


friends


actresses


were


involved.








True


Score


Development


Because


the


intent


of each


scene


was


to communicate


either


ective


or ineffective


subordinate


performance


, the


behaviors


depicted


the


scene


were


chosen


from


effective


or ineffective


behavioral


anchors


associated


with


appropriate


performance


dimension.


As a result


, the


intended


true


score


of each


scene


was


either


depict


effective


behavior


or 2


ineff


ective


behavior


Since


each


video


taped


scene


was


to be


used


as a component


of each


subordinate


s overall


performance


evel


studi


were


conducted


to evaluate


whether


the


intended


value


each


scene


could


be determined


independent


samples


Study


Study


DurDose


study


s purpose


was


use


experience


of independent


expert


judges


to provide


eedback


about


the


subordinate


performance


depict


in each


videotaped


scenes.


There


were


two


objectives


first


objective


was


insure


that


each


scene


only


performance


information


target


the


dimension


one


was


intended


depicted.


target


seco


subordinate


objective


on one


was


expert


rater


assessment


of the


scal


value


behavior,


based


BARS


scale


prepared


this


effort.


Partic


ioants.


Twenty-two


judges


were


use


Each


judge


prior


management


experience


and


most


judges


previous


-A-- - ---- .


t


T


F


r








Procedures.


Completed


videotaped


scenes


were


divided


into


four


sets


and


each


judge


was


asked


to evaluate


32 of


scenes


Videotapes


were


distributed


expert


judges


who


were


asked


to rate


each


subordinate


in each


scene


whose


performance


reflected


on one


four


performance


dimensions.


Judges


were


allowed


to observe


the


video


-tape


as many


times


they


desired


and


was


asked


treat


and


rate


each


scene


an independent


source


performance


information.


Each


scene


was


observed


expert


judges


copy


instructions


sent


expert


judges


provided


Appendix


Results.


The


first


consideration


when


reviewing


Study


s res


ults


was


to determine


whether


judge


identified


the


intended


target


subordinate


target


dimension


in each


scene.


In 108


scenes


judge s


entified


rated


intended


target


dimension


and


subordinate


remaining


scenes


, all


one


judges


rated


intended


target


subordinate


dimen


sion.


The


next


step


was


see


judge


ratings


the


performance


intended


information


rating


in each


value


scene


a 6


were


consistent


ective


scenes


with


and


ineffective


scenes.


Tabl


provides


the


results


judge


s ratings


scenes.


Each


scene


was


rated


judges


mean


rating


standard


deviation


the


ratings


are


shown.


each


situations


there


. A









effectively


target dimension and a B


scene which


presented ineffective


performance.


TABLE


EXPERT JUDGE'S MEAN RATINGS AND STANDARD DEVIATIONS


SCENE RTG SD SCENE RTG SD SCENE RTG SD SCENE RTG SD


1.67


23b
24a
24b


.50
.55


29b 1.6 .55
30a 5.6 .55
30b 2.3 1.89
31a 6.4 .55
31b 1.2 .45


43a
43b
44a


45a


45b 1.8 .50
46a 5.6 1.14
46b 1.5 .58
47a 6.8 .45
47b 1.4 .55
48a 6.4 .89
48b 1.8 .50


Mean


scores


judge's


ratings


high performance


scenes


(A scenes)


was


6.18


, with a mean standard deviation


and a range of


from a low of


a high of


Mean


score


judges


' ratings


low performance


scenes


a a a a. a ..... a


e n


a a n* a a a a a


4-I-A


1.4
6.3
1.4
6.4
2.00


cr-


rr*rn







Conclusions.


The


results


this


study


suggest


that


each scene depicted per

intended subordinate on


formance


information about


intended


performance dimension.


These


results also support


the contention


that


each scene


presents the


performance


intended level


information about


of effective or


simulation's


ineffective


subordinates.


Study B


Study purpose.


purpose


this


study was


determine whether two


versions of


entire


simulation,


which c

scenes,


insistedd


manipulated


a different


the


performance


level


videotaped

the


subordinates as


intended.


A second


purpose


was


statistically


consider


alternative definitions of


a true


score


subordinate


performance


presented


in the


simulation.


Participants.


Eighty-seven MBA students


enrolled in a


course on


organizational


behavior participated


in Study


participants


were


volunteers and


received


course credit


for their participation.


Procedures.


Participants were divided into


separate versions of videotaped subordinate


two groups.

performance


were prepared.


Both


versions


showed


subordinates one


scene


from all


scene


64 situations.


a situation,


one


the other


group


saw the effective


group saw the


corresponding


ineffective scene.


result


was that


each


group saw







different


examples


of performance


each


the


subordinates.


Over


both


groups


scenes


were


shown


Both


groups


of partic


pants


partic


ipated


the


full


simulation


which


included


in-basket


exercises


and


expo


sure


videotaped


presentation


of subordinate


performance


The


in-baskets


material


and


videotapes


were


shown


over


four


separate


sessions,


one


per


week


for


4 weeks.


Thi


allowed


most


evaluation


stringent


experimental


entire


conditions


simulation


After


under


the


fourth


session


, subjects


used


BARS


scal


to rate


the


performance


4 subordinate


on each


the


performance


dimensions.


Three


alternative


versions


true


scores


were


prepared


comparison


with


MBA


student


ratings


The


first


true


score


was


based


ratings


intended


when


each


scene


was


prepared.


thi


case


, for


each


scene


showing


effective


performance,


the


contribution


scene


overall


subordinate


true


score


was


considered


For


each


scene


showing


ineffective


performance


, the


contribution


true


score


was


a 2.


To develop


a true


score


the


subordinate


on a performance


dimension,


average


of the


four


scenes


shown


whi


ch reflected


on that


dimension


was


used.


For


example


, if


group


saw


effective


scenes


with


a value


and


ineffective


scene


with


a value


of 2


, the


true


score


that


dimension


would


version







second


substituting the


true


true


score was developed


actual mean


score component


ratings


rather than


same way


from expert


intended


but


judges


score.


Referring back


shown


to T


the group


able 2-1

included


subordinate


the effective


scene


performance

from


situation


i.e.


scene


true score


contribution


performance of that


subordinate was


rather than


intended


true


score


version.


Again,


true


score


a subordinate on


performance dimension


was the


average of


the expert


judge


s rating


for the


four


scenes.


This


version of


true


score


is referred


as the


"expert"


true score.


Finally,


third


ratings provided by


true


score was created


expert


judges


that


including


used all
11


any ratings


that were


not on


target


dimension or


for the


target


subordinate.


Occasionally


one or more


judges'


ratings


indicated


that


a particular


scene


reflected


on more


than


one


performance dimension or on a subordinate who


happened


to be


scene but was


target


subordinate.


determine whether these extra ratings provided important


information about


true differences


in subordinate


performance,


true


score


version was developed


that


included any rating provided by


at least


judges.


This


true score considered the


possibility that additional


information was being provided on rating performance


beyond







applicable


scores


subordinate


were


on the


averaged

dimension


to provide


This


a true


version


score

the


for

true


score


is referred


Results.


Mean


the


ratings


"all

for


ratings"


true


participant


score.

s who


observed

three al


a particular


ternative


versions


scenes

true


were

scores


compared


for


that


combination


scenes.


Each


the


two


groups


had


a seen


separate

overall


set

true


scenes


scores


that


(ratings


contributed


diff


subordinates


erent

on 4


performance


dimensions)


The


overall


mean


ratings


both


groups


was


then


correlated


with


the


three


alternative


versions


both


that


group


true


ratings


score.


were


In other


correlated


words


with


means


three


ernative


versions


trues


scores


for


scenes


they


observed


results


are


presented


Table


TABLE


CORRELATIONS


AND


BETWEEN


ALTERNATIVE


STUDY


TRUE


B GROUP


SCORES


RATINGS


VERSIONS


Variable


Group


Ratings


Intended


Score


.68**


Expert


Score


.71**


.97**


1.00


. All


Rating


Score


.68**


.97**


.90**


Note: Two-tail significance
** p<.01







Each alternative


true score definition


was


highly


correlated.


was closely


subjects who


Additionally


related


had observed


each version


the mean


those


true


ratings


versions of


score


provided by


subordinate


performance


(intended


true score


.68,


expert


rating true


score


.71,


rating true score


.68).


Conclusions.


above analysis


supports


two


important


conclusions.


First,


the managerial


simulation


effective


instrument


in meaningfully manipulating


subordinate


performance


in a realistic yet


carefully


controlled


laboratory


setting.


Study


B participants were


exposed


to 2


different


versions of


subordinates'


performance across


different


performance


dimensions


and


the average


correlations between intended


true


score


subjects


ratings was


.70.


This


suggests


that


using


different


videotaped scenes


to create different


profiles of


performance across


dimensions

performance


is an


subordinates


performance


effective means of manipulating subordinate


in meaningful


ways


second


important


conclusion is that all


three


alternative


versions


the true


score


carry very


similar


information about


subordinate


performance depicted in


videotape and as a result,


of three


versions


acceptable.


The


intended


true


score


version


provides


simplest


definition of


true


score and a


review of







score


suggests


there


little


to be


added


using


a more


complex


measure


the


true


score.


Use


the


intended


true


score


version


also


supported


information


from


Study


whi


showed


that


independent


expert


judges


consistently


rated


the


target


individual


dimension


the


inte


nded


direction


mean


value


their


ratings


closely


approximated


the


intended


value


for


ective


scenes


ineffective


scenes.


Relationship


current


study


The


methods


scribed


above


were


used


in the


current


study


to develop


a true


score


subordinate


performance


information


presented


to subj


ects.


Each


scene


which


showed


a subordinate


performing


ectively


on a performance


dimension


was


assigned


true


score


value


and


each


scene


shown


that


presented


ineff


ective


performance


information


was


ass


signed


a value


ove


rall


subordinate


true


score


on a performance


dimension


was


calculate


scenes


d by


used


taking


to present


mean


that


performance


subordinate


value


the


s performance.


This


technique


assumes


that


raters


should


give


equal


weight


each


ece


justifie


of performance


d when


using


information.


scenes


This


developed


technique


above


following


reasons


First,


each


scene


was


carefully


constructed


to depict


a subordinate


behaving


in a way


that


ose


resembled


one


behavioral


anchors


used


on each







performance


ratings.


Since


scales provide


several


behaviors


that


reflect


on a


given level


of performance,


is difficult


to determine why


assigned more weight


one behavior


than another.


Second,


should be


judges


Study A described above evaluated

specific behavior using the BARS


scale


scale.


value of


Since


their


each

ratings


closely approximated

this can be taken as


the

the


intended


true


values


contribution


each behavior,


that


behavioral


episode


the overall


level


subordinate


performance.


remainder of


this


chapter


describes the


procedures and methods


Overview of


used in


Procedures


this s


3 Used


3tudy.


in This Study


This


section


provides an overview of


procedures


followed


conducting this


study.


Special


versions of


the


managerial


simulation consisting of


in-baskets and


videotaped


presentation


of subordinate


performance were


developed


from the material


described above.


Procedures


The


experimental


portion


study was


conducted


over


a 2-week


period.


Participants were


required


attend


two


2-hour


sessions,


one during


each of the


two weeks.


During both of


these


sessions


students were


given an in-


basket


exercise.


first


week


in-basket


familiarized







a series


of challenging


administrative


tasks


During


the


sec


week


s session


a different


in-ba


sket


was


provided


ch placed


later


time


parti


cipants


weeks


in the


later)


same


ects


situation


were


but


encouraged


provide

The


detailed responses

videotaped scenes


in-basket


used


resent


items.

subordinate


performance


information


were


interspersed


throughout


both


weeks


the


simulation


with


approximately


half


scenes


shown


first


week


and


other


half


shown


the


second


week


Subjects


were


allowed


take


notes


performance


information


presented


through


the


videotaped


scenes.


They


were


unaware


that


they


would


not


allowed


use


these


notes


when


making


their


performance


evaluations


Following


second


session


students


comply


eted


performance


ratings,


observational


accuracy


check,


manipulation


check,


and


simulation


reaction


check.


These


forms


are


describe


d below


the


instruments


sec


tion.


Participants


were


required


to complete


a written


ass


ignment


based


on the


treatment


condition


to which


they


were


ass


signed.


This


assignment


was


one


week


after


the


the


second


session.


study


protocol


located


Appendix


Analv


SlS


Test


of hypotheses


were


conducted


using


t-tests


test







variables.


Additionally,


a series of


analyses of


variance


(ANOVA)


considered


differences


between


treatment


conditions.


numerator)


testing


This


compare(

allows


of manipulated


focused F-test


ci


differences


degree


in group means.


for precise consideration


variables on dependent


reasoning outlined by


Rosenthal


freedom in


This type


effects


variables and


and Rosnow


follows


1985).


Subjects


hundred


introductory


forty-seven


course


undergraduates


in management


enroll


participated


ed in an


this


research


and received


course credit


for their participation.


sample consisted of


men and


women


with a mean


age of


21.06


years.


Most


subjects were either


their


third or


fourth


year


undergraduate work with


a mean


full


time work


experi ence of


19.78 months.


Male and


female


subjects were


randomly


distributed


into


different


treatment conditions


to control


for possible


effects due


sex


participants.


As a result,


there


were


16-18


men and


13-15


women in


each


condition.


Manipulations of Accountability


and Rating Outcomes


this


accountability


study both


and available


participant


level


information about


possible


outcomes


that


could result


from their ratings were







Each participant was assigned


accountability


to one of two


conditions:


ACCOUNTABLE
evaluations


justifying their performance


NOT


ACCOUNTABLE


justifying their performance


evaluations


Subjects


the accountable


condition were


told


that


they were


required


to provide a


written


justification


their performance


ratings.


Subjects


in the


accountable


condition


were


told


their


ratings will


remain anonymous and


that


they would be


required


to complete a written


critique


the overall


simulation.


Subjects


in both conditions were


told


that


they will


receive


points of


extra


credit


for their participation and


that


amount


extra


credit


would depend on


their performance


their written assignment.


Copies of


the simulation and on

the accountability


manipulation are


included in Appendix E.


Subjects were


also assigned


to one of


four performance


appraisal


outcome


conditions.


these conditions,


subjects


were provided information about


previous performance


the outcomes or


rating decisions on


These outcome manipulations were


presented


effects of


the organization.


to subjects


each


treatment


group through a letter


from their


"supervisor"


presented


them


just


prior to


presentation of


simulation.


Additionally,


subjects


first


three


--A---I 2- -1 I.. ---I -, .L. .1








described manipulations.


These


materials are


located in


Appendix F.


A summary


each


outcome


situation


provided


below.


INFLATIONARY


OUTCOME


The


supervisor'


letter


said
have


that


subordinates


consistently


divisions
promotions


and as a


been rated
result, t


in the


subject'


lower than


here


have


division


peers


in other


been no


from the subject's division in


last


years.


EQUITABLE TREATMENT OUTCOME


letter e
women in


expressed


concern over the


the organization.


supervisor's
ratings given


supervisor


also


discussed a pending


lawsuit over the


ratings.


ACCURACY


expressed
inflation


OUTCOME


concern over


within


impossible to
administrative


etc.


encouraged


use


The su
how the


pervisor's
consistent


1


the organization has made


performance


purposes


As a result,


to provide


better
ratings
it


evaluations


(promotion,


subjects
ratings


selection


this condition were


that accurately


reflected


true differences of


subordinates within


each


performance dimension.


4. COMPARISON
provided no in
performance ev


the major


C


GROUP


formation
aluations.
challenges


The
abou


supervisor'


t


letter


effects of


It described a


previous


generic


list


facing the organization.


Subordinate Performance


Presented


In This


Study


The


videotaped


scenes of


subordinate


performance


described


earlier provided an large


number


subordinate


true


score


performance


profiles that


could be


presented.


Profiles


for this


study were


constructed


to facilitate


tests


hypotheses discussed in


Chapter


and


the


rating


outcomes


described above.


As a


result,


there were


three


t


- ~ ~ ~ ~ ~ ~ ~~~, a .aI.n--~C41-r


.. -a


*


1


r,, Lt.l


I,, L







on a dimension


was


developed


to approximate


average


score


subordinates


other


divi


sons


as presented


manipulation


given


to subjects


inflationary


outcome


condition


(i.e


a rating


The


true


score


other


subordinates


was


lower


than


the


average


shown


people


the


"inflationary


condition"


Thi


difference


was


designed


to create


pre


sure


on participants


inflate


ratings


their


subordinates


so they


would


be similar


ratings


assigned


to subordinates


in other


divis


ions


Second,


overall


true


score


for


women


was


created


lower


than


the


true


score


males


on some


performance


dimensions.


This


difference


was


designed


to create


pressure


on participants


"equitable


treatment"


condition


artificially


inflate


ratings


women


to comply


with


pressure


to provide


equivalent


ratings


to male


and


female


subordinates


third


criterion


was


to provide


variability


overall


strengths


and


weakness


ses


in each


subordinate.


This


situation


empha


sizes


need


subjects


pay


careful


attention


individual


to within


performance


subordinate


dimensions.


performance


Table


variation


presents


true


score


of simulation


subordinates


based


on scenes


selected


use


study


to meet


the


criteria


described


above








TABLE


SUBORDINATE


TRUE


PERFORMANCE


SCORES


j S.


ADMIN


ALICE 6

BILL 6

CAROLE 6


DAVID


WORK


EFFORT


COORD


SUPERVIS


OVERALL


3 5 4 4

4 6 6 5.33

5 3 3 3.66


I I


Since


equivalent


performance


information


is presented


subordinates


administrative


dimension,


overall


true


score


was


based


average


true


score


values


work


effort,


coordination


and


supervision


dimensions.


Instruments


Ratina


Format


Individuals


used


BARS


scale


specially


designed


jobs


depicted


in-basket/videotapes.


This


scale


was


discussed


earlier


and


can


be found


Appendix


Subject


ratings


were


collected


using


rating


form


found


in Appendix


Demoaraphic


Questionnaire


Subjects


answered


demographic


questions


prior


beginning


experiment.


questions


pertained







experience


copy


this


questionnaire


included


App


endi


Manipulation


Check


After


dealing


questions


final


with


sess


context


about


information


, subjects


rating


they


answered


including


received


questions


specific


in the


simulation.


Subj


ects


were


provided


with


a list


describing


each


the


material


prepared


to manipulate


the different


outcome


conditions


and


were


asked


to identify


whi


materials


they


recalled


receiving


These


results


were


used


test


that


each


group


recall


receiving


material


unique


their


condition.


Coding


variable


is di


scussed


Chapter


Copi


the


manipulation


check


are


included


Appendix


Observational


Accuracy


Check


form


behavioral


assessed


events


partic


presented


ipant


including


recall


specific


whether


event


pre


sented


positive


or negative


information


about


subordinate


performance


accurate


listed


inaccurate


statements


descriptions


that


the


were


either


scenes


shown


during


the


simulation.


Subj


ects


were


asked


to identify


scenes


actually


occ


urred.


Copies


observational


accuracy


instrument


are


included


Appendix








Post-exDeriment Questionnaire


This


form assessed subject


reactions


the entire


simulation.


included


a variety


of questions to


assess


whether


subjects


reactions differed by treatment


condition.


Question responses were


used as self-report


information


about


subjects


attitudes and


reactions towards


tasks


presented


simulation.


copy


of this questionnaire


located


in Appendix K.


Dependent


Variables


Variables


for Testina Hypotheses


Performance


ratings


Performance


ratings were collected


after the


second


session.


Subjects


rated


four


subordinates,


Alice,


Bill,


Carole,


and David on


four performance dimensions.


Since


only behavior on


three of


the dimensions varied between


subordinates,


those


ratings were


used


computation


differential


accuracy measure described below.


The mean


ratings were also


used


assess


the overall


favorability


ratings


assigned by


subjects


in different


treatment


conditions.


Subjects were


told


to record a


rating from


each


subject


with


being the most


favorable


rating.


Differential Accuracy


n S r r a ft I I


Fr


Il ., .


II r I







in accuracy


between


treatment


conditions.


Differential


acc


uracy


scores


indicate


well


the


rater


can


diff


erentiate


between


different


ratee


performances


over


different


compute


performance


using


dimensions.


equation


Thi


presented


measure


Cronbach


were


(1955)


Observational


Accuracy


This


measure


indicates


accurate


partic


ipants


recalled


spec


ific


performance


information.


the


sum


number


of performance


sodes


the


partic


ipant


correctly


identifies


as having


observed


32 episodes


described


instrument.


If a subject


correctly


identify


whether


or not


or she


saw


32 of


episodes


scribed


they


would


receive


a score


Inte


rnal


reliability


Cronbach


thi


alpha.


instrument


Thi


result


was


calculated


using


in an reliability


estimate


.82.


Variable


Additional


Analv


ses


Attentiveness


This


measure


indicates


attentive


participants


were


presentation


of performance


information.


Two


judges,


blind


the


subject


treatment


condition,


rated


subjects


attentiveness


using


the


attentiveness


rating


scale


located


Appendix


One


judge


made


ratings


during


the


first


session


and


second


judge


made


ratings


during







worked on


their


in-baskets and watched


videotaped scenes


subordinate


performance.


positioned within


room so


Judges were


they


strategically


could view participant


actions during the


presentation of


performance


information.


Judge'


assigned


each participant a rating


from


Inter-rater


reliability was determined by


calculating the


intra-class


correlation between


two


sets


of ratings.


This correlation was adjusted


using the


Spearman-Brown


correction


formula


to determine


that


reliability


the composite of


two


judges'


ratings was


.67.


ratings of


the


two


judges were


summed


form


participant's overall


attentiveness


score.


Note Takina


This measure


provides a


judgement


the quality


quantity


subject's


notes taken


voluntarily


response


performance


relevant behavior presented in


simulation.


While


subjects were


provided paper


for taking performance


related notes,


they were


encouraged


or discouraged


from


taking them.


Subjects were


unaware


that


they would not


able


use these notes when making their performance


evaluations.


sessions were


performance


related notes


combined and provided


from both


to independent


judges


blind to


subject


treatment


condition.


These


judges


independently rated both


the quality


and quantity of


notes


taken


using the


note


taking rating


scale


located


in Appendix








notes


Inter-rater


reliability


was


determined


calculating


the


intra


-class


correlation


between


the


two


sets


of ratings.


Thi


correlation


was


adjusted


using


the


Spearman-Brown


reliability


correction formula

the composite of t


to det


ermine

judges


that


the


' ratings


was


.92.


The


ratings


judges


were


summed


form


parti


cipant


s note


taking


score


Self


-report


measures


Two


scal


were


developed


from


subject


reactions


post


-experiment


questionnaire


located in


Appendix


These


scal


were


used


to support


additional


analy


ses


found


Chapter


Cronbach


s alpha


is reported


as a measure


internal


consistency


of each


variable


Encaczement


One


variable


measured


level


engagement


felt


subjects


toward


their


participation


simulation.


was


felt


that


subjects


who


were


more


cognitively


engaged


simulation


would


report


spending


more


time


thinking


about


the


simulation


specific


challenges


presented


in the simulation.


variable


based


on subject


responses


items


, m,


and


(alpha


.69)


"Post


-experiment


Questionnaire"


Concern.


Finally,


a single


item


scale


, based


on item


"Post


-experiment


Questionnaire"


This


measure


was


used


assess


how


concerned


subj


was


that


they


may


not


receive


full


credit


participation


in the


simulation.











CHAPTER
RESULTS


This

checks, t


chapter


ests


presents


of hypotheses


results

, and a


additional


manipulation

analyses.


Manipulation


Accountability


Checks


Manipulation


manipulation


check


accountability


manipulation


was


simply


whether


subjects


demonstrated


that


they


understood


turning


proper


requirements


assignment.


final


Subjects


assignment


in the


accountable


condition


justified


their


performance


ratings.


Subjects


not


held


accountable


their


ratings


prepared


written


critique


simulation.


subjects


submitted


correct


final


assignment


based


on the


accountability


manipulation.


Ratina


Outcome


Situational


Manipulation


list

the


After


which

rating


second


described

outcome c


week


every


S session


letter


conditions.


subjects


or memo


Each


letter


used


were


shown


to manipulate


described


was


seen


subjects


only


one


the


rating


outcome


conditions.


Subiects


were


asked


to identify


which


the







material


they


recalled


recalled


letter


receiving


and


they


placing


did


they


recall


letter.


Seven


letters


were


described.


subjects


in the


inflationary,


equitable


treatment


and


accuracy


outcome


conditions


had


been


expo


two


the


letters,


subjects


comparison


group


had


seen


only


one


letters.


Four


variable


were


constructed


which


consisted


sum


of each


subject


S score


letters


that


were


related


eac


situation.


he or she had


seen


For


example


both


, if


lett


a subj


ers


ect


that


reported


were


that


presented


one


the


outcome


conditions,


they


would


receive


that


of material


they


reported


they


hadn


seen


either


those


letters,


they


would


receive


that


condition.


Ideally


, a subj


ects


assigned


to the inflationary


outcome


condition


would


receive


material


pres


ented


that


condition


and


material


presented


other


conditions


Tabl


shows


the


group


means


each


relevant


outcome


condition


for


the


variable


comprised


responses


condition.


desc


Means


ription


subjects


mate


ass


rial


signed


provided


to other


that


groups


who


receive


those


mat


erial


also


provided.


average


score


subjects


ass


signed


inflationary


outcome


condition


on materials


they


had


been


shown


was


If each


member


group


had


correctly


identified


that


they


received


the


letters


that







TABLE
MANIPULATION


Number of


CHECK RESULTS


Items Reported BY


Relevant


Material


Experimental


Groups


Other


Groups


Inflationary

Equitable
Treatment

Accuracy

Comparison
Group


Mean


Mean

0.47


SD

.60


0.50


would have


been


a 2.


Mean score on


inflationary


outcome


materials


subjects


assigned


the other


conditions


was


.47.


members of


those conditions


had


correctly


recalled


that


they


had not


seen


inflationary


outcome


letters


letter was


the group mean


shown


score would be 0.


to subjects


Because only


comparison


one


group,


score


subjects


that


condition should be


score


subjects


assigned


the other conditions


should


be a


One-tailed


t-tests were


used


to determine


subjects


in groups


that had been shown


letters designed


present a specific manipulation answered yes


to seeing those


letters more consistently than subjects


in groups


who


had


not been


shown


letters.


t-tests


showed significant


_wI







recalled


receiving.


Therefore,


despite


receiving


over


pages


memos


and


letters


within


the


two


in-baskets


, thi


analy


SLS


shows


that


subjects


who


were


provided


material


designed


to present


a certain


manipulation


recalled


receiving


those


material


more


consistently


than


subjects


other


conditions.


Methods


of Analysis


Indep


endent


sample


t-tests


are


used


cons


ider


hypothesis


different


ces


between


groups.


Additional


analy


ses


included


a series


x 2 analyst


variance


ANOVA)


to support


a focused


look


the


effects


of each


rating

group.


outcome c

Finally,


conditionn


LISREL


when


use


compared

d for st


the


ructural


comparison

equation


modeling


anal


ysis.


Tests


of Hvpotheses


Independent


sample


t-tests


were


used


test


hypotheses


presented


Chapter


Each


test


made


one


comparisons


depending


on the


hypothesis


being


tested.


One


group


t-tests


compared


subj


ects


in sel


ected


cell


with


non-accountabi


subjects


in the


comparison


group.


These


latter


subj


ects


were


held


accountable


for


their


ratings


and


were


not


provided


a desirable


rating


outcome


manipulation.


As a result


, they


can


be considered


a control







contrasted differences


between accountable


subjects and


non-


accountable


subjects within a specific


rating outcome


condition.


Consistent


with


the directional


predictions


stated in


each hypothesis,


t-test


are one-tailed


and


considered


significant


when


probability


type


error


for the


resulting t-value was


less


than


or equal


.05.


tests


used


test


specific


hypotheses are


considered


"protected"


against


possibility


capital


zing on


chance


they were either planned before


the data was


collected


or they


are


supported by the overall


F computed as


part of


the ANOVAs discussed


later


this


section.


This


method is consistent


with


arguments made by


Rosenthal


Rosnow


(1984).


For


additional


discussion see


Ballam


(1963).


Ratina


accuracy


first


rating


set of


accuracy that


hypotheses


result


considered


from different


differences


treatment


conditions.


Table


3-2


presents


cell


means


each


condition based on


accuracy.


the dependent


Because differential


variable of


accuracy


differential

a measure of


rating


error,


subjects with a higher


score on


the measure


had more


error


and were


therefore


less accurate.








TABLE


MEAN VALUES


AND STANDARD


DEVIATIONS


DIFFERENTIAL ACCURACY


BY TREATMENT CONDITION


Accountability


Inflation


MEAN


Equitable
Treatment


MEAN


Accuracy


MEAN


Compare
Group


MEAN


Overall


Mean


No

Overall


HvDothe


Hypothesis


predi


that


accountable


raters


would


more


accurate


than


non-accountabi


raters


except


when


esirable


inaccurate


rating outcomes

ratings. The


could be ac

differential


h


ieved by

accuracy


providing

obtained


accountable


subj


ects


comparison


group


was


compared


that


of non-accountable


subjects.


As shown


Table


accountable


subjects


were


more


accurate


.00,


.74)


than


non-accountabi


subjects


, SD


.65)


1.84,


.05).


Hypothe


was


tested


comparing


diff


ere


ntial


accuracy


obtained


accountable


subj


ects


inflationary


and


accuracy


outcome


conditions


with


their


non-accountable


counterparts


inflationary


group,


accountable


subjects


were


significantly


more


accurate


.03, SD


.55),


than


non-accountable


subjects


.38,







accuracy was also


found in


the accuracy


condition.


Accountable


subjects were


significantly more accurate


,SD


.53)


than non-accountable


subjects


1.23,


.23)


(t(61


= 2.24,


p<.05).


Hypothesis


was


supported


in all


three conditions where accountable


subjects were


predicted


to be more


accurate


than non-accountable


subjects.


Hvyothesis


Hypothesis


predicts


that


accountable


subjects


equitable treatment


outcome condition,


who were


encouraged


to rate male


and female


subordinates


equally,


would be


less


accurate


than non-accountable subjects


in the comparison


group.


This analysis does


support


hypothesis.


fact,


pattern of means


shown in


Table


means


suggests that accountable


subjects


in the equitable


treatment


condition


were more accurate


1.20,


.51)


than non-accountable


subjects


comparison


group


1.34,


1.52


Because one-tailed


tests are


used


this


analysis


this difference


in means


is not


interpretable.


HyDothesis


Hypothesis


predicts that


accountable


subjects


presented with a rating context


where accurate


ratings are


desired would be more


accurate


than subjects


in the


comparison


group


regardless of


accountability


status.


This


hypothesis was partially


supported.


Accountable


subjects


accuracy treatment


group


.91,


.53)


were








the


comparison


group


.34, SD


.65)


(t(60)


.88,


p<.01)


However,


suggests


that


while


their


the


mean


patt


erns


diffe


means


rental


shown


accuracy


table


was


lower


ess


error)


than


accountable


subj


ects


in the


compare


son


group


.00,


.74),


that


difference


was


significant


Observational


t(60)


accuracy


Tabl


presents


scores


observational


accuracy


separately


each


condition.


TABLE


MEAN


VALUES


AND STANDARD


DEVIATIONS


OBSERVATIONAL ACCURACY


BY TREATMENT CONDITION


Accountability


Inflation


Mean


Equitable
Treatment


Mean


Accuracy


Mean


Compare
Group


Mean


Overall


Mean


Yes

No

Overall


24.4


24.1


24.3


24.1


23.1


24.1


24.1


23.8


23.5


24.0


HvDothesi


Hypothes


4 predicts


that


accountable


subjects


would


have


more


accurate


recall


performance


episodes


than


non-


accountable


observational


subjects.


accuracy


Within


comparison


accountable


raters


group,


= 24


the


, SD


was


significantly


greater


than


that


non-


.e.


FOR


..







However,


outcome


inflationary


conditions accountable


more accurate


than subjects


outcome and


subjects were


held accountable


accuracy


significantly


for their


performance


ratings.


Within


inflationary


condition


accountab

recalled


subjects more


performance


accurately


episodes


25.7,


than non-accountable


subjects


= 23.1,


= 4.1)


(t(61)


= 2.74


, <.


01) .


Within


accuracy treatment


condition accountable


subjects


also


more


accurately


= 24.1,


= 3.1)


recall


ed more


performance


episodes


than non-accountable


subjects


= 23


(t(61


2.54,


<.01).


Hypothesis


was


supported in


two of


three


conditions where


subjects were


placed


in a rating


context


where desirable outcomes


could be achieved by making


inaccurate

Hypothesis


ratings.

5


Hypothesis


, predicts


that accountable


subjects


placed


in a rating


context


where a desirable outcome could be


achieved by


rating female


subordinates


higher than


true


level


of performance warrants,


would have


lower


observational


accuracy than non-accountable


subjects.


was predicted


that


these


raters would be susceptible


contextual


pressures and adopt an


information


search


strategy


in which


they


attended to only positive


information.


However,


because


analysis of


hypothesis








than


raters


comparison


group,


that


suggests


that


these


raters


should


have


a better


recall


of performance


sodes.


Consistent


with


res


ults


found


for


hypothesis


hypoth


esis


means


was


shown


supported


in Table


either.


means


In fact,


suggests


that


patt


ern


accountable


subj


ects


equitable


treatment


condition


scored


higher


on the


measure


of observational


accuracy


= 24.4


,SD


than


non-


accountable


subj


ects


comparison


group


= 23


, SD


Again,


because


one-tail


tests


are


use


d in


analyst


this


difference


means


interpretable.


Favorabilitv


of Mean


Ratings


Tabi


condition


presents


based


overall


overall


means


mean


for


ratings


each


ass


separate


signed


subjects.


TABLE


MEAN VALUES


AND


MEAN RATINGS


STANDARD


BY TREATMENT


DEVIATIONS


FOR


CONDITION


Accountability


Inflation


MEAN


Equitable
Treatment


MEAN


Accuracy


MEAN


Compare
Group


MEAN


Overall


SD MEAN


No

Overall


1







HvDothesis


Hypothesis


outcome manipulation


considered


the effect


and predicted


that


the


inflationary


accountable


subjects


in this situation would

subordinate performance


provide more

in response


favorable

to context


ratings of

al pressure


to make


their subordinate'


ratings more competitive with


subordinates


subjects


in other


in this


divisions.


condition made mo


As predicted,

re favorable


accountable

ratings (M =


, SD


.40)


when


compared


non-accountable


subjects


the comparison


group


= 4.7


.41)


(t(61)


2.73,


p<.01).


These accountable


subjects also


rated


subordinates


more


favorably than


subjects


the same condition who were


held accountable


for their ratings


= 4.7


(t(61


= 2.73


.01) .


As mentioned during the discussion


this


hypothesis,


previous


research has


suggested


that


raters


can


inflate


ratings without


accuracy.


significantly


This analysis


reducing differential


supports that conclusion.


Accountable


subjects did rate


subjects more


favorably


, but


they did


so in a way that


still


recognized


true


differences


in performance between subordinates


and across diverse


performance dimensions.


these


results


suggest,


they


equally


inflated all


ratings to


the same degree,


they were


substantially penalized on a measure of


differential


accuracy.


r E<








HvDothesi


Table

condition


3-5

based


presents

on the


overall

overall


mean

mean


rating


each

s ass


separate


signed


subjects


their


female


subordinates.


TABLE


MEAN VALUES


AND STANDARD


DEVIATIONS


FEMALE


MEAN


RATINGS


BY TREATMENT


CONDITION


Accountability


Inflation


MEAN


Equitable
Treatment


MEAN


Accuracy


MEAN


Compare
Group


MEAN


Overall


MEAN


Yes

No

Overall


3.9 .68 3.7 .70

3.6 .73 3.8 .51


Subj


ects


equitable


treatment


outcome


condition


were


exposed


to a rating


context


where


a d


esirabl


outcome


could


be achieved


rating


male


and


female


subordinates


equally


Hypothesis


predicts


that


only


accountable


raters


this


condition


would


be aff


ected


thi


manipulation


would


rate


their


femal


subordinates


more


favorably


The


results


earlier


hypotheses


have


shown


that


accountable


subjects


this


condition


were


not


ess


accurate


and


did


have


poorer


rec


of specific


behavioral


events


Consistent


with


this


, hypothesis


was


also


supported.


Contrary


were


to prediction


more


, mean


favorable


ratings


accountable


female


subjects


subordinates


the


FOR







than for non-accountable


subjects


comparison


group


= 3.8,


.51).


The


use of


one-tailed


tests again makes


this difference


in means


interpretable


Dilution Effect


A final


research


question presented


Chapter


dealt


with


the dilution


effect,


possibility that


accountability may


lead subjects


give


weight


irrelevant


performance


information.


Since


subjects who were


accountable were more


accurate on measures


of observational


accuracy


and differential


accuracy,


there


is a


strong


implication


accountable


that


subject


performance


was


information


diluted by


gathered by


irrelevant


information


about


subordinate work


performance.


Additional


data


gathered


for this


study


supports


this


implication.


Two dependent


variables were


constructed


that


summed


the


total


number of


positive


and


the


total


number of


negative


events


recalled by


subjects.


assumption behind


creation of


these


variables was that


accountable subjects


had attended


to and weighted irrelevant


information


, they


would subsequently


recall


a higher


number


of events when


they made


their performance


ratings.


Results of


t-tests


show there was


a significant


difference between


number of


positive


events


recalled by


accountable subjects


17.8,


6.96)


as compared







1.51,


.133).


There


were


no significant


differences


number


negative


scenes


reported


accountable


subjects


11.5


, SD


3.62)


as compared


non-


accountable


subjects


,SD


= 4.29)


t(243)


.37,


.171)


These


results


do not


support


a dilution


ect


Summary


of t-test


results


The


res


ults


t-tests


of hypotheses


suggest


that


the


accountability


manipulation


to improved


accuracy


Across


three


distinct


rating


contexts,


accountable


subjects


rated


their


subordinates


more


accurate


than


eir


non-


accountable


counterparts


When


accountable


subjects


were


required


to provide


ratings


they


could


not


tify,


the


case


artifi


cially


inflating


female


ratings


relative


to mal


, they


resisted


situational


pressure


to do


Within


inflationary


accuracy


outcome


conditions


, accountable


subj


ects


recall


ed information


about


specific


incidences


subordinate


performance


more


accurate


than


non-


accountable


subjects


However,


the


comparison


group,


accountable


subjects


did


have


significantly


more


accurate


recall


than


non-accountable


raters


These


results


suggest


that


holding


raters


accountable


may


lead


ers


to attend


to performance


information


in a way


that


makes


them


have


better


recall


specific


performance


events.







Additional


2 X 2


Analyses


ANOVA Results


Whereas the


t-tests


hypotheses


compared


results


between


the overall

As a result


selected conditions,


effects of


additional


they


different

analyses


did not


directly test


rating outcome


conditions.


using ANOVA allowed


focused


test of


contrast


differences


between


subjects


each rating outcome


condition with subjects


the


comparison


group.


Inflationary


outcome


condition


Table


presents the


results of


ANOVA


considering the


effect


a rating


context


where


a desirable


outcome can be achieved by


inflating the


rating of


subordinates


as compared


the comparison


group.


The ANOVA results


reported in


Table


shows that


there was an interaction between


accountability


and rating


outcome


(F(1,122)


3.69,


<.05).


Based


on means


reported


in Table


, it


appears


that only when raters were


held


accountable were


they


influenced by


contextual


pressure


inflate


their ratings.


A Duncan


s multiple


range


test


reported significant


(p<.05)


differences between


the mean


ratings of


accountable


raters


inflationary


outcome


condition and the other three conditions contrasted


this


analysis.


This


supports


prediction


that


raters will be








TABLE


RESULTS


OUTCOME


ANOVA


CONDITION


COMPARING


WITH


THE


COMPARISON


INFLATIONARY


GROUP


Source of Mean
Variation Rating

F Signif

Accountability 4.47 .037*

Rating Context 2.92 .090

Interaction AxR 3.69 .050*

Note: df = 1, 122
*p <.05 ** p 1.01


influenced


to achieve


specific


rating


outcomes


when


they


are


linked


their


ratings


accountability


Equitable


treatment


condition


Table


presents


results


of a 2


ANOVA


considering


effect


of a rating


context


where


a des


irable


outcome


can


achieved


inflating


ratings


of female


subordinates


insure


they


are


equal


to ratings


mal


subordinates


Table


shows


a significant


interaction


between


accountability


and


rating


context


was


predicted


that


accountable


raters


equitable


treatment


condition


would


rate


female


subordinates


more


favorably


than


raters


comparison


group.


However,


as suggested


the


patt


mean


ratings


femal


found


Table


indicates


that


. ,. r


ern


- -


,I







TABLE


RESULTS


OF 2X2


ANOVA


COMPARING


THE


EQUITABLE


TREATMENT


OUTCOME


CONDITION


WITH


COMPARISON


GROUP


Source of Female Mean
Variation Rating

F Signif

Accountability 6.61 .011*

Rating Context 5.21 .024

Interaction AxR 4.76 .031*

Note: df = 1, 120
*p s.05 ** p 1.01


female


Range


means


test


(F(1


was


, E<


conduct


test


.05)


, a Duncan


significant


s Multiple


differences


between


conditions


These


results


reveal


significant


diff


erences


.05)


between


favorability


rating


provided


non-accountable


subj


ects


equitable


treatment


condition


subjects


other


three


conditions


considered


ANOVA.


This


result


is interesting


because


contrary


other


findings


this


study


the


equitable


treatment


condition,


subjects


were


held


acc


ountable


their


ratings

present


was


rated

ed with

insure


femal


a rati


subordinates

ng context w


performance


more


here


ratings


favorably

a desirable


female


when


outcome


subordinates


equal


those


es.








Accuracy


treatment


condition


Tabl


-9 presents


ANOVA


results


comparing


differential


acc


uracy


of subje


the


comparison


group


with


subjects


the


accuracy


outcome


condition


who


were


encouraged


to provide


more


accurate


ratings


facilitate


organic


national


deci


sion


making.


TABLE


RESULTS


OF 2X2


ANOVA


COMPARING


THE


ACCURACY


OUTCOME


CONDITION


WITH


COMPARISON


GROUP


Source of Differential
Variation Accuracy

F Signif

Accountability 8.13 005**

Rating Context .80 .374

Interaction AxR .00 .991

Note: df = 1, 122
*p <.05 ** p <.01


overall


significant


effect


accountability


differential


accuracy


already


been


discussed.


This


analysis


shows


there


was


no significant


effect


rating


context


no interactions


Summarv


ANOVA


results


results


contrast


analy


5GB


show


that


only


accountable


raters


who


were


placed


the


rating


context


where


they


could


achieve


a desirable


outcome


inflating


-- v








their


ratings.


Accountable


raters


in the


equitable


treatment


group


were


not


ected


a rating


context


where


a desirabi


femal


rating


subordinates


outcome

more f


could


avorably.


be achieved


Similarly


rating

while


accountable


subjects


accuracy


outcome


condition


had


lowest


condition,


accurate


mean


their


than


rating


ratings


accountable


on differential


were


raters


accuracy


significantly


in other


across


more


conditions.


Structural


Model


Anal


VSis


Results


reported


so far


show


that


subjects


who


were


required


justify


their


ratings


recalled


specific


performance


events


more


accurately


(obs


ervational


accuracy)


rated


accuracy)


performance


more


To explore


accurate


factors


(differential


that


contribute


observed


relationship


several


process


variables


were


considered.


One


self


-report


variabi


was


constructed


from


student


responses


to the post


-experiment


questionnaire


discussed


Chapter


This


variable


measured


level


engagement


felt


simulation.


subj


Subj


ects


ects


toward


who


were


their


more


parti


cipation


cognitively


engaged


simulation


were


expected


to spend


more


time


thinking


about


simulation


and


specific


challenges


presented


in the


simulation.


In addition


"engagement"


variable


two








Attentiveness


measured


how


attentive


individual


subjects


were


when


subordinate


performance


information


was


presented


via


television


monitor.


The


quality


and


quantity


note


taking


was


measured


based


on how


useful


voluntary


notes


taken


the


subjects


would


have


been


they


had


been


used


during


the


performance


evaluations.


Table


sents


means,


standard


deviations


and


correlations


accountability,


observational


accuracy,


and


differential


accuracy


as well


the


self


-report


and


behavioral


variable


es.


TABLE


MEANS


, STANDARD


DEVIATIONS


BETWEEN BEHAVIORAL AND


, AND


CORRELATIONS


SELF-REPORT VARIABLES


Variable


Means


Accountability

Observational


.50 (


24.21


-.18


Accuracy


Differential
Accuracy

Engagement

Attentiveness


-.27


10.77


-.22


-.19**


-.26**


Note


Two-tail


taking

signifi


-.34**


chance.


< .01


.13*


= 247


Note


Note


Accountability was
for non-accountable


Differential


subject
indicate


coded


for accountable


subj


ects


and a 2


subjects


accuracy measures


ratings. A
s increased


a result
accuracy.


average


lower


scores


level


error


on the measure


Note


Reliabilities


are reported


in the diagonal.


Based


on the


correlation


between


variables


shown


-.40**


**


,26~"








explore


their


possibility


ratings


that


responded


subjects


that


required


pressure


account


adopting


certain


attitudes


and


behaviors


relative


to the rating


task.


These


attitudes


and


behaviors


might


have


contributed


improved


observational


accuracy


and


diff


erential


accuracy


Ratin.Q


Process


Model


A model


was


developed


which


proposed


that


accountable


subj


ects


were


more


attentive


to subordinate


performance,


took


more


notes


about


subordinate


behavior


, and


were


more


engaged


in dealing


with


problems


presented


simulation.


As a result


, higher


scores


these


three


variable


would


related


improved


observational


accuracy


which


in turn


would


lead


more


accurate


performance

relationship


evaluations.


variable


Base

were


on these


analyze


propo


in the


sed

following


causal


order


: first


accountability;


second


attentiveness,


note


taking


, and


engagement;


third


observational


accuracy;


fourth


diffe


rental


accuracy


Structural


equation


modeling


was


used


test


relationships


between


accountability,


rating


process


variables


, and


both


measures


accuracy


Structural


equation


because


modeling


it allows


provides


both


advantages


assessment


and


this


situation


modification


theoretical


model


(Anderson


Gerbing,


1988)


significance


tests


were


used


assess


structural








First,

variable


a model


and


which


specified


subsequent


a direct


dependent


path


variables


between


was


each

compared


a series


of nested


model


where


certain


paths


were


restricted


zero


using


sequential


-square


difference


approach


recommended


Anderson


and


Gerbing


(1988)


was


Second,


used


the


exploratory


significance


purposes


level


to help


individual


determine


paths


best


fitting


nested


model.


One


advantage


of structural


equation


modeling


using


LISREL


Jores


kog


& Sorbom,


1993)


that


allows


significance


testing


structural


model


after


correcting


attenuation


unreliability


of the


measures


following


analy


ses


reliability


the


accountability


variable


was


ass


umed


to be


Reliability


differential


accuracy


difficult


assess


because


each


subject


only


one


score.


However,


that


score


actually


represents


the


average


error


for


the


subject


across


their


ratings.


Cronbach


s alpha


was


estimated


treating


the


error


in each


rating


as a single


item


then


computing


the


internal


consistency


items


.65)


Reliability


estimates


the


remaining


variable


were


presented


Chapter


and


are


shown


in Tabl


Error


terms


were


directly


entered


into


structural


equations


using


following


formula:


- Reliability]


X Variance.


The


base


model


Model


analy


zed had


direct


paths


from