Title: Token reinforcement, choice, and self-control in pigeons
CITATION THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00102722/00001
 Material Information
Title: Token reinforcement, choice, and self-control in pigeons
Physical Description: Book
Language: English
Creator: Jackson, Kevin D., 1957-
Copyright Date: 1993
 Record Information
Bibliographic ID: UF00102722
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: ltuf - AKB4274
oclc - 30935145

Full Text













TOKEN REINFORCEMENT, CHOICE,
AND SELF-CONTROL IN PIGEONS



















BY

KEVIN D. JACKSON


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1993















ACKNOWLEDGEMENTS


I thank the members of my Ph.D. committee, Marc Branch,

Marvin Harris, Hank Pennypacker, Donald Stehouwer, Frans van

Haaren and especially my committee chairs Timothy D.

Hackenberg and E.F. Malagodi. Karen Anderson provided

expert assistance with the figures. Jeff Arbuckle commented

helpfully during the design of the experiment. Charlene

Kruegar did most of the initial subject training and

assisted with early program writing. Eric Jacobs and Cindy

Pietras often served as surrogate experimenters, and kept

the lab running through the duration. A special thank you

goes to the wonderful people of the Alachua County

Association for Retarded Citizens for providing support

throughout the conduct of this study and especially during

the write up. I thank my family for providing important

social contingencies regarding my commitment to this

project. I especially thank my wife, Linda, and my

daughter, Julie, for their tolerance, patience, and love.

Finally, my thanks go to Metallica and Ted Nugent for

setting such high standards and for providing an auditory

context in which to work.


















TABLE OF CONTENTS




ACKNOWLEDGEMENTS . . . . . . . .

ABSTRACT . . . . . . . . . .

GENERAL INTRODUCTION . . . . . . .
Self-Control as Behavior . . . .
Individual and Cultural Benefits
of Self-Control . . . . . .
Experimental Analyses of Self-Control .
Experiments with Pigeons . . .


Human Self-Control and
Differences . .


Interspecies


EXPERIMENT 1 . . . . . .
Method . . . . . .
Subjects . . . .
Apparatus . . . .
Procedure . . . .
Results . . . . . .
Discussion . . . . .

EXPERIMENT 2 . . . . . .
Method . . . . . .
Subjects and Apparatus .
Procedure . . . .
Results . . . . . .
Discussion . . . . .

GENERAL DISCUSSION . . . .

APPENDIX . . . . . . .

REFERENCES . . . . . .

BIOGRAPHICAL SKETCH . . . .


iii


. iv

. 1
. 1

S 3
S 7
S 8

. 16


S . . . . 67
S . . . . 67
S . . . . 67
S . . . . 67
S . . . . 69
S . . . . 72

S . . . . 91

S . . . . 100

S . . . . 102


109















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

TOKEN REINFORCEMENT, CHOICE,
AND SELF-CONTROL IN PIGEONS

By

Kevin D. Jackson

August, 1993

Chairperson: Dr. E. F. Malagodi
Cochair: Dr. Timothy D. Hackenberg
Major Department: Psychology

In a choice between an immediate small reinforcer and a

delayed large reinforcer, an organism exhibits "self-

control" if it chooses the delayed reinforcer and

"impulsiveness" if it chooses the immediate reinforcer.

Under such procedures, humans generally exhibit self-control

but pigeons usually respond impulsively. Six pigeons were

exposed to self-control procedures involving illumination of

light-emitting diodes (LEDs) as a form of token

reinforcement. In a discrete-trials arrangement subjects

chose between 1 and 3 LEDs; each LED was exchangeable for

2-s access to food. In Experiment 1, subjects responded

impulsively, consistent with predictions of the ideal

matching law applied to LED reinforcement, and with previous

findings in pigeons. However, within-session patterns of

responding were more consistent with predictions of the

iv









ideal matching law applied to food scheduling. Differences

in food delays for the 2 choices, that favored the small-

reinforcer choice, prevented a clear assessment of the role

of LEDs in determining choice. In Experiment 2, the

relative influence of LEDs and food was investigated in the

same subjects with delays to food from either choice

response equal under most conditions, but unequal in others.

All subjects exhibited more self-control in Experiment 2

than in Experiment 1. Four subjects preferred the delayed

large reinforcer during an arrangement that closely

resembled typical human procedures, suggesting that the

nature of the consequences of choice responding may account

for previously reported differences in the choice responding

of humans and pigeons. Token-reinforcer arrangements may

promote self-control in a manner similar to commitment

procedures. The LEDs probably functioned as conditioned

reinforcers, although their discriminative properties may be

more relevant to the obtained self-control.
















GENERAL INTRODUCTION


Self-Control as Behavior

We speak of self-control when, despite the presence of

contingencies that increase the likelihood of one class of

behavior, an individual engages in an alternative behavior

that is more beneficial in the long run. For example,

choosing a piece of fruit from the refrigerator, instead of

one's favorite pastry, in order to improve overall health.

Self-control is frequently used, not only as a description

of a valued form of behavior, but mistakenly as an

internalized explanation for that behavior. Unfortunately,

this practice does little to promote an understanding of the

origins and mechanisms of self-control, and perpetuates the

myth that self-control and other behavioral patterns are the

result of inexorably mysterious processes.

Behaviorists also recognize the importance of self-

control, not as an internalized trait, but as behavior to be

explained. Radical behaviorists in the tradition of B.F.

Skinner focus on relations between historical and current

contextual factors and the occurrence of self-control, as

well as on technologies for enabling humans to acquire and

benefit from repertoires that are sensitive to long-term









2

consequences. In the seminal textbook chapter on this topic

(Skinner, 1953, chap. 15), Skinner defined self-control as

engaging in one behavior (controlling response) that alters

the occurrence of another behavior (controlled response),

thereby producing a more valuable outcome. Thus, the

controlling response of counting to ten when angry may

decrease the probability of hitting someone (controlled

response) thereby avoiding the potentially aversive

consequences of fighting. Skinner also discusses varied

situations in which individuals produce or remove a

controlling stimulus of some response, in which they change

the relationship between behavior and its consequences,

arrange for deprivation, or manipulate an emotional

variable. Often, self-control involves the manipulation of

verbal stimuli: for example, making and then following a

list of tasks to be completed. Stating to oneself the

beneficial outcome(s) of some behavior--a rule about the

behavior and its consequences--may also exemplify self-

control.

Recognizing self-control as behavior may help reveal

the variables of which self-control is a function. It may

also yield important practical benefits, such as new self-

control techniques and technologies for teaching self-

control. Skinner attributed much of his own success to the

use of behaviorally based strategies of self-control

(Skinner, 1979), and even co-authored a book containing











self-control techniques relevant to behavioral changes

accompanying old age (Skinner & Vaughan, 1983). Others have

adopted Skinner's strategy, endorsing the application of

behavioral principles toward teaching self-control (e.g.,

Mahoney & Thoresen, 1974; Runck, 1982; Stuart, 1977). Thus,

self-control may proliferate through exposure to

scientifically based rules about behavior and through the

application of scientifically based technologies.



Individual and Cultural Benefits of Self-Control

Much important human behavior can be viewed in terms

consistent with self-control, that is, operant behavior

functionally related to temporally remote consequences. For

example, consider a person who encounters a valued item

while shopping, perhaps a stereo system, but lacks the cash

to purchase it. The person may use a credit card, gaining

immediate possession of the stereo, but with the unfavorable

remote consequence of less money due to interest payments on

the credit card. Self-control is said to occur when instead

of purchasing on credit, the person saves enough cash to buy

the item directly at some future time, thereby avoiding the

added cost of interest on money borrowed. Techniques for

achieving this type of self-control may include cutting up

all one's credit cards or only buying items on a premade

shopping list.









4

At the level of individual behavior, self-control often

determines success in life. An individual who saves money

for greater long-term gains or studies now because of job

opportunities later is likely to benefit in the long run and

be more successful over the course of a lifetime. Many

stories of individual human success and greatness involve

forgoing of immediate gains and behaving instead toward some

long-term objective such as solving an important problem or

completing an extensive project.

Human cultural patterns may also be viewed in terms of

self-control. No single individual can build a highway,

operate a manufacturing plant, or cultivate the crops

responsible for feeding a nation. Instead, such tasks

require the collective behavior of many individuals,

behavior that occurs because of its relationship to

important deferred outcomes. Culture may thus be viewed as

a system by which human behavior (cultural practices) is

collectively brought under control of valuable deferred

outcomes. Cultural evolution can be explained in terms of

the relationship between cultural practices and important

outcomes, particularly outcomes involving increased energy

flow, decreased reproductive pressure, and, in

hierarchically stratified societies, differential advantages

for members of the upper strata (Harris, 1974, 1977, 1980,

1981, 1989). In the case of culture, the behavior of many

individuals is brought under the control of remote











consequences through the arrangement of more immediate

socially administered reinforcement and punishment and

through verbal practices that include rules relating

behavior to arbitrary, nonarbitrary, and sometimes

supernatural consequences (Glenn, 1985, 1988; Malott, 1988;

Skinner, 1953, 1974).

As important as self-control is to human success, so is

the failure to respond to deferred consequences at the root

of many problems facing both individuals and the cultures of

which they are members. Many stories of individual human

failure involve "impulsive" responding or behavior

controlled by relatively immediate consequences. An

individual behaving under control of short-term outcomes,

for example, by spending hours each day watching television

instead of learning new job skills, by consuming goods and

services at a rate in excess of income, or by the daily

self-administration of drugs, will not fare well in the long

run. A frightening implication of this account of self-

control is that as the market place is increasingly flooded

with electronic entertainment devices, video games, video

tapes, advanced audio components, and other computerized

toys capable of providing hours of seemingly endless

varieties of relatively immediate reinforcing outcomes,

individuals may be increasingly less likely to engage in

behaviors related to long-term, individually beneficial











consequences, and hence, less likely to succeed at life

(Skinner, 1986).

Social problems ranging from the AIDS epidemic, in

which the more immediate reinforcement of unprotected sex

overrides the potentially lethal outcome, to pollution and

the destruction of the earth's ozone layer, in which more

immediate financial gains outweigh tremendous environmental

costs, can be viewed as failures to respond to important

deferred outcomes. Similarly, the growing national debt,

substandard housing construction in hurricane prone areas,

and the needless depletion of natural resources, all involve

failures of deferred consequences to exert control over

current behavior.

Although cultural evolution involves selection by

deferred outcomes, a culture may also fail by not responding

to even more remote consequences of some of its practices

(Glenn, 1988). Indeed, the history of human cultural

evolution reveals repeated cycles of adopting new modes of

production, momentarily improving living standards, and

intensifying production until ecological limitations are

met, producing catastrophic consequences for participants in

the culture (Harris, 1977, 1980). In response to such

catastrophes, a process of radical transformation begins,

new cultural practices are selected, and the pre-existing

culture no longer survives. These catastrophes are

avoidable by increasing investment in the development and











adoption of more efficient technologies, adjusting the rate

of production intensification, and tolerating sustained, but

less severe, reductions in living standards. As Skinner put

it, "The evolution of culture is a gigantic exercise in

self-control" (Skinner, 1971, p. 205). In other words, a

culture survives when it is responsive to the remote

consequences (reinforcing and aversive) of its practices

(Skinner, 1971, 1981). Responding to deferred outcomes is

thus at the heart of behavioral ethics and the high value

placed on cultural survival. For all of these reasons,

self-control may be the most important problem faced by the

behavioral and social sciences.



Experimental Analyses of Self-Control

Experimental analyses of self-control focus primarily

on the role of procedural and historical factors on choices

of individual subjects. Typically, concurrent schedules

with two response options are used and each option (choice)

is associated with its own reinforcement schedule

(Herrnstein, 1961). The experimental arrangement for

studying self-control typically involves a choice between a

larger, delayed reinforcer and a smaller, more immediate

reinforcer. Under these conditions, choice of the delayed

reinforcer is defined as "self-control" whereas choice of

the immediate reinforcer is defined as "impulsiveness."

Investigations of self-control have focused on reinforcement











schedule parameters, type of reinforcement, degree of

deprivation, experimental history, the availability of

different responses and stimuli during experimental

sessions, and other characteristics of experimental

subjects.

Experiments with Pigeons

Pigeons have served as subjects in most nonhuman

studies of self-control, with access to food (grain) as the

reinforcer and key pecking as the choice response. When

faced with a choice between an immediate small reinforcer

and a delayed larger reinforcer, pigeons almost invariably

prefer the smaller, more immediate reinforcer (Ainslie,

1974; Logue & Pena-Correal, 1984; Logue, Rodriguez, Pena-

Correal, & Mauro, 1984; Mazur & Logue, 1978; Rachlin &

Green, 1972; see review by Logue, 1988). For example, Mazur

and Logue (1978) exposed 4 pigeons to a choice procedure

with 31 discrete choice trials per session. Reinforcement

rate was held constant by starting each trial 1 min from the

onset of the preceding trial. Trials began with the

illumination of the left and right keys, green and red

respectively. A single peck on the right key, fixed-ratio 1

(FR1), resulted in 2-s access to grain. Each left keypeck

produced a 6-s delay period, followed by 6-s access to

grain. All subjects preferred the immediate reinforcer,

pecking the right key on nearly every trial.











Lea (1979) demonstrated that pigeons prefer a more

immediate reinforcer over an equivalent delayed reinforcer,

even when rate of reinforcer access is greater when the

delayed reinforcer is chosen. This demonstrates the potent

effects of reinforcement immediacy, for pigeons' choice

responding is extremely sensitive to rate of reinforcer

access when there is no prereinforcer delay across

alternatives (de Villiers, 1977). In a related study,

Logue, Smith, and Rachlin (1985) demonstrated that pigeons'

choices in a self-control paradigm were insensitive to

postreinforcer delay, except when prereinforcer delays were

equal and postreinforcer delays affected the rate of

reinforcer access.

A notable exception to the usual finding of

impulsiveness in pigeons occurs if subjects are given an

opportunity to commit in advance to receiving the larger

delayed reinforcer (Rachlin & Green, 1972). In Rachlin and

Green's experiment, five pigeons were first exposed to a

standard self-control arrangement. Using a discrete-trials

procedure, a single peck on a red choice key produced

immediate access to 2-s food, whereas a single peck on the

green key produced 4-s access to food after a 4-s delay.

Within one session, all subjects showed exclusive preference

for the red key (immediate reinforcer) that was maintained

throughout subsequent exposures to this choice arrangement.










10

Next, subjects were presented with a concurrent chains

schedule. At the start of each choice trial both response

keys were illuminated white (initial link) and a fixed-ratio

(FR) of 25 keypecks, distributed in any way between the two

keys, produced a blackout of T seconds. The terminal link

followed the blackout and depended on the location of the

25th keypeck. If the 25th keypeck was on the right key, the

terminal link consisted of the original choice situation

described above. If the 25th keypeck was on the left key,

only the green key was illuminated in the terminal link, and

only the larger delayed reinforcer was available. The value

of T was manipulated across experimental phases. For all

subjects, the number of large-reinforcer choices (left

keypecks during the initial link) and entries into the

terminal link associated with only the large reinforcer

increased as the value of T was increased from 0.5 to 16 s.

Preference reversals occurred in 3 subjects; that is,

pigeons that primarily pecked the right key at shorter

values of T switched over to the left key as the value of T

was increased. These subjects preferred the delayed, larger

reinforcer and thus exhibited self-control, when given an

opportunity to commit to that option far enough in advance

of the availability of the smaller, more immediate

reinforcer.

Impulsive responding under the standard self-control

arrangement and the preference shifts observed in the











Rachlin and Green study are consistent with the ideal

matching law, an equation that is useful for describing and

predicting pigeon performance under two-component concurrent

schedule arrangements (Baum & Rachlin, 1969; Herrnstein,

1970):

Bl/B2 = AlD2/A2Dl.

In this equation Bi and B2 represent the number of responses

on alternatives 1 and 2, respectively, and Al, A2, D1, and D2

represent the reinforcer amounts (A) and prereinforcer

delays (D) associated with the two options. According to

this equation, the proportion of responses allocated to an

option is equal to the relative reinforcer value of that

option, where reinforcer value is defined as the product of

magnitude and immediacy (1/delay) of reinforcement. With

concurrent FR1 schedules, subjects tend to choose the

preferred option exclusively (e.g., Herrnstein, 1958; Logue

& Pena-Correal, 1984); under such arrangements the matching

law is useful primarily as a predictor of the direction of

preference. If the ratio B1/B2 is greater than 1,

preference for option 1 is predicted, and if less than 1,

preference for option 2 is predicted. In Rachlin and

Green's (1972) initial procedure, treating the large-

reinforcer choice as option 1, the ratio B1/B2 would be less

than 1 (substituting a small nonzero delay value for the

small reinforcer), which is consistent with the obtained

preference for the smaller immediate reinforcer. In later











conditions the value of T is added to the delay value of

each option; thus, as the value of T increases so does the

ratio of B1/B2. The increasing number of large-reinforcer

choices, observed as T increased in the Rachlin and Green

experiment, was therefore in qualitative agreement with

predictions of the ideal matching law. The matching

equation also predicts a preference reversal, from the small

reinforcer to the large reinforcer, as the value of T

increases. This occurred in 3 of 5 subjects of the Rachlin

and Green study and has since been replicated in many other

studies with pigeons as subjects (e.g., Ainslie, 1974;

Green, Fisher, Perlow, & Sherman, 1981; Navarick & Fantino,

1976).

Interestingly, Logue and Pena-Correal (1985) found that

pigeons' choices in a self-control procedure were not

affected by changes in deprivation. Four pigeons were each

deprived to 65%, 80%, and 90% of their free-feeding weights

and were exposed to 5 different choice arrangements under

each deprivation level. As predicted by the matching law,

large-reinforcer choices increased as delays to the small

reinforcer approached the value of the large-reinforcer

delay. The failure of deprivation to alter choice

responding suggests that deprivation produces the same

percentage change in the value of each reinforcer (Logue,

1988).









13

An important exception to the finding of impulsiveness

in pigeons and to predictions of the ideal matching law

involves a fading procedure developed by Mazur and Logue

(1978). Two groups of 4 pigeons each were studied.

Subjects in the experimental group were first exposed to a

discrete-trials choice between 2- or 6-s access to grain

each delayed 6-s from a choice. All subjects preferred the

large reinforced. Over the next 11,000 trials, the delay to

the small reinforcer was gradually reduced towards 0 s

(fading). Subjects nearly always chose the large reinforcer

across conditions in which the delay to the small reinforcer

was greater than 3 s, a finding consistent with the matching

law. When the delay to the small reinforcer was 2 s or

less, a value at which the matching law predicts exclusive

preference for the small reinforcer, 2 subjects continued to

prefer the large reinforcer and all subjects continued to

make large-reinforcer choices at least some of the time.

Subjects in the control group were only exposed to the

terminal condition of the experimental group and then to a

condition in which the small reinforcer was delayed 5.5 s.

Unlike subjects in the experimental group, these subjects

showed nearly exclusive preference for the small reinforcer

when it was delivered immediately, a finding consistent with

predictions of the ideal matching law. Logue and Mazur

(1981) showed that the self-control observed in the fading

subjects partly depended on the presence of stimuli









14

(overhead lights) during the delay that were differentially

associated with the two choices. These stimuli apparently

enhanced the value of the delayed larger reinforcer. A

later study confirmed the effects of this fading procedure

on self-control in pigeons. Using an equation that includes

parametric estimations of sensitivity to delays and amounts

of reinforcement, it was shown that the choices of pigeons

exposed to the fading procedure were more sensitive to

variations in reinforcer amount than to reinforcer delay

(Logue et al., 1984).

Other exceptions to the adequacy of the ideal matching

law for predicting preference in self-control arrangements

with pigeons include some concurrent-chain schedule

situations with equivalent variable-interval (VI) schedules

in the initial links and fixed-interval (FI) schedules in

the terminal links. With equivalent VI schedules in the

initial link, responses are distributed across both options

and terminal links are entered equally often from either

option (Fantino, 1977). Relative response rate serves as

the measure of preference under such schedules. Green and

Snyderman (1980) manipulated reinforcer delay by altering

the length of terminal-link FI components. Pigeons were

exposed to a choice between 6-s access to grain after a long

delay and 2-s access to grain after a shorter delay. When

the ratio of delays was 6:1 and 3:1, preference for the

large reinforcer decreased with increases in the absolute











value of the delays. With a delay ratio of 3:2, the

relative rate of large-reinforcer responses increased with

increases in delay values. Both of these findings are

inconsistent with matching-law predictions of no change in

preference when delay ratios are constant. Green and

Snyderman also examined predictions of the delay-reduction

hypothesis (Fantino, 1969, 1977), a model that bases

reinforcer value on the reduction in delay to food

associated with the onset of terminal components. This

model is consistent with the changes observed under delay

ratios of 6:1 and 3:2, but, like the matching law, predicts

no change when the delay ratios are 3:1. Navarick and

Fantino (1976) obtained some results consistent with both

the matching law and the delay-reduction model. When the

value of the terminal link FI (delay) associated with the

small reinforcer was consistently 10 s shorter than the

large, the number of large-reinforcer choices increased as

the value of both terminal FIs increased. However, similar

increases in large-reinforcer choices occurred when

reinforcer delays (FI values) were equal, a finding

consistent with delay reduction, but not the matching law.

Grosch and Neuringer (1981) exposed pigeons to a series

of self-control arrangements similar to those used by

Mischel (1974) with human children as subjects. Trial

durations alternated between 5 and 15 seconds; subjects

could wait until the end of a trial and receive a preferred









16

grain mixture or peck a key during the trial and receive an

equal amount of a less preferred grain. Grain preferences

were determined prior to the experiment by presenting both

grains at once and observing which grain mixture was

consumed first. Self-control was measured as the time

subjects waited before responding. Self-control was

influenced by a number of variables that, more or less,

resembled those manipulated by Mischel. (Some of Mischel's

research is discussed below.) Pigeons exhibited less self-

control when food was visible (although the presence of food

increased self-control when key pecks were required to

obtain the preferred grain), when stimuli correlated with

food (feeder lights) were present, or when food was

delivered immediately before choice trials. Adding an

alternative response manipulandum during the delay increased

self-control (see Logue & Pena-Correal, 1984, for a similar

finding). Prior reinforcement of waiting increased self-

control and prior punishment of waiting decreased self-

control. While these findings illustrate some of the

commonalities in the choice responding of humans and pigeons

under self-control arrangements, substantial performance

differences have also been observed.

Human Self-Control and Interspecies Differences

In contrast to pigeons, human subjects generally

exhibit self-control in laboratory settings (Logue, Pena-

Correal, Rodriguez, & Kabela, 1986). Logue et al. (1986)











exposed adult females to choices between reinforcers of

varying amounts and delays, similar to the choices given to

pigeons by Logue et al. (1984). Subjects pressed a button

that delivered points exchangeable for money following

sessions. Access to the button was controlled by pushing a

rod to the left or right (choice responses). The first

experiment involved a discrete trials self-control

procedure. Unlike pigeons, humans in this study preferred

the larger delayed reinforcer over the smaller more

immediate reinforcer in most cases, although response bias

made it difficult to interpret the choices of some subjects.

During the remaining experiments, subjects were exposed to

concurrent VI schedules with various arrangements of delays

and magnitudes of reinforcement for the two options. When

faced with a choice between a small, relatively immediate

reinforcer and a larger delayed reinforcer, all subjects

made a greater number of delayed-reinforcer choices than

characteristically made by pigeons or predicted by the ideal

matching law. In 30 of 38 cases in which the matching law

predicted preference for the more immediate reinforcer, the

humans preferred the delayed reinforcer. These findings are

consistent with many other studies of human choice which

deviate from matching-law predictions and from the usual

pigeon findings. Instead of matching, humans' choices tend

toward maximizing overall obtained reinforcement, and are

less sensitive to the diminishing effects of delay on









18

reinforcer value (e.g., Belke, Pierce, & Powell, 1989; Flora

& Pavlik, 1992; King & Logue, 1987; Mawhinney, 1982; Millar

& Navarick, 1984; Navarick, 1986). There are various

possibilities for explaining the differences in the choices

of humans and pigeons, some of which will be reviewed below.

Molar maximization models of choice, which assume

behavior maximizes overall obtained reinforcement, are most

consistent with human self-control performance (e.g.,

Houston & McNamara, 1985; Rachlin, Battalio, Kagel, & Green,

1981). Some studies, upon which these models are based,

have demonstrated preference for a larger more delayed

reinforcer by nonhuman subjects, when such a choice

maximizes energy intake and minimizes energy expenditure;

procedural discrepancies, however, make it difficult to

compare these findings directly with the studies reviewed

here (for further discussion see Logue, 1988). From this

perspective, the failure of molar maximization models to

account for pigeons' performances under self-control

arrangements is the result of limitations on the time frame

over which costs and benefits are balanced. Such

limitations could be argued for on an evolutionary basis or

could be viewed as a result of historical or procedural

factors. Unfortunately, it is unclear at present which of

these variables is critical and even whether pigeon and

human differences are best characterized in terms of

maximization models of behavior.











Performance differences between humans and pigeons

under self-control procedures might also result from the

participation of human subjects in extensive verbal

communities outside of the laboratory that, especially in

capitalistic societies, are likely to support adherence to

maximization strategies differentially (Mawhinney, 1982).

In addition to directly reinforcing maximization, such

histories likely establish repertoires of following

maximization rules and stating rules to oneself about how to

respond in ways that maximizes reinforcement. Such an

interpretation is consistent with behavioral theory

(Skinner, 1974; also see Horne & Lowe, 1993, for an

excellent discussion) and is supported by direct evidence

that experimenter provided rules can influence responding

under experimentally arranged contingencies (Bentall & Lowe,

1987; Catania, Matthews, & Shimoff, 1982; Horne & Lowe,

1993; Solnick, Kannenberg, Eckerman, & Waller, 1980) and by

inferential evidence that self-stated rules influence

responding during some human experiments (Baron & Galizio,

1983; Horne & Lowe, 1993; Laties & Weiss 1963; Lippman &

Meyer, 1967; Logue et al., 1986; Lowe, Harzem, & Bagshaw,

1978; Matthews, Catania, & Shimoff, 1985; Sonuga-Barke, Lea,

& Webley, 1989). Sometimes instructions explicitly

encourage maximization patterns, as in the Logue et. al.

study, in which written instructions to the subjects

included the statement, "Your task is to earn as many points












as you can" (p. 161). That such rules contributed to the

observed tendency to maximize is supported by post-

experimental questionnaires in which subjects reported that

they were attempting to maximize the total points earned and

that they did this by trying to time the delays and

durations characteristic of button availability. More

recently, similar correlations between human subjects'

verbal reports and patterns of responding were obtained

under various concurrent schedules (Horne & Lowe, 1993).

The authors of this study clarified how the responding of

verbal adult humans in operant experiments often involves an

interaction of verbal processes with experimental

contingencies.

Human verbal and social histories are also implicated

in developmental studies of self-control. Sonuga-Barke et

al. (1989) exposed 4, 6, 9, and 12-year-old children to

choices between 1 and 3 tokens exchangeable for candy or

toys after the session. Preference was assessed with

concurrent VI schedules of block pressing. Presses on one

block produced a 10-s delay and delivery of 1 token; presses

on the alternate block resulted in 3 tokens after a delay

that ranged from 20 to 50 s across different conditions.

With these delay values, reinforcement could be maximized by

shifting preference from the large to the small reinforcer

as the delay to the large reinforcer was increased. Some of

the 4-year-olds and all of the 12-year-olds showed this









21

pattern. While the 12-year-olds showed dramatic preference

shifts, however, the 4-year-olds' shifts were from near

indifferent responding to preference for the smaller more

immediate reinforcer. The 4-year-olds reported a strategy

of picking the large reinforcer, although they did not do so

with any consistency. The 12-year-olds gave reports that

corresponded to their performance and indicated a strategy

of attempting to maximize reinforcement by timing the delays

and counting tokens. The 6- and 9-year-olds showed

consistent preference for the larger reinforcer and, like

the 12-year-olds, their individual verbal reports

corresponded well with their choice responding. The results

suggest a developmental sequence in which, between the ages

of 4 and 6, children learn to wait for larger delayed

reinforcers, and between the ages of 9 and 12, learn to

wait, or not wait, for a larger reinforcer depending on

overall obtained reinforcement. These changes were likely

aided by accompanying changes in rule stating and rule

following repertoires.

Other developmental studies described by Logue (1988)

and Mischel and Mischel (1983), also implicate verbal

processes in choice. In these studies, children (3 to 12

years old) choose between preferred and nonpreferred

edibles. The preferred edible was determined on the basis

of prior verbal reports of the subjects. During single-

trial experimentation, subjects were instructed to wait for









22

the experimenter to return to get the preferred snack but to

signal for the experimenter to return to get the less

preferred snack. The measure of self-control was the time

spent waiting for the experimenter to return. Generally,

the longer the experimenter was away, the more likely it was

that subjects would not wait. Subjects were also less

likely to wait for the less preferred snack than the more

preferred snack. Older children were more likely to wait

and to wait longer than younger children (a similar

developmental finding has been reported by Burns & Powers,

1975). Self-control is improved in these studies when

subjects engage in distracting activities during the wait;

restate the rule about getting the preferred snack by

waiting; make general or abstract statements about the task

(e.g., it is good to wait); avoid making statements about

the taste, texture, or consumable characteristics of

edibles; and avoid looking at the edibles. Older children

are more likely to describe and engage in these strategies

for improving self-control and to prefer choice situations

more conducive to self-control (e.g., situations in which

the edibles are out of sight). Verbal reports may be seen

to correspond with performance in these studies, in that

children who report using the above strategies usually wait

longer, and children who wait longer are usually better at

describing these strategies for improving self-control.

Waiting by children is also increased when the experimenter











provides instructions describing successful waiting

activities. Together, these findings suggest that the type

of self-verbalizations determines the length of waiting and

as children grow older they become more skilled at engaging

in verbal strategies during the wait.

It is possible that some self-stated rules about

forthcoming reinforcers serve a function analogous to the

overhead lights present during the delay interval associated

with the larger reinforcer in pigeon studies involving delay

fading (Logue & Mazur, 1981; Logue et al., 1984; Mazur &

Logue, 1978). With both pigeons and humans, events (lights

or rules) during the delay that are differentially

associated with obtaining the larger reinforcer enhance

self-control. These delay-fading studies might also relate

to pigeon and human self-control differences, in that human

adults are more likely to have had experiences analogous to

the fading history of the pigeons that demonstrated more

self-control. In any case, results showing that both

pigeons and younger (less verbal) humans tend to respond

impulsively in self-control situations, and that verbal

processes play a role in human performance, strongly suggest

that verbal history is an important determinant of self-

control in humans.

Verbal processes cannot explain all species differences

in self-control, however (van Haaren, van Hest, & van De

Poll, 1988). Van Haaren et al. investigated choices of male











and female rats between 1 and 3 food pellets. Presses on

the right lever produced the larger (3-pellet) reinforcer,

whereas presses on the left lever produced the smaller

(1-pellet) reinforcer. When each reinforcer was preceded by

a 6-s delay, all subjects preferred the large reinforcer.

When the delay associated with the small reinforcer was

decreased to 0.1 s, all subjects continued to prefer the

large reinforcer. When contingencies associated with the

levers were reversed, most of the subjects switched levers

and continued to prefer the larger, more delayed reinforcer.

In a second experiment with different rats as subjects, the

small reinforcer was always delivered after a 6-s delay and

the large reinforcer was delayed either 9, 15, 24, or 36 s

during different conditions. Most subjects consistently

preferred the large reinforcer. Rats' choices in this study

differed from those of pigeons under similar arrangements,

more closely resembling human performance. Among the

interpretations of the differences between pigeons and rats

considered by van Haaren et al., was the notion that

elicited key pecks might contribute to the impulsive

responding typical of pigeons.

It is well known that a stimulus paired with food

presentation will elicit stimulus-directed pecking in

pigeons (Schwartz & Gamzu, 1977). Poling, Thomas, Hall-

Johnson, and Picker (1985) demonstrated that a red key

paired with a small reinforcer (3-s access to grain) was











more often the target of elicited key pecks than a

simultaneously presented blue key paired with a larger

delayed reinforcer (9-s access to grain). Lopatto and Lewis

(1985) investigated the role of elicited pecks in a single

key self-control arrangement, in which pecking a key during

periodic 4-s presentations produced a small reinforcer (2-s

access to grain), while not pecking resulted in a larger

reinforcer (4-s access to grain) delivered after the key was

darkened. Subjects responded impulsively, pecking the key

on 95% of trials. When pecking no longer produced the small

reinforcer and canceled the large reinforcer, pigeons

continued to peck on 75% of key illuminations, suggesting

that elicited pecks also contributed to the impulsiveness

observed in the first procedure. The role of elicited key

pecks in standard two-key self-control arrangements with

pigeons has not been determined, although the studies cited

here suggest that elicitation may add to the impulsiveness

observed in some of these experiments.

Finally, procedural differences involving the nature of

the consequences may contribute to the reported performance

differences between humans and pigeons in studies of choice

and self-control. In most human experiments, consequences

consist of points (token reinforcers) that are exchangeable

for money some time after the experimental session. Humans

may be more likely to demonstrate self-control because there

is no advantage to obtaining points quickly, since they











cannot be exchanged until the session is over. Thus, the

point arrangement characteristic of human studies may favor

maximization over the length of the session. In pigeon

studies, on the other hand, the typical consequence is food,

an unconditioned reinforcer of more immediate consummatory

value. This arrangement may favor impulsivity. Consistent

with this interpretation are reports of impulsiveness in

humans when food (Ragotzy, Blakely, & Poling, 1988) or

escape from unconditioned aversive stimuli (Navarick, 1982;

Solnick et al., 1980) are consequences of choice responding.

Ragotzy et al. (1988) demonstrated impulsiveness in

humans when food was the consequence of choice responding.

Severely retarded human adolescents chose between 1 and 3

Cocoa Puffs. Choices were made by touching one of two

different colored cards, each associated with one of the

reinforcer options. All 3 subjects preferred the large

reinforcer when both reinforcers were delivered immediately,

but as the delay to the large reinforcer was increased

across conditions, preference shifted strongly in favor of

the small reinforcer. The human subjects in this study

responded somewhat differently than pigeons, preferring the

large delayed reinforcer under some parameters in which the

matching law predicts strong impulsiveness. However, unlike

the human subjects in prototypical choice studies, and more

like pigeons, these subjects failed to maximize

reinforcement, responded impulsively, and were sensitive to











the diminishing effects of reinforcer delay on reinforcer

value. In a second phase of the experiment, the delay to

the small reinforcer was increased across conditions and

preference shifted back to the large reinforcer, a result

that is also consistent with previous findings in pigeons

(Green et al., 1981; Rachlin & Green, 1972). While the

Ragotzy et al., 1988, study lends some support to the notion

that impulsiveness is more likely with immediately

consumable reinforcers, the atypical impulsive responding in

their human subjects could also be related to the verbal

deficiencies characteristic of the severely retarded.

Solnick et al. (1980) investigated choices of female

college students who solved math problems while wearing

headphones. In one condition, after 15 s of exposure to

white noise (90 dba) played through the headphones, subjects

were given a choice of pressing one button that turned the

noise off immediately for a short duration (90 s) or

pressing an alternate button that turned the noise off for a

longer duration (150 s) after a delay of 30 s. Unlike the

verbal human adults in most studies, these subjects

responded impulsively, strongly preferring the immediate

reinforcer (noise termination). A 15-s delay was added to

both options for a second group of subjects, by scheduling

the choice opportunity at the start of each trial. Subjects

exposed to this condition showed exclusive preference for

the larger, more delayed reinforcer, a finding that











resembles previous reports with pigeons (e.g., Green et al.,

1981).

Negative reinforcement by noise termination also

produced impulsive responding in adult human college

students in a study by Navarick (1982). Navarick's subjects

increasingly preferred the small reinforcer as the delay to

the large reinforcer was increased, preferred immediate

reinforcement over an equal duration of delayed

reinforcement, and preferred a large reinforcer over a small

reinforcer when both were delivered immediately.

Navarick and associates have also examined the effects

of other reinforcers with humans. Impulsivity was

demonstrated in at least some of the human subjects when

either access to a video game (Millar & Navarick, 1984) or

slides of entertainment and sports personalities served as

choice consequences (Navarick, 1986). Another study

(Navarick, 1985) examined choice when illumination of

indicator lights that the subjects were told to react to

with a "pleasant feeling" served as consequences of choice

responding. In this case, subjects demonstrated preference

for large over small amounts of reinforcement (duration of

illumination) when no delays were scheduled for either

choice but did not prefer the immediate to the delayed

reinforcer when reinforcer amounts were equal. This finding

raises the possibility that the instructions regarding the

point reinforcers in human self-control studies may also











play a role in the obtained insensitivity to large-

reinforcer delays, although insensitivity of the type

demonstrated by Navarick was not apparent in the Logue et

al. (1986) study. Together Navarick's work shows that adult

human choices are generally more sensitive to differences in

reinforcer amount than reinforcer delay, and because the

magnitude and reliability of delay sensitivity varied

considerably between the reinforcer types investigated, that

qualitatively different reinforcers likely have different

propensities for producing impulsiveness (Navarick, 1986).

In regards to the present discussion, the finding of

impulsiveness in many of these studies when reinforcers of

more immediate value serve as consequences of choice, and

the failure to show impulsiveness in human studies when

points serve as reinforcers, further suggests that the

characteristic consequences of choice in pigeon and human

studies (food vs. points) may contribute to the

characteristic differences in choice and self-control.

In summary, the finding that pigeons respond

impulsively under self-control arrangements and that adult

humans typically demonstrate self-control is often explained

in terms of the verbal processes characteristic of humans

(e.g., Mawhinney, 1982) and the limited capacity for

temporal integration in pigeons. The finding that adult

humans respond impulsively when negative reinforcement or

access to positive reinforcers with more immediate value









30

serve as choice consequences suggests that the type of

reinforcement may be involved in the previously reported

species differences. The present experiments investigated

this possibility with pigeons as subjects, using tokens as

consequences of choice, responding in a self-control

arrangement that more closely resembles the typical human

paradigm. Figure 1 illustrates the rationale for this

investigation.









REINFORCEMENT
WITH IMMEDIATE VALUE


YES


NO


IMPULSIVENESS


UNCONDITIONED REINFORCEMENT
(FOOD AND NOISE TERM NATION)


IMPULSIVENESS


UNCONDITIONED REINFORCEMENT
(FOOD)


SELF-CONTROL


TOKEN REINFORCEMENT
(POINTS)


9



TOKEN REINFORCEMENT
(LED ILLUMINATION)


Figure 1: A summary of research findings in self-control
experiments with pigeons and humans. The two left quadrants
show that impulsiveness has usually been found with both
pigeons and humans when reinforcement has immediate value.
The upper right quadrant represents the usual finding of
self-control in humans when token reinforcement is used.
The present experiment was conducted to provide information
for the lower right quadrant and assessed the responding of
pigeons with token reinforcement.


1


I















EXPERIMENT 1


The points delivered as consequences in human operant

studies may be viewed as token reinforcers (Gollub, 1977;

Kelleher, 1958; Malagodi, 1967). Token reinforcers are

usually physical objects, delivered according to some

schedule of reinforcement, that can be exchanged for some

other (terminal) reinforcer. Tokens, however, can be

defined more generally as conditioned reinforcers "that the

organism may accumulate and later exchange for other

reinforcers" (Catania, 1992, p. 400). In token reinforcer

arrangements a discriminative stimulus is usually associated

with exchange periods, during which a specified "exchange"

response involving the token(s) is followed by presentation

of the terminal reinforcer. Thus, the token reinforcer

paradigm involves a schedule of token reinforcement, a

schedule of exchange periods (exchange schedule), and a

schedule of reinforcement of exchange responses by the

terminal reinforcer (Malagodi, Webbe, & Waddell, 1975;

Waddell, Leander, Webbe, & Malagodi, 1972; Webbe & Malagodi,

1978). All three schedules of the token paradigm are also

components of the point reinforcer system used in human

operant studies. The typical procedural arrangement with









33

humans differs from the token reinforcer paradigm, however,

in the following 3 ways: (1) point delivery consists of

incrementing a counter instead of delivering a physical

object; (2) the exchange response involves manipulation of

verbal stimuli that correspond to points instead of

manipulating a token object itself; and (3) the terminal

reinforcer consists of money (a generalized conditioned

reinforcer), instead of an unconditioned reinforcer.

The token reinforcer arrangement characteristic of

human studies may produce self-control in a manner similar

to the commitment response procedure described earlier

(Rachlin & Green, 1972). Recall that self-control was

increased in this study when pigeons were provided with an

opportunity for advance commitment to the large reinforcer.

Similarly, by choosing a larger number of points during the

session, humans are committing to a greater amount of money

after the session. In both cases, a choice at time X

determines the availability of reinforcement at a later time

(X + T). With humans, in-session choices determine the

magnitude of post-session (post-T) monetary reinforcement.

For pigeons, commitment responses within a session determine

food availability after T seconds. Interestingly, if the

matching law were applied to humans' choices using the

delays and magnitudes of monetary reinforcement, preference

for the larger amount of reinforcement would be predicted.

The pervasiveness of self-control in human subjects may









34

simply be a replication of the effects of scheduling choices

far enough in advance of the availability of reinforcement

(e.g., Green et al., 1981).

The points delivered in studies with humans might also

contribute to the obtained self-control. The correspondence

of points to the amount of monetary reinforcement, resembles

the correspondence of overhead lighting to the amount of

food reinforcement that was shown to promote self-control in

pigeons (Logue & Mazur, 1981).

This interpretation de-emphasizes the importance of

points and implies that they are subordinate to the

scheduling of monetary reinforcement in determining humans'

choices. Whether or not this is true of nonhumans' choices

is not known. The token reinforcement schedule, however, is

often considered to be subordinate to the exchange schedule:

the token derives its reinforcing function from the terminal

reinforcer that is available only during exchange periods.

Also, while patterns of token reinforced behavior usually

resemble those characteristic of the token reinforcement

schedule, the obtained rate of behavior and within session

changes in patterns and rates across intertoken intervals,

are determined by the exchange schedule (e.g., Malagodi et

al., 1975; Waddell et al., 1972; Webbe & Malagodi, 1978).

An extreme example of this is the extended pauses observed

under token reinforcement schedules during times and

stimulus conditions most remote from the exchange period











(e.g., Kelleher, 1958; Malagodi et al., 1975; Waddell et

al., 1972; Webbe & Malagodi, 1978).

The present experiment investigated pigeons' preference

under a token reinforcer arrangement similar to the typical

human procedure involving point delivery. Choices (pecks on

lighted side keys) during discrete trials resulted in the

illumination (delivery) of either 1 or 3 LEDs (tokens).

Each LED could be "exchanged" for 2-s access to grain by

pecking a center key during exchange periods. Exchange

periods were initially scheduled after each trial; the ratio

of trials to exchange periods was then increased across

phases until a single exchange period was scheduled at the

end of the session. Increasing this ratio in successive

phases was done to encourage the development of conditioned

reinforcing properties of the LEDs by initially providing a

strong correlation between LED presentation and food

availability, before gradually increasing the periodicity of

exchange periods. Gradually increasing the ratio of trials

to exchange periods may also minimize the response-weakening

properties of increasing exchange schedule values (Waddell

et al., 1972). Also, the exposure of subjects to exchange

periods with increasing numbers of LEDs to exchange periods

across phases, provided a rich history of correspondence

between LEDs and the number of food deliveries available.

Thus, the correspondence of LEDs to food amounts resembled

the correspondence of points to money amounts in human











studies. Finally, if changing the exchange schedule was

analogous to the manipulation of temporal variables, (T, as

discussed above), then under conditions with choices between

1 immediate LED and 3 delayed LEDs, preference for the

larger delayed reinforcer (3 LEDs) might be expected to

increase as the ratio of trials to exchange periods

increased, that is, as choice responses became increasingly

remote from food availability.



Method

Subjects

Six experimentally naive male White Carneau pigeons

(Columba livia) served as subjects. All subjects were

individually housed with water and health grit continuously

available. Subjects were maintained at 80% of their

laboratory free-feeding weight.

Apparatus

A standard 3-key pigeon chamber (Lehigh Valley) with a

modified stimulus panel served as the experimental space. A

minimum force of 0.14 N was required to activate either side

key and a minimum force of 0.12 N activated the center key.

Thirty-four red light-emitting diodes (LEDs) were recessed

in the panel, forming a horizontal row 5 cm below the

ceiling and 0.7 cm below the houselight fixture (see Figure

2). The LEDs were evenly spaced and centered 1.7 cm from

each end of the panel. Unless otherwise indicated, onset of











LEDs always proceeded sequentially from left to right with

each onset accompanied by a brief tone. Offset of LEDs

always proceeded sequentially from right to left. When

operative, the left, center, and right keys were illuminated

green, red, and blue respectively. Primary reinforcement

consisted of access to mixed grain through the stimulus

panel reinforcement aperture. During food delivery, all

keylights and the houselight were dark and an orange light

above the feeder was illuminated. White noise was present

in the experimental room to mask extraneous sounds.

Experimental contingencies were scheduled and recorded by an

IBM 286-compatible computer with MED-PC software.

Procedure

Each subject was first exposed to a one hour session of

adaptation with the houselight and all LEDs illuminated but

no other programmed contingencies in effect. During

magazine training and exchange keypeck shaping, the number

of illuminated LEDs corresponded to the number of food

deliveries available. Magazine training sessions began with

the simultaneous illumination of the left-most 17 LEDs, the

white houselight, and the red center (exchange) key.

Intermittent hopper presentations were controlled by a hand

held switch. When operated, the switch turned off 1 LED and

0.5 s later produced food. Alternate switch operations

withdrew the hopper. Magazine training ended when the











subject ate readily from the feeder for at least five

consecutive food deliveries.

Exchange-keypeck shaping began with the same stimulus

conditions as magazine training. Successive approximations

to keypecks on the center (exchange) key produced offset of

1 LED, followed 0.5 s later by a 2-s food delivery. Once a

keypeck (exchange response) occurred, each remaining food

delivery of the session required a single peck on the

illuminated exchange key. All subjects were then exposed to

two sessions of 34 LED exchanges each, with the same

contingencies on the exchange key.

Choice-key training began with the illumination of the

houselight and one choice key (left or right). Each subject

was exposed to two sessions of 34 food deliveries each, with

a different choice key available in each session. A single

peck on the illuminated choice key turned off the key and

turned on 1 LED, followed 0.1 s later by an exchange period,

signaled by illumination of the exchange key. A single peck

on the exchange key turned off the key and 1 LED, followed

0.5 s later by 2 s of food. Throughout the experiment,

exchange periods remained in effect until all illuminated

LEDs were exchanged. For one subject (1857), who did not

peck the choice key after 180 minutes in the chamber,

pecking was established by reinforcing successive

approximations with the onset of an LED followed by the

exchange period.











Throughout the remainder of the experiment, two

sessions were scheduled daily, five days per week, with a

5-min blackout between sessions. Each session consisted of

12 discrete trials, each beginning 60 s from the onset of

the preceding trial, excluding exchange periods. Failure to

respond for 45 s on a given trial delayed the onset of the

next trial an additional 60 s. During the intertrial

interval (ITI) the houselight and all keylights were dark.

The first two trials of each session were forced

exposure trials, designed to bring behavior into contact

with the consequences programmed on both keys. The key

available on the first trial (left or right) was determined

randomly with a probability of .5; the alternate choice key

was automatically illuminated during the second trial. The

contingencies correlated with the illuminated key on forced-

choice trials corresponded to those in effect on choice

trials.

Choice trials began with the illumination of the

houselight and both side (choice) keys. A single peck on

either side key (choice response) darkened both keys and

produced the associated consequences, the illumination of

either 1 or 3 LEDs. Large-reinforcer choices resulted in

the illumination of 3 LEDs--1 immediate, the other 2 spaced

0.6 s apart. Thus, it took 1.2 s to deliver 3 LEDs. Small-

reinforcer choices resulted in the immediate illumination of

1 LED.









40

All subjects were initially exposed to a choice between

1 and 3 LEDs, scheduled "immediately", with an exchange

period following each trial (designated condition 1). The

large reinforcer was arbitrarily assigned to the left key

for three subjects and to the right key for the other three

subjects (Table 1). This assignment was constant throughout

the experiment. When scheduled, exchange periods always

began 0.1 s after the last LED presentation. Thus, exchange

periods followed small-reinforcer (1 LED) choices by 0.1 s

and large-reinforcer (3 LED) choices by 1.3 s.1

Next, subjects were randomly divided into two groups of

three pigeons each. For Group A, large-reinforcer choices

produced 3 LEDs after a 6-s delay (condition lD). The ratio

of choice trials to exchange opportunities was then

increased to 2:1, 5:1, and 10:1, across conditions 2D, 5D,

and 10D, respectively. For Group B, the ratio of trials to

exchange periods was first increased from 1:1 to 2:1 to 5:1

to 10:1, before adding the 6-s delay to the large reinforcer

in the final condition (10D). Figure 3 shows the sequence

of events following large- and small-reinforcer choices.


1 The LEDs are spoken of in terms of reinforcement
recognizing that strict behavior analytic criteria for doing
so have not been met. This is done on the basis of formal
similarities between the scheduling of LEDs here and the
scheduling of reinforcing consequences in other studies and
for convenience when discussing and evaluating the role of
LEDs in the current experiment; this is consistent with
discussions of analogous consequences in human operant
studies.











LEDs remained illuminated during the ITI after trials with

no scheduled exchange period. Whenever the ratio of choice

trials to exchange periods was greater than 1:1, only the

second forced trial was followed with an exchange period.

Table 1 summarizes the experimental conditions, order of

exposure, and number of sessions for all subjects.

Experimental phases were in effect for at least 20

sessions and until the following stability criteria were

met: (a) no trends evident in the number of choices

allocated to either alternative over the last 10 sessions

and (b) the number of choices of either option during the

last 5 sessions not outside the range of values obtained

during all previous sessions. Conditions were changed

arbitrarily if these criteria were not met in 80 sessions.



Results

Figure 4 shows the number of large-reinforcer choices

across all experimental conditions. Data from Group A are

displayed in the left panel and Group B in the right. The

bars are means from the last 10 sessions of each condition;

vertical lines show the range of values used to determine

the means. Because a session consisted of 10 trials, a

value above 5 generally indicates preference for the large

reinforcer, whereas a value below 5 indicates preference for

the small reinforcer. A mean value between 4 and 6, with a











range that extends above and below 5, indicates

indifference.

Condition 1, with no delay to small or large

reinforcers, resulted in strong preference for the large

reinforcer in 5 of 6 subjects; only Subject 1857 (Group A)

preferred the small reinforcer. For the other 2 subjects in

Group A (747 and 1383), preference reversed in favor of the

small reinforcer when the large reinforcer was delayed 6 s

in condition lD. Large-reinforcer choices also decreased

for Subject 1857 during this phase. All three subjects in

Group A preferred the immediate reinforcer across phases lD,

2D, 5D, and 10D. This preference was generally strong, with

an average of less than 2 large-reinforcer choices per

session, except during phase 2D in which the number of

large-reinforcer choices was somewhat elevated for Subjects

1857 and 1383.

For subjects in Group B, scheduling the exchange period

every second choice trial reduced the number of large-

reinforcer choices for Subjects 1732 and 1855 but not for

Subject 753. Further increases in the number of trials per

exchange period during conditions 5 and 10 shifted

preference in favor of the small reinforcer for Subjects

1855 and 753. The magnitude of this effect was greatest in

Subject 753, who in the previous two conditions chose the

large reinforcer on nearly all trials. For Subject 1732,

preference for the large reinforcer was recovered during











conditions 5 and 10 but reversed in favor of the small

reinforcer when a delay to the large reinforcer was added in

condition 10D. This added delay also resulted in fewer

large-reinforcer choices for Subject 753. In Subject 1855

the number of large-reinforcer choices increased slightly

during this condition, resulting in approximate

indifference.

Figure 5 shows within-session choice patterns. The

relative frequency of large-reinforcer choices is plotted

across trials preceding scheduled exchange periods over the

final 10 sessions of each condition. Only data from

conditions in which exchange periods occurred after two or

more trials are shown. As before, proportions above .5

indicate preference for the large reinforcer and proportions

below .5 indicate preference for the small reinforcer.

For subjects in Group A (left panels), the greatest

proportion of large-reinforcer choices occurred during the

1st trial of the block of trials preceding exchange periods.

This was consistent across subjects and conditions, except

for Subject 747 during condition 10D in which the proportion

of large-reinforcer choices varied unsystematically across

the 10 trials. The most pronounced differential control of

large-reinforcer choices by trial position occurred during

condition 2D in Subject 1383: the proportion of large

reinforcers chosen was .82 in the 1st trial but zero during

the 2nd trial of the block. For all subjects in Group A,











during conditions 2D and 5D the proportion of large-

reinforcer choices was greatest during the 1st trial and

dropped to zero or near zero levels during the remaining

trials) of a block. During condition 10D, except for

Subject 747, the probability of a large-reinforcer choice

decreased across trials, reaching a level of zero during the

latter trials of the block.

Similar, though less pronounced, effects occurred with

subjects in Group B (right panels). The relative number of

large-reinforcer choices was greatest during the initial

trial of the block in 8 of 12 cases for the three subjects

and decreased to lower levels across remaining trials.

Figure 6 shows average choice latencies during

conditions in which exchange periods were scheduled after

two or more trials, from the last 10 sessions of each

condition. Latencies for subjects in Group A and B are

displayed in left and right panels, respectively. Note that

the Y axes are scaled individually to accommodate between

subject differences in latencies. Open symbols represent

latencies for large-reinforcer choices and filled symbols

for small-reinforcer choices. The absence of a data point

for either choice denotes conditions in which choices of

that type did not occur.

In 38 of 40 cases across subjects, latencies were

longest during the 1st trial of a block, decreasing across

trials. The Ist-trial latencies also tended to be longer as











the number of trials per exchange period was increased

across conditions. This effect was clearest for Subject

1857, in which 1st trial latencies were shortest during

condition 2D, somewhat longer during condition 5D, and

longest during condition 10D. With one exception (the 2nd

trial of condition 10D for Subject 1732), the longest

latency for each subject occurred on the 1st trial in

conditions with exchange periods scheduled every 10th trial.

Subjects 747 (condition 10D) and 1855 (condition 10)

regularly had 1st trial choice latencies longer than 45 s,

which postponed the onset of the 2nd trial. No trend was

evident in these latencies and they did not systematically

relate to choice.



Discussion

In Experiment 1, pigeons' choices were assessed in a

self-control arrangement with token-like reinforcers.

Despite the procedural similarities of this arrangement with

typical human procedures, the overall results of Experiment

1 support previous findings with pigeons (Logue et al.,

1984; Mazur & Logue, 1978) rather than with humans (Logue et

al., 1986). That is, subjects usually responded

impulsively, preferring the small immediate reinforcer over

the large delayed reinforcer (Figure 4). Such impulsive

responding is consistent with the matching law applied to

LED reinforcement.











Figure 7 shows matching-law predictions of the number

of large-reinforcer choices based on LED reinforcement. To

obtain meaningful predictions a delay of .01 s was used

instead of 0 s when reinforcement delivery was immediate.

Thus, both DL and Ds were .01 s when neither reinforcer was

delayed (no delay); when the large reinforcer was delayed, a

value of 6 s was used for D The reinforcer amounts used

to calculate predicted values were 3 (AL) and 1 (As) for the

large and small reinforcers, respectively (3 or 1 LEDs).

The relative number of large-reinforcer choices predicted by

the matching law was first calculated, then multiplied by 10

to obtain the predicted number of large-reinforcer choices

out of 10 trials. The matching-law predictions correspond

very well to obtained data from conditions in which the

large reinforcer was delayed (lD, 2D, 5D, and 10D in Figure

4). Here, in 14 of 15 cases the small reinforcer was

preferred; the only exception was the indifferent responding

of Subject 1855 under condition 10D. The matching-law

predictions correspond less well to obtained data from

conditions without a reinforcer delay (1, 2, 5, and 10).

Under these conditions the large reinforcer was preferred in

only 8 of 15 cases. Six of the 7 exceptions were from

conditions for Group B subjects in which the number of

trials per exchange period exceeded one. LED reinforcement

did not differ between these conditions, suggesting that











other factors were responsible for the lower number of

large-reinforcer choices.

LED reinforcement parameters also cannot account for

the within-session patterns of choice shown in Figure 5.

For example, during conditions 2 and 2D, a greater

proportion of large-reinforcer choices occurred on the 1st

trial of a block than the 2nd. In four subjects (1857,

1383, 1732, and 1855) the large reinforcer was preferred on

the 1st trial while the small reinforcer was preferred on

the 2nd trial. Also, during conditions in which exchange

periods were scheduled after 5 or 10 trials, especially for

subjects in Group A, the proportion of large reinforcers

chosen tended to be greatest during the 1st trial, often

shifting downward abruptly from the 1st to the 2nd trial.

The within-session pattern of choices under conditions

2 and 2D is more consistent with the predictions of the

ideal matching law applied to food parameters, than to LED

reinforcement. Figure 8 shows matching-law predictions

(based on food reinforcement) of the relative number of

large-reinforcer choices for each trial in the block

preceding exchange periods under all experimental

conditions.2 Predictions for Group A are shown in the top

graph and for Group B in the bottom. The displayed values

are based on food amounts and delays. The amounts used to

calculate these values are based on the total amount

(seconds) of food available during the exchange period









48

following all trials of a block. The delay values are based

on the minimum delays to the first food delivery of an

exchange period, excluding choice response and exchange

response latencies. For each experimental condition the

food-delay values on the trial immediately preceding an

exchange period, when LED reinforcement is immediate for

both options, are 1.8 s and 0.6 s for large- and small-

reinforcer choices, respectively. When the large reinforcer

is delayed, the food delay values are 7.8 s and 0.6 s for

large- and small-reinforcer choices, respectively. Because

on all trials except the trial just prior to an exchange

period the delays to food are equal for both large- and

small-reinforcer choices, the matching-law predictions

across these trials are determined solely by amount of food

ratios. Table 2 shows the amount of food values used in

calculating the proportions displayed in Figure 8 and the

results of all calculations. The effect of a choice in a

given trial on the relative amount of food obtained in a

subsequent exchange period, depends on reinforcer choices

during all other trials of the block. For this reason, the

predicted proportion of large-reinforcer choices for each


2 Although there is no precedent for applying the
matching law to food parameters in an arrangement like the
one here, the matching law should have relevance to the
present data. The method of application described and
presented here was selected on rational but also pragmatic
grounds--it yielded results that were consistent with the
obtained choice in the current experiments.











trial number was determined by first calculating the ratio

assuming exclusively small-reinforcer choices for all other

trials of the block and then calculating the ratio assuming

exclusively large-reinforcer choices for all other trials.

These two ratios establish the range of predictions for a

given trial number under a particular experimental

condition. The shapes of the obtained functions in both

calculations were the same and the magnitude of difference

between the two values on any trial was always small. The

ratios were therefore averaged to obtain the values

displayed in Figure 8 (see the Appendix for complete

calculation examples). Because the experimental procedure

involved concurrent fixed-ratio 1 schedules, these

predictions were not expected to provide precise estimates

of choice response ratios but rather to predict the

direction of preference; the obtained choice ratios would

thus be expected to be more extreme than illustrated.

During conditions 1 and 1D an exchange period occurred

after each trial so predictions are plotted only for trial

1, represented by the symbols "1" and "ID". Figure 8 shows

that the ideal matching law applied to condition 1 predicts

indifference between the small and the large reinforcer.

Figure 4, however, shows that 5 of 6 subjects strongly

preferred the large reinforcer under this condition, whereas

Subject 1857 preferred the small reinforcer. The matching

law predicts strong preference for the small reinforcer











under condition 1D for Group A (top graph of Figure 8),

which is in accord with the obtained preferences (see Figure

4). When the exchange period is scheduled after two trials,

the matching law predicts preference for the large

reinforcer on the 1st trial of the block and the small

reinforcer on the 2nd. The data shown in Figure 5 are in

qualitative, and sometimes quantitative, agreement with

these predictions. As predicted, Subjects 1857 and 1383

(left panel), and 1732 and 1855 (right panel), preferred the

large reinforcer on the 1st trial of a block and the small

reinforcer on the 2nd trial of a block of trials preceding

an exchange period. Subject 747 (left panel) preferred the

small reinforcer on both trials but a greater proportion of

large-reinforcer choices occurred on the 1st trial than the

2nd trial, yielding a curve in the direction predicted by

the matching law. Data from Subject 753 (right panel) did

not correspond to matching-law predictions; equally strong

preference for the large reinforcer was exhibited during

both the 1st and 2nd trials of condition 2.

Predictions of the matching law were less accurate

under conditions in which an exchange period was scheduled

following 5 or 10 trials. Preference for the large

reinforcer is predicted across all but the final trial of a

block, at which point the proportion of large-reinforcer

choices is predicted to drop steeply below .5. Instead, the

proportion of large reinforcers chosen tended to decrease











across trials of a block, often shifting downward abruptly

from the 1st to the 2nd trial (see Figure 5).

Given the well established sensitivity of pigeons'

choices to even small differences in delays to food, it is

not surprising that unequal delays to food also affected

responding in the current experiment. In fact, small

differences in delays to food in Experiment 1 may have

precluded a clear assessment of choices maintained by LED

reinforcement. For example, as described earlier, on choice

trials immediately preceding exchange periods, under

conditions in which the large reinforcer was delayed, the

minimum delays to food were 0.6 s following small-reinforcer

choices but were 7.8 s following large-reinforcer choices.

Similarly, minimum delays to food on trials immediately

preceding an exchange period were 1.8 s and 0.6 s for large

and small-reinforcer choices, respectively, when LED

reinforcement was immediate for both options. These

different delays to food were a joint function of exchange

periods scheduled immediately after LED presentation, the

additional time taken to illuminate three LEDs in succession

following large-reinforcer choices, and the added delay to

the large reinforcer under conditions in which LED

presentation was delayed. The ideal matching law,

established largely on the basis of pigeons' choices under

food reinforcement schedules, applied to the choices in

Experiment 1 with food parameters, predicts preference for











the small reinforcer on trials immediately preceding the

exchange period, whenever exchange periods are scheduled

after two or more trials (Figure 8). Thus, at least on the

final trial of a block, food-reinforcement parameters would

be expected to have had a greater influence on choices than

the subordinate LED arrangements. This interpretation is

consistent with the choice patterns usually observed under

conditions 2 and 2D (Figure 5).

Under these conditions, differences between the 1st and

2nd trials in delays to food for the two choice responses

and in stimulus conditions, provided a basis for

discriminative control of choice. On the 1st trial, with no

LEDs illuminated, most subjects preferred the large

reinforcer. On the 2nd trial, when at least one illuminated

LED was always present and food was obtained sooner

following a small-reinforcer choice, most subjects preferred

the small reinforcer. Together, these results are in accord

with the predictions of the ideal matching law applied to

food delays (Figure 8) and extend the generality of previous

findings regarding the importance of food reinforcer delays

in controlling choice in pigeons.

Although food-based ideal matching-law predictions

corresponded less well to results from conditions in which

exchange periods occurred after 5 or 10 trials (Figures 3

and 6), stimulus generalization, based largely on the

presence of illuminated LEDs, might account for some of











these discrepancies between performances and matching-law

predictions. Recall that the presence or absence of LEDs

distinguished the 1st and 2nd trials of conditions 2 and 2D.

During subsequent conditions, when exchange periods were

scheduled after 5 or 10 trials, LEDs were illuminated on all

trials except the 1st trial of a block. The greater number

of large-reinforcer choices on the 1st trial of a block

occurred in the absence of illuminated LEDs, a situation

correlated with no differential delays to food. On the

final trial of a block, with LEDs present, food delays

favored small-reinforcer choices. Such control of small-

reinforcer choices may have generalized across earlier

trials, with LEDs present, resulting in fewer large-

reinforcer choices than predicted by the ideal matching law.

The latency data displayed in Figure 6 also support the

view that the presence or absence of LEDs contributed to the

choice patterns. For both large- and small-reinforcer

choices, latencies were generally longest during the 1st

trial of a block, the trial most temporally remote from

food, and the trial on which the large reinforcer was most

preferred. With the exception of Subject 1732, latencies

were short and nearly equal across the remaining trials in

which illuminated LEDs were always present. For Subject

1732, latencies tended to decrease across trials, apparently

under control of the increasing proximity to food delivery

and perhaps the increasing number of LEDs. That this











pattern did not occur in the other five subjects suggests

that control by presence or absence of LEDs was greater than

control by increasing numbers of LEDs or by temporal

proximity to food.

Interestingly, the differential preference for the

large reinforcer on the 1st trial in the current experiment

may be viewed as a kind of self-control, although it is not

clear from the present results if LEDs or food deliveries

should be treated as the effective reinforcers. Of course,

both LED and food parameters may have been relevant. The

relative influence of these reinforcement variables was

assessed in Experiment 2.










L s -Houselight
LEDs 10
OOOOOOOOOOOOOOOOOOOOOOO0000000000000000000000000000000000


O
choice
key


O
exchange
key


O
choice
key


FEEDER ->
OPENING


A diagram of the stimulus panel with the LEDs.


Figure 2:












Conditions 1, 2, 5, and 10


Large Choice

Choice keys off
1 LED on


0.6 s


1 LED on
1
0.6 s
I


Small Choice


Choice keys off
1 LED on

0.1 s
Exchange Period or ITI
Exchange Period or ITI


1 LED on
1
0.1 s

Exchange Period or ITI


Conditions ID, 2D, 5D, and 10D


Large Choice

Choice keys off
I
6 s
1
1 LED on


Small Choice
I
Choice keys off
1 LED on


0.1 s

Exchange Period or ITI


0.6 s
i
1 LED on
1
0.6 s

1 LED on
I
0.1 s

Exchange Period or ITI


Figure 3: The sequence of events following large- and
small-reinforcer choices during conditions with (bottom
panel) and without (top panel) a large-reinforcer delay.































Figure 4. The number of large-reinforcer choices per
session across experimental conditions. Data from Group A
subjects are shown in the left panel and Group B subjects in
the right panel. Values are means from the last 10 sessions
of each condition. Open bars indicate no delay to the large
reinforcer (3 LEDs). Striped bars indicate a 6-s delay to
the large reinforcer. Vertical lines show the range of
values used to determine the mean.










GROUP A


GROUP


BIRD 1732


BIRD 1857


NO DELAY
6-S DELAY


SI
101




8 RD
'Or


1 1D 20 5D 10D




BIRD 47







1 1D 2D 5D 10D


9 5 10 10D


1855





^ n I
I'


1 2 5 10 10D


BIRD 1383


1 1D 2D 5D 10D


EXPERIMENTAL


BIRD 753

i

[,


1 2 5 10 10D


CONDUIT ONS


n^I
































Figure 5. The proportion of large-reinforcer choices at
each trial number of a block of trials preceding exchange
periods. Values are derived from choice trials during the
last 10 sessions of each experimental condition where an
exchange period occurred after two or more trials. Data
from Group A subjects are shown in the left panel and data
from Group B subjects are displayed in the right panel.










GROUP A
BIRD 1857
0 2D
A 5D
S10OD


1.0



S0.5



S0.0
-7


BIRD
-A


GROUP B ou
1732
A-X^-V-V-V-V-V


2

10
10D


1 2 3 4 5 6 7 8


9 10


BIRD 747









1 3 4 5 7 9
1 2 3 4 5 6 7 8 9 10


BIRD 1383


BIRD 1855
1.0 r-

@ \

0. \




0.0 I
1 2 3 4 5 6 7 8 9 10


BIRD 753
0-0







1 J L V


1 2 3 4 5 6 7 8 9 10


< 1.0




+0.5

Li
0.0
C o.o






1.0




S0.5



0.0


TRIAL NUMBER


1 2 3 4 5 6 7 8 9 10


1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10


j i II I


I


\x i
o o-o


/


Pp
































Figure 6. Average choice latencies during conditions where
exchange periods were scheduled after two or more trials.
Values were derived from choice trials during the last 10
sessions of each experimental condition. Open symbols
represent latencies for large-reinforcer choices and filled
symbols indicate small-reinforcer choices. Data from Group
A subjects are shown in the left panel and data from Group B
subjects are displayed in the right panel.










GROUP A


BIRD 1857


GROUP B


BIRD 1732


LG
L 2D
5D
O 10D


1 2 3 4 5


I I 8 10
6 7 8 9 10


BIRD 747
EO




1 2 4 5\



1 2 3 4 5 6 7 8 9 10


BIRD 1383


I I 3 4I I I I I I
1 2 3 4 5 6 7 8 9 10


15


LI- 'M
2 -
5 A
10 V
O 10D 0


1 2 3 49 I I I I I
1 2 3 4 5 6 7 8 9 10


BIRD 1855

-V





148

1 2 3 4 5 6 7 8 9 10


BIRD 753

O


10


5


0 lI4 jl lr z i
1 2 3 4 5 6 7 8 9 10


TRIAL NUMBER


>-h
Z
0


o

()


z
w
H-
-j


ILJ


0


10 r









MATCHING LAW PREDICTIONS
LED REINFORCEMENT


NO DELAY


DELAY


EXPERIMENTAL


CONDITIONS


Figure 7. The number of large-reinforcer choices predicted
by the matching law when the large reinforcer (3 LEDs) is
delivered with no delay (open bar) or with a 6-s delay
(black bar). Values are based on the matching law applied
to the amounts and delays of LED reinforcement.
Calculations are described in the text.

















MATCHING LAW PREDICTIONS


GROUP A
O 2D
A 5D


1 10D


10
o 0


1 2I 3
1 2 3 4


5 6 7I I I 1
5 6 7 8 9 10


LI
0
ry

.J


LJ

Q^


GROUP B


2
5
10
10D


I ~ F I I


_-- '- i i r I I I i i
1 2 3 4 5 6 7 8 9 10

TRIAL NUMBER


Figure 8. The proportion of large-reinforcer choices
predicted by the matching law applied to food reinforcement
for each trial number of a block of trials preceding
exchange periods. Group A data are displayed in the top
graph and Group B data in the bottom graph. The symbols 1
and ID represent predictions based on conditions 1 and lD.












TABLE 1

The experimental conditions, order of exposure, and
number of sessions for all subjects in Experiment 1.
Group A histories are summarized in the top panel and
Group B in the bottom. The key assigned to the large
reinforcer is indicated below each bird number.


Delay to
Trials Per Large
Experimental Exchange Reinforcer Number of
Condition Period (seconds) Sessions

Bird Bird Bird
1857 747 1383
Group A right left left

1 1 0 30 26 20

1D 1 6 20 20 20

2D 2 6 36 28 30

5D 5 6 20 20 20

10D 10 6 22 32 40


Bird Bird Bird
1732 1855 753
Group B right right left

1 1 0 32 21 42

2 2 0 60 50 32

5 5 0 78 22 58

10 10 0 80 32 36

10D 10 6 38 28 26












TABLE 2

The amount of reinforcement values and relative number of
large-reinforcer choices predicted by the ideal matching law
for each trial number of all experimental conditions in
Experiment 1. Values are based on food reinforcement. When
the same trial number is listed twice, the top listing shows
values when the small reinforcer is chosen on all other
trials and the bottom listing shows values when the large
reinforcer is chosen on all other trials. The mean values
displayed are the average of these two calculations for each
trial and correspond to the values plotted in Figure 8. The
food delay values are described in the text above.


Experimental Large
Condition Trial AL As Large + Small Mean

1 1 6 2 .500 .500

1D 1 6 2 .188 .188

2 1 8 4 .667 .634
1 12 8 .600
2 8 4 .400 .367
2 12 8 .333

2D 1 8 4 .667 .634
1 12 8 .600
2 8 4 .133 .118
2 12 8 .103

5 1-4 14 10 .583 .560
1-4 30 26 .536
5 14 10 .318 .298
5 30 26 .278

5D 1-4 14 10 .583 .560
1-4 30 26 .536
5 14 10 .097 .090
5 30 26 .082

10 1-9 24 20 .545 .531
1-9 60 56 .517
10 24 20 .286 .275
10 60 56 .263

10D 1-9 24 20 .545 .531
1-9 60 56 .517
10 24 20 .085 .081
10 60 56 .076















EXPERIMENT 2


The purpose of Experiment 1 was to clarify the role of

token reinforcement in accounting for previously reported

differences in the choices of pigeons and humans.

Unfortunately, the scheduling of exchange periods allowed

for food delays to differ for the 2 choices, which may have

prevented a clear assessment of the relative influence of

LED versus food reinforcement. To distinguish these

separate sources of reinforcement, the major manipulations

of Experiment 1 were replicated in Experiment 2 in the same

subjects, with delays to food from either choice response

equal under most conditions but unequal in others.



Method

Subjects and Apparatus

The pigeons from Experiment 1 served as experimental

subjects. Housing, feeding arrangements, and apparatus were

the same as in Experiment 1.

Procedure

Group and choice key reinforcer assignments were the

same as in Experiment 1. All subjects were initially

exposed to an arrangement similar to condition 1 of









68

Experiment 1 (also designated condition 1), except that the

exchange period occurred 1.5 s from either choice response.

For subjects in Group A, the large reinforcer was then

delayed by 6 s; the exchange period thus occurred 7.5 s

after a large-reinforcer choice (Dl). Beginning with the

next condition (ED1), exchange periods were scheduled 9.5 s

from either choice, with no change in LED presentation.

Finally, the number of trials per exchange period was

increased across conditions (designated ED2, ED5, and ED10),

as in Experiment 1.

After condition 1, Group B subjects were first exposed

to increases in the ratio of trials to exchange periods

across conditions 2, 5, and 10. Then a 6-s delay was added

to the large-reinforcer choice (D10). Under this condition,

the exchange period occurred 7.5 s after a large-reinforcer

choice but still only 1.5 s after a small-reinforcer choice.

In the next condition (ED10), the exchange period was

scheduled 9.5 s from either choice. All subjects in Group B

and Subject 1857 from Group A, were next exposed to a

reversal of contingencies on the choice keys (RED10),

followed by a return to the original contingencies (ED10).

Experiment 2 conditions are summarized in Table 3. Figure 9

shows the sequence of events following large- and small-

reinforcer choices during conditions with (bottom panel) and

without (top panel) a large-reinforcer delay. The values











from conditions D1 and D10 are shown in parenthesis above

the exchange period in the bottom panel.

Results

Figure 10 shows the number of large-reinforcer choices

across all experimental conditions for subjects in both

groups. All subjects strongly preferred the large

reinforcer in condition 1 in which neither reinforcer was

delayed and the exchange period occurred 1.5 s after each

choice. For 2 subjects in Group A (1857 and 747),

preference reversed in favor of the small reinforcer when

the large reinforcer was delayed by 6 s and the exchange

period occurred 7.5 s after a large-reinforcer choice (Dl).

Subject 1383's performance was less sensitive to this

change, as only a small decrease in large-reinforcer choices

occurred. During condition ED1, in which the exchange

period was scheduled 9.5 s from either choice, preference

for the large reinforcer was recovered in Subject 1857 but

not in 747. Subject 1383 continued to prefer the large

reinforcer during this condition.

Increasing the ratio of trials to exchange produced

different results between subjects in Group A. The number

of large-reinforcer choices decreased for Subject 1857,

resulting in indifference during conditions ED2 and ED5,

before increasing slightly during condition ED10.

Preference for the large reinforcer became stronger after

reversing the keys (RED10) and was maintained when the











original contingencies were reinstated in the final

condition. Subject 1857 was the only subject in Group A to

prefer the delayed large reinforcer in the terminal

arrangement, in which a single exchange period was scheduled

at the end of each session. Subject 747 was roughly

indifferent during conditions ED2 and ED5 and preferred the

small reinforcer during condition ED10. Subject 1383

preferred the small reinforcer across conditions ED2, ED5,

and ED10.

All three subjects in Group B exhibited self-control in

the terminal arrangement in which the large reinforcer was

delayed, exchange periods were scheduled after 10 trials,

and there was an equal delay to the exchange period from

either choice (conditions ED10 and the key contingency

reversal condition, RED10). Subjects 1732 and 1855 of Group

B preferred the large reinforcer across all conditions, even

when responding impulsively resulted in quicker access to

the exchange period (D10). For Subject 753, preference for

the large reinforcer decreased as the ratio of trials to

exchange period was increased across conditions 2, 5, and

10, resulting in indifference during the latter two

conditions. Preference shifted dramatically to the small

reinforcer when the large reinforcer was delayed (condition

D10) but then reversed sharply in favor of the large

reinforcer in condition ED10, in which the time to the

exchange period from either choice was increased to 9.5 s.









71

Preference for the large reinforcer was maintained over the

next two conditions in which the keys were reversed (RED10)

and then returned (ED10).

Figure 11 shows within-session choice patterns for

subjects in both groups. The relative frequency of large-

reinforcer choices is plotted across trials preceding

scheduled exchange periods. For Subjects 1732 and 1855, who

preferred the larger reinforcer across all conditions, the

relative frequency of large-reinforcer choices was high and

invariant across trial number. For the other 4 subjects no

general within-session choice patterns were observed.

Responding usually varied unsystematically across trials of

a block, although large-reinforcer choices tended to

decrease across trials for Subjects 1857 and 753.

Figure 12 shows average choice latencies during

conditions in which exchange periods were scheduled after

two or more trials, from the last 10 sessions of each

condition. Only latencies for the preferred option of each

condition are shown, except for Subject 1857 under condition

ED5 in which each option was chosen equally often, but only

large-reinforcer choice latencies are shown. Latencies from

the omitted option did not differ systematically from

preferred option latencies and showed the same general

trends.

In 24 of 32 cases across subjects, latencies were

longest during the 1st trial of a block and tended to











decrease across trials. Sometimes latencies dropped

abruptly from the 1st to the 2nd trial, as in condition ED5

for Subject 1857, condition ED5 and ED10 for Subject 747,

and condition 5 for Subjects 1732 and 1855. For Subject 747

this decrease was followed by an increase in latencies

across remaining trials, although latencies remained well

below their 1st-trial values. In 4 of 6 subjects the

longest average latency occurred during the 1st trial under

a condition in which the exchange period was scheduled after

10 trials but 1st-trial latencies did not generally increase

as the number of trials per exchange period was increased.

With the exception of Subjects 1383 and 1732, choice

latencies regularly exceeded 45 s, especially during earlier

trials of a block, which postponed the start of subsequent

trials. No consistent trend occurred with these latencies

and they were not systematically related to choice patterns.



Discussion

In contrast to the results of Experiment 1, self-

control was obtained in 4 of the 6 subjects during the

terminal choice arrangement of Experiment 2 (Figure 10). In

both experiments, the matching law applied to LED

reinforcement predicts preference for the large reinforcer

when no delays are programmed for LED presentation but

preference for the small reinforcer whenever the large

reinforcer is delayed by 6 s (see Figure 7, Experiment 1).









73

These predictions do not differ between experiments, so they

cannot account for the obtained choice differences.

Moreover, in Experiment 2, LED based matching-law

predictions corresponded to choice responding in only 11 of

20 cases for Group A subjects and 11 of 24 cases in Group B

subjects (Figure 10). The primary exceptions to these

matching-law predictions in Experiment 2 were the high

number of large-reinforcer choices during conditions in

which presentation of the 3 LEDs was delayed.

The results of Experiment 2 and the choice differences

between the two experiments are more consistent with

matching-law predictions derived from food-schedule

parameters, which implicate programmed delays to food as

determinants of choice. Figure 13 shows matching-law

predictions, based on food-schedule parameters, of the

relative number of large-reinforcer choices for each trial

of a block of trials preceding scheduled exchange periods

under all experimental conditions of Experiment 2. Food was

delivered 0.5 s after an exchange response so the food

delays used in generating Figure 13 were 0.5 s greater than

the delays to the exchange period listed in Table 3.

Matching-law predictions were determined for each trial as

in Figure 8 and are also displayed in Table 4. In Figure

13, open bars represent values during conditions in which

predictions do not differ between trials. Under condition

D10, the coarsely striped bar indicates the predicted value











for each of the first 9 trials of the block and the finely

striped bar illustrates the matching-law prediction for the

10th trial. Values on a given trial depend partly on

choices during other trials of a block and error bars

indicate the range of predictions under all possible choice

patterns for these other trials. The upper and lower limits

of these bars indicate the predicted value when the small

reinforcer or large reinforcer, respectively, is chosen on

all other trials of a block. Under most conditions, the

delays from either choice to food are equal, so preference

for the large reinforcer is predicted; predictions also do

not differ between trials but are determined instead only by

differences in the effects of the two response options on

the amount of food obtained during the subsequent exchange

period. The predicted relative frequency of large-

reinforcer choices decreases as the number of trials per

exchange period increases, since the absolute amount of food

available in the exchange period increases, and the relative

effect of a single choice response on the amount of food in

the exchange period decreases. On all trials of condition

D1 and on the 10th trial of condition D10, minimum delays to

food differed between the two response options, 2 s for the

small-reinforcer choice and 8 s for the large-reinforcer

choice. This difference results in a prediction of

preference for the small reinforcer on every trial of

condition D1 and on the 10th trial of condition D10.











Excluding condition D10, choice responding was in

agreement with food-based matching-law predictions in 30 of

42 cases (Figure 10). Of the 12 exceptions to the matching

law, 11 involved fewer large-reinforcer choices than

predicted. In most of these cases, however, a greater

number of large-reinforcer choices occurred than is

characteristic of pigeons in the more typical self-control

arrangements (e.g., Mazur & Logue, 1978). For example,

neither Subject 1857 nor 747 preferred the large reinforcer

during conditions ED2 and ED5 (Group A, left panel),

although both subjects chose the large reinforcer during

nearly half of the trials. There also were more large-

reinforcer choices during these conditions than in similar

conditions (2D and 5D) of Experiment 1 in which food delays

favored small-reinforcer choices (Figure 4). Similar

results occurred with Subject 753 (Group B), who chose the

large reinforcer more often during conditions 5 and 10 of

Experiment 2 (Figure 10) than in analogous conditions in

Experiment 1 (Figure 4). In all subjects, the fewer number

of large-reinforcer choices sometimes observed during

conditions in which exchange periods followed 2 or more

trials, as compared with condition 1, might also result from

the diminishing relative influence of choices on the amount

of food during the exchange period, as reflected in the

decreasing number of large-reinforcer choices predicted by









76

the matching law as the number of trials per exchange period

increases (Figure 13).

The other (12th) exception to predictions of the

matching law, involves the large-reinforcer preference

exhibited by Subject 1383 under condition D1 (Figure 10,

bottom left graph). The insensitivity to delay evidenced

here differed from the delay sensitivity apparent in the

choice patterns of the same subject during analogous

manipulations of Experiment 1 (Figure 4, condition ID).

Perhaps, as demonstrated in other experiments (e.g.,

Navarick & Fantino, 1976), the lower ratio of delays to food

from each choice in Experiment 2 accounts for this

difference. In condition lD of Experiment 1, the minimum

delays to food were 7.8 s from a large-reinforcer choice and

0.6 s from a small-reinforcer choice, yielding a delay ratio

of 13:1 (large choice delay:small choice delay). In

condition D1 of Experiment 2, the minimum delays to food

from these choices were 8 s and 2 s, respectively, yielding

a delay ratio of 4:1. The lower delay ratio in Experiment 2

leads to a quantitatively different matching-law prediction,

in a direction favoring large-reinforcer choices (compare

Figure 8, condition lD to Figure 13, condition Dl). Thus,

although the choice patterns of Subject 1383 during

condition D1 differ quantitatively from matching-law

predictions, the choice differences between condition D1 of











Experiment 2 and condition ID of Experiment 1, are in

qualitative agreement with the matching law.

The predominance of food delays over LED delays was

clearly demonstrated in the choice patterns of Subject 1857

(Figure 10). Preference for the small reinforcer occurred

during condition Dl, in which presentation of the 3 LEDs was

delayed 6 s and food could be obtained quicker by choosing

the small reinforcer. When the delays to food were equated

for both options during condition ED1, however, preference

reversed in favor of the large reinforcer, although LED

presentation continued to be delayed 6 s following large-

reinforcer choices.

The choice patterns of Subject 753 during conditions

D10 and ED10 (Figure 10, lower left graph) were also

strongly influenced by food delays. During condition D10, a

6-s delay was added to the presentation of LEDs following

large-reinforcer choices and delays to food also differed

between the two response options, so food could be obtained

quicker by choosing the small reinforcer on the 10th trial

of a block. The shorter delay to food results in a

matching-law prediction of preference for the small

reinforcer on the 10th trial (Figure 13). It was argued

earlier that similar delay differences in Experiment 1

produced impulsive responding on the final trial of a block,

a pattern which generalized across trials. Consistent with

this interpretation, Subject 753 showed exclusive preference









78

for the small reinforcer across the last 6 trials of a block

during condition D10 and the number of large-reinforcer

choices decreased across trials (Figure 11). The number of

large-reinforcer choices per session also decreased during

-ondition D10 (Figure 10). The importance of delays to

food, as opposed to delays in LED presentation, for this

subject was also dramatically illustrated by the increase in

large- reinforcer choices during condition ED10, in which

food delays were equated for the two options but the delay

associated with LED presentation was not changed.

Unlike Subject 753, Subjects 1732 and 1855 preferred

the large reinforcer during condition D10 (Figure 10).

Surprisingly, choices of Subject 1732 were ser itive to

differential delays for the two options during a similar

condition in Experiment 1 (Figure 4, condition 10D).

Perhaps, as discussed above for Subject 1383, the smaller

food-delay ratios of Experiment 2 account for the failure of

impulsive responding to develop during condition D10 for

this subject. The choices of Subject 1855, on the other

hand, were insensitive to food and LED delays associated

with large-reinforcer choices during the analogous

manipulation of Experiment 1 (condition 10D, Figure 4) and

showed apparently similar insensitivity during condition D10

of Experiment 2. However, a review of choice patterns

across all sessions of condition D10 of Experiment 2 (not

shown), revealed that this subject never chose the small









79

reinforcer on the 10th trial of a block and therefore did

not contact the shorter delay to food associated with the

small-reinforcer choice.











Conditions 1. 2. 5. and 10


Large Choice
i
Choice keys off
1 LED on

0.6 s


1 LED on
i
0.6 s
1
1 LED on


Small Choice
i
Choice keys off
1 LED on


1.5 s

Exchange Period or ITI


0.3 s

Exchange Period or ITI


Conditions ED1. ED2. ED5. ED10. Dl. and D10


Large Choice
i
Choice keys off

6 s


1 LED on

0.6 s
I
1 LED on

0.6 s
1 LED on
1 LED on


Small Choice


Choice keys off
1 LED on


9.5 s (1.5 S)
I
Exchange Period or ITI


2.3 s (0.3 S)
Exchange Period or ITI
Exchange Period or ITI


Figure 9: The sequence of events following large- and
small-reinforcer choices during conditions with (bottom
panel) and without (top panel) a large-reinforcer delay.
The values from conditions D1 and D10 are shown in
parenthesis above the exchange period in the bottom panel.
































Figure 10. The number of large-reinforcer choices per
session across all experimental conditions for both groups.
Graphing conventions are the same as in Figure 4, except the
finely striped bars indicate the first condition with the
large reinforcer delayed, the coarsely striped bars indicate
equal 10-s delays to the exchange period from either choice,
and the reversed coarse stripes indicate the key contingency
reversal (condition RED10).










GROUP


BIRD 1857


yr


^~ *I _
1 D1 ED1 ED2 ED ED RED ED
0 1 EDi E02 E05 EDlO REDlO EDlO


BIRD 747
10 -



5-



0 f
1 D1 ED1 ED2 ED5 ED10


BIRD 1383


1 D1 ED1 ED2 ED5 ED10


EXPERIMENTAL


GROUP B


BIRD 1732


1 2 5 10 D10 ED10 RED10 ED10


BIRD 1855
10



5




_-
1 ,jI

1 2 5 10 D10 EDIO REDIO EDO10


BIRD 753


1 2 5 10 D10 ED10 RED10 ED10


CONDITIONS

































Figure 11. The proportion of large-reinforcer choices at
each trial number of a block of trials preceding scheduled
exchange periods. Graphing conventions are as in Figure 5
except for the additional symbols and new experimental
conditions depicted in the figure keys. The key for Group A
is located in the lower left graph and for Group B in the
upper right graph.











GROUP A
BIRD 1857
A-A-A -*-**-*
- 2 ALp /

- eA A
- A


U
AL


1 I I I I 1 I I I I
1 2 3 4 5 6 7 8 9 10


GROUP B
BIRD 1732
1.0

1O7

0 A
0.5 -


02
S5o
V 10
D 010


I I I I3
1 2 3 4


* ED10
* RED10
A ED10


5 6 7 8 9 10


BIRD 747


El


A A
A

U


1 2 3 4 5


6 I 10
6 7 8 9 10


BIRD 1383
0 ED2
ED5
ED10
0 RED10
A ED10


0
u. u


1 2 3 4 5 6 7


8 9 10


BIRD 1855
1. 0 r -0 *, rO 9

V A D A,0 E7
A-* *\A7 AL

0.5 /'A-\4 -0




0.0 I I
1 2 3 4 5 6 7


e-e,
-0-2,
A'








S 9 10
8 9 10


BIRD 753


1 2 3 4 5 6 7 8 9 10


TRIAL NUMBER


0.5-

O


Cr)


0

0
I


ELL

-j





ELL

-J
































Figure 12. The average choice latency during conditions
where exchange periods were scheduled after two or more
trials. Values are derived from preferred choice trials
during the last 10 sessions of each experimental condition.
The axes are scaled individually for each subject. Other
graphing conventions are the same as in Figure 6 except for
the different symbol correspondences indicated in the figure
keys.











GROUP A


BIRD 1857


1 2 3 4 5 6 7 8 9 10


BIRD 747 LG SM
SA 0 ED2 0
A ED5 A
O ED10 U
S RED10
< 0 ED10

K /\U

Uf -*---.


S I I I 6
1 2 3 4 5 6


I I 10
7 8 9 10


BIRD 1383






1\A6


I 2 I I 6 7 8 I I
1 2 3 4 5 6 7 8 9 10


GROUP B


BIRD 1732
20
DL




5


1 2 3 4 5 6 7 8 9 10


BIRD 1855
130
120


LG SM
j 2
5 A


90 *L 10
80 DL D10 DS
70 -\ O ED10
60 \ / RED10
50 0 E ED10
40
30 2
20
10
o
1 2 3 4 5 6 7 8 9 10


BIRD 753


1 2 3 4 5 6 7 8 9 10


TRIAL NUMBER


GO 25


Z O

0



40


F-





-j



0


I-
(
















MATCH NG LAW PRED CT ONS (FOOD)


ALL TRIALS
___ TRIALS 1-9
I FINAL TRIAL


-1 :E_
- - - - - -


1 2 5 10
ED1 ED2 ED5 ED10


D1 D10


EXPERIMENTAL CONDITIONS




Figure 13. The proportion of large-reinforcer choices
predicted by the matching law applied to food reinforcement
for each trial of a block of trials preceding exchange
periods. Open bars represent values during conditions where
predictions do not differ between trials. Under condition
D10, the coarsely striped bar indicates the predicted value
for each of the first 9 trials of a block and the finely
striped bar illustrates the matching-law prediction for the
10th trial. Error bars during conditions with more than one
trial per exchange period indicate the range of predictions
under all possible choice patterns for other trials of a
block.












TABLE 3

The experimental conditions, order of exposure, and
number of sessions for all subjects in Experiment 2.
Group A conditions are summarized in the top panel and
Group B in the bottom panel.


Time from a Choice
Response to the Exchange
Experimental Period (Seconds) Number of
Condition Large Small Sessions

Bird Bird Bird
Group A 1857 747 1383

1 1.5 1.5 27 28 27

D1 1.5 7.5 50 22 44

ED1 9.5 9.5 60 24 28

ED2 9.5 9.5 32 34 64

ED5 9.5 9.5 47 30 70

ED10 9.5 9.5 77 21 46

RED10 9.5 9.5 42 -

ED10 9.5 9.5 26 -


Bird Bird Bird
Group B 1732 1855 753

1 1.5 1.5 28 26 30

2 1.5 1.5 24 20 90b

5 1.5 1.5 39 34 20

10 1.5 1.5 22 78 30

D10 1.5 7.5 33 22 42

ED10 9.5 9.5 26 80 30

RED10 9.5 9.5 20 80 36

ED10 9.5 9.5 40 34 106c











TABLE 3--continued

aThe numbers 1, 2, 5, and 10 refer to the number of
trials per exchange period. The letter D indicates a
6-s delay to the large reinforcer (3 LEDs). The letter
E indicates an equal delay of 9.5 s from either choice
response to a scheduled exchange period. The letter R
indicates that the contingencies were reversed for the
choice keys.

bThe choice key assignments were inadvertently
switched for two consecutive sessions and performance
was noticeably disrupted afterwards. The phase was
continued until stability criteria were met.

CPreference cycled between the large and small
reinforcer during most of the phase without meeting
stability criteria. At the 80th session a trend toward
the small reinforcer was evident and the phase was
continued until no trends were evident for 20
consecutive sessions.












TABLE 4

The relative number of large-reinforcer choices predicted by
the matching law for each trial number of all experimental
conditions in Experiment 2. Values are based on food
reinforcement. When there are two listings for the same
trialss, the top listing shows values when the small
reinforcer is chosen on all other trials and the bottom
listing shows values when the large reinforcer is chosen on
all other trials. The mean values displayed are the average
of these two calculations for each trial and correspond to
the values plotted in Figure 13. The food delay values are
described in the text, the amount of food values are the
same as in Experiment 1.


Experimental Large
Condition Trial Large + Small Mean

1 and ED1 1 .750 .750

2 and ED2 all .667 .634
.600

5 and ED5 all .583 .560
.536

10 and ED10 all .545 .531
.517

D1 1 .429 .429

D10 1-9 .545 .531
.517
10 .231 .221
.211















GENERAL DISCUSSION

All subjects chose the larger delayed reinforcer more

often in Experiment 2 than in Experiment 1. Self-control

increased in Experiment 2, consistent with predictions of

the matching law, primarily because the minimum delays to

food from choices were equated for both options during most

conditions. Together, these experiments confirm many

previous findings regarding the sensitivity of pigeons'

choices to delays in food presentation (e.g., Green et al.,

1981; Lea, 1979; Logue et al., 1984; Rachlin & Green, 1972).

When food delays were prevented from differentially

influencing choice, in the terminal condition (ED10) of

Experiment 2 that most resembles the typical human

procedure, 4 of 6 subjects (1857, 1732, 1855, and 753)

preferred the larger delayed reinforcer (Figure 10). The

levels of self-control observed in Experiment 2 are

comparable to those reported in a similar study with humans

(Logue et al., 1986) and those found in a previous

demonstration of self-control with pigeons involving delay-

fading histories (Mazur & Logue, 1978). Also, the

variability, within and between subjects, was well within

the range characteristic of similar studies (e.g., Logue &

Pena-Correal, 1984; Logue et al., 1984, Experiment 1; Logue











et al., 1986, Experiment 1; Mazur & Logue, 1978; Rachlin &

Green, 1972; van Haaren et al., 1988). The reliability of

these effects, in the 4 subjects showing the most self-

control, was further established by reversing the

contingencies on the choice keys. In all cases, subjects

continued to prefer the larger delayed reinforcer,

regardless of the key with which it was associated (Figure

10, conditions ED10 and RED10) ruling out key color and

position bias as alternative explanations. This

manipulation was important because 3 of the 4 subjects

exposed to the key reversals had prolonged recent histories

of preferring the same option; moreover, position and/or

color biases are especially common with concurrent FR1,

discrete-trials procedures like those used here (e.g., Logue

& Pena-Correal, 1984; Logue et al., 1984, Experiment 1; van

Haaren et al., 1988).

The self-control demonstrated in the present study may

be viewed in terms consistent with Skinner's (1953)

treatment of self-control. In Skinner's terms, choice of

the immediate small reinforcer might be considered the

controlled response and choice of the delayed larger

reinforcer the controlling response. In this case, choosing

the delayed reinforcer, as a form of self-control,

exemplifies the technique Skinner calls "doing something

else" (Skinner, 1953, p. 239). That is, choice of the

immediate smaller reinforcer is prevented by the emission of











an incompatible response (choice of the delayed larger

reinforcer). The process by which pigeons in the present

study came to exhibit self-control and acquire this

controlling response is worth considering.

One possibility is that choice of the 3 LEDs was

directly reinforced by the LEDs. Despite the present

finding that food delays affected choice more than did LED

delays, there are several reasons to suspect that the LEDs

did function as reinforcers. First, because the training

histories and LED arrangements in the present study closely

resemble the token reinforcer paradigm (Malagodi, 1967), it

is likely that the LEDs functioned as token reinforcers.

Although subjects in the present study did not directly

manipulate the LEDs, as do subjects in more typical token

reinforcement studies, it is not clear that such handling

enhances reinforcing efficacy. Also, the long latencies

characteristic of Ist-trial choices and latency reductions

once tokens were present (Experiment 1, Figure 6 and

Experiment 2, Figure 12) resemble previous findings with

token reinforcement (Kelleher, 1958; Malagodi et al., 1975;

Waddell et al., 1972; Webbe & Malagodi, 1978). Informal

observations revealed that all subjects did occasionally

orient toward the LEDs when they were presented and often

pecked at them during the ITI and prior to exchange periods.

Pecking is often elicited by conditioned stimuli paired with

food (Schwartz & Gamzu, 1977), stimuli that would also be









94

expected to have reinforcing properties (Gollub, 1977). LED

illumination might also be expected to function as

conditioned reinforcement because the accumulation of LEDs

was correlated with reductions in the delay to food

(Fantino, 1977).

Although the reinforcing function of the LEDs is not

certain, the precise function of the LEDs in the present

study is no more mysterious than the function of points

delivered in similar experiments with human subjects (e.g.,

Logue et al., 1986). Although these experiments do not

usually include clear functional assessments of point

delivery, points are often presumed to function as

reinforcers in humans, even in the absence of explicit

instructions. This is presumably because human subjects

typically have extensive histories with points and numbers

outside of the laboratory. These histories likely establish

precise discrimination in humans of more from less points

over a wide range of absolute numbers of points. If points

are delivered as reinforcers, such histories may also

enhance sensitivity to the cumulative amount of

reinforcement--sensitivity that may be related to the

maximization and self-control often seen in human subjects

(e.g., Flora & Pavlik, 1992; King & Logue, 1987; Mawhinney,

1982). The present finding of self-control in subjects that

did not have such extensive verbal and social histories

reveals that training circumstances provided within the











token reinforcer arrangement may be sufficient to produce

self-control.

Previously reported differences in the performance of

humans and pigeons under self-control procedures may

therefore be the result of procedural differences, rather

than the verbal processes characteristic of humans per se.

A number of studies have documented differences in self-

control when different consequences are arranged (e.g.,

Logue et al., 1984; Logue et al., 1986; Navarick, 1982;

Ragotzy et al., 1988; Solnick et al., 1980). In conjunction

with the current study, these experiments suggest that with

both humans and pigeons, self-control is less likely with

reinforcers of more immediate value, such as food and escape

from noise, but more likely with token reinforcers. The

self-control obtained with token reinforcement could thus be

viewed simply as a case of insensitivity to delays with

certain kinds of consequences. When this delay

insensitivity is related to other characteristics of the

token arrangement, however, more complex interpretations

emerge.

It is consistent with the token reinforcement

literature to discuss the exchange period as a reinforcer of

component token schedules (e.g., Webbe & Malagodi, 1978)

and, in the present study, of trial choices. In Experiment

1 of the current study, it was argued that quicker access to

the exchange period on the final trial of a block following




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - Version 2.9.7 - mvs