The effect of reducing cognitive complexity on a hypothetico-deductive reasoning task

MISSING IMAGE

Material Information

Title:
The effect of reducing cognitive complexity on a hypothetico-deductive reasoning task
Physical Description:
137 leaves : ill. ; 29 cm.
Language:
English
Creator:
Marek-Lovejoy, Joan Pamela
Publication Date:

Subjects

Subjects / Keywords:
Psychology thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Psychology -- UF   ( lcsh )
Genre:
bibliography   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1998.
Bibliography:
Includes bibliographical references (leaves 99-102).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by J. Pamela Marek-Lovejoy.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 029549600
oclc - 41372645
System ID:
AA00022687:00001

Table of Contents
    Title Page
        Page i
    Acknowledgement
        Page ii
    Table of Contents
        Page iii
    Abstract
        Page iv
        Page v
    Introduction
        Page 1
        Page 2
        Page 3
    Literature review
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
    Dissertation framework and plan
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
    General discussion
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
    Conclusions and future directions
        Page 95
        Page 96
        Page 97
        Page 98
    References
        Page 99
        Page 100
        Page 101
        Page 102
    Appendix A. Wording of problems
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
    Appendix B. Materials
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
    Biographical sketch
        Page 137
        Page 138
        Page 139
Full Text
















THE EFFECT OF REDUCING COGNITIVE COMPLEXITY ON A
HYPOTHETICO-DEDUCTIVE REASONING TASK








By

J. PAMELA MAREK-LOVEJOY


A DISSERTATION TO BE PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1998














ACKNOWLEDGMENTS

I wish to express my enduring gratitude to my mentor, Richard A. Griggs, who

has served as an inspiration and a motivating force throughout this investigation, for his

ongoing concern and tenacious commitment. I also wish to thank the members of my

committee, Shari Ellis, Ira Fischler, Patricia H. Miller, and Chris Janiszewski, for their

continued support.

I am also indebted to my colleague, Andrew Christopher, who has been a constant

source of encouragement. My deepest appreciation goes to my husband, Leon Lovejoy,

for his emotional and material support during the time this dissertation was being

prepared.














TABLE OF CONTENTS


page
ap__gge

ACKNOW LEDGM ENTS........................................................................ ii

ABSTRACT............................................................................................. iv

INTRODUCTION ........................................................................................ 1

LITERATURE REVIEW .......................................................................... 4
W hat is the M meaning of "Or"? ........................................................ 4
Introducing the THOG.................................................................... 8
Early Investigations ........................................................................ 9
Introducing Realism........................................................................ 17
The Era of Separation ...................................................................... 26
Providing Procedural Cues............................................................... 38
Summary ........................................................................................ 43

DISSERTATION FRAMEWORK AND PLAN............................................ 46

M ETHODS, M ATERIALS AND RESULTS............................................. 56
General Procedures........................................................................... 56
Experiments la and lb ..................................................................... 56
Experiments 2a and 2b ..................................................................... 62
Experiment 3 .................................................................................... 66
Experiments 4a and 4b ..................................................................... 68

GENERAL DISCUSSION............................................................................ 83

CONCLUSIONS AND FUTURE DIRECTIONS......................................... 95

REFERENCES.............................................................................................. 99

APPENDIX A WORDING OF PROBLEMS........................................... 103

APPENDIX B M ATERIALS.................................................................... 112

BIOGRAPHICAL SKETCH ......................................................................... 137














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

By

J. Pamela Marek-Lovejoy

August 1998

Chairman: Richard A. Griggs
Major Department: Psychology

In adaptive reasoning, prior knowledge and selective attention may promote

efficiency by reducing information processing demands. However, in logical tasks, our

knowledge base and focusing strategies may lead us astray. The THOG problem is one of

a trilogy of tasks designed by Peter Wason to unveil systematic nonlogical tendencies that

divert people from the appropriate solution path. Based on an exclusive disjunctive rule,

the correct solution to the THOG problem involves hypothesis generation, classification

of four designs given alternative hypotheses, and simultaneous evaluation of information

from multiple hypotheses. Typically no more than one third of adult participants solve the

standard version of the problem. Confusion theory suggests that the cognitive complexity

of the problem leads people to adopt simplistic strategies such as focusing on properties

of the positive example. Clearly separating the hypothesized properties from those of the

positive example ameliorates this difficulty in some instances, but facilitation in an

abstract context is closely linked to the intricacies of instruction wording. This research

explicates difficulties encountered in the THOG problem by examining the effects of









restricting the number of alternative classifications and providing procedural assistance.

In seven experiments, 603 University of Florida undergraduates each completed one

pencil-and-paper version of the problem. Eliminating an "indeterminate" option from the

set of three possible classifications provided facilitation, as did wording the instruction to

focus attention on one other THOG in addition to the positive example. This facilitation

could, however, reflect nonlogical strategies such as focusing on uniqueness. A complete

explanation of the problem to the point of simultaneous hypothesis testing had a null

effect. Results strongly suggest that within the confines of the standard abstract problem

structure, consistent facilitation of logical reasoning is ultimately hampered by failure to

accurately combine information from multiple hypotheses. Uncertainty regarding which

of two hypotheses represents reality may influence people to bypass simultaneous

hypothesis testing. Given this requirement in the THOG problem, its complexity may

become an insurmountable challenge. Because problem demands simulate those in more

realistic decision-making scenarios involving several alternatives and constraints,

development of techniques to promote simultaneous evaluation of multiple hypotheses is

recommended.














INTRODUCTION

Reasoning, the basis for development and testing of hypotheses and formulation

of logical conclusions, requires filtering information to select relevant data for

goal-directed decisions. Prior knowledge and selective attention may promote efficiency

in reasoning by reducing information processing demands but may also inappropriately

override logic. "To capture some of the same processes, the puzzlement, the doubts, the

obsessive tendencies toward repetition, the compelling power of false clues... "(Wason,

1978, p. 20) manifested in reasoning beyond the laboratory, Peter Wason devised a

trilogy of experimental tasks: the 2-4-6 problem, the selection task, and the THOG

problem. These problems illuminated weaknesses in reasoning, including resistance to

falsification, a tendency to be misled by perceptual cues, and self-contradiction. The high

error rates contradicted prevailing ideas concerning our abilities to reason logically

(Evans & Newstead, 1995).

Initial research on the 2-4-6 task, an inductive reasoning problem designed to

explore the strategies people use to generate and test hypotheses, was published in 1960.

In this task, participants aim to determine the rule used to create a three-number

sequence. They do so by generating a series of three numbers, then receiving feedback

indicating whether the series fits the rule. When participants are confident they know the

rule, they report it to the experimenter. If they are incorrect, they resume generating

triples. Although effective strategies involve successively testing and attempting to falsify








different hypotheses, many participants tend to repeatedly test positive examples

consistent with their original hypothesis (Tweney et al., 1980).

The first major paper on the selection task, a deductive reasoning problem

involving a conditional rule, was published in 1966. In this task, participants are shown

one side of four cards (e.g., A, K, 4, 7), then indicate which cards they would turn over to

determine the truth or falsity of a rule (e.g., "If a card has a vowel on one side, then it has

an odd number on the other"). Typically, no more than 10% of adult participants make

the logically correct choices (Evans & Newstead, 1995). The fame accorded the selection

task, more widely researched than any other reasoning problem, was a primary impetus

for the development of the THOG problem ten years later. According to Wason (1977), in

the course of a decade, potential participants had become too familiar with the selection

task, although they often remained confused about its solution even after exposure to

relevant information (Wason, 1979; 1981).

To delve further into contradictory elements in reasoning leading to a "crisis in

belief," Wason created the THOG problem, a task requiring the generation of hypotheses

and application of an exclusive disjunctive rule. For the past 20 years, researchers have

been searching for reasons to explicate the difficulty of the THOG problem. This paper

reviews the highlights of that search, emphasizing attempts to reduce its cognitive

complexity within the confines of its standard structure. Facilitation efforts are designed

to overcome identified biases that impede solution.

As defined by Evans (1989), "bias" refers to "a systematic tendency to take

account of factors irrelevant to the task at hand or to ignore relevant factors" (p. 9). The

term does not necessarily imply that people are unable to reason logically. Rather, it








suggests that specific problem features provoke use of inappropriate strategies that thwart

solution. Knowledge of such bias facilitates understanding of the cognitive processes that

underlie reasoning performance.

To provide a framework for positioning subsequent experiments on the THOG

problem and its relations, Section 1 of the Literature Review encapsulates related research

on how people interpret disjunctive statements. Section 2 previews the standard version

of the THOG, introducing the cognitive activities presumed to underlie its solution.

Section 3 describes early studies of the THOG, including attempts at facilitation through

the use of realistic material. Section 4 discusses efforts to reduce cognitive complexity by

clearly separating the properties of the designated example from the hypothesized

properties to which the disjunctive rule is applied during classification. Section 5 outlines

attempts to guide people toward the correct solution by providing procedural cues.

Section 6 summarizes information presented in preceding sections.














LITERATURE REVIEW

What is the Meaning of "Or?"

Consider the two propositions (p and q) in the following sentence: "Jake is angry

(p) or Jason is friendly (q)." Under what conditions is this sentence logically false? If both

propositions (disjuncts) are false (if Jake is not angry and Jason is not friendly), then the

disjunction, p or q, is false as well. Under what conditions is this sentence logically true?

Ifp is true and q is false, then the sentence itself is true. Similarly, ifp is false and q is

true, then the sentence itself is true. These three determinations of truth and falsity hold

regardless of the interpretation of"or." The fourth possible combination is not as clear

cut. Suppose both disjuncts are true: Jake is angry and Jason is friendly. Given an

inclusive interpretation, the original sentence is true because inclusive disjunction permits

both propositions to occur. In contrast, given an exclusive interpretation, the original

sentence is false because exclusive disjunction, by definition, specifies thatp and q

cannot both occur without falsifying a statement or violating a rule. Inclusive disjunction

allows, but does not require, an "or both" interpretation, whereas an exclusive disjunction

is limited to "but not both" scenarios.

In propositional logic, the relationship between the truth and falsity of each of the

propositions and the truth and falsity of the statement that they comprise is typically

illustrated via the use of truth tables. For example, the truth table that follows (in which






5

T = True and F = False) highlights how the difference between inclusive and exclusive

disjunction occurs only in the case when both propositions p and q are true.

Truth

Connective Linguistic Form Proposition of Rule

R q

Inclusive disjunction p or q (or both) T T T

T F T

F T T

F F F

Exclusive disjunction p or q (but not both) T T F

T F T

F T T

F F F

In linguistics, theorists have debated which of these two interpretations of

disjunctive statements is "basic." Newstead and Griggs (1983) cited key positions in this

debate. They noted that Gazdar (1979) argued for the primacy of the inclusive

interpretation but considered that "or" may take on an exclusive meaning when used

without qualification. This derived interpretation stems from the expectation that speakers

provide the maximum amount of information. Lakoff (1971) advocated that "or" implies

an alternative, and that most disjunctive sentences are congruent with an exclusive

interpretation. Hurford (1974) claimed that "or" has dual meanings, with other languages

assigning different words to each meaning. Evidence for the various positions in the








linguistic debate included theorists' interpretations of different sentences and their

meanings, rather than experimental data.

In the psychology of reasoning, Evans and Newstead (1980) experimentally

investigated people's perceptions of disjunctive statements involving letters and numbers.

In Experiment 1, participants read a rule concerning how certain letters could be paired

with certain digits, then generated letter-number pairs that conformed to or contradicted

the rule. In Experiment 2, participants indicated whether a rule was true or false in

relation to a specific letter-number pair. In both experiments, people who responded to

TT instances showed a preference for inclusive interpretation. Approximately 50% to

60% of the participants consistently favored an inclusive interpretation of these instances,

about 20% consistently favored an exclusive interpretation, and the remainder gave

inconsistent responses. However, Newstead and Griggs (1983) pointed out that focusing

primarily on TT instances provides a less rigid criterion than reconstruction of the entire

subjective truth tables used by participants. Using this approach, other studies (e.g.,

Braine and Rumain, 1981) have shown that exclusive disjunction is the primary

interpretation, at least if the material involved is abstract or only weakly linked to

real-world usage.

Using more realistic scenarios, subsequent research suggested that interpretation

of disjunctives is context-dependent. Newstead, Griggs, and Chrostowski (1984, Exp. 1)

presented participants with brief passages, each of which included a disjunctive

statement. Participants read three passages from each of seven different contexts (e.g.,

threat, choice, qualification). For each passage, they indicated whether each of four

possible outcomes (including each of the four possible disjunct pairs: TT, TF, FT, FF)








was consistent or inconsistent with the disjunctive statement. Although there was

considerable variation between contexts, a majority of responses (averages of 65% and

76% in two studies) were indicative of an exclusive interpretation. This was generally

true of all contexts except qualification, in which the inclusive interpretation

predominated. In qualification contexts (e.g., "The person I will vote for will have to be

either intelligent or open-minded"), it is generally understood that the presence of both

disjuncts most likely enhances a candidate's potential for gaining a vote rather than

reducing it.

In Experiment 2, participants read brief passages ending with a disjunctive

statement, then read a sentence that either affirmed or denied the first disjunct. Their task

was to indicate whether a conclusion that either affirmed or denied the second disjunct

followed from the information given in the disjunctive statement and the sentence that

followed it. Again, responses were typically indicative of an exclusive interpretation,

except in a qualification context.

Of the seven contexts studied, the abstract context is most directly relevant to

interpretation of the exclusive disjunctive rule in the standard THOG problem. In fact, the

disjuncts involved in one of the abstract scenarios included shape and color. Overall, in

the abstract context, participants who evaluated consistency of outcomes (Newstead et al.,

1984, Exp. 1) favored an exclusive rather than inclusive interpretation (67% vs. 18% for

the main experiment and 47% vs. 45% for the replication). Participants who evaluated

conclusions based on a disjunctive statement followed by an affirmation of its first

disjunct also favored an exclusive interpretation of the connective "or" (Newstead et al.,

1984, Exp. 2). However, the tendency to interpret a disjunctive statement as exclusive








appeared somewhat weaker given an abstract context than in other contexts (except

qualification). Additionally, the proportion of correct answers generally appeared lower in

abstract contexts than in other scenarios.

Introducing the THOG

The THOG problem takes us into a "looking glass world" (Wason, 1978, p. 50), a

world in which the pairs of features that define THOGness, if combined, create a design

that is not a THOG. This is one of two contradictions that spur interest in the processes

leading to solution of the THOG problem. The other contradiction, perhaps one that

makes that actual solution seem untenable, is that the two THOGs in the problem share

neither the same shape nor the same color.

The problem is built around four combinations of two shapes and two colors.

People are told that one of four designs is a THOG. Given a rule that defines THOGness,

they are asked to indicate whether each of the other designs is or is not a THOG or

whether there is insufficient information to make a decision. Typically, only about one

third or less of adult participants who attempt the standard version of this task correctly

classify the designs. This level of performance occurs despite explicit reinforcement of

the dominant exclusive interpretation of "or" by inclusion of the phrase "but not both" in

the problem statement that follows.

In front of you are four designs: Black Diamond, White Diamond, Black Circle
and White Circle. You are to assume that I have written down one of the colours
(black or white) and one of the shapes (diamond or circle). Now read the
following rule carefully: If, and only if, any of the designs includes either the
colour I have written down, or the shape I have written down, but not both, then it
is called a THOG. I will tell you that the Black Diamond is a THOG. Each of the
designs can now be classified into one of the following categories: A) Definitely
is a THOG, B) Insufficient information to decide, C) Definitely is not a THOG.
(Wason & Brooks, 1979, p. 80).








At this point, a review of logical steps leading to the correct solution of the THOG

problem is appropriate. First, knowing that the Black Diamond is a THOG, and

considering the rule "If, and only if, any of the designs includes either the color I have

written down, or the shape I have written down, but not both, then it is called a THOG,"

participants who follow algorithmic steps toward solution would first hypothesize the set

of properties written down by the experimenter. There are two possible sets of properties:

either black and circle or white and diamond. Second, the identity of each of the other

three designs would be determined based on the first set of hypothesized properties (black

and circle) and the rule. The White Diamond is not a THOG because it contains neither of

these properties. The Black Circle is not a THOG because it contains both of them. The

White Circle is a THOG because it contains one of the properties (circle) but not the other

(black). Third, the identity of each of these designs would be determined based on the

second set of hypothesized properties (white and diamond) and the rule. The results of

this procedure are the same as those obtained with the black and circle set. Fourth, by

simultaneously evaluating classifications given under each of the hypotheses, participants

would reach the conclusion that the White Circle is a THOG, and the Black Circle and

White Diamond are not THOGs.

Early Investigations

Any of four steps toward solution provides a potential juncture for error. To test

whether people understood the rule, Wason and Brooks (1979, Exp. 1) presented

participants with four designs, each containing a different combination of two colors and

two shapes. (They called the target designs CHUZ instead of THOG because the same

participants later attempted to solve the standard THOG problem.) Participants then wrote








down their own choice of one color and one shape, thereby constructing their own

hypotheses. They read the rule and classified each design into one of three categories: A)

Definitely is a CHUZ, B) Insufficient information to decide, C) Definitely is not a CHUZ.

All participants in this Constructed CHUZ condition correctly identified all four designs

based on their preceding choice of color and shape and application of the exclusive

disjunctive rule. This solution rate demonstrated that adults do understand the rule, and

can apply it to a single set of properties that they themselves have determined.

A second possible difficulty is that people cannot correctly identify what the

experimenter has written down. Wason and Brooks (1979, Exp. 2) presented participants

with the CHUZ designs, the rule, and a checklist of four combinations of two colors and

two shapes that the experimenter could have written down. Participants were told that one

design was a CHUZ, then asked to indicate whether or not the experimenter could have

written down each of the listed combinations. Sixty-four percent of the participants

correctly identified which hypotheses could and could not be written down.

In Experiment 3, Wason and Brooks (1979) showed participants the CHUZ

designs, the rule, and the positive example. Instead of being given a checklist of possible

hypotheses, these participants were asked to determine what was written down and

provide a rationale for their answer. Sixty percent correctly identified the two possible

hypotheses and backed their answer with appropriate reasons. Another 20% identified the

hypotheses without adequately supporting their answers. Thus, in the two experiments,

71% of the participants accurately interpreted the rule, using it to work backwards from

the positive example to derive the properties that could be written down. Later

experiments (Girotto & Legrenzi, 1989, Exp. 1; Girotto & Legrenzi, 1993, Exps. 1, 2, &








3; Griggs, Platt, Newstead, & Jackson, 1998, Exps. 1,2, & 3; Smyth & Clark, 1986,

Exp. 3) replicated these findings regarding elicitation of hypotheses in a variety of

problem contexts, with correct identification of both possible hypotheses ranging from

50% to 95%. Clearly, at least a majority of people, typically more, are not stymied by

either rule interpretation or hypotheses generation.

However, the series of experiments conducted by Wason and Brooks (1979) also

illustrated that neither understanding the rule (Exp. 1) nor the ability to identify

hypotheses (Exp. 2) necessarily translated into an ability to solve the THOG problem. In

Experiment 1, although all participants correctly identified the CHUZ after constructing

their own hypothesis, only 28% were then able to solve the standard THOG problem. In

Experiment 2, only 33% of the participants who spontaneously indicated both possible

hypotheses in the CHUZ problem subsequently solved the problem. Participants who did

not spontaneously generate the possible hypotheses were told which ones were correct

and given an explanation of how they were derived. Prior to attempting to solve the

CHUZ problem, all claimed to understand the problem at least insofar as the hypotheses

generation stage, yet none provided the correct answer. Moreover, prior experience with

this CHUZ problem, partitioned into the stages of hypotheses generation and hypotheses

testing, did not facilitate subsequent performance on the THOG problem.

The pattern of errors in these early investigations of the THOG problem

foreshadowed those in subsequent research. After completing either the Constructed

CHUZ or a standard CHUZ problem that differed from the THOG only in the assigned

name and color of designs, participants attempted to solve the THOG problem (Wason &

Brooks, 1979, Exp. 1). Forty-seven percent of the errors indicated either that White Circle








was not a THOG and that there was insufficient information to decide about the Black

Circle and White Diamond (42%) or that the White Circle was not a THOG and that the

Black Circle and White Diamond were THOGs (5%). Collectively, Wason and Brooks

(1979) labeled these "intuitive" errors, because they "seem due to a plausible inference

based on properties of the designs rather than on the hypotheses (p. 84)." Griggs and

Newstead (1983) differentially labeled these errors Type A (with not-THOG responses

for the Black Circle and White Diamond) and Type B (with indeterminate responses for

the Black Circle and White Diamond).

One possible explanation for these errors is drawn from work on attainment of

disjunctive concepts (Bruner, Goodnow, & Austin, 1956). In the classic attribute learning

paradigm, two attributes defined a concept, and each instance contained four attributes.

Participants were shown a card and told whether or not it was illustrative of the concept.

Then, they selected additional cards, one at a time. For each card selected, the

experimenter indicated whether it was a positive or negative instance. After any selection,

a participant had the option of hypothesizing what the defining attributes were. The task

was complete when a participant hypothesized the correct attributes. For disjunctive

concepts, participants frequently adopted an erroneous strategy which, though appropriate

for attaining conjunctive concepts, led people astray when learning disjunctions.

Developing their hypotheses based on positive instances, participants proposed that

features shared by illustrative instances were those that defined the concept. This

common element fallacy often fails for disjunctive concepts because two members of a

category may have no feature in common. Bruner et al. also found that if the concept

attainment task began with a negative rather than a positive example, participants used








more efficient strategies to reach their initial hypotheses. Bruner et al. contended that the

negative example encouraged participants to focus on attributes not included in the

example, thus bypassing the common element fallacy.

In the THOG problem, given the Black Diamond as a positive example,

participants committing the common element fallacy hypothesize that the properties

written down are black and diamond. If they then apply the rule, the Black Circle and

White Diamond each appear to be THOGs, because each contains one of the

hypothesized attributes. Because the White Circle contains neither of these properties, it

does not appear to be a THOG. This response pattern corresponds to Type A errors.

Another possible explanation for the error patterns in the THOG problem has been

drawn from studies of conditional reasoning. In propositional logic, the affirmative rule

"IfNeil is studying, then his door is closed" is considered false only ifNeil is studying

and his door is open (the TF case). The statements "if Neil is not studying and his door is

open" (the FT case) and "ifNeil is not studying and his door is not open" (the FF case)

are both considered true, verifying the rule. Evans (1972) investigated how people

actually interpreted conditional rules, using stimuli differing in color and shape. He

hypothesized that people considered the FT and FF cases irrelevant and would not

designate them as either verifying or falsifying instances. Evans (1972) asked participants

to construct examples that verified or falsified a variety of conditional rules, some of

which involved negatives (e.g., "If not p, then q"). The major finding of this research

overshadowed results concerning rule interpretation. When falsifying rules, people

attended not to the logical structure but rather to the particular instances named in a rule.

Their selections matched the named instances. For example, given the rule "If there is not








a Yellow Diamond on the left, there is a Purple Circle on the right," only 30% of the

participants correctly chose any color/shape other than the Yellow Diamond and any

color/shape other than the Purple Circle (the TF case) to falsify the rule. Instead, people

would initially select the Yellow Diamond on the left and the Purple Circle on the right

(the FT case) as a falsifying instance. Few (3%) initial selections for falsifying the other

three rule forms included the FT case, suggesting that its selection for rules with a

negative antecedent and an affirmative consequent was influenced by matching bias. The

idea that perceived relevance is linked to items that match those mentioned in the rule has

also been offered as an explanation for performance on the four-card selection task

(Evans, 1995). Response patterns on the THOG problem also show evidence of matching,

but in a somewhat different way.

Given the Black Diamond as a positive instance, the matching bias explanation

suggests that people bypass the difficulty of testing multiple hypotheses simultaneously.

Instead, to simplify the complexity of the task, people compare or match each of the

designs against the positive example. Because the White Diamond and the Black Circle

each share only one of the designated THOG properties, people are uncertain about the

THOGness of these designs. Reflecting this uncertainty, people indicate there is

insufficient information to make a decision. The White Circle does not match the Black

Diamond on either attribute dimension. Possessing neither the color nor the shape of a

Black Diamond, it is classified as "not a THOG." This response pattern corresponds to a

Type B error.

Griggs and Newstead (1983) designed a series of variations on the THOG

problem to explore whether the common element fallacy or matching bias explanation








best explained its difficulty. They reasoned that use of a negative example (indicating the

Black Diamond was not a THOG) should facilitate identification of THOGs (the White

Diamond and the Black Circle) in either case, although subsequent facilitation on a

standard THOG problem would support a common element fallacy explanation. This

rationale followed from the understanding that directing focus to a negative instance

serves to initiate a logical pattern of reasoning, susceptible to transfer. In contrast, if

people based their responses on matching, then they would reach the correct answers on

the Not-THOG problem by matching to the example, albeit a negative one.' This

nonlogical strategy would yield a correct answer only for the Not-THOG problem. Griggs

and Newstead (1983, Exp. 1) found that performance was indeed better on the

Not-THOG problem than on the standard version. However, participants who worked on

the Not-THOG problem first were not more likely to solve the standard THOG problem

than were those who worked on the standard THOG first. 2 The lack of transfer favored

the matching bias explanation.

An additional experiment introducing the Anti-THOG problem (Griggs &

Newstead, 1983, Exp. 2, adopted from Wason, 1978) yielded a conflicting conclusion. In

this problem, instead of being told that a design is a THOG if it contains one and only one

of the two features written down, participants were told: "There is a particular color and a



Among participants who constructed verifying and falsifying cases of conditional rules,
Evans (1972) found evidence of matching bias stemming from negative components of
antecedents and consequents. Similarly, in selection task research, Evans and Lynch
(1973) demonstrated that card choices tended to match the components named in the rule,
regardless of the presence of negatives.
2 In one of the two problems attempted by each participant, the shapes and shading of the
designs were changed, as was the label for the target design (from THOG to CHUZ).








particular shape such that any of the four designs which has either both these features, or

neither of them, is called a THOG." Note that if the Black Diamond is given as a positive

example, this rule yields the same answer as the standard rule, i.e., the Black Diamond

and White Circle are THOGs. Because of the conjunctive nature of the Anti-THOG rule,

Griggs and Newstead reasoned that if people relied on a common element strategy, then

performance would be better on the Anti-THOG problem than on the standard version. In

contrast, matching bias based on the Black Diamond as a positive example would yield

the same results for both versions. In a between-subjects design, the percentage of correct

answers was higher for the Anti-THOG problem than for the THOG, supporting use of

the common element fallacy.

But there is more to the story. Griggs and Newstead (1983, Exp. 3) devised a third

more complex problem, the Denial THOG, to determine if matching bias plays a role in

the difficulty of the THOG. Because the matching bias explanation suggests that people

base their answers on the positive instance, not on the rule, a matching bias explanation

predicts the same pattern of errors for a rule with negatives as for the standard rule, if the

same positive example, the Black Diamond, is used. In the Denial THOG, the rule

involved three negatives: "If, and only if, a design does not include the color that I have

written down, or does not include the shape that I have written down, or does not include

both the color and the shape that I have written down, then it is a THOG." The answer to

this problem is that there is insufficient information to classify any of the designs,

because any combination of shape and color other than black and diamond could be

written down. The classification of the other designs differs depending on what is written.

The error patterns for the Denial THOG resembled those for the standard THOG, with a








majority of errors being intuitive, primarily Type B. These results were in accordance

with matching bias predictions.

Thus, no firm conclusions could be drawn concerning the common element

fallacy versus matching bias explanations. However, Griggs and Newstead (1983)

suggested, based on evidence from the selection task, that people relied on matching

when no other solution path seemed viable (e.g., in more difficult problems). This

hypothesis might explain why matching bias was less prominent in the Anti-THOG

problem which is logically less challenging than the THOG. Subsequent research

stretched beyond this unresolved controversy to examine whether realism enhanced

solution rates.

Introducing Realism

In other reasoning domains, the influence of realism on performance has been

equivocal. Newstead, Griggs and Warner (1982) reported that belief bias compromises

the conclusion that syllogistic reasoning is facilitated by realistic content (citing Wilkins,

1928). In studies using conditional rules, the influence of concrete content has been

inconsistent. Heightened performance originally attributed to realism (Johnson-Laird,

Legrenzi, & Legrenzi, 1972) subsequently appeared to be more appropriately explained

by memory cueing (Griggs & Cox, 1982). According to this explanation, realistic

problems prime preexisting knowledge that is then applied to "solve" a problem in lieu of

logical reasoning. Further, on a disjunctive reasoning task in which the meaning

expressed by the conjunct of two premises was incongruent with real-life expectations,

realism tended to impair rather than improve performance (Roberge 1977; 1978, cited in

Newstead et al., 1982).








To study the effects of realism on the THOG problem, Newstead et al. (1982)

investigated performance on four problems similar in structure to the THOG problem.

The first (Newstead et al., 1982, Exp. 1), adapted from Stainton-Rogers (cited in Wason,

1978) described the preferences of four women for clothing (jeans and shirts or dresses)

and music (rock or classical), and designated one woman as having "style" (see

Appendix A, Newstead et al., 1982, Style, for exact wording). Participants were asked to

determine which other woman or women had style. The theme of the second problem

(Newstead et al., 1982, Exp. 2) was eligibility for a third-year psychology course, with a

prerequisite of one and only one previous course in cognitive psychology. Participants

were given information about four students, each of whom had completed one of two

first-year courses (social or cognitive psychology) and one of two-second year courses

(social or cognitive psychology), then given an example of a student who qualified for a

third-year course (see Appendix A, Newstead et al., 1982, Psychology, for a more

complete description). Participants were asked to determine which other student or

students, if any, qualified for the third-year course. Neither the Style nor the Psychology

problem produced facilitation compared to abstract versions.3

To assess the extent to which prior expectations might boost solution rates,

Newstead et al. (1982, Exps. 3 & 5) adapted a third problem related to food preferences

from Cordell (1978; Appendix III, cited in Newstead et al., 1982). The items involved in





3 Performance on the Style problem was compared to that on the standard THOG,
whereas performance on the Psychology problem was compared to that on a problem
involving letter combinations and an arbitrary "category P."








the Meat and Gravy problem included two foods (meat or ice cream) and two sauces

(gravy or chocolate sauce). Participants were given a rule, told that the experimenter

would eat meat and gravy, then determined whether or not the experimenter would eat

each of the other food-sauce combinations. The problem was designed so that the other

edible combination (ice cream and chocolate sauce) coincided with preexisting beliefs

(see Appendix A, Newstead et al., 1982, Meat and Gravy, for the exact wording of two

versions.). Both versions of the Meat and Gravy problem produced similar and significant

facilitation compared to an abstract problem that contained letters and numbers instead of

foods and sauces (43% correct vs. 0% correct, Exp. 3). Justifications written by

participants supported the hypothesis that results were attributable to memory cueing

rather than logical reasoning. However, in Experiment 5, performance on an incongruent

version of this problem (with meat and chocolate sauce as the answer), was equivalent to

performance on replications of the original Meat and Gravy version. Moreover, the

proportion of correct answers to the original Meat and Gravy problem dropped to 20%,

suggesting that adults' preexisting expectations had only a small influence on realistic

versions of the THOG.4





4 In contrast, elementary school children (8 to 9 years of age) seemed highly susceptible
to memory cueing (Newstead et al., 1982, Exp. 4). Given a congruent problem
comparable to the Meat and Gravy scenario (using pictures of hamburger, mustard,
pancakes and syrup), 75% of the children responded correctly. Because it was unlikely
that these children had acquired the ability to combine information from two hypotheses
(typically demonstrated at about age 11 or 12, according to Inhelder & Piaget, 1958), the
high solution rate suggested that preexisting expectations influenced answer choices.
Supporting this idea, when a correct response conflicted with prior experience (in the
incongruent condition), no child solved the problem.








Subsequent research probed whether the effect of realism was context specific.

According to Wason (1978), an exclusive interpretation of"or" was particularly

appropriate in an imperative context. Thus, Griggs and Newstead (1982) devised

additional problems using imperatives (see Appendix A, Griggs & Newstead, 1982, Drug

and Diet problems, for exact wording). In the Drug problem, participants read a scenario

about administering drugs. Four drugs differed in content (calcium or potassium) and in

mode of administration (oral or injection). Nurses were instructed to give patients one

injection and one oral medication daily, containing one dose of calcium and one dose of

potassium. One permissible combination was presented as a positive example, then

participants determined whether or not each of the other combinations was appropriate. In

the Diet problem, four ladies in a diet class were instructed to have meat either for lunch

or dinner, but not both. When they took sandwiches on a picnic, they packed four boxes

of sandwiches. Two boxes were for lunch and two were for dinner. For each meal, one

box contained sandwiches with meat, the other sandwiches with cheese. Given a positive

example of one combination of boxes that conformed to the diet plan, participants

determined whether each of the other combinations fit the rule.

All participants did both problems, with order of presentation rotated (Griggs &

Newstead, 1982, Exp. 1). Compared to the abstract THOG problem (Newstead et al.,

1982), the Drug problem facilitated performance regardless of presentation order. The

Diet problem, however, facilitated performance only when it was presented second. This

inconsistency suggested that a factor other than an imperative context was involved in

facilitation. Griggs and Newstead (1982) posited that this factor related to problem

structure. In the Drug problem, both divisions of the structural tree (calcium and








potassium) were clearly specified and linked to the second property (oral and intravenous)

that in turn was linked to two specific drugs. This linkage was not present in either the

Diet problem (specifying meat and one meal but not indicating the content of the other) or

in the standard THOG problem (citing properties written down but not mentioning those

properties that were not written down). To test the hypothesis that facilitation related to

explicitly providing information needed to construct a binary symmetrical structural tree,

Griggs and Newstead (1982, Exp. 3) modified the Diet problem and created a structured

version of the abstract THOG (see Appendix A, Griggs & Newstead, 1982, for exact

wording of the Structured Diet problem).

In the Structured Abstract THOG, participants determined which of four objects,

each denoted by a nonsense syllable, conformed to a rule about correct combinations.

Two objects were squares (CHON and THIG) and two were circles (GREF and WULP).

One square and one circle were black and the others were white. Participants were not

told which name corresponded to which color. Given a positive example of a permissible

pair and a rule that a correct combination included one object of each color and one

object of each shape, participants indicated whether each of the remaining objects

conformed to the rule. In the Structured Diet problem, the phrasing of the rule was

changed to indicate that the ladies should have meat for one and only one meal and

cheese for one and only one meal. Thus, both branches of the structural tree were labeled.

In a between-subjects design, participants in the experimental groups completed one of

these problems. To serve as a baseline, participants in a control group completed the

standard THOG problem.








Participants who worked on the structured problems performed extremely well.

Solution rates were 90% for the Structured Abstract THOG, 85% for the Structured Diet

problem, and 10% for the standard THOG problem. Would this impressive facilitation

lead to transfer from a structured version to the standard THOG? Griggs and Newstead

(1982) did not conduct a transfer test between the Structured Abstract THOG and

standard THOG problems. Despite their shared underlying structure and level of

abstraction, these two problems appear to require different reasoning processes. For the

Structured Abstract THOG, the answer can be derived primarily by a process of

elimination. Given CHON-GREF as a positive example, the CHON-WULP combination

is disallowed because WULP is a different color than GREF and therefore must be the

same color as CHON. Similarly, the THIG-GREF combination is disallowed because

THIG is a different color than CHON and therefore must be the same color as GREF.

Other than the positive example, the only remaining square-circle possibility is

THIG-WULP, because THIG must be the opposite color of CHON (given both are

squares), and WULP must be the opposite color of GREF (given both are circles). Thus,

THIG-WULP is a permissible combination. This line of reasoning seems markedly less

complex than the simultaneous evaluation of two hypotheses required in the standard

THOG, reducing the likelihood of transfer.

Griggs and Newstead (1982, Exp. 4) did examine transfer between a rephrased

version of the Drug problem (see Appendix A, Griggs & Newstead, 1982, Rephrased

Drug, for more details) and the standard THOG. Order of presentation was balanced

between subjects. Unlike the transfer between the Drug and Diet problems, prior

experience with a structured problem clearly did not transfer to the standard THOG.








Although about half of the participants correctly responded to the Rephrased Drug

problem in each presentation order, the percentage solving the standard THOG ranged

from 0% to 6%.

Apparently, the difference in specificity of the branches of the structural tree was

too great, hampering participants' ability to link the two problems. The elusiveness of

transfer echoed earlier findings indicating that transfer from one problem-solving task to

another occurred only when participants clearly recognized structural similarities, an

effect mediated by problem complexity (Luger & Bauer, 1978; Reed, Ernst & Banerji,

1974). The Diet problem statement alluded to the binary nature of its structure, although

the non-meat side of the tree was neither positively defined nor named. Yet even though

the second brand was not explicit, its presence, perhaps in the description of the

sandwiches, sufficed to permit transfer from the more explicitly structured Drug problem.

In contrast, the THOG problem statement made no reference to properties not written

down, minimizing the probability that participants would recognize the two branches.

This failure may have led participants to create an inappropriate internal representation

that blocked subsequent achievement of the correct solution.

Rather than constructing a complex scenario to imbue the THOG problem with

realism, Smyth and Clark (1986) selected a real-life example of exclusive disjunction, the

half-sister relationship, and embodied it in the THOG format. This arrangement permitted

them to explore the effects of increasing cognitive complexity by comparing transfer to

the standard THOG problem for each of a series of Half-Sister problems. In Experiment

1, Smyth and Clark demonstrated that people understood the half-sister relationship (see

Appendix A, Smyth & Clark, 1986, Half-Sister, for exact problem wording), but that this








knowledge did not transfer into improved performance on the THOG problem. However,

Half-Sister wording did not parallel that of the THOG, nor did the Half-Sister problem

explicitly contain an exclusive disjunctive rule. When Smyth and Clark (Exp. 2)

rephrased the Half-Sister problem to approximate the phrasing of the THOG problem

statement, performance relative to the original Half-Sister problem dropped (from 93%

correct answers to 37% correct answers), even though the relationship was cued with the

words "my mother" and "my father" (see Appendix A, Smyth & Clark 1986, Cued

Half-Sister, for exact wording). The decline was attributable to a failure to correctly

identify women who were not half-sisters. Despite this difficulty, performance on the

Cued Half-Sister was better than performance on the standard THOG. There was no

evidence of transfer.

In a second step toward determining the effects of heightened task complexity

Smyth and Clark (1986, Exp. 3) developed a problem that did not explicitly state it was

necessary to assume that either one of the mothers or one of the fathers was written down.

The problem statement in this Uncued Half-Sister problem also did not provide cues to

possible combinations of parents that could have been written down (see Appendix A,

Smyth & Clark, 1986, Uncued Half-Sister, for exact wording). Half of the participants

who attempted to classify the four women responded to a question "Who could my

parents be?" prior to classification (Structured Uncued Half-Sister problem). Performance

on both versions of the Uncued Half-Sister (10% correct) was no better than performance

on the standard THOG (8% correct). Although about two thirds of the participants

correctly identified both hypothesized sets of parents if asked to do so, only 15% of this

group then correctly classified the women. Thus, the difficulty appeared to stem primarily








from the need to simultaneously evaluate more than one alternative, rather than from an

inability to generate appropriate pairs, as was true for the CHUZ problem (Wason &

Brooks, 1979).

Smyth and Clark (1986) also investigated whether errors on the more complex

versions of the Half-Sister problem stemmed from inappropriate strategies similar to

those leading to errors on the standard THOG. To do so, they converted the Not-THOG

problem (Griggs & Newstead, 1983, Exp. 1) to the Not-Half-Sister problem (Smyth &

Clark, 1986, Exp. 4), by providing a negative example. Half of the participants who

attempted to classify the four women on the Not-Half-Sister problem responded to a

question "Who could my parents be?" prior to classification (Structured Not-Half-Sister

problem). Forty-eight percent of the participants answered the Not-Half-Sister problems

correctly on the first presentation of these problems. This solution rate was above the

solution rates for the standard THOG problem (8%), the Not-THOG problem (25%,

Griggs & Newstead, 1982), and the Cued and Uncued Half-Sister problems (36% and

10% respectively). Performance was similar on the Structured and Unstructured

Not-Half-Sister problems. Reaching a correct solution on these problems failed to transfer

to subsequent performance on the standard THOG problem. Because participants who

provided correct answers to the Not-Half-Sister problems were just as likely to make

nonintuitive as intuitive errors on the standard THOG problem, Smyth and Clark rejected

the matching bias explanation for the observed facilitation on the Not-Half-Sister

problems. They suggested instead that the high solution rate reflected a tendency to

consider the parents of the example conjunctively, simplifying subsequent operations.








The Era of Separation

In THOG-type problems, intuitive errors are typically more frequent than other

types of errors. In the studies reviewed to this point, intuitive errors accounted for 33%

(Wason & Brooks, 1979) to 60% (Griggs & Newstead, 1983) of all responses, generally

exceeding the percentage of correct answers. Both types of intuitive error stem from a

tendency to inappropriately base decisions (either determination of hypotheses or

classification of designs) on the properties of the positive example. Girotto and Legrenzi

(1989) proposed that a key to reducing this misleading strategy was to clearly separate the

data with which people are provided (the designated THOG) from the hypotheses they are

asked to generate (the properties written down). Girotto and Legrenzi (1989) suggested

that this separation could be achieved by creating a scenario in which there was a

temporal separation between the data and hypotheses. In the Two-Level Spy problem (see

Appendix A, Girotto & Legrenzi, 1989, Two-Level Spy, for exact wording), they devised

a thematic problem that required generating hypotheses based on an exclusive disjunctive

rule. To solve the problem, participants were required to alter the properties of the

positive example. The alteration was designed to "defocus" attention from this example,

and thereby encourage application of the rule to the hypothesized combinations.

The story involved four spies, each with two features (type ofjob and type of visa)

on their passports (Girotto & Legrenzi, 1989, Exp. 2). To return home in an emergency,

the spies altered one and only one of the features on their passports. Participants were

provided with information about the original versions of the passports for each spy and

told that one of the spies arrived home safely. Participants then determined which of the

other three spies, if any, also returned home without difficulty. Seventy-five percent of









the participants correctly solved this problem, compared to only 15% of those who were

presented with a similar thematic problem (see Appendix A, Girotto & Legrenzi, 1989,

One-Level Spy, for exact wording) that did not demand modification of the features of the

positive example, and 29% of those who worked on the standard THOG problem. In the

latter problems, 79% of the errors were intuitive, primarily Type B. Girotto and Legrenzi

(1989) hypothesized that the relatively poor performance on the One-Level Spy problem

resulted from its failure to clearly separate the data (the properties of the original passport

of the spy who returned home safely) and hypotheses (the properties of the altered

passports) levels. If this was indeed the case, then facilitation could occur using more

abstract material if data and hypotheses were adequately separated.

In the Pub problem, Girotto and Legrenzi (1989, Exp. 3) embedded the colors and

shapes from the standard THOG into a story about a card game in which the prize was a

free dinner (see Appendix A, Girotto & Legrenzi, 1989, Pub, for exact wording). One of

five men dealt himself and each of four friends a card. Each card contained one of the

four color-shape combinations from the THOG problem. The dealer offered to buy dinner

for whomever had a card that included either the color or shape of the design on his own

card, but not both. The person who held the Black Diamond was designated as someone

for whom the dealer would buy dinner. Participants then decided what card the dealer had

and whether he would buy dinner for anyone else. Eighty-nine percent of the participants

solved this problem, demonstrating that alteration of properties was not necessary for

facilitation. The problem structure provided sufficient incentive to inhibit prolonged

focus on the positive example (the card held by the person who was owed a dinner) and to

encourage concentration on the hypotheses (the cards that could be held by the dealer).








Although the presence of the story reduced the level of abstraction compared to the

standard THOG, results supported the idea that facilitation was linked to the structure of a

story, in this case the clarity with which it separated data and hypotheses, rather than the

introduction of realism.

Working with abstract material, O'Brien et al. (1990) compared the effects of

separation of the positive example from the properties that were written down (Trump

THOG), labeling of properties not written down (Blackboard and Blackboard Control

THOG), and instruction phrasing (One-Other THOG) on the proportion of correct

answers.5 Their work was designed to evaluate the explanation offered for the facilitatory

effect of the Structured Abstract THOG (Griggs & Newstead, 1982); namely, that

facilitation stemmed from providing labels for both sides of the structural tree. In the

Blackboard THOG condition (O'Brien et al., 1990, Exp. 3), participants were told that

one of the colors and one of the shapes was written on the left-hand side of a blackboard,

and that the other color and other shape was written on the right-hand side of a

blackboard. In the Blackboard Control condition, only the items on the left-hand side of a

blackboard were mentioned. The rule was identical in both cases, stating that if and only

if the any of the designs included either the color or shape written on the left-hand side of

the blackboard, but not both, then the design was a THOG. The Blackboard THOG

problem, explicitly referring to the binary structure, was more frequently solved than the




5 O'Brien et al. (1990) used a triangular shape instead of a diamond. However, to avoid
disrupting comprehension of comparisons between the O'Brien et al. investigations and
other research, including the experiments conducted for this dissertation, the triangular
shape is referred to as a diamond throughout this manuscript.









standard THOG problem (40% versus 5% respectively, in a cross-experiment

comparison), but the Blackboard Control problem (15% correct) did not significantly

facilitate performance6. These findings provided further support for the value of labeling

both branches of the structural tree. However, the Trump THOG (O'Brien et al., 1990,

Exp. 4) demonstrated that labeling both sides of the structural tree was not necessary for

facilitation.

The Trump THOG problem was designed to separate the properties of the THOG

example from the properties written down, as did the realistic Pub and Spy problems

(Girotto & Legrenzi, 1989). Instead of referring to properties written down, in the Trump

THOG problem, one of the colors was labeled TRUMP, and one of the shapes was

labeled FAFNER. A THOG was defined as a design that contained either the color

TRUMP or the shape FAFNER, but not both. This version of the problem also facilitated

performance (45% correct), attesting to the effect of separation with abstract material, at a

level comparable to the effect of labeling both sides of the structural tree.

In the One-Other THOG problem, the problem statement was identical to that of

the standard THOG. However, instead of being asked to classify each of the designs,

participants were explicitly told that only one design other than the Black Diamond was a

THOG. Their task was to correctly identify the other THOG. Sixty percent of the

participants did so. However, written justifications revealed that 67% of those who gave

the correct answer reached their conclusion using some type of exclusion strategy or by


6 O'Brien et al. asked participants to identify the Black Diamond, as well as the other
three designs, in all problems except for the task with the one-other instruction. Across
the Trump, Blackboard, and Blackboard Control THOG conditions, only 5% of the
participants did not correctly label the Black Diamond as a THOG.









focusing on uniqueness, without engaging in simultaneous testing of multiple hypotheses.

This suggested that the high solution rate for the Structured Abstract THOG may have

been partially attributable to participants who attained the correct answer by a process that

did not involve testing alternate hypotheses. Subsequent work with separation using

abstract material (Girotto & Legrenzi, 1993) was subject to a similar effect from

instruction wording.

Facilitation via separation suggests that a major difficulty in the THOG problem

stems from inappropriate focus on the positive example, leading to confusion between its

properties and the hypothesized properties to which the rule must be applied. In its

original form, confusion theory proposed that people assumed that the properties of the

positive example were those that were written down. However, in light of evidence that

many people who correctly identified the hypothesized combinations (white and diamond

or black and circle) failed to solve the problem, confusion theory has been extended to

incorporate any confusion between the hypothesized properties and the properties of the

positive example (Newstead, Girotto & Legrenzi, 1995). This revision allows for the

possibility that people correctly identify the hypotheses but then consider the properties

written down as (incorrectly) exemplifying THOGs rather than as properties that define

THOGness through the application of the rule. This seems akin to proposing that after

generating the hypotheses, people bypass applying the rule and select as THOGs those

design that match the properties written down. Although it is difficult to determine

whether incorrectly identifying the White Diamond and Black Circle as THOGs stems

from inappropriate reasoning (considering the written-down properties as examples) or

perceptual bias (matching the designs to the properties written down), successful








separation overcomes these difficulties by drawing attention away from the positive

example. However, Newstead and Griggs (1992) demonstrated that separation provides

only a partial explanation.

In their initial replication of the Pub problem, Newstead and Griggs (1992)

specifically requested that participants respond to the question "Which card do you think

Charles could have?" prior to determining whether or not each of the other friends would

receive a free dinner (or if there was insufficient information to decide). Compared to the

standard THOG problem with an expanded rule,7 the Pub problem (Newstead & Griggs,

1992, Exp. 1) significantly facilitated performance (0% versus 41% correct answers

respectively). However, a direct comparison between performance on this Pub problem

with a version that omitted the question about Charles' card revealed that performance

was enhanced by inclusion of the question (53% correct with the question versus 7%

correct without the question). Thus, the facilitatory effect of the Pub problem was based

not only on thematic separation but also required explicitly asking a question about the

hypotheses.

Newstead and Griggs (1992, Exp. 3) probed the influence of adding a similar

Question 1 to the standard THOG problem. Prior studies that focused on generating

hypotheses about the properties written down on an abstract problem (O'Brien at al,

1990; Smyth & Clark, 1986; Wason & Brooks, 1979) suggested that this should have

little effect on attaining the correct solution, and indeed it did not. The percentage of

correct answers was similar whether Question 1 was present (23%) or absent (13%). Also



7 The rule stated: "If a design includes either the color or the shape I have written down,
then it is not a THOG. If a figure has neither the color nor the shape, it is not a THOG."








consistent with earlier work, correctly generating the two possible hypotheses did not

necessarily lead to correct classification of the designs. Forty percent of the participants

indicated that either the Black Circle or the White Diamond could have been written

down, but of these, only 25% then proceeded to solve the problem. Thus, given

separation, the presence of Question 1, forcing participants to generate hypotheses, was

required for facilitation; however, without separation, hypotheses generation did not

inevitably lead to a correct answer. These results required a further modification to

confusion theory, because simply eliminating the confusion between the positive example

and the hypothesized properties was not sufficient to produce facilitation, unless people

initially generated hypotheses.

Newstead and Griggs (1992) suggested that versions of the THOG that produced

facilitation (e.g., the Drug, Restructured Diet, Spy, Pub, and Trump problems), either

directly or indirectly promoted both hypotheses generation and separation. For example,

in the Drug and Restructured Diet problems (Griggs & Newstead, 1992), clarifying the

binary nature of the problem may have facilitated hypotheses generation. In the Spy

problem (Girotto & Legrenzi, 1989), hypotheses generation may have been encouraged

by the requirement to modify the passports. In the Pub problem (Girotto & Legrenzi,

1989), hypotheses generation may have stemmed from asking about Charles' card. In the

Trump problem (O'Brien et al., 1990), labeling the properties may have induced

hypotheses generation. As Newstead and Griggs (1992) acknowledged, post hoc

explanation does not provide empirical justification. However, at the least, their

explanations illustrated that prior instances of facilitation did not preclude the possibility








that both separation and hypotheses generation jointly contributed to enhanced solution

rates.

Newstead and Griggs (1992) noted that even if confusion theory was modified to

encompass hypotheses generation, it would fail to explain why participants frequently

indicate that they cannot determine whether or not the White Diamond and Black Circle

are THOGs (Type B error). According to the theory, confusion would result in the

classification of these designs as "definitely not THOGs." This assertion appears to

ignore another source of confusion. If people generate two hypotheses, they may be

uncertain what to do next. From reading the problem statement, they may realize that only

one of these two hypotheses is actually written down. Failing to grasp the possibility that

both hypotheses may lead to the same conclusion, people may resolve the dilemma by

indicating that the status of the White Diamond and Black Circle is indeterminate. In

decision-making, Shafir and Tversky (1992) have demonstrated that when faced with

uncertainty, people tend to postpone a decision. The indeterminate response in the THOG

problem may be analogous to such postponement.

If confusion can be reduced by separation, then facilitation should occur with or

without a realistic context. The Trump THOG (O'Brien et al., 1990) demonstrated such

an effect, although its 45% solution rate did not reach the 89% level achieved in the Pub

problem (Girotto & Legrenzi, 1989). However, in addition to a realistic setting, the Pub

problem combined the two hypothesized properties under a single label (e.g., Charles'

card), whereas the Trump THOG assigned a label to one color and one shape but did not

label the hypothesized combination. Perhaps a single label is a prerequisite to facilitation,

even if data and hypotheses are separated.








Girotto and Legrenzi (1993) created an abstract problem, SARS, in which the

hypothesized properties of the designs were combined under the label SARS. Participants

were presented with the following problem statement.

In front of you are four designs: Black Diamond, White Diamond, Black Circle,
and White Circle (see Figure). I have defined one of these designs as a SARS.
You do not know which design this is. But you do know that a design is a THOG
if it has either the color or the shape of the SARS, but not both. Knowing for sure
that the Black Diamond is a THOG, you have to indicate which one or which
ones, among the remaining designs, could be the SARS. (Girotto & Legrenzi,
1993,p.705)

Given the Black Diamond as a positive example of a THOG, participants first

indicated which of the remaining designs could be the SARS (Question 1). Next, they

responded to the instruction: "Could you also indicate whether, in addition to the Black

Diamond, there are other THOGs?" Note that the task requirements stemming from this

"other" THOG instruction differ from those of the standard THOG instruction

(classifying each design as either a THOG, not a THOG, or indeterminate).

Using a between-subjects design, Girotto and Legrenzi (1993, Exp. 1) compared

performance on the SARS problem to that on the THOG problem with the "other"

instruction and on the Hypotheses-THOG. In the Hypotheses-THOG, Girotto and

Legrenzi (1993) introduced Question 1 into the standard THOG problem, asking

participants to identify the properties written down. This Hypotheses-THOG also

included the "other" THOG instruction, in lieu of the standard three-choice inquiry. The

solution rate for the SARS problem (70%) exceeded that for the Hypotheses-THOG

(40%) and the standard THOG with the "other" instruction (25%). Consistent with

previous studies involving hypotheses generation, at least half of the participants in the

SARS (61%) and Hypotheses-THOG (50%) conditions correctly generated two








hypotheses. Among those who did so, 76% percent in the SARS condition and 60% in the

Hypotheses-THOG condition correctly identified the designs. Participants who were

unable to generate at least one hypothesis were also unable to correctly identify the

designs.

An examination of the error patterns for the three problems suggested that only

the SARS problem eliminated considerable confusion leading to intuitive errors. Only

14% of the incorrect answers in the SARS condition were classified as intuitive errors,

compared to 58% for the Hypotheses-THOG and 56% for the standard THOG with the

"other" instruction. This error pattern, together with relative performance on the SARS

versus the Hypotheses-THOG supported the effectiveness of separation. On the other

hand, it was possible that after naming two SARS, participants then selected the only

remaining design, the White Circle, as the other THOG, without performing hypotheses

tests.

Attempting to eliminate this alternative explanation, Girotto and Legrenzi (1993)

conducted two subsequent experiments with the SARS, each with six designs. The two

additional shapes were a Gray Triangle and a Gray Rectangle. In Experiment 2, the

problem statement was identical to that of the SARS problem except that it specified six

shapes rather than four. In the Experiment 3, the problem statement indicated that the

experimenter had chosen one shape and color, designating the combination a SARS (see

Appendix A, Girotto & Legrenzi, 1993, SARS 6, Id Color and Shape, for exact wording).

In this version, the phrasing of Question 1 was also altered, requesting participants to find

the possible SARS, rather than indicating which of the remaining designs could be

SARS. However, despite evidence of facilitation (67% and 63% solved the problems in








Experiments 2 and 3 respectively), it could be argued that the gray color and triangular

and rectangular shapes were simply considered irrelevant because they did not contain

any of the properties of either the SARS or the THOG, leaving only the White Circle as

the other possible THOG.

Girotto and Legrenzi (1993) did not address the issue of whether Question 1 was

required to obtain facilitation, nor did they examine the effect of the "other" THOG

instruction on facilitation, compared to the standard three-choice instruction. In the task

used by O'Brien et al. (1990) with a one-other THOG instruction, 67% of those who

correctly identified the White Circle as a THOG used an exclusion strategy or focused on

uniqueness in lieu of logical reasoning. This sensitivity to information that focuses

attention on a single design may also explain performance on the SARS problem.

The possibility of alternative explanations motivated Griggs, Platt, Newstead and

Jackson (1998) to conduct a series of experiments in which they systematically

manipulated the requirement for hypotheses generation (Question 1 present or absent),

and the wording of the instruction. In the first series of experiments (Exp. 1 and four

replications), Griggs et al. (1998) compared performance on the SARS problem with and

without a request to generate hypotheses. Both versions included the standard

three-choice instruction, rather than the "other" THOG instruction. Among a variety of

participant populations (ranging from high school students to undergraduates to graduate

students), Griggs et al. failed to replicate the Girotto and Legrenzi results. Solution rates

ranged from 0% to 12% in the five studies, with no differences in performance related to

the presence or absence of Question 1. Only the "other" THOG instruction (Griggs et al.,








1998, Exp. 3) produced facilitation, with approximately half of the participants

identifying the White Circle as a THOG.

Using a 2 x 2 factorial design, Griggs et al. (1998, Exp. 4) also examined

performance on the SARS and standard THOG problems (both without Question 1) with

either the one-other or standard three-choice instruction. On both problems, the one-other

instruction produced facilitation compared to baseline solution rates for the three-choice

instruction (11% on the standard THOG problem and 6% for the SARS problem). There

was no significant difference between the solution rates for the SARS (72%) and standard

THOG (53%) problems with the one-other instruction. This pattern of results testified

that the presence of Question 1 was not necessary for facilitation if the instruction stated

(O'Brien et al., 1990) or suggested (Girotto & Legrenzi, 1993) that there was one other

THOG.

As O'Brien et al. (1990) mentioned, the one-other THOG instruction did not

consistently enhance logical reasoning. Instead, for many participants, it served to narrow

down the possibilities, rather than separate the properties of the example and the

hypotheses. It achieved its effect in abstract problems that did not involve a single label

for the pairs of hypothesized properties and did not require hypotheses generation.

Perhaps telling participants that there is only one other THOG discourages intuitive errors

by implying that both the Black Circle and White Diamond cannot be THOGs. From an

attentional perspective, when people cannot justify choosing either the White Diamond as

opposed to the Black Circle or vice versa, they may restrict their search to a design that is

unique. The White Circle is the only available candidate.








This attentional cueing explanation (Griggs et al., 1998) suggests that participants

may "identify the correct designs but for the wrong reasons" (p. 12). If the one-other

instruction encourages shifting the attentional spotlight to a design that bears a singular

relationship to the positive example, participants may adopt a strategy of elimination to

arrive at the correct answer, bypassing the more demanding tests of multiple hypotheses.

Moreover, the classification task itself is abbreviated by the request to identify one design

rather than three. Thus, the one-other instruction may serve to reduce cognitive

complexity in two respects, neither of which necessarily promotes application of the

appropriate logical reasoning strategies.

Providing Procedural Cues

Given a variety of unsuccessful attempts to promote logical facilitation using the

standard problem context and instructions, Smyth and Clark (1986, Exp. 5) devised an

ordered set of questions to guide participants through the logical steps leading to the

solution of the Uncued Half-Sister and the standard THOG problems. For the Uncued

Half-Sister, prior to indicating whether or not each woman was a half-sister (or whether

there was insufficient information to decide), participants answered the following

questions:

1. Given that Robin is my half-sister, who could my parents be?

2. If the first pair of parents you gave in answer to Question 1 actually were my
parents, which of the other women would be my half-sister?

3. Did you write down two possible pairs of parents in answer to Question 1 ? If
so, which of the women would be my half-sister if the second pair you choose
actually were my parents? (Smyth & Clark, 1986, p. 284)

Similar questions were developed for the colors and shapes in the standard THOG

problem. No feedback was given regarding answers to these questions. For both the








Uncued Half-Sister and the standard THOG problems, the effect of the three questions on

solution rate was minimal. Realism did, however, enhance performance on the three

preclassification questions. The proportion of correct answers to Question 1 was higher

for the Uncued Half-Sister problem (78%) than for the THOG problem (47%), as was the

proportion of correct answers to Questions 2 and 3 (47% for the Uncued Half-Sister

problem vs. 31% for the THOG problem). However, the solution rate was comparable for

the two problems (22% for the Uncued Half-Sister problem, and 19% for the THOG

problem).

Results highlighted the confusion stemming from dealing with two hypotheses.

Across both problems, a substantial minority of participants (23%) answered Question 1

correctly but offered incorrect responses to Questions 2 and 3. In fact, across both

problems, 19% of the participants answered all three questions correctly, yet failed to

correctly solve the problem. The addition of realism could not overcome apparent

confusion resulting from uncertainty regarding which pair of parents or properties was

relevant to the ultimate classification.

In the standard THOG context, O'Brien et al. (1990, Exp. 2) attempted to

ameliorate one source of difficulty by explicitly specifying what combination of

properties could be written down and providing a rationale for these possibilities. In the

Pretest THOG problem, after the problem statement, participants read the following.

Because you know that a THOG has either the shape I have written down or the
colour I have written down, but not both, it follows that either the colour I have
written down is black or the shape I have written down is diamond, but not both.
Further, because there are only two shapes, it follows that if the colour I have
written down is black, then the shape I have written down is circle; and because
there are only two colours, it follows that if the shape I have written down is
diamond, the colour I have written down is white. There are, therefore, two








possible combinations that I could have written down: diamond and white, or
circle and black. (O'Brien et al., 1990, p. 340.)

Participants then responded to a group of four questions regarding which designs

could or could not be THOGs given each possible combination (see Appendix A, O'Brien

et al., 1990, Pretest THOG, for exact wording). The standard three-choice classification

instruction followed. To reveal possible confusion related to the status of the Black

Diamond, O'Brien et al. asked participants to classify all four designs.

Slightly more than half (55%) of the participants correctly answered all four

pretest questions. In this group, similar proportions of respondents then misidentified the

Black Diamond as NOT a THOG (45%) and correctly classified all four designs (45%).

The written justifications of those who indicated that the Black Diamond was not a

THOG suggested that these participants mistakenly considered black and diamond to be

the properties that had been written down. Overall, 25% of the participants offered the

correct solution to the Pretest THOG, compared to the standard THOG baseline of 5%

(O'Brien et al., 1990, Exp. 1) that also requested classification of all four designs.

However, in the baseline condition, no one misdentified the Black Diamond, whereas

45% of the participants did so after responding to pretest questions. This high level of

misidentification overshadowed the proportion of intuitive errors on the Pretest THOG

(20%), which was considerably less than the proportion of intuitive errors on the standard

THOG baseline (55%).

O'Brien et al. (1990) concluded that many participants harbored the faulty

assumption that black and triangle were the written-down properties. Explicitly providing

information that refuted this conception appeared to result in further confusion rather than








clarification. Further, participants who did understand how to determine hypotheses may

have lacked the procedural knowledge to follow through with the classification

procedures, thereby minimizing the effect of knowledge about the hypothesized

properties.

In the Explanation THOG (O'Brien et al., 1990, Exp. 2) addressed the issue of

how to classify designs given each hypothesis. In this condition, participants read a

detailed rationale that not only identified which properties were written down but also

explicitly indicated which designs were and were not THOGs given each possibility. This

explanation, shown below, thereby provided procedural guidance to lead participants to

the point of comparing the results of design identification across two hypotheses:

Consider all of the combinations that I might have written down: diamond and
black, circle and black, diamond and white, and circle and white. I could not have
written down both diamond and black because the Black Diamond includes both
of these features. Similarly, I could not have written down both circle and white
because the Black Diamond contains neither of these features. But the remaining
combinations each include one of the features of a Black Diamond, and hence I
could have written down either circle and black, or diamond and white.

Consider first the possibility that I wrote down circle and black. In this case, a
Black Circle cannot be a THOG because it includes both features, and a White
Diamond cannot be a THOG because it includes neither feature. But a White
Circle is a THOG because it includes one of the features and not the other.
Consider next the possibility that I wrote down diamond and white. In this case, a
Black Circle cannot be a THOG because it includes neither of the features, and a
White Diamond cannot be a THOG because it includes both features. Again, a
White Circle is a THOG because it includes one of the two features and not the
other. (O'Brien et al., 1990, p. 340-341)

The standard three-choice instruction followed this explanation. Participants were

asked to classify all four designs.

Participants were unable to use this information appropriately. Those who were

provided with an explanation did no better at solving the problem than did those who








attempted to classify the four designs in the standard THOG problem (5% correct answers

for each). Rather than reducing confusion, the explanation appeared to enhance it. A large

majority of participants who were presented with the explanation (70%) proceeded to

classify the Black Diamond as "not a THOG" (60%) or "indeterminate" (10%). As noted

previously, no participants exhibited this form of confusion in the standard THOG

condition (O'Brien et al, 1990, Exp. 1).

O'Brien et al. (1990) reported that written justifications indicated that many

participants perceived a contradiction in the problem. Some reacted to this by classifying

the Black Diamond as "not a THOG." In other instances, participants neglected to

consider the statement identifying the Black Diamond as a THOG, perhaps because they

were overloaded by the detailed explanation that followed it. O'Brien et al. concluded

that participants "who do not appreciate the task on their own are not apt to benefit by

having it explained to them" (O'Brien et al., 1990, p. 343).

Although it is possible that the lack of facilitation reflected the complexity of the

particular explanation provided, the Explanation THOG highlights the challenge of

determining how to overcome or bypass the dual-hypotheses stumbling block. If

participants did understand the explanation, then its null effect identifies simultaneous

testing of multiple hypotheses as the locus of major difficulty. However, the low solution

rate may reflect in part an incremental creation of ambiguity (above that of the standard

THOG problem) related to the request to classify all four designs. The fact that it was

necessary to classify the Black Diamond may have created doubt about the veracity of the

statement indicating that the Black Diamond was a THOG.








Summary

Evidence regarding the primary interpretation of disjunctions as inclusive or

exclusive has been mixed; however, with a strict criterion, an exclusive interpretation

seems favored (Newstead & Griggs, 1983). Interpretations are also context dependent,

although with the exception of qualification scenarios, the exclusive interpretation tends

to predominate. Participants are more consistently geared toward the exclusive meaning

when they evaluate the conclusions of syllogisms than when they indicate whether an

outcome is consistent with a disjunctive rule. Because the THOG problem explicitly

includes the phrase "but not both," it would tend to propel participants toward an

exclusive interpretation.

Early investigations of potential sources of difficulty (e.g., Wason & Brooks,

1979) have revealed that people can understand and apply the exclusive disjunctive rule.

Given the Black Diamond as a positive example, a majority of people also display

competence in generating the dyad of hypotheses (the written-down properties), a finding

that has been consistently replicated (Girotto & Legrenzi, 1989, Exp. 1; Girotto &

Legrenzi, 1993, Exps. 1, 2, & 3; Griggs, Platt, Newstead, & Jackson, 1998, Exps. 1, 2, &

3; Smyth & Clark, 1986, Exp. 3). Beyond this point, however, difficulties emerge.

Intuitive errors suggest that people do not typically proceed to perform the necessary

analyses based on the hypotheses. Instead, participants often seem distracted by the

positive example, classifying designs by assessing their similarity to the Black Diamond.

This primitive matching bias reportedly leads to an answer pattern in which the White

Diamond and Black Circle have indeterminate status and the White Circle is not a

THOG. The common element fallacy, an assumption that the properties of the Black








Diamond are also the hypothesized properties, reportedly results in a mirror image of the

correct response. Griggs and Newstead (1983) have presented evidence favoring, but not

consistently supporting, the matching bias alternative.

Interweaving the THOG problem in realistic scenarios has not consistently

improved performance (Evans, Newstead, & Bymrne, 1993). Moreover, when realism does

enhance solution rate, the improvement may stem from nonlogical factors such as

memory cueing (Newstead, Griggs, & Warner, 1982). However, isomorphs of the THOG

problem that clarify the binary nature of the structural tree do boost solution rates (Griggs

& Newstead, 1982), as do problems that clarify the separation between the properties

written down and the positive example (Girotto & Legrenzi, 1989). However, separation

alone does not appear to be sufficient for facilitation (Newstead & Griggs, 1992); rather,

the effect of separation seems dependent on prior generation of hypotheses.

Girotto and Legrenzi (1993) have extended the influence of separation to abstract

versions of the THOG problem. Explicitly labeling each combination of properties

written down as a possible SARS boosts accurate identification of the White Circle as a

THOG, but again this facilitation has been linked to the presence of a request to generate

hypotheses (Griggs, Platt, Newstead, & Jackson, 1998). Further, facilitation is dependent

on the use of an "other" instruction rather than the standard three-choice version (Griggs

et al., 1998). The "other" instruction may serve to focus attention on the design that has

not been previously classified as a SARS, thus leading to the correct answer without

requiring testing of multiple hypotheses. Similar facilitation using the one-other THOG

instruction (Griggs et al., 1998; O'Brien et al., 1990) supports this hypothesis.








Efforts to enhance performance by providing procedural clues have had minimal

effects on solution. Both Smyth and Clark (1986) and O'Brien et al. (1990) have

introduced a series of questions designed to illuminate logical steps toward correct

classifications. However, even among participants who correctly answered all questions,

the ultimate solution remained elusive. Explicitly providing information identifying the

correct hypotheses and identifying which designs are THOGs and are not THOGs given

each alternative has also failed to facilitate performance (O'Brien et al., 1990).

Thus, two decades of research have identified multiple steps along the THOG

solution path that divert participants from correct answers. First, whereas most people can

correctly generate two hypotheses, some cannot. Second, people who can correctly

generate two hypotheses do not consistently identify which designs are THOGs and

which are not THOGs for each of the two possible combination of hypothesized

properties. Third, people who can correctly identify the THOGs and not-THOGs for both

possible combinations of hypothesized properties often cannot accurately combine this

information to classify the designs. My dissertation research addresses each of these

difficulties in an attempt to find clues for reducing the cognitive complexity of this

intriguing problem. To provide a concise framework for my investigation, the following

section highlights those existing studies that are most directly relevant to my research and

outlines the design of my series of experiments.














DISSERTATION FRAMEWORK AND PLAN

The literature review suggests that participants often appear to react to the

complexity of the THOG problem by circumventing logical reasoning through the use of

what Yachanin and Tweney (1982) termed "cognitive short-circuiting." When faced with

a task requirement that they do not understand or which overburdens working memory,

participants opt to continue by employing nonlogical but less demanding strategies such

as matching. In so doing, participants focus on an irrelevant problem feature, or

reconsider a relevant feature at an inappropriate time. The concept of cognitive

short-circuiting was initially applied to the matching and verification biases in Wason's

selection task (Yachanin & Tweney, 1982), but the use of simplified strategies to cope

with arduous problems has emerged in more generalized hypothesis-testing paradigms as

well. Doherty, Mynatt, Tweney and Schiavo (1979) noted that rather than face the

challenge of evaluating data relevant to each of two hypotheses, participants chose to

examine multiple pieces of information bearing on a single hypothesis. Such

inappropriate selection ofnondiagnostic information (pseudodiagnosticity) was

positioned as a response to the problem's conceptual difficulty. In an attempt to alleviate

reliance on use of dysfunctional strategies in the THOG problem, my dissertation aims to

clarify and expand on prior efforts to reduce its cognitive complexity.

Confusion between the properties of the positive example and those of the

hypothesized combinations of properties has been identified as a major roadblock to








solution in both realistic (Girotto & Legrenzi, 1989) and abstract (Girotto & Legrenzi,

1993) THOG scenarios. When Girotto and Legrenzi (1989) compared performance on

two realistic Spy problems, one that explicitly required a transformation of the presented

properties of passports (Two-Level Spy problem) and another that did not (One-Level

Spy problem), performance on the Two-Level Spy problem was significantly superior to

performance on the One-Level version and the standard THOG baseline. Girotto and

Legrenzi (1989) concluded that the transformation highlighted the distinction between the

properties of the positive example (those on the original passport of the spy who arrived

safely in Moscow) and the hypothesized properties (the altered properties that permitted

safe arrival); furthermore, this separation was designated as the source of facilitation. The

high solution rate on the Pub problem, devised to achieve separation without the need for

transformation, offered support for the separation hypothesis, using the same abstract

designs as in the standard THOG problem but positioning them in a thematic context.

Thus, it appeared that providing a problem statement that inhibited the tendency to

conflate the hypotheses and the properties of the positive example was sufficient to boost

solution rates. However, Newstead and Griggs (1992) discovered a boundary condition

for the facilitation noted in the Pub problem; namely, solution rates were boosted only if

people were asked to generate hypotheses.

A similar pattern of facilitation and identifiable constraints has emerged for

separation in an abstract context. To clearly differentiate between the designated example

and the properties written down, Girotto and Legrenzi (1993) created a problem that

combined each color and shape combination written down under a single label, SARS.

Prior to identifying the THOG designss, people were asked to identify the SARS, based








on the knowledge that (a) the Black Diamond was a THOG, and (b) a THOG had either

the color or shape of the SARS, but not both. This version of the problem produced

significant facilitation compared to the standard THOG and a THOG problem that

required generation of hypotheses (Hypotheses-THOG). However, in lieu of the standard

three-choice instruction, the SARS problem requested participants to indicate if there

were other THOGs. Griggs et al. (1998) demonstrated that this "other" instruction, rather

than the SARS label, was the impetus for facilitation.

Thus, separation was not sufficient to enhance performance, in either a realistic or

abstract scenario. If abstract designs were used, facilitation was directly linked to

instruction wording, suggesting that the instruction may serve to direct attention to an

aspect of the problem that leads people to the correct answer. This attentional explanation

was supported by results of investigations using the one-other instruction, initially

introduced by O'Brien et al. (1990). In their investigation, O'Brien et al. informed

participants that in addition to the Black Diamond, one other design was a THOG. They

then asked people to identify the singular THOG. This instruction significantly facilitated

performance compared to the standard THOG instruction. Griggs et al. (1998) replicated

this finding and extended it to the SARS problem. If the one-other instruction was used,

performance on the SARS problem was enhanced even in the absence of a requirement to

generate hypotheses. O'Brien et al. and Griggs et al. agreed that the enhanced solution

rate may have stemmed from attentional factors rather than appropriate logical reasoning

strategies.

Attentional cueing has been implicated as a source of enhanced performance in

reasoning tasks other than the THOG problem. For example, in the Wason selection task,









rule variations devised by Evans, Ball and Brooks (1987) suggested that people attend to

cards that are mentioned in the rule, making selections based on linguistic cues, without

necessarily considering the implications of the hidden sides of the cards. If this tendency

to match was reduced by incorporating a description of all four cards in a rule explication,

performance was enhanced (Platt & Griggs, 1993), possibly stemming from an equalized

distribution of attention. In the THOG problem, an instruction that served to narrow the

range of candidates for possible THOGs evoked a higher solution rate (Girotto &

Legrenzi, 1993; Griggs et al., 1998; O'Brien et al., 1990). The first three experiments in

this dissertation research were designed to replicate and extend findings pertaining to the

effects of separation, generation of hypotheses, and instruction complexity.

Experiments la and lb were designed to explore the possibility that narrowing the

range of classification options may result in facilitation comparable to that achieved by

the "other" or one-other instructions. In these experiments, the category "insufficient

information to decide" was eliminated from the standard three-choice instruction for both

the THOG and SARS problems, with and without Question 1 (a request to generate

hypotheses). In Experiment l a, problem type (THOG or SARS) and number of

classification options (two-choice or three-choice), all without Question 1, were

manipulated using a 2 x 2 factorial design. In Experiment lb, problem type (THOG or

SARS), number of classification options (two-choice or three-choice), and hypotheses

generation (Question 1 present or Question 1 not present) were manipulated using a

2 x 2 x 2 factorial design.

Experiments 2a and 2b were designed to replicate and generalize the combined

effect of separation via labeling and the use of the "other" instruction in an abstract








context, initially demonstrated by Girotto and Legrenzi (1993) with a sample of Italian

technical high school students. Both experiments included the same three conditions used

by Girotto and Legrenzi -- the SARS, the Hypotheses-THOG and a standard THOG

baseline -- all of which used the "other" instruction. To determine if the "other"

instruction served as a source of facilitation, Experiments 2a and 2b also incorporated a

standard THOG baseline with a three-choice instruction. Participants in Experiment 2a

were high-ability undergraduates enrolled in a special section of introductory psychology,

whereas participants in Experiment 2b were drawn from a general pool of introductory

psychology students.

To further explore the influence of instruction wording, Experiment 3 extended

the work of O'Brien et al. (1990) and Griggs et al. (1998) in which participants were

asked to identify one other THOG. Unlike these preceding investigations, Experiment 3

included conditions in which participants were requested to generate hypotheses. In

Experiment 3, problem type (THOG or SARS) and hypotheses generation (Question 1

present or Question 1 not present) were manipulated using a 2 x 2 factorial design. The

one-other instruction was employed in all conditions.

Changing the number of alternatives or the wording of the instruction provides

participants with linguistic cues that may prompt correct classification of designs or the

correct selection of one other THOG design. However, these manipulations do not

necessarily guide participants along the logical solution path, nor do they necessarily

indicate which steps toward solution are particularly challenging or prone to error. Evans

(1984, 1989) suggests that in complex reasoning tasks, people may err in a very early

stage of reasoning by misrepresenting the problem. According to Evans (1984, 1989),









people may initially make inappropriate distinctions between relevant and irrelevant

information. These judgments of relevance are made preattentively, in what Evans labels

a heuristic stage, and may stem from matching to an item named in the problem

statement. The rapid and preconscious characterization of this heuristic stage has been

considered analogous to manner in which we assess relevance in language comprehension

(Evans, 1995). The second stage in Evans' model encompasses analytic processing,

whereby people apply logical reasoning to make inferences based on the problem

representation they have constructed. Given the possibility of bias in the heuristic stage,

sound analytic processing does not guarantee an accurate solution.

In attempt to bypass misrepresentation of the THOG problem, O'Brien et al.

(1990) provided participants with a complete explanation of the problem up to the point

of combining information from the two hypotheses. The explanation indicated which two

combinations of color and shape could be written down, why these combinations were the

only logical possibilities, which designs were and were not THOGs given each

hypothesis, and how decisions regarding design identification were made for each

hypothesis. After reading the problem statement and explanation, participants classified

the four designs. The solution rate for this Explanation THOG problem was identical to

that for the standard THOG baseline, but the error patterns differed for the two problems.

Providing an explanation increased the possibility that participants would incorrectly

indicate that the positive example of THOGness, the Black Diamond, was not a THOG,

or that its identity could not be determined from the information given. Experiment 4a

was designed to replicate this counterintuitive result and to refine another attempt by

O'Brien et al. (1990) to guide participants toward the correct solution.









In the Pretest THOG, O'Brien et al. (1990) offered participants a partial

explanation and then provided them with structured pretest questions to further direct

reasoning. After the problem statement, participants were told which two combinations of

color and shape could be written down and why these combinations were the only logical

possibilities. Then, participants answered a group of four questions by indicating which

designs could and could not be THOGs given each possible combination of written-down

properties. Active involvement in the process of generating design identifications given

the correct hypotheses did not significantly enhance performance compared to the

standard THOG baseline. Although about half of the participants correctly answered all

four pretest questions, less than half of this group then correctly identified the four

designs. Because grouping questions about multiple hypotheses may not have provided an

optimal format for encouraging participants to compare answers for the two hypotheses, a

revised format for the Pretest THOG was introduced in Experiment 4a. Performance on

the Pretest THOG with all four questions grouped together (Pretest THOG,

grouped-questions) was compared to performance using the same questions, but with a

separate space provided for the answer to each (Pretest THOG, split-questions). For

comparative purposes, Experiment 4a also included a standard THOG baseline. As in the

O'Brien et al. investigation, participants in all conditions in Experiment 4a were

instructed to classify all four designs.

Because the standard THOG problem statement used by O'Brien et al. (1990)

indicated that the Black Diamond was a THOG, their subsequent request to classify all

four designs may have created rather than resolved confusion. To investigate this

possibility, in Experiment 4b, the two Pretest questions concerning which designs could









be THOGs included the qualifier "other than the Black Diamond." Manipulations also

addressed two other potential sources of difficulty. It appeared possible that variance in

responses to the Pretest questions and the concluding classification instruction reflected

differences in degree of commitment involved. Whereas the classification instruction

included the standard alternatives (e.g., "definitely a THOG" and "definitely not a THOG"

together with the indeterminate option), the pretest questions inquired about what design

or designs could or could not be THOGs. Thus, participants' answers to the pretest

questions may have been based on a less-than-certain probability.

This uncertainty may have played a role in subsequent design classification.

Decision-making investigations have demonstrated that uncertainty about preexisting

conditions can create an unwillingness to commit to a decision (Shafir & Tversky, 1995;

Tversky & Shafir, 1993). Logic dictates that if Alternative 1 is preferred over

Alternative 2 if a specific event occurs, and if Alternative 1 is preferred over

Alternative 2 if the specific event does not occur, then Alternative 1 should be preferred

over Alternative 2 even in the absence of information about event occurrence. However,

violations of this sure-thing principle have been demonstrated in decision-making

paradigms. Uncertainty about event occurrence leads people to postpone decisions, rather

than select the alternative that was favored in either case. In the THOG problem, the

absence of information about which hypothesis is actually written down may adversely

affect correct classifications. Adding further uncertainty by inquiring about which designs

could be THOGs might compound indecision.

To explore the potential differential effects of the word "could" versus the word

"definitely" on answers to pretest identification questions for each hypothesis and on









subsequent design classification, Experiment 4b included conditions in which the pretest

questions asked which design or designs were definitely or definitely not THOGs. In

addition, because O'Brien et al. (1990) suggested that participants may have assumed that

black and diamond were the written down properties (despite an explanation to the

contrary), Experiment 4b included conditions in which a reminder that black and

diamond could not be written down was inserted prior to the classification instruction.

Thus, Experiment 4b used a 2 x 2 factorial design, in which the manipulated variables

were the word "definitely" in the pretest questions (present or not present) and a reminder

about hypothesized properties (present or not present).

To summarize my dissertation plan, the first three experiments primarily focused

on effects of instruction wording and the final experiment was concerned with training.

Experiments l a and lb compared performance with two-choice and three-choice

instructions on the SARS and THOG problems, with and without a request to generate

hypotheses. Experiments 2a and 2b evaluated the effects of the "other" instruction on the

standard THOG, and on the SARS and THOG problems with a request to generate

hypotheses (based on Girotto & Legrenzi, 1993). Experiment 3 extended the work of

O'Brien et al. (1990) with the one-other instruction. Experiments 4a and 4b aimed to

direct attention toward the appropriate solution path, modeled on manipulations by

O'Brien et al. (1990). Experiment 4a revisited efforts to explain the THOG problem to

the point of combining the results for the two hypotheses, including conditions that

introduced four pretest questions to encourage design classification under alternative

hypotheses. Experiment 4b included modifications to pretest questions to uncover the

possible influence of uncertainty and to address misconceptions stemming from






55


inappropriate selection of hypotheses. Together, these experiments were designed to

identify key barriers to problem solution and suggest factors involved in facilitating

performance.














METHODS, MATERIALS AND RESULTS

General Procedures

To verify that each participant was involved in only one experiment, the

experimenter checked participant code numbers versus a listing of code numbers from

preceding experiments. In all experiments, participants were randomly assigned to

conditions within groups. Prior to distributing the problem, the experimenter gave the

following verbal instructions.

This experiment is designed to investigate how people solve a deductive
reasoning problem. You will be given a piece of paper providing instructions for
the problem and the problem itself. You will be asked to read the instructions,
then attempt to solve the problem. The problem requires logical reasoning and
does have a correct answer. I am interested in your best performance, so please
take your time and do not rush to make a judgment. (From Informed Consent
form)

The experimenter then distributed the problems, with problem version

manipulated between-subjects. All participants completed only one problem, presented on

one side of an 8-1/2" x 11" paper. Appendix B includes copies of materials for all

problems. Participants were given as much time as they needed to complete the problem.

All completed it within the 30-minute time slot set aside for each session. Most did so

within 10 to 15 minutes after beginning work on the problem.

Experiments l a and I b

The cognitive complexity of the THOG problem may be heightened in the final

classification stage by inclusion of the option "insufficient information to decide." By









allowing an escape from firm commitment, this possible choice may mislead some

participants to the erroneous conclusion that answers depend on an unknown quantity, the

hypothesis that is written down. Thus, eliminating the potential doubt created by the

indeterminate option may result in enhanced performance. Experiments 1 a and lb were

designed to test this possibility. In these experiments, performance on SARS and THOG

problems with three classification possibilities (definitely a THOG, insufficient

information to decide, and definitely not a THOG) was compared to performance on

problems with two classification choices.

Experiment la

Participants. Eighty introductory psychology students and 39 upper-division

psychology students at the University of Florida voluntarily participated in the pilot study.

Participants from the introductory psychology course completed the two-choice THOG

(N = 20), three-choice THOG (N = 20), two-choice SARS (N = 20) and three-choice

SARS (N = 20). Participants from the upper level psychology course completed only the

two-choice THOG (N = 19) and two-choice SARS (N = 20). Because there were no

differences related to class level, the data for the two-choice conditions were combined

prior to analysis.

Materials. Four versions of the THOG problem were created by manipulating type

of problem (THOG and SARS) and number of alternatives in the instruction (standard

three-choice or two-choice, excluding the insufficient information option). Participants

were required to identify the possible SARS in the SARS conditions but were not

required to generate the color and shape written down in the THOG conditions. Designs








were presented vertically at the bottom of the page, following the problem statement and

instructions. Participants wrote their answer for each design on a blank line to the right of

the design.

The wording of the THOG problem statement was identical to that of the standard

THOG except it began with the phrase "At the bottom of the page" instead of"In front of

you." The wording of the SARS problem statement was identical to that used by Girotto

and Legrenzi (1993), through the rule statement with the exception of the "At the bottom

of the page" phrase. The following question was added prior to the instructions for

classification: "Knowing for sure that the Black Diamond is a THOG, your first task is to

indicate which designs) could be the SARS." This differed from the Girotto and

Legrenzi question in that it did not specifically state "among the remaining designs."

Results. Narrowing the range of alternatives enhanced overall solution rates for

both the THOG and SARS problems. Overall, 34% solved the two-choice versions

compared to 8% who solved the versions with three classification options,

X2 (1, N = 119) = 10.02, p = .002. In the THOG conditions, 38% of the participants

solved the problem given the two-choice instruction and 5% solved it given the

three-choice instruction. In the SARS conditions, 30% of the participants solved the

problem given the two-choice instruction and 10% solved it given the three-choice

instruction.

Even though the SARS problems included a request to generate hypotheses and

the THOG problems did not, performance was similar for both types of problems. Overall

23% of the participants solved the SARS problems, and 27% solved the THOG problems.








Sixty-seven percent of the participants in the SARS conditions correctly identified

both possible SARS combinations in Question 1 (65% given the two-choice instruction

and 70% given the three-choice instruction). However, among participants who correctly

generated two hypotheses, only 30% then accurately classified the designs (42% given the

two-choice instruction and 7% given the three-choice instruction). Among those who did

not correctly generate two hypotheses, one participant in each condition accurately

classified the designs.

Intuitive errors were more prevalent than other types of errors in all conditions

(ranging from 37% to 45% of all errors). No more than two participants in any condition

made any other specific error.

Experiment lb

Participants. One-hundred-forty-four undergraduates enrolled in introductory

psychology courses at the University of Florida took part in this experiment to fulfill a

portion of their experimental participation requirement. Eighteen participants completed

each of eight versions of the THOG problem described in the Materials section.

Materials. Eight different versions of the THOG problem were created by

manipulating type of problem (SARS and THOG), presence of a request to generate

hypotheses (with Question 1 and without Question 1) and number of alternatives in the

instruction (three-choice and two-choice). Materials were worded and formatted as in

Experiment 1 a, with the following exceptions. In the THOG conditions, for versions

requiring generation of hypotheses, the following question was added prior to the

instructions for classification: "Knowing for sure that the Black Diamond is a THOG,








your first task is to indicate which color and shape combinations) I could have written

down. In the SARS conditions, for versions that did not require generation of hypotheses

the following sentence was added prior to classification instructions: "I will tell you that

the Black Diamond is a THOG." For all versions, six design orders were used.

Results. Participants performed similarly on the THOG and SARS problems, as

shown in Table 1. Given a standard problem with three possible classifications and

without a requirement to generate hypotheses, 11% of the participants solved the THOG

problem and 6% solved the SARS problem. When required to generate hypotheses, 17%

solved each problem.

More participants solved the problems with the two-choice instruction than solved

those with the three-choice instruction in both the SARS and THOG conditions. Overall,

33% solved the two-choice versions compared to 13% who solved the versions with three

classification options, X2 (1, N = 144) = 8.85, p = .003. In the THOG conditions, 33% of

the participants solved the problem given the two-choice instruction and 14% solved it

given the three-choice instruction. In the SARS conditions, 33% of the participants solved

the problem given the two-choice instruction and 11% solved it given the three-choice

instruction. Facilitation was more noticeable when participants were also required to

generate hypotheses, 42% for the two-choice instruction versus 17% for the three-choice

instruction, X2 (1, N = 72) = 5.45, p = .020 but also occurred when hypotheses generation

was not required, 25% for the two-choice instruction versus 8% for the three-choice

instruction, X2 (1, N = 72) = 3.60, p = .058.








Eighty-nine percent of the participants in the SARS conditions with Question 1

correctly identified both possible SARS combinations, compared to 50% of the

participants in the THOG conditions who correctly identified the two possible

color-shape combinations written down, X2 (1, N = 72) = 12.83, r < .001. However,

among participants who correctly generated two hypotheses, only 34% in the SARS

conditions and 50% in the THOG conditions then accurately classified the designs.

Among those who did not correctly generate two hypotheses, no participants in the SARS

conditions and only 6% of those in the THOG conditions accurately classified the

designs.

The majority of errors were intuitive. For conditions including the three-choice

instruction, 50% to 60% of the participants who failed to solved the problems indicated

that the White Circle was not a THOG, and that the Black Circle and White Diamond

were THOGs or that there was insufficient information to classify them. Type B errors

predominated. For conditions including the two-choice instruction, error patterns were

more varied. Between 38% and 81% of the errors were intuitive, with Type A being the

only intuitive possibility, given that participants were forced to make a binary "Yes-No"

decision for each design.

Summary. Experiments l a and I b revealed that narrowing the range of

alternatives in the instruction led to facilitation. Solution rates appeared higher in

two-choice than in three-choice conditions for both the THOG and SARS problems.

Facilitation for the two-choice instruction generally occurred whether or not hypotheses

generation was required. Solution rates were similar for the THOG and SARS problems,








mirroring the findings of Griggs et al. (1998). More participants correctly generated two

hypotheses in the SARS conditions than in the THOG conditions. However, among those

who correctly generated two hypotheses, solution rates did not differ by problem type.

Experiments 2a and 2b

The conditions in these experiments were designed to replicate those of Girotto

and Legrenzi (1993, Exp. 1) with Italian high-school students. Experiment 2a was based

on a classroom demonstration with students enrolled in a special honors section of

introductory psychology. Experiment 2b included undergraduates enrolled in standard

introductory psychology courses. The identical materials were used in both experiments.

Each experiment included the three conditions employed by Girotto and Legrenzi (1993),

plus a baseline condition using the standard THOG problem.

Experiment 2a

Participants. Forty-seven undergraduates enrolled in an honors introductory

psychology course at the University of Florida voluntarily took part in this experiment as

a classroom exercise. Twelve participants completed the SARS problem, the standard

THOG with an "other" instruction, and the standard THOG with a three-choice

instruction, and 11 completed the Hypotheses-THOG problem.

Materials. Four different versions of the THOG problem were used. The first three

were identical to the problems used by Girotto and Legrenzi (1993, Exp. 1). All included

the "other" THOG instruction: "Could you also indicate whether, in addition to the Black

Diamond, there are other THOGs?" The SARS and Hypotheses-THOG problems each

incorporated a request for generation of hypotheses, whereas the standard THOG did not.








The fourth condition (not included in the Girotto and Legrenzi experiment) involved the

standard THOG problem with the three-choice instruction ("definitely a THOG,"

"definitely not a THOG," and "insufficient information to decide"). In all versions, the

designs were presented horizontally at the top of the page, prior to the problem statement.

For all versions, three design orders were used.

Results. A majority of participants (83%) solved the SARS problem by correctly

identifying the White Circle as a THOG. This solution rate was significantly greater than

that for the standard THOG with a three-choice instruction (33%), X2 (1, N = 24) = 6.17,

R = .013, but only marginally greater than that for the Hypotheses-THOG (45%),

X2 (1, N = 23) = 3.63, p = .057, and statistically similar to that for the standard THOG

with an "other" instruction (58%). Performance in the Hypotheses-THOG condition was

similar to that in the two standard THOG conditions (which did not differ significantly

from each other).

All participants in the SARS condition correctly generated two hypotheses, as did

73% of those in the Hypotheses-THOG condition. Among participants who correctly

generated two hypotheses, 83% in the SARS condition and 50% in the

Hypotheses-THOG condition subsequently identified the White Circle as a THOG.

Among those who did not correctly generate two hypotheses in the Hypotheses-THOG

condition, only 33% accurately identified the White Circle.

Intuitive errors accounted for 50% of all errors on the standard THOG problem

with a three-choice instruction. Among participants completing the standard THOG with

an "other" instruction, 60% of those who made errors identified the Black Circle and









White Diamond as THOGs, and did not classify the White Circle. In the

Hypotheses-THOG and SARS conditions, there were no intuitive errors. No error pattern

occurred more than once. Girotto and Legrenzi (1993) also found that only a small

proportion of the errors in the SARS condition were intuitive (14%). However, intuitive

errors were predominant in both the Hypotheses-THOG and standard THOG with "other"

instruction conditions (58% and 56% respectively).

Experiment 2b

Participants. Eighty-five undergraduates enrolled in introductory psychology

courses at the University of Florida took part in this experiment to fulfill a portion of their

experimental participation requirement. Eighteen participants completed the SARS

problem, 32 completed the Hypotheses-THOG problem, 18 completed the standard

THOG problem with an "other" instruction, and 17 completed the standard THOG with a

three-choice instruction.

Materials. The materials used were identical to those in Experiment 2a. For all

versions, six design orders were used.

Results. As shown in Table 2, a majority of participants (61%) solved the SARS

problem by correctly identifying the White Circle as a THOG. This solution rate was

significantly greater than that for the standard THOG (three-choice instruction), 18%,

X2 (1, N = 35) = 6.88, p = .009, but did not significantly surpass the solution rates for

either the Hypotheses-THOG or the standard THOG ("other" instruction), 38% and 44%

respectively. Performance in these latter two conditions was similar, with neither problem








producing significant facilitation compared to the standard THOG (three-choice

instruction).

In the SARS and Hypotheses-THOG conditions, similar proportions of

participants correctly generated the two hypotheses (72% and 66% respectively). Among

participants who correctly generated two hypotheses, 85% in the SARS condition and

52% in the Hypotheses-THOG condition subsequently identified the White Circle as a

THOG. Among those who did not correctly generate two hypotheses, none of the

participants in the SARS condition and 9% of those in the THOG conditions accurately

classified the designs.

The majority of errors (64%) were intuitive only on the standard THOG

(three-choice instruction) problem. On the Hypotheses-THOG problem, the predominant

error was indicating that all designs could be THOGs (45%), with "none could be

THOGs" and intuitive errors each accounting for 15% of the incorrect answers. On the

standard THOG ("other" instruction) problem, the predominant error was indicating that

the Black Circle and White Diamond could be THOGs (40% of all errors), with "all could

be THOGs" accounting for 30% of the incorrect answers. On the SARS problem, only

one error pattern occurred more than once. Two participants (29% of those who gave an

incorrect answer) indicated that the Black Circle was a THOG.

The pattern of solution rates in the present experiments was generally comparable

to the results of Girotto and Legrenzi (1993). In Experiments 2a, 2b, and Girotto and

Legrenzi (1993, Exp. 1), more participants solved the SARS problem than solved either

the Hypotheses-THOG problem or the standard THOG ("other" instruction) problem, but








these differences reached significance only in the Girotto and Legrenzi research. In

Experiments 2a and 2b, performance on the Hypotheses-THOG was equivalent to that on

the standard THOG ("other" instruction), paralleling the Girotto and Legrenzi findings,

and to the standard THOG (three-choice instruction), not included in the Girotto and

Legrenzi investigation.

Summary. Although more participants solved the SARS problem than solved

either the Hypotheses-THOG, standard THOG ("other" instruction), and standard THOG

(three-choice instruction) problems, performance was significantly enhanced in

Experiments 2a and 2b only versus the standard THOG (three-choice instruction)

problem. The significant facilitation noted by Girotto and Legrenzi (1993) for the SARS

problem versus the Hypotheses-THOG problem and the standard THOG problem, all

with an "other" instruction, did not occur in the present experiments, although the

solution-rate patterns were generally comparable. Any facilitation attributable to the

"other" instruction does not necessarily imply that participants evaluated multiple

hypotheses. Instead, as noted by Griggs et al. (1998), the "other" instruction may have

encouraged people to solve the problem through a process of elimination, selecting the

White Circle as a THOG because the Black Circle and White Diamond had been

classified as SARS.

Experiment 3

If participants are adopting an elimination strategy, then reinforcing this strategy

by explicitly stating there is one other THOG would continue to reveal, and possibly

increase, facilitation. O'Brien et al. (1990) found such facilitation for a one-other








instruction on the THOG problem without a request to generate hypotheses. Similarly,

Griggs et al. (1998) found facilitation for a one-other instruction on both the THOG and

SARS problems, with neither containing a request to generate hypotheses. Experiment 3

aimed to assess whether the facilitation attributed to the one-other instruction varied with

problem type (SARS vs. THOG) and with the request to generate hypotheses (Question 1

vs. no Question 1).

Participants. Seventy-three undergraduates enrolled in introductory psychology

courses at the University of Florida took part in this experiment to fulfill a portion of their

experimental participation requirement. Eighteen participants completed the THOG

problem with Question 1, the SARS problem with Question 1, and the SARS problem

without Question 1, and 19 participants completed the standard THOG problem without

Question 1.

Materials. Four versions of the THOG problem were created by manipulating type

of problem (THOG and SARS) and presence of the request to generate hypotheses

(Question 1 and no Question 1). All four versions used the one-other instruction. The

wording of the problem statements and of the request to generate hypotheses was the

same as in Experiment 2b, as was the format. For all versions, six design orders were

used.

Results. The one-other instruction enhanced solution rates in all conditions except

among participants who worked on the standard THOG problem without Question 1. In

the other three conditions, performance was comparable (solution rates of 72%, 78% and

83% for the THOG with Question 1, SARS without Question 1, and SARS with Question








1 conditions respectively) and significantly higher than the 37% solution rate on the

standard THOG problem without Question 1, X' (3, N = 72) = 11.18, p = .011.

Similar proportions of participants who completed the SARS and THOG

problems correctly generated two hypotheses when requested to do so (90% and 87%

respectively). The vast majority of those who generated the two hypotheses subsequently

identified the White Circle as the THOG (88% and 87% for the SARS and THOG

problems respectively). Among those who did not identify the two hypotheses, one

participant identified the White Circle as the THOG in the SARS condition and none did

so in THOG condition.

Solution rates indicated that the request to generate hypotheses for the SARS

problem had little effect beyond that of the one-other instruction. In contrast, a high

solution rate was observed on the THOG problem only when it included both Question 1

and the one-other instruction. For no apparent reason, these results conflicted with prior

findings demonstrating that the one-other instruction significantly facilitated performance

on the standard THOG problem even without a request to generate hypotheses (Griggs

et al., 1998; O'Brien et al., 1990).

Experiments 4a and 4b

Results of prior experiments typically demonstrated that facilitation was elusive

unless the instruction was worded to focus attention on a limited number of

classifications or designs to be classified. In Experiments 1 a and 1 b, such focus was

attained by reducing the number of alternatives. In Experiments 2a and 2b, facilitation

stemmed from asking if there were other THOGs, permitting use of an elimination









strategy. In Experiment 3, telling participants that there was only one other THOG led to

a higher solution rate. These experiments provided insight regarding how to overcome

difficulties impeding identification of the White Circle as a THOG; however, they did not

address how to achieve facilitation on the standard THOG problem with the three-choice

instruction. Experiments 4a and 4b aimed to affect such facilitation by guiding

participants to the correct solution.

Experiment 4a

O'Brien et al. (1990) were unsuccessful in their attempts to enhance solution rate,

despite providing either a complete explanation (Explanation THOG) or an explanation

of what could be written down and questions about design classification for each possible

hypothesis (Pretest THOG). Experiment 4a was designed to re-examine these

manipulations. In their Pretest experiment, O'Brien et al. introduced a single grouping of

four pretest questions. Because it is possible that using a large group of questions

diminished the effectiveness of these inquiries, Experiment 4a included an additional

reformatted version of the Pretest THOG in which a separate space was provided for the

answer to each question.

Participants. Sixty introductory psychology students at the University of Florida

took part in this experiment to fulfill a portion of their experimental participation

requirement. Fifteen participants completed each of four versions of the THOG problem

described in the Materials section. In addition, fifteen participants from same participant

pool were included in a replication of the Pretest THOG (split-questions) condition.








Materials. Four different versions of the THOG problem were used. Consistent

with O'Brien et al. (1990), all versions required participants to classify all four designs

(including the Black Diamond) as "Definitely a THOG," "Definitely not a THOG," or

"Insufficient information to decide." The standard THOG, Explanation THOG, and

Pretest THOG (grouped-questions) were replications of O'Brien et al. conditions.

Appendix A provides exact wording for the latter two versions. Wording of the problem

in the fourth condition, Pretest THOG (split-questions), was identical to the wording of

the Pretest THOG (grouped-questions). However, space for an answer was provided after

each question, instead of grouping all four questions together. Designs were presented

horizontally, prior to the problem statement. For all versions, six design orders were used.

Results. Providing participants with an explanation that included the correct

classification of designs given either hypothesis failed to enhance solution rates.

Congruent with O'Brien et al. (1990), performance on the Explanation THOG (47%

correct) was no better than performance on the standard THOG baseline (40% correct).

Table 3 illustrates that error patterns were similar on the two problems as well. However,

the error pattern for the Explanation THOG in the present study differed from the O'Brien

et al. findings. A majority of participants (70%) in the O'Brien et al. study did not classify

the Black Diamond as "Definitely a THOG, "compared to only one participant (7%) in

Experiment 4a. The O'Brien et al. examination of written justifications suggested to them

that participants responded as if black and diamond were written down or reacted to a

perceived contradiction in the explanation by classifying all designs in the "not THOG"








category. In the present experiment, the explanation appeared to have a null effect, rather

than compounding confusion.

Similarly, requesting participants to answer a group of questions by indicating

which designs could or could not be a THOG given each hypothesis did not enhance

performance (47% correct) relative to the standard THOG. Of the nine participants (60%)

who answered all four questions correctly, six (67%) then correctly identified all four

designs, two (22%) indicated that there was insufficient information to decide about all

four designs, and one (11%) gave another response. One participant (7%) who indicated

that both the Black Diamond and White Circle could be THOGs given either hypothesis

but only gave partial answers (the design that was written down) to pretest questions

regarding what designs could not be THOGs, then correctly identified all four designs.

Five participants (33%) did not identify both the Black Diamond and White Circle as

THOGs in both relevant pretest questions. None of this group solved the problem.

Performance on the Pretest THOG (grouped-questions) differed from performance

in the O'Brien et al. (1990) investigation in two respects. First, O'Brien et al. found that

pretest questions had "some salutary effect" (25% solved the problem) compared to their

baseline (5% correct), whereas performance on the two versions in the present experiment

was similar. Second, a seemingly higher proportion of participants in the O'Brien et al.

study (45%) did not classify the Black Diamond as a THOG, compared to 20% in the

present study. The proportion of participants answering all four questions correctly in

Experiment 4a was similar to that in the O'Brien et al. study.








Separating the four pretest questions to enable respondents to more systematically

answer and inspect their answers appeared to enhance ability to subsequently classify all

four designs correctly (73%). However, the level of facilitation did not reach significance

in either a four-condition comparison, X2 (3, N = 60) = 3.94, p = .268, or in a direct

comparison with performance on the Explanation THOG or Pretest THOG

(grouped-questions), both 47% correct, x2 (1, N = 30) = 2.22, p = .136 for both

comparisons, although it did approach significance compared to performance on the

standard THOG, 40% correct, X2 (2, N = 30) = 3.39, p = .065. Correct answers on the

Pretest THOG (split-questions) problem stemmed in part from participants who no more

than partially answered at least two of the four pretest questions. For example, three

participants (20%) who correctly indicated that both the White Circle and Black Diamond

could be THOGs, but only the design not written down could not be a THOG, then

correctly identified all four designs, whereas one participant (7%) with a similar pattern

of responses to the pretest questions did not. One participant (7%) who suggested that if

White Diamond was written down, then the White Circle could be a THOG and the Black

Circle could not, but if Black Circle was written down, then the Black Diamond could be

a THOG and the White Diamond could not, also correctly identified all four designs.

Seven participants (47%) answered all four questions correctly and subsequently

correctly classified all four designs; one (7%) incorrectly classified the designs after

offering correct answers to all four pretest questions. Neither of the two participants

(13%) who offered another combination of at least partially correct answers to the pretest

questions solved the problem.









In a replication of the Pretest THOG (split-questions), 60% of the 15 participants

correctly identified all four designs. The distribution of answers to pretest questions

closely resembled that in the original experiment (53% answered all correctly, 33%

correctly reported that the Black Diamond and the White Circle were THOGs under both

hypotheses, but only one of the remaining designs was not a THOG, and 13% gave

another pattern of answers). All eight participants (53%) who correctly answered all four

pretest questions solved the problem, as did one participant (7%) who correctly

designated the two possible THOGs for both hypotheses but offered only a single correct

answer to each of the "not THOG" questions.

Experiment 4b

The common element fallacy proposes that participants err on the THOG problem

because they think black and diamond are the written down properties. If that is the case,

then applying the rule should reveal a contradiction; namely, that the Black Diamond is

not a THOG. As noted, to investigate this possibility, O'Brien et al. (1990) asked

participants to classify all four designs. In their baseline condition, all participants

correctly classified the Black Diamond; however, when given either a full explanation or

pretest questions, many concluded that the Black Diamond was not a THOG or not

classifiable based on the information given. Participants reached this conclusion despite

the fact that they were specifically told what could be written down.

Experiment 4b was designed to examine whether or not asking participants to

classify the Black Diamond actually created confusion that would not typically arise when

attempting to solve the standard problem. Pretest questions were altered to request








classification of designs other than the Black Diamond. In addition, some versions

included a reminder that black and diamond could not be written down, whereas others

asked which designs could definitely be or definitely not be a THOG given each

hypothesis.

Participants. Sixty introductory psychology students at the University of Florida

took part in this experiment to fulfill a portion of their experimental participation

requirement. Fifteen participants completed each of four versions of the THOG problem

described in the Materials section.

Materials. Four versions of the THOG problem were created by manipulating the

presence of a reminder (whether or not the problem included the caveat "Reminder:

Given the rule and the fact that the Black Diamond is a THOG, I could not have written

down Diamond and Black without creating a contradiction." prior to the classification

instruction) and the presence of the word "definitely" in the pretest questions (present or

not present), using a 2 x 2 factorial design. All included four pretest questions, with a

separate space after each for an answer. In all versions, questions regarding which designs

could be a THOG included the qualifier "other than the Black Diamond." Designs were

presented horizontally, prior to the problem statement. For all versions, six design orders

were used.

Results. As shown in Table 4, the proportion of correct answers was similar in all

conditions, ranging from 33% to 47%. Without any further modifications, adding the

qualifier "other than the Black Diamond" to pretest questions concerning which designs

could be THOGs appeared to create rather than resolve confusion about the ultimate








classification of the designs, yielding 47% correct classifications compared to 73% for the

Pretest THOG (split-questions) in Experiment 4a and 60% in its replication (both without

the qualifier). Although the decrement in correct classifications was not significant,

X2 (1, N = 45) = 1.67, p = .197, there was a marginally significant drop in the proportion

of participants who answered all four pretest questions accurately. In Experiment 4a and

its replication, 53% of the participants answered all four pretest questions correctly

compared to 27% who did so in Pretest THOG plus Qualifier condition in Experiment 4b,

X2 (1, N = 45) = 2.88, p = .090. The decline appeared to be primarily attributable to fewer

completely correct identifications of designs that could not be THOGs (even though the

qualifier was added only to the "could be a THOG" pretest questions). In Experiment 4a

and its replication, seven participants (53%) correctly indicated the two designs that could

not be THOGs given either hypothesis, compared to four participants (27%) who did so

when the qualifier was added to the "could be" questions. In response to the "could be a

THOG" pretest questions, in Experiment 4a and its replication, 13 participants (87%)

indicated that both the Black Diamond and White Circle could be THOGs given either

hypothesis, compared to 11 participants (73%) who indicated that the White Circle could

be a THOG given either hypothesis when the qualifier "other than the Black Diamond"

was added in Experiment 4b.

Asking participants which designs definitely could or could not be THOGs did not

influence the overall proportion of completely correct classifications, compared to

conditions in which "definitely" was not included. However, in the "definite" conditions,

fewer participants answered all four pretest questions correctly. Among 49 participants in








all four conditions who offered either completely or partially correct answers to all pretest

questions, those in conditions requesting a definite commitment were significantly less

likely to provide completely correct answers to all four questions (16% of 25 participants)

than were those in conditions in which a definite commitment was not required (42% of

24 participants), X2 (1, N = 49) = 3.95, p = .047. Further, Table 5 shows that in the

"definite" conditions, a majority of participants who correctly classified the designs

offered only partially correct answers to the question regarding which designs could not

be THOGs. Among the 24 participants in all four conditions who correctly classified all

four designs, these correct classifications were significantly less likely to follow from

completely correct answers to pretest questions in "definite" conditions (in which only

18% of 11 participants offered completely correct pretest answers) than in conditions not

requiring a definite commitment (in which 61% of 13 participants offered completely

correct pretest answers), Fisher's Exact Probability test, two-tailed, R = .047.

Summary. Experiment 4a revealed that neither identifying the hypotheses and how

they were determined nor providing a complete explanation of the reasoning underlying

the THOG problem up to the point of comparing the results for the two hypotheses

facilitated performance if participants were asked to classify all four designs. If questions

were provided to guide participants toward the solution, encouraging participants to

answer one question at a time appeared to offer some assistance compared to grouping

the questions together, but the effect did not attain significance. Experiment 4b indicated

that the difficulty was not attributable to being asked questions about all four designs

rather than three nor to forgetting that black and diamond could not be the hypothesized








properties. Participants who were asked to make definite commitments about what

designs could not be THOGs under either hypothesis seemed somewhat more apt to

generate only one possibility rather than two, compared to participants who were asked

simply what designs could not be THOGs. Further, among participants who correctly

classified all four designs, those who made definite commitments on pretest questions

were more likely to derive their correct classifications from only partially correct pretest

answers than were participants not asked to make definite commitments on pretest

questions.








Table 1

Correct Responses by Problem Type, Hypotheses Generation, and Instruction
(Experiment I b)


With hypotheses generation Without hypotheses generation


Problem type Two-choice Three-choice Two-choice Three-choice


THOG 7 3 5 2

SARS 8 3 4 1


Note. There were 18 participants in each condition.








Table 2

Responses by Problem Type (Experiment 2b)


Responses


Condition Correct Intuitive error Other


SARS 11 0 7

Hypotheses-THOG 12 3 17

Standard THOG

("other" instruction) 8 4 6

Standard THOG

(three-choice instruction) 3 9 5


Note. There were 18 participants in the SARS and standard THOG (three-choice
instruction) conditions, 32 in the Hypotheses-THOG condition, and 17 in the standard
THOG ("other" instruction) condition.









Table 3

Responses by Problem Type (Experiment 4a)


Responses


Intuitive

Correct error


Condition


Near

insight


Black

Diamond

not THOG or

indeterminate


Standard THOG

Explanation THOG

Pretest THOG

(grouped-questions)

Pretest THOG

(split-questions)


Other


Note. There were 15 participants in each condition. All conditions included the standard
three-choice instruction.









Table 4

Responses by Problem Type: Pretest THOG (split-questions) with Qualifier
(Experiment 4b)


Responses


Correctly Black Diamond

identified Intuitive Near not-THOG or

Condition all designs error insight indeterminate Other


With Qualifier only 7 3 1 3 1

Qualifier + Remindera 6 3 2 0 4

Qualifier + 6 1 4 1 3
"Definite"'

Qualifier + Reminder

+ "Definitea, b 5 3 3 2 2


Note. There were 15 participants in each condition. All conditions included the standard
three-choice instruction. All pretest questions related to identifying THOGs included the
qualifier "other than the Black Diamond."

a In Reminder conditions, the following sentence was inserted after the pretest questions
and before the instruction: "Reminder: Given the rule and the fact that the Black
Diamond is a THOG, I could not have written down Diamond and Black without creating
a contradiction."
b In "Definite" conditions, the pretest questions instructed participants to indicate which
designs were definitely or definitely not THOGs, rather than asking which designs could
and could not be THOGs.









Table 5

Correct Classifications as a Function of Answers to Pretest Questions
(Experiment 4b)


Answers to pretest questions


Indicated White Circle was

THOG for both hypotheses


Correct pretest


Not-written


Condition


With qualifier only

Qualifier + Remindera

Qualifier + "Definite""

Qualifier + Reminder


classifi- questions combination

cations correct was not THOG


47%

40%

40%


20%

33%

7%


13%

7%

13%


combination

was not THOG


13%

0

20%


+ "Definite"a, b


33%


13%


13%


Note. There were 15 participants in each condition. All conditions included the standard
three-choice instruction. All pretest questions related to identifying THOGs included the
qualifier "other than the Black Diamond."

a In Reminder conditions, the following sentence was inserted after the pretest questions
and before the instruction: "Reminder: Given the rule and the fact that the Black
Diamondis a THOG, I could not have written down Diamond and Black without creating
a contradiction."
b In "Definite" conditions, the pretest questions instructed participants to indicate which
designs were definitely or definitely not THOGs, rather than asking which designs could
and could not be THOGs.


Written














GENERAL DISCUSSION

Seven experiments highlight sources of difficulty that thwart solution to the

THOG problem. Each experiment represented an attempt to relieve complexity at

different steps along the solution path. In the standard version of the THOG problem,

there is no explicit request for hypotheses generation, no attempt to differentiate the

properties of the positive example from those of the hypothesized properties, no

suggestion to initially classify the designs given each hypothesis, and no indication in the

response categories that all designs can be accurately classified as either a THOG or not a

THOG based exclusively on the information provided. Given these multiple sources of

confusion, people veer away from the route to the solution at different junctures. Some

ultimately arrive at the correct answer despite detours by resorting to nonlogical

strategies. Most do not.

If hypotheses generation is not requested, a substantial minority of participants act

in a manner taken to indicate that they are unaware of the necessity of this activity.

Instead, they may adopt nonlogical tactics, such as comparing each design to the positive

example, then basing THOGness decisions on perceived similarity to the Black Diamond

(matching bias). With the standard three-choice instruction, such a strategy is reflected by

the proportion of Type B intuitive errors, e.g., classifying the White Circle as "not a

THOG" because it shares no property with the Black Diamond, and classifying the Black

Circle and White Diamond as indeterminate because they each share only one property








with the Black Diamond. Overall, across five experimental conditions that included the

standard THOG problem with a three-choice instruction, 27% of 82 participants

committed a Type B error.

Unlike the type B error pattern, Type A errors have historically been interpreted as

reflecting an incorrect attempt at hypotheses generation, namely considering black and

diamond as the properties written down (the common element fallacy). By then correctly

applying the disjunctive rule, participants who conjecture that black and diamond are

written down reach the conclusion that the White Circle is not a THOG and the Black

Circle and White Diamond are THOGs. Overall, across five experimental conditions

which included the standard THOG with the three-choice instruction, 16% of 82

participants committed a Type A error. Error patterns for the SARS problem paralleled

those for the THOG. In the only SARS condition that included the three-choice

instruction and did not require hypotheses generation, Type B errors (made by 44% of the

18 participants) were considerably more common than Type A errors (11%).

To encourage hypotheses generation, versions of the THOG problem have been

developed that include an initial question asking participants to indicate which color and

shape combinations) could be written down. Consistent with prior findings, in five

experimental conditions in which participants generated hypotheses prior to classifying

designs, 64% of 97 participants generated the two correct combinations. The proportion

of participants who did so was lower among those who then responded to the standard

three-choice instruction (39%) than among those who responded to other instructions

(two-choice, 61%; "other," 73%; "other" replication, 65%; one-other, 83%). It is possible

that this range of answers is attributable to random variation; however, an alternative








explanation may be that participants were not consistently proceeding linearly though the

problem. As expected, correctly generating hypotheses substantially increased the

likelihood of correctly classifying designs, but it clearly did not guarantee success.

An examination of the incorrect answers to hypotheses generation questions

suggested that the common element fallacy played a only minor role in design

classification. Across the five THOG conditions mentioned above, only 6% of the 97

participants reported that only the combination black and diamond could be written

down, although an additional 10% reported that the combination black and diamond

could be written down in conjunction with other combinations. The standard three-choice

condition was the only one in which in which it was possible to determine the ultimate

classification of all designs across three answer choices. In that condition, one of five

people who wrote down black and diamond as an hypothesis then committed a Type A

error. Thus, evidence for explaining Type A errors in the context of the common element

fallacy is weak.

Participants in the SARS conditions also readily generated hypotheses when asked

to do so. Overall, 79% of 144 participants correctly indicated which two designs could be

SARS. Across seven experimental conditions, a range of 65% to 100% of participants

offered the two correct hypotheses. Misconstruing the Black Diamond as an hypothesis

by identifying it as a SARS was a rare phenomenon. Among participants working on

SARS problems requiring hypotheses generation, only two of 144 participants made this

error. As was true in the THOG conditions, generation of correct hypotheses did not

consistently lead to correct classification of designs. Again, the proportion of correct








classifications appeared more related to specific classification instructions than to an

ability to determine hypotheses.

Assignment of the SARS label to the written-down properties appeared, however,

to positively influence correct generation of hypotheses compared to the standard request

to indicate the combinations) that could be written down. Across all types of instruction,

the SARS label seemed to result in a somewhat higher proportion of correct hypotheses

generation compared to versions that did not mention the SARS (three-choice, 89%

versus 39%; two-choice, 89% versus 61%; high ability "other," 100% versus 73%;

"other," 72% versus 65%; and one-other, 89% versus 83%). Thus, reducing complexity

by combining two properties under one label did provide some assistance. However,

differences in hypotheses generation seldom translated into higher solution rates.

In contrast, the pattern of results related to instructional manipulations revealed a

clear influence on solution rates. In the present series of experiments, performance given

the standard three-choice instruction was compared to performance with only two

choices, with the "other" THOG instruction and with the one-other THOG instruction. As

choices were narrowed, performance generally improved. For example, across full-scale

experiments with typical introductory psychology students on the standard THOG

problem without hypotheses generation, from a baseline of 11% to 18% correct, the

two-choice instruction led to a 28% solution rate, the "other" instruction to a 44%

solution rate, and the one-other instruction to a 37% solution rate. The latter figure may

be an anomaly, given the Griggs et al. (1998) finding of 53% correct with a similar

sample versus their 11% baseline. If hypotheses generation was included, performance








given either the two-choice or "other" options was comparable (39% and 38%

respectively), but the solution rate with the one-other instruction reached 72%.

A similar pattern was drawn for the SARS problems, from a baseline of 6%

correct with a three-choice instruction without hypotheses generation. Comparable figures

for two-choice and one-other conditions were 22% and 78%, respectively. The SARS

problems that requested generation of hypotheses also clearly revealed an increase in

correct solutions as the number of alternatives narrowed: 17% with three choices, 44%

with two choices, 61% with the "other" instruction, and 83% with the one-other

instruction.

The alternative instructions reduce complexity in different ways. The THOG

problem inherently creates uncertainty regarding which hypothesis is written down.

Among those participants who do not attempt to test multiple hypotheses or who do not

realize that answers are the same given either hypothesis, the classification of designs

may appear dependent on what is written down. Without knowing for sure which specific

combination is actually written, many participants may opt for the insufficient

information alternative. In two-choice conditions, removing this option may lead some

participants toward the realization that there is a definite answer.

Does the two-choice alternative indirectly reduce uncertainty about which

hypothesis was written down by implying that it doesn't matter? Or does it more directly

simplify the problem by narrowing the range of possibilities? If the former is true, then

participants may more closely examine the possible THOGs and not-THOGs. Among

those who have actively attempted to determine what is written down, the White Circle








may appear to be a THOG given either hypothesis. Because there is no indeterminate

option, the Black Circle and White Diamond may then be classified as not-THOGs.

This rationale admittedly contains multiple suppositions, as yet unsupported.

However, the explanation and training experiments (Experiments 4a and 4b) indicate that

a majority of participants do correctly identify the White Circle as a THOG given either

hypothesis and tend to make more errors regarding the designs that cannot be THOGs. On

the other hand, the increased solution rate may also be attributable to improved "odds" for

guessing. To tease apart alternative explanations might require changing the structure of

the problem. If participants are asked to consider only one possible combination of

written down properties, a primary source of uncertainty would be eliminated. Under

these conditions, if the two-choice instruction remains more effective, its effectiveness

seems less likely to be attributable to reduction of uncertainty.

Both the "other" and one-other instructions may achieve their effects by focusing

attention on a reduced set of possibilities, rather than heightening people's willingness to

test hypotheses. As O'Brien et al. (1990) mention, the statement that there is only one

other THOG may encourage participants to choose a design that has a unique relationship

to the positive example. Because the White Diamond and Black Circle both share one

feature with the Black Diamond, it may be difficult to justify choosing one instead of the

other. The White Circle is the only design that does not share any feature with the Black

Diamond and may be selected as a THOG strictly on the basis of its uniqueness.

The "other" instruction may operate in a similar manner. For example, many

participants may reason that because White Diamond and Black Circle are the

written-down properties, the White Circle is the only remaining candidate for the THOG








classification. Such a rationale replaces simultaneous evaluation of multiple hypotheses

with an exclusion strategy. Thus, as Griggs et al. (1998) suggest, the one-other and

"other" instructions may encourage participants to select the correct answer for the wrong

reasons. The training conditions provided by the Explanation and Pretest THOG

manipulations suggest that this may indeed be the case.

The Explanation THOG provides participants with a detailed rationale for each of

the hypotheses that could be written down and for the designs that can and cannot be

THOGs given each hypothesis. Only two pieces of information remained unstated. First,

the explanation does not explicitly indicate that it is necessary to combine the answers

from the two hypotheses to achieve a correct solution. Second, the explanation does not

clarify how to test multiple hypotheses. After reading the problem statement and detailed

explanation, participants in the O'Brien et al. (1990) experiment performed no better than

those in their baseline condition. Experiment 4a replicated this counterintuitive null

result.

One possible reason for the lack of facilitation relates to the complexity of the

explanation and the inclusion of potentially misleading information. First, some

participants may interpret the explanation as containing a contradiction. The explanation

initially asks participants to consider black and diamond and white and circle among the

combinations that might be written down but then provides reasons why these

combinations are not permissible. Second, it mentions that the White Circle is a THOG

given either combination but fails to mention the Black Diamond. Information about the

Black Diamond is provided only at the conclusion of the problem statement preceding the

explanation. Thus, if participants chose to check the explanation when making their








classifications, they find no explicit statement that the Black Diamond is a THOG. This

might raise doubts about the status of the Black Diamond, leading to its classification as

not a THOG or as indeterminate. A vast majority (70%) of the participants in the O'Brien

et al. experiment reached this conclusion, but in the replication only one participant (7%)

did so. The reason for this difference is unclear, and may possibly relate to problem

format. However, what is clear is that leading participants directly to the point of a testing

multiple hypotheses does not prod them to do so or to consider how to do so correctly.

Thus, the failure of the Explanation THOG to facilitate performance serves to

identify the activity of testing multiple hypotheses as a major roadblock to the correct

solution. If the null effect remains after future experimentation that removes potential

sources of confusion from the explanation itself, results of the Explanation THOG will

offer a persuasive rationale for the classification behavior of those who correctly generate

two hypotheses. Existing data clearly indicate that simultaneously testing multiple

hypotheses is not a natural activity, e.g., in decision-making, people typically opt to attend

primarily to one hypothesis when making inferences (Mynatt, Doherty, & Dragon, 1993).

In the THOG problem, faced with uncertainty concerning which of the two hypotheses is

written down, participants may abandon any attempt to perform hypotheses tests. Among

those who do make such an attempt, lack of procedural knowledge may block the correct

solution (or may lead to correct answers for the wrong reasons).

Pretest THOG data support this thesis. The Pretest conditions were similar to the

Explanation THOG because participants were told what combinations of properties could

be written down and were given a rationale for each. Following the O'Brien et al. (1990)

procedure, pretest questions in Experiment 4a were worded to encourage participants to








classify the Black Diamond, as well as the other three designs. However, unlike the

Explanation THOG, the Pretest versions required participants to make their own

determinations of which designs could or could not be THOGs given each of the

hypotheses. If questions regarding design classification given each hypothesis were

presented as a single group, performance was comparable to baseline, obscuring the small

effect in the O'Brien et al. study. However, a change in format that allowed individual

inspection of answers to each of the four questions appeared to facilitate performance,

although the level of facilitation did not reach significance using the standard .05

criterion. This may reflect in part the small sample sizes (N = 15 per group). If each

original group had included 30 participants, a similar pattern of results would have been

interpreted as revealing statistically significant facilitation for the Split-Questions Pretest

THOG compared to the standard THOG baseline, Explanation, and Grouped-Questions

Pretest. A replication of the Split-Questions Pretest with an additional 15 participants

provided possible evidence for some facilitation (60%), but to a lesser extent than in the

original experiment (73%).

Using the Pretest THOG (split-questions) format, qualifying the pretest questions

concerning which designs could be THOGs by inserting the phrase "other than the Black

Diamond" failed to produce facilitation and may have added to, rather than relieved,

potential confusion. Fewer participants correctly answered pretest questions when the

qualifier was present than when it was absent. Additionally, asking participants to make a

definite commitment in the pretest questions regarding which designs could and could not

be THOGs seemed to reduce the proportion of completely correct answers to these

questions. Instead, participants tended to produce only one of the two possible designs








that could not be THOGs for each hypothesis. Responses were about equally split

between those indicating that the combinations written down could not be THOGs and

those indicating that the combinations not written down could not be THOGs. Thus,

insertion of the word "definitely" may have uncovered a specific stumbling block to

solution, even among participants who generated hypotheses. This might help explain the

indeterminate status of the White Diamond and Black Circle in the most prevalent type of

intuitive error.

Across the seven Pretest conditions (including the replication of the Pretest

THOG (split-questions), 39% of 105 participants answered all four pretest questions

correctly, 44% correctly identified the White Diamond as a THOG but provided only one

correct answer to each question regarding which design or designs could not be a THOG,

and 19% made some other error. Among those who answered all four pretest questions

correctly, 79% subsequently classified the designs accurately. Additionally, in the group

that correctly identified the White Diamond but gave only a partial answer to the

"not-THOG" pretest questions, 41% correctly classified the designs. This percentage

suggests that a minority block of participants attained the correct answer via a path that

circumvents appropriate logical inferences. The failure of this group to exhibit uncertainty

by choosing the insufficient information category for the White Diamond and Black

Circle suggests that they combined design classifications for the two hypotheses in a

unique manner. Perhaps people reasoned that if the White Diamond could not be a THOG

given one hypothesis and the Black Diamond could not be a THOG given the other then

both of these designs were definitely not THOGs. This implies that these participants

neglected to consider that only one hypothesis could actually be written down.








More direct evidence for the inability of some participants to perform the required

multiple hypotheses tests comes from the responses of those participants who offered

completely correct answers to all pretest questions but did not accurately classify the

designs (nine participants, representing 21% of those who correctly answered all pretest

questions). Error patterns were varied. Two participants indicated that the status of all

designs was indeterminate. No other response pattern was offered by more than one

individual.

Others reached inaccurate conclusions by applying logic to incorrect assumptions.

In the group that correctly identified the White Circle as a THOG but gave only a partial

answer to the "not-THOG" pretest questions, 24% proceeded logically (based on their

responses to pretest questions) to classify the White Circle and Black Diamond as

THOGs and the White Diamond and Black Circle as indeterminate. The extent of this

"near insight" illustrates that an incorrect answer does not necessarily indicate the

complete absence of logical reasoning. On the other hand, it is difficult to reconstruct the

basis for the intuitive errors committed by 17% of this group. After indicating that the

White Circle was a THOG given either hypothesis, they then concluded that it was

definitely not a THOG. Perhaps the confusion resulting from uncertainty regarding the

classification of the Black Circle and White Diamond led some to abandon logical

reasoning and resort to a matching strategy at this final stage of problem-solving.

Supporting this possibility, an additional 9% of this group of participants offered a

nonintuitive error pattern indicating that the White Circle was not a THOG or that there

was insufficient information to decide.








The classification of the White Circle as indeterminate may echo abandonment of

the sure-thing principle in decision-making in uncertain situations (Shafir & Tversky,

1993). The sure-thing principle suggests that "if we prefer x to y given any possible state

of the world, then we should prefer x to y even when the exact state of the world is not

known" (Shafir & Tversky, 1990, p. 450). Similarly, if the White Circle is a THOG if

White Diamond is written down and the White Circle is a THOG if Black Circle is

written down, then the White Circle will be a THOG even if participants do not know

which of the two combinations is written down. In decision-making, however, when the

exact state of the world is unknown, people do not consistently act in accordance with the

sure-thing principle, instead opting to maintain a "wait and see" position by postponing

the decision. This example of nonconsequentialist reasoning may be analogous to

selection of the "insufficient information to decide" option in the THOG task.

Thus, the complexity of the THOG problem in its abstract form renders it resistant

to facilitation at multiple junctures. This set of experiments clarifies how people err at

different stages of the problem: generating hypotheses, classifying designs given each

hypothesis, and combining the results of these initial classifications to reach a logical

conclusion. Results clearly demonstrate that answers alone provide a misleading picture

of underlying logical processes. Some people may logically reach an incorrect answer if

they commit a single error in any of the preceding stages. Others may provide a correct

answer based on tactics that are inherently nonlogical.














CONCLUSIONS AND FUTURE DIRECTIONS

Within the confines of the problem structure of the standard abstract THOG

problem, consistent facilitation of logical reasoning is ultimately hampered by difficulties

related to testing multiple hypotheses. Understanding of the exclusive disjunctive rule is

not a major roadblock. Hypotheses generation can typically be encouraged by explicit

requests, and a majority of people generally succeed at this task. Separating the properties

of the positive example from those of the hypotheses by providing a single label is helpful

at this point. However, hypotheses generation alone does not reliably facilitate solution.

Given each hypothesis, most people are capable of determining which designs are

THOGs and, to a somewhat lesser extent, which designs are not THOGs, for both

possibilities. Even at this stage, however, correct classification of the designs remains

elusive. Perhaps stemming from the uncertainty surrounding which hypothesis is actually

written down, people either fail to consider how to combine information from the two

hypotheses or do so incorrectly. Resistance to simultaneous evaluation of two hypotheses

appears to be strong.

Although some instruction wording results in higher solution rates, evidence

suggests that the facilitation may stem from nonlogical factors. It appears as if people

resort to heuristical tactics, even after engaging in some form of logical reasoning that

leads them to a point of confusion. In this case, heuristics such as matching or a strategy

of elimination are not necessarily preattentive. Thus, they should not be confused with the