eIA6NUMY
I&2 ) Cl (
q5, C6,5
CHOICE OF ONFARM TRIAL
SITES
Training Working Document No. 5
Prepared by
Roger Mead
Consultant
in collaboration
with CIMMYT staff
CIMMYT
Lisboa 27
Apdo. Postal 6641,
06600 M6xico, D.F., Mexico
PREFACE
This is one of a new series of publications from CIMMYT entitled Training Working
Documents. The purpose of these publications is to distribute, in a timely fashion,
trainingrelated materials developed by CIMMYT staff and colleagues. Some Training
Working Documents will present new ideas that have not yet had the benefit of extensive
testing in the field while others will present information in a form that the authors have
tested and found useful for teaching. Training Working Documents are intended for
distribution to participants in courses sponsored by CIMMYT and to other interested
scientists, trainers, and students. Users of these documents are encourage to provide
feedback as to their usefulness and suggestions on how they might be improved. These
documents may then be revised based on suggestions from readers and users and
published in a more formal fashion.
CIMMYT is pleased to begin this new series of publications with a set of six documents
developed by Professor Roger Mead of the Applied Statistics Department, University of
Reading, United Kingdom, in cooperation with CIMMYT staff. The first five documents
address various aspects of the use of statistics for onfarm research design and analysis,
and the sixth addresses statistical analysis of intercropping experiments. The documents
provide onfarm research practitioners with innovative information not yet available
elsewhere. Thanks goes out to the following CIMMYT staff for providing valuable input
into the development of this series: Mark Bell, Derek Byerlee, Jose Crossa, Gregory
Edmeades, Carlos Gonzalez, Renee Lafitte, Robert Tripp, Jonathan Woolley.
Any comments on the content of the documents or suggestions as to how they might be
improved should be sent to the following address:
CIMMYT Maize Training Coordinator
Apdo. Postal 6641
06600 Mexico D.F., Mexico.
Document 5
CHOICE OF ONFARM TRIAL SITES
1. Objectives
There are two distinct objectives which may influence the choice of sites. One is simply to achieve a
"representative" sample. Alternatively, or additionally, we may wish to assess the extent to which
differences of yield effects are attributable to variation in one or more characteristics of the sites. In either
case we shall be considering the particular levels of a small number of site characteristics.
2. Random or Systematic?
The choice of sites should always be systematic, or at least have a major systematic component. With a
large sample, as in many sample surveys, a random sample is virtually certain to be representative When
we are considering the choice of between three and ten sites, random selection merely guarantees that we
will sometimes select a set of sites which are very similar. That is, if we are selecting three sites from a set
of six possible sites, then random selection ensures that 1 time in twenty we will select the three most
similar sites (since there are 20 possible selections of three from six).
3. Representative Variation
One way to achieve representativeness in a sample of sites is to require that each pair of sites is adequately
separated. How much separation should we expect? If we were considering only a single site characteristic,
with a variance of o2 then the expected squared difference between two observations from the distribution
of the characteristic would be
2 o2.
If there are c characteristics then, assuming that within the domain the characteristics occur independently,
and that the scales of the characteristics have been standardized to a common variance, the expected
squared distance between two sites measured over all c characteristics would be
2c o2.
A reasonable target for our selection of sites would then be that the average squared intersite distance
should be
2c 02.
4. Selection From a Known Set of Sites
To illustrate the selection of a set of averagelyy" different sites consider the situation where we have
advance information on the intersite distances. We use data on seven sites from the 1981 experiment in
Document 4D. The squared distance matrix is
Site
1 2 3 4 5 6 7
Site
1 42291 38505 5846 38304 32750 13942
2 32988 38153 95343 29549 32316
3 30297 95729 19398 19479
4 24390 27623 14711
5 87457 69708
6 14227
The average of these squared distances is 38238. If we assume that this average is representative also of the
variation in the population of sites for which the sample of seven is representative then we should try to
select a set of sites for which the intersite distances are similar with an average about 38238.
If three sites are to be selected then the selection of sites 1, 2 and 3 is obvious (mean squared distance =
37928) and further inspection does not reveal (to me!) any better set.
If four sites are to be selected then there is no such satisfactory set. Possible alternatives are
(1,2,3,6) average 32580 minimum 19398 maximum 42291:
(2,3,4,6) average 29668 minimum 19398 maximum 38153;
(1,5,6,7) average 42731 minimum 13942 maximum 87457.
Although the average is nearest for (1,5,6,7) the two small distances for (1,7) and (6,7) are rather too small
and I would probably choose the (1,2,3,6) set.
5. Sequential Selection from Uncharacterized Sites
It would be possible to start with one site for which the values of the site characteristics would be measured
and then consider other sites in turn measuring the values of the characteristics to see whether they were
appropriately different from the first (and later in the process from each previously included site). For this
it would be necessary to have some idea of a, which means knowing the likely range of values for each
characteristic.
6. Site Selection to Obtain Information About the Effects of Characteristics.
For c characteristics the minimum number of sites required to provide information about all c effects would
be (c+1). Some ideal patterns for sites to investigate the effects of one or two characteristics can be
constructed geometrically.
For one characteristic the best set of two sites would be 0.707 a above and below the mean (approximately
at the quartile points). The best set of three sites would be at the mean and a above and below the mean,
giving squared distances of
02 o2 and 402 with average 202
For two characteristics the best set of three sites would be
X (k,0)
x x
(0.5k,m) (0.5k,m)
where m =(43)/2 k
and k = k(4/3)o.
A suitable set of four sites would be in a square with values of
+ or q(2/3)o.
Further geometrical manipulation could produce ideal sets for three or four characteristics (I think!).
7. Estimating Effects of Characteristics
This can be done by regression using the withinsite estimates of precision for the yield effects being
compared between sites.
