• TABLE OF CONTENTS
HIDE
 Title Page
 Table of Contents
 Introduction
 Data
 Pattern analysis
 Zonation
 Discussion
 Annexes
 References






Title: Multivariate methods for pattern analysis and zonation
CITATION THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00085565/00001
 Material Information
Title: Multivariate methods for pattern analysis and zonation
Physical Description: Book
Publisher: International Institute of Tropical Agriculture
United States Agency for International Development
National Cereals Research and Extension Project
Place of Publication: Washington, D. C.
Cameroon
Publication Date: October, 1993
 Notes
General Note: Paper presented at the North American Symposium of the Association for Farming Systems Research-Extension; University of Florida, Gainesville, Fla., October 12-16, 1993
 Record Information
Bibliographic ID: UF00085565
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Table of Contents
    Title Page
        Title Page
    Table of Contents
        Table of Contents 1
        Table of Contents 2
    Introduction
        Page 1
    Data
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
    Pattern analysis
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
    Zonation
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
    Discussion
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
    Annexes
        Page 40
        Page 41
    References
        Page 42
Full Text








MULTIVARIATE METHODS


FOR PATTERN ANALYSIS AND ZONATION









Paper presented at

the North American Symposium

of the Association for Farming Systems Research-Extension

University of Florida, Gainesville, October-12-16, 1993






Doyle Baker and Karen Dvorak





October 1993







Financial support for this research was provided by the International Institute of Tropical Agriculture and
USAID through the National Cereals Research and Extension Project, Contract III (6310052-C-00-1011-00)









CONTENTS


1. INTRODUCTION 1

2. DATA 1

2.1 Hypotheses 2
2.2 Data Collection 3
2.3 Variables 4

3. PATTERN ANALYSIS 6

3.1 Methods 6
3.2 Components Interpretation 8
3.3 Canonical Correlations 14

4. ZONATION 17

4.1 Methods 18
4.2 Linear Discrimination 20
4.3 Percentile Classification 25
4.4 Clustering 27

5. DISCUSSION 34

5.1 Implementation and Interpretation 34
5.2 Strengths and Weaknesses 36

6. CONCLUSIONS 39


ANNEXES

A. Sampling Methodology and Enumeration 40
B. Derivation of Ranks and Indices 40


REFERENCES









TABLES


1. Hypotheses on resource pressure-response 2
2. Variables used in multivariate analyses 4
3. Variable means before standardization by ecological zone 5
4. Component loadings for subject matter models 9
5. Interpretation of subject matter models 9
6. Development index regressions probability of error in attributing significance 10
7. Loadings for components of complete model 12
8. Variable loadings for significantly correlated canonical variates 15
9. Interpretation of canonical variates (variables with high loadings) 16
10. Subject matter zone discriminant models 20
11. Selected zone discriminant model rotated loadings 22
12. Percent correct "optimistic" classification using discriminant models 24
13. Variable means before standardization for score quartiles 25
14. Loadings for MARKET and ACTIVITY indices used in
subject matter clustering 28
15. Cluster size and mean component scores for subject matter model clusters 29
16. Cluster size and mean component scores for complete model clusters 32



FIGURES


1. Cameroon provinces and ecological zones 3
2. Eigenvalue scree plot for complete components model 12
3. High loading variables in complete components model 13
4. Discriminant score 50 percent regions for select model first and second factors 23
5. Discriminant score 50 percent regions for select model third and fourth factors 24
6. Quartile zonation based on variance weighted scores of first four
components of adjusted complete model 27
7. Partial denogram for hierarchial clustering using subject matter indices 29
8. Zonation based on clustering subject matter model scores 30
9. Denogram for component score clustering 32
10. Zonation based on clustering four components of complete model 33








MULTIVARIATE METHODS
FOR PATTERN ANALYSIS AND ZONATION



1. INTRODUCTION

No systems-oriented research approach can succeed unless the system is properly identified
and differentiated from similar systems. Farming systems research-extension (FSRE)
methodology development correspondingly has given substantial attention to research (or
recommendation) domain definition (eg. Byerlee, Collinson et al., 1980; Harrington and
Tripp, 1984; Hildebrand and Poey, 1985). Most attention has focused on when to define and
characterize domains, and how to go about collecting necessary information (Franzel, 1992).
Consensus has favored relatively early definition, following review of secondary sources and
rapid appraisal diagnosis. Validation generally has entailed comparison of descriptive
statistics for a priori defined domains.

Because of emphasis on early definition and rapid appraisal, domain definition in low income
applications of FSRE nearly always has been based on a small number of criteria. In nearly
all programs, the main criteria for zonation have been ecology and/or cropping patterns.
Many FSRE programs have further differentiated sub-domains within agroecological zones,
but such differentiation usually has been based on one to three criteria with few distinct cells.
Most common has been use of technology categories such as irrigated versus non-irrigated
and animal traction or not. Wealth proxies and gender (particularly gender of household
head) have been used effectively in some FSRE programs (see, for example, ATIP, 1986).

Despite the potential value of economic and institutional circumstances for characterizing and
differentiating domains, statistical methods for pattern analysis and zonation using socio-
economic criteria remain relatively unexplored. This is a major weakness in FSRE
methodology, and could prove to be a fatal flaw in adaptations of FSRE to high income
agricultural systems. For both low and high income agricultural systems, refined pattern
analysis and zonation can be achieved by use of multivariate data analysis methods--including
principal components, canonical correlation, linear discriminant and cluster analysis.

The paper illustrates uses of multivariate methods for pattern analysis and zonation. Data and
variables used in the analyses are described in Section Two. The third and fourth sections
cover pattern analysis and zonation methods. A discussion of implementation, interpretation,
strengths and weaknesses follows in Section Five.

2. DATA

The data used for the multivariate analyses are from a resource management survey (RMS)
carried in Cameroon in 1992.








MULTIVARIATE METHODS
FOR PATTERN ANALYSIS AND ZONATION



1. INTRODUCTION

No systems-oriented research approach can succeed unless the system is properly identified
and differentiated from similar systems. Farming systems research-extension (FSRE)
methodology development correspondingly has given substantial attention to research (or
recommendation) domain definition (eg. Byerlee, Collinson et al., 1980; Harrington and
Tripp, 1984; Hildebrand and Poey, 1985). Most attention has focused on when to define and
characterize domains, and how to go about collecting necessary information (Franzel, 1992).
Consensus has favored relatively early definition, following review of secondary sources and
rapid appraisal diagnosis. Validation generally has entailed comparison of descriptive
statistics for a priori defined domains.

Because of emphasis on early definition and rapid appraisal, domain definition in low income
applications of FSRE nearly always has been based on a small number of criteria. In nearly
all programs, the main criteria for zonation have been ecology and/or cropping patterns.
Many FSRE programs have further differentiated sub-domains within agroecological zones,
but such differentiation usually has been based on one to three criteria with few distinct cells.
Most common has been use of technology categories such as irrigated versus non-irrigated
and animal traction or not. Wealth proxies and gender (particularly gender of household
head) have been used effectively in some FSRE programs (see, for example, ATIP, 1986).

Despite the potential value of economic and institutional circumstances for characterizing and
differentiating domains, statistical methods for pattern analysis and zonation using socio-
economic criteria remain relatively unexplored. This is a major weakness in FSRE
methodology, and could prove to be a fatal flaw in adaptations of FSRE to high income
agricultural systems. For both low and high income agricultural systems, refined pattern
analysis and zonation can be achieved by use of multivariate data analysis methods--including
principal components, canonical correlation, linear discriminant and cluster analysis.

The paper illustrates uses of multivariate methods for pattern analysis and zonation. Data and
variables used in the analyses are described in Section Two. The third and fourth sections
cover pattern analysis and zonation methods. A discussion of implementation, interpretation,
strengths and weaknesses follows in Section Five.

2. DATA

The data used for the multivariate analyses are from a resource management survey (RMS)
carried in Cameroon in 1992.









2.1 Hypotheses


Two primary goals of the resource management survey were to: (a) identify developmental
patterns, and (b) differentiate research zones (or domains) on the basis of developmental
patterns--not just cropping systems or ecology. Rather than focus on a single driving force
such as population growth, the survey was designed to address multivariate hypotheses
linking pressures on the resource base to farmers' response to pressure. While the hypotheses
linked response patterns to growing pressure on resource access, no a priori hypotheses were
developed about specific responses to specific pressures. A major objective of the subsequent
multivariate analysis was to empirically isolate specific linkages.

The main indicators of resource access pressure, and their hypothesized effects on farm-
household system management are summarized in Table 1.



Table 1: Hypotheses on resource pressure-response


Indicators of Resource Access Pressure


Land: (a) population density, (b) village size,
(c) perceptions of land shortages, (d)
interaction with non-community and non-household
members, (e) formalized and monetized modes of
compensation, (f) restrictions on access to
common resources.

Labor: (a) extensive use of non-family labor,
(b) formalized and monetized modes of
compensation, (c) perceptions of labor
shortages.

Market contact: (a) distance and cost to nearest
market, (b) distance to nearest urban center,
(c) distance to nearest town, (d) traders come
to village, (e) input availability in nearest
market.


Expected Farmer Responses


Crop patterns and land use: (a) shortening
fallow periods, (b) increased field amendments
(organic and chemical), (c) intensified soil
management practices, (d) field type and crop
pattern differentiation.

Monetization of production enterprises: (a) use
of purchased inputs, (b) food purchases, (c)
sales of crop products.

Household economy: (a) livestock husbandry
increase in importance relative to hunting and
fishing, (b) more non-crop income sources, (c)
agricultural income falls in importance relative
to non-agricultural income, (d) income from bush
products and artisan activities falls relative
to salaried employment and services.


Developmental interventions: (a) HGO and
government activities, (b) extension contact and
demonstrations, (c) infrastructure (water,
roads, electricity, etc.), (d) active
cooperative or credit association.









2.2 Data Collection


The RMS was implemented through village-level meetings in 85 randomly selected villages,
chosen from a nationwide, stratified area-based sampling frame (see Annex A).' Figure 1
shows the provinces of Cameroon and its primary ecological zones.



Figure 1: Cameroon provinces and ecological zones


'The village was the unit for enumeration and analysis.









2.3 Variables


The analyses in sections two and three are restricted to a sub-set of 23 variables, out of
several hundred generated through the survey (Table 2). All variables represent exogenous
sources of resource pressure or farmers' resource use patterns,2 and all are continuous. Most
index variables were derived from rank and classification data, but were combined so as to
give cardinal orderings (Annex B).


Table 2: Variables used in multivariate analyses

Model Acronym Transform Derivation

MARKET
DISURB log kilometers to nearest urban center
POPDEN log rural population density for department
COSTM sq.root transport cost to nearest daily market (US$)
SERVICE sq.root index for public services
DISTWN log index for distance to nearest town
VILPOP log number of households in village
FIELD
FALO sq.root average fallow period for main food fields
LUI log index for land use intensity
FLDINT sq.root index for intense soil management
NOTYPE log number of sole and mixed crop patterns
FLDTYP index for distinct field types
DEPENDENCE
DEPSF log index for proportion buying staple foods
DEPMT index for proportion buying meat
DEPVL log index for proportion buying vegetables and legumes
SELLC qd.root index for proportion selling crops
ACTIVITY
ACROP log activity rank for crop production
ATREE log activity rank for tree crop production
APROC activity rank for food processing for sale
ALVSK activity rank for livestock husbandry
AFISH log activity rank for fishing
AHUNT log activity rank for hunting and gathering
ACOMM activity rank for artisan activities, commerce, services
AWAGE qd.root activity rank for wage and salary employment





2Variables for distances to urban centers and towns, departmental population density, and government
services are (for practical purposes) exogenous. Conversely, modes of compensation, forest access
restrictions, and land shortages are affected by village institutions and social organization. Due to the
influence of village institutions, variations in several non-exogenous resource pressure indicators did not
have consistent associations with hypothesized farmer responses.









Before undertaking the multivariate analyses all variables were checked for normality. Most
were skewed because villages in Cameroon generally fall toward the less-developed end of a
pressure-response development continuum. Variables found to be significantly skewed were
normalized through log or square root transformation. Square root transformation was
necessary for variables having zero as a valid observation. For two variables, it was
necessary to go to a quadratic root transformation before an acceptably normal distribution
was obtained.3 Following normalization, all variables were standardized to have zero mean
and unit variance. This is standard practice in multivariate analysis so model results are not
affected by different measurement units.

For reference, means of the variables before normalization and standardization are given in
Table 3. Clear differences in development patterns are apparent between the highlands and
the three lowland forest zones (evergreen, semi-evergreen and forest-savanna transition). The
highlands zone has higher population density, greater market integration and more services.
Corresponding to this, the highlands has a more intense resource use pattern for essentially all
the posited response variables. The north is distinct in several respects from the other zones
due to its arid ecology. Livestock are substantially more important than in the other zones,
both in production and consumption. Cropping systems are sorghum or maize based, while
root crops and trees play a minor role.


Table 3: Variable means before standardization by ecological zone

Seni Semi
Ever- Ever- Trans- High- Arid
Group Variable green green ition lands Savan. All

Villages 20 21 18 8 18 85

MARKET DISURB 93 76 109 63 115 94
POPDEN 10.9 13.2 3.9 79.9 27.8 20.0
VILPOP 65 189 106 773 95 180
COSTH 3.79 3.94 4.21 3.19 1.57 3.41
SERVICE 1.2 1.0 1.2 2.3 1.6 1.3
DISTWN 59 69 85 38 105 75
DEPENDENCE DEPSF 3.6 1.4 2.5 4.0 7.0 3.6
DEPMT 5.8 4.6 3.8 7.1 5.4 5.1
DEPVL 1.2 0.9 1.8 2.9 1.8 1.5
SELLC 12.0 10.9 13.4 12.7 6.9 11.0

(continued next page)



3Some variables representing hypothesized pressure-response relationships were excluded from the
analysis because it was not possible to derive normal indices. Of particular interest were indices for
chemical use on food crops and investments in field infrastructure. Non-zero values were obtained for less
than 15 percent of the villages.









Table 3: (continued)

Seni Seni
Ever- Ever- Trans- High- Arid
Group Variable green green ition lands Savan. All

FIELD FALO 4.4 4.4 4.8 0.6 4.5 4.1
LUI .41 .40 .42 .84 .58 .49
FLDINT 1.6 1.5 2.0 7.3 1.8 2.2
NOTYPE 3.8 4.3 5.3 9.3 5.7 5.1
FLDTYP 3.9 4.0 4.6 5.3 3.4 4.1

ACTIVITY ACROP 4.4 4.6 4.3 4.6 4.6 4.5
ATREE 7.3 6.8 12.3 6.7 20.6 11.0
APROC 17.0 17.3 15.2 19.0 17.7 17.0
ALVSK 17.7 16.0 15.1 13.0 9.8 14.6
AFISH 18.5 15.3 15.7 25.1 21.2 18.2
AHOUT 13.4 14.1 14.6 17.3 20.7 15.7
ACOHN 15.1 16.8 16.2 13.3 17.7 16.2
AWAGE 20.4 23.5 21.7 19.4 22.3 21.8


3. PATTERN ANALYSIS

The goal of pattern analysis is to improve and refine diagnosis of farmer circumstances.
Insights from pattern analysis can be used to refine hypotheses (and checklists) relating to
population behavior, and system performance, dynamics and sustainability.

3.1 Methods

The first part of the pattern analysis section presents results from principal components
analysis. The concept of principal components analysis is to identify a small number of
linear variates, based on the original variables, which account for a substantial proportion of
variation in the sample data. The derived linear variables are interpreted as underlying
dimensions (or factors) which, while unobserved, are responsible for variable correlation.

The procedure followed in principal components analysis is to decompose the covariance or
correlation matrix of variables into a set of linear combinations which account for maximum
possible variation. This is done by finding the largest eigenvalue and associated eigenvector
for the variance-covariance matrix or correlation matrix.4 Subsequent eigenvalues and
corresponding eigenvectors are sequentially extracted, each accounting for maximum variation




4In all analyses given below, standardize variables were used, so interpretation corresponds to use of
correlation aatrices.









and subject to the constraint that it is uncorrelated with all prior components.5

Implementation of the principal components analysis started with four subject matter sub-sets
of the variables, as identified above in Table 2. Interpretation of trade-offs is easier for
fewer variables, particularly if there is subject matter coherence. For each subject matter
model, components with eigenvalues greater than one were retained and orthogonally rotated
to improve interpretation.6 Basing rotation and interpretation on components with eigenvalues
greater than one is a common rule-of-thumb in components analysis (Dillon and Goldstein,
1984; Karson, 1982).7

The rotated loadings were interpreted to identify underlying components (or factors or
dimensions). Component scores were calculated for each village. These represent indices of
the identified underlying components. The component scores (indices) for the
DEPENDENCE, ACTIVITY, and FIELD models were regressed against the population and
market exogenous variables in order to differentiate effects of various exogenous
circumstances.

The third step of the analysis was to do a combined principal components analysis (all
variables). This analysis was used to identify broader development patterns, cutting across
and linking food dependence, activity emphasis, field management practices, and exogenous
population and market circumstances. A sub-set of components was selected and
orthogonally rotated. The "eigenvalue greater than one" rule-of-thumb was not used since, as
is often the case with models having a large number of variables, too many components for
useful interpretation and presentation had eigenvalues exceeding one. To choose the number
of eigenvalues to retain, the eigenvalues were plotted in declining order--called a scree plot
(Dillon and Goldstein, 1984). Choice was based on the most identifiable gap between the
first set of large eigenvalues and the other roots.

The second part of the section extends and refines pattern characterization through use of
canonical correlation analysis. The function of canonical correlation analysis is to identify


'Most major statistical software programs do principal components analysis (and/or factor analysis), so
mathematical procedures for implementing the analysis are, for purposes of generating and interpreting
results, not essential. For those who are interested, detailed model explanation can be found in several
excellent texts, including Karson (1982), Dillon and Goldstein (1984), and Nardia, Kent and Bibby (1979).
The first two, in particular, give a large number of examples along with discussion of underlying
mathematical and statistical properties.
6Varimax orthogonal rotation, used in the analyses presented in this paper, maximizes variation in
component coefficients. Following orthogonal rotation, components continue to have no correlation, and the
total variance accounted for by all rotated components remains the same. The variance accounted for by each
rotated component is redistributed, so eigenvectors, component loadings and resulting interpretation are
affected by the number of components included in the rotation.
7The intuitive logic is as follows: since each variable has been standardized to unit variance, and the
sun of all eigenvalues equals the number of variables, any linear variate having an eigenvalue less than one
accounts for less variation than any one of the original variables.









sets of linear variates which are maximally correlated. Mathematically, neither set is
dependent on the other set, but most applications of canonical correlation analysis identify
one set as independent and the other set as dependent. In this sense, interpretation of highly
weighted (or loaded) variables for significantly correlated variates corresponds--in a general
sense--to coefficient interpretation in a multiple regression model.

The same procedures used for the components analysis were repeated for the canonical
correlation analysis, starting with subject matter models and then going to a combined model.
To further ensure comparability in interpretation--highlighting model features not model
treatment--the same variables were used in each model, significant dependent canonical
variates were rotated and interpretations were based on canonical loadings.8

It is worth highlighting, before turning to results, that the difference between principal
components analysis and canonical correlation is in what is maximized. Components analysis
maximizes variation within a single variable set, while canonical correlation maximizes
correlation between two data sets. In both analytical methods, maximization is accomplished
through sequential matrix decomposition into the largest eigenvalue and associated
eigenvector, with subsequent decompositions subject to zero correlation with prior
components or variate sets. Differences in interpretation between canonical correlation and
principal components analysis stem from the matrix which is decomposed. In one case, the
matrix is the variance-covariance (or correlation) matrix for one set of variables. In the
other, a composite matrix is formed, comprising one block each of separate correlation
matrices for each set, and two blocks of correlations in the pooled data set.

3.2 Components Interpretation

Component loadings for the subject matter models are given in Table 4. In both the
ACTIVITY and MARKET models, three components had eigenvalues greater than one.
These components were rotated and rotated loadings are shown in Table 4. Only a single
component had a large eigenvalue for the DEPENDENCE and FIELD models. Table 4
shows loadings for the respective first unrotated component. Descriptive interpretations of
the development dimensions represented by the various components are given in Table 5.








"In principal components analysis, interpretation is nearly always based on rotated components, but
rotation is not as common in canonical correlation analysis. Mathematically, both procedures entail matrix
decomposition into eigenvalues and associated eigenvectors (only the matrix being decomposed differs), so
reasons for the different practices are not entirely clear. Furthermore, interpretation in canonical
correlation analysis often is based on standardized coefficients rather than loadings, but Dillon and
Goldstein (1984) convincingly argue that loadings provide more reliable interpretation.









Table 4: Component loadings for subject matter models

1 2 3 1

ACTIVITY DEPENDENCE
ACROP -.06 -.62 -.39 DEPSF .61
ATREE .80 .14 .12 DEPHT .81
APROC .01 .04 -.84 DEPVL .78
ALVSK -.86 .05 .10 SELLC -.59
AFISH .18 -.65 .34 FIELD
ABUNT .53 -.43 .30 NOTYPE .51
ACOMM .08 .12 .62 LUI .87
AWAGE .09 .71 .16 FALO -.89
MARKET FLDTYP .51
DISTWN -.23 .83 .04 FLDINT .61
DISURB .10 .13 .80
POPDEN .18 -.09 -.83
COSTM .14 .83 .16
VILPOP .70 .04 -.47
SERVICE .88 -.09 .13


Table 5: Interpretation


Model


of subject matter model components


Interpretation


ACTIVITY 1 Main non-food crop enterprise: livestock versus tree and hunting
ACTIVITY 2 Modernization: traditional activities (crop, fish, hunt) versus wage
ACTIVITY 3 Diversification: Far (processing) versus non-fara (commerce, artisan)

DEPENDENCE Sufficiency: buying versus selling food

FIELD Intensification: soil nanagenent practices

MARKET 1 Village: size and services
MARKET 2 Integration: closeness to local markets and nearby town
MARKET 3 Population: population density and nearness to urban center



The main variation among villages in rural income generation is due to the relative
importance of livestock versus trees and hunting. This reflects ecological differences between
the arid savanna and highlands, relative to southern forest zones. Once variance is removed
due to ecology-affected enterprise emphasis, two interesting patterns of development emerge.
One, shown in the second activity model component, is a tradeoff between traditional rural
activities and wage employment. The other, seen in the third component, is a distinction in
the nature of supplemental income generation--whether food processing or non-farm activity
emphasis.









For both the DEPENDENCE and FIELD models, the first component captured nearly 50
percent of the variance. In each case, the first unrotated component gives a general index,
for food dependence and field management intensity respectively.

In the MARKET model, three components had eigenvalues greater than one. Following
rotation of these components, three patterns of exogenous circumstances were identified. As
seen in Tables 4 and 5, one pattern corresponds to local circumstances, the second to local
market and town integration, and the third to macro circumstances of population density and
urban center proximity.

Scores obtained from the food dependence, activity and field management models were
regressed against the set of exogenous population and market variables, along with a
categorical variable for ecological zone. The purpose, as mentioned above, was to
distinguish effects of different exogenous variables and, in particular, to identify patterns
which are significantly associated with market circumstances. Given the purpose of analysis
(and use of normalized and standardized data), the focus of interpretation is on significance of
the regression coefficients, not the size and signs of the coefficients. Results, giving the
probability of error in identifying a variable as being significant, are in Table 6.


Table 6: Developaent index regressions probability of error in attributing significance

LDIST DISTURB POPDEN COST VILPOP SERVICE ZONE R2

ACTIVITY 1 .90 .67 .60 .54 .84 .79 .00 .70
ACTIVITY 2 .17 .04 .10 .54 .12 .40 .22 .40
ACTIVITY 3 .01 .04 .34 .53 .38 .30 .55 .24

DEPENDENCE .38 .43 .02 .37 .18 .01 .03 .44

FIELD .48 .61 .04 .86 .25 .36 .00 .49



For the activity model, ecological zone is the only exogenous variable that has a highly
significant effect on the livestock-trees-hunting balance. This was as expected, given the
substantial and well-known differences between southern and northern Cameroon, but
presents a problem in evaluating the hypothesis that livestock are more important more
intensive resource use systems. In Cameroon, zone is the dominant determinant of livestock
importance but zone is confounded with population, services and distance to town. Due to
resulting multicollinearity, potential influence of developmental intensification on the relative
importance of livestock is difficult to assess.

Regressions for the second and third activity models more clearly show and differentiate the
hypothesized market integration effects on activity emphasis. Distance to urban center has a
significant effect on the relative importance of natural resource activities--fishing and hunting-









-versus wage employment. The effect of population density is also significant, with a ten
percent probability of error. Proximity to towns and urban centers has a significant effect on
the relative importance of non-farm activities versus on food processing.

The balance in food self-sufficiency, represented by the first component of the
DEPENDENCE model, is significantly affected by zone, population density and presence of
local services. In the arid savanna and highlands, farmers are less self-sufficient than those
in the southern forest. In the northern savanna, this is due to harshness of the environment
while in the highlands it is attributable to high population density and diversification of the
rural economy Land and field management intensification (FIELD model) is also
significantly affected by population and ecological zone, for the same reasons.

An interesting result for both the DEPENDENCE and FIELD models is lack of significance
for any of the market integration/proximity variables. A key question in interpretation is
whether integration/proximity does not affect dependence and intensification, or whether
markets effects exist but are secondary relative to population pressure. These cases cannot
necessarily be distinguished with standard multiple regression aid correlated independent
variables.

An alternative to deriving indices and regressing them against independent variables is to
include all variables in a complete components model. Interpretation of such a complete
model must be restricted to the correlation concept of principal components, but this is not
necessarily a problem when the goal is pattern characterization. In fact, second and
subsequent components can reveal secondary patterns which otherwise might be hidden in
analyses focused on univariate significance.

In the components analysis of the complete model, seven eigenvalues exceeded one. To
achieve further parsimony in data reduction for interpretation, a scree plot of eigenvalues
(Figure 2) was used to identify a gap between the most important components and the other
components with eigenvalues greater than one.

Based on the scree plot, four components were rotated. Rotated loadings for these
components are given in Table 7. To facilitate interpretation, high loading variables are
plotted in Figure 3.











Figure 2: Eigenvalue scree plot for complete components model


Table 7: Loadings for components of complete model

1 2 3 4


DISTWN
DISTURB
POPDEN
COST
VILPOP
SERVICE

DEPSF
DEPHT
DEPVL
SELLC

MOTYPE
LUI
FALO
FLDTYP
FLDINT

ACROP
ATREE
APROC
ALVSK
AFISH
AHUMT
ACOMM
AWAGE


.20
-.14
.55
.14
.53
.08

.11
.09
.38
-.15

.42
.77
-.84
.47
.54

.21
.01
.42
-.19
.19
.13
-.46
-.23


-.68
-.56
.44
-.47
.33
.12

.36
.79
.35
.04

-.02
-.03
-.05
-.01
.33

.22
-.25
-.37
.02
.75
.42
.10
-.35


-.16
-.01
-.10
.14
.19
.51

-.32
-.17
-.37
.63

.59
-.02
.12
.28
.01

-.70
.17
-.38
-.11
.11
.06
.18
.21











Figure 3: High loading variables in complete components model


1.0

08 LUI DEPMT
ALVSK ACPOP
0.6 SELLONOT PE
FLDIN VILPOP DEPSF SERVICE
0.4 ACOM LDr
D APROC NOTYPE APROG
AGEDEPVL



-0.4 FLDTYP TM DEPVL
APROC AUnNT COSTM
COST
AHUNTSELLO
DISURB
-.0e
DISTWN
AF16S ATREE

FALO
-1.0 __________
First Second Third Fourth


The first and second rotated components can be interpreted as representing two important
dimensions of development, both in accord with hypothesized patterns. The first component
represents "population driven intensification and differentiation." High population density
and village size are linked to short fallow, land use intensity, intensity of soil management,
larger numbers of cropping patterns and field types, and increasing importance of non-farm
activities. Quite important, from the standpoint of the overall multivariate model, is low
correlation between the various market proximity and integration variables and land use
intensification. Population pressures would seem to be the main driving force of land use
intensification, at least in Cameroon.

The second component provides insight into the effects of urban proximity and market access.
Proximity to local markets, towns and urban centers is linked to increased food purchases,
greater importance of wage employment and food processing, and reduced importance of
fishing and hunting. This component might reasonably be labelled "market- influenced
diversification and dependence." Even though urban centers and local markets have weaker
linkages with land use intensification than anticipated, patterns of non-agricultural resource
use and income generation nearer to urban centers and local markets conform closely to
hypothesized patterns. In brief, the first two components provide substantial empirical
support for the hypothesized resource pressure-response model, while helping to differentiate
effects of population pressure and market integration.








The third component identifies the dominant non-food crop activity. Based on the size and
signs of the loadings, the importance of livestock is contrasted to the importance of tree crop
production and hunting. Livestock importance is also linked to staple food dependence (high
purchases and low sales). This component captures enterprise differences between the arid
savanna and, to a lesser extent, the highland and transition zones versus the southern forest
zones. An analysis of variance of scores for this component confirmed the significant effect
ecological zone.

The fourth component can be labelled "food crop production surplus pattern." A high
ranking for food crop production is linked to crop sales, importance of food processing for
sale, a large number of cropping patterns, low food purchases, and presence of village
services. Since there are no highly loaded variables with opposite signs, this component
distinguishes successful, surplus food crop-producing villages from those where one or more
of the "successful pattern" circumstances do not hold.

The main contribution of the complete model relative to the subject matter models is the
linkages shown between exogenous circumstances and resource use patterns in the first two
components. To further investigate these developmental patterns, rotations were carried out
with additional components having eigenvalues greater than one. Most interesting was the
effect of including the fifth component (eigenvalue of 1.33). Rotation with five components
resulted in a split of the market linkages identified in component two, thereby distinguishing
local market access patterns from urban proximity patterns. The main pattern associated with
local markets and towns is declining importance of the "natural resource" activities--fishing
and hunting--and increasing importance of meat purchases. Urban proximity (along with
higher loading for population density) was strongly associated (loadings above .5) with all
three categories of food purchases and the importance of wage employment. Soil
management intensity also had a high loading, showing the hypothesized association between
intensification and urban proximity exists, even ifdoes not account for as much variability as
does population driven intensification

3.3 Canonical Correlations

Matrix decomposition in principal components analysis depends on covariation (or
correlation). There is no guarantee that component patterns will provide desired information
about linkages between exogenous circumstances and hypothesized responses or behavior.
This was the case, for example, with the third and four components of the complete model.
When the main focus of analysis is on linkages between one set of circumstances and another,
such as in the pressure-response hypotheses of the Cameroon study, canonical correlation is a
good option to principal components analysis. Canonical correlation is also a good
alternative to multiple regression analysis when independent variables are highly correlated.

The purpose of canonical correlation analysis with the Cameroonian data was to identify
highly correlated patterns between exogenous circumstances (represented by the MARKET
variable set) and various resource use responses. As with the components analysis, subject










matter models were evaluated as well as a complete model incorporating all variables.
Loadings for the subject matter and complete models are given in Table 8. For significantly
correlated variates, dependent sets were rotated to improve interpretation. Canonical
correlations given in Table 8 are between unrotated independent set variates and rotated
dependent set variates. To facilitate interpretation, variables with high loadings are
highlighted in Table 9.



Table 8: Variable loadings for significantly correlated canonical variates 1

DEPENDENCE FIELD ACTIVITY COMPLETE
1 2 1 1 2 1 2 3


MARKET
DISTWN
DISTURB
POPDEN
COST
VILPOP
SERVICE
DEPENDENCE
DEPSF
DEPMT
DEPVL
SELLC
FIELD
NOTYPE
LUI
FALO
FLDTYP
FLDINT
ACTIVITY
ACROP
ATREE
APROC
ALVSK
AFISH
AHUT
ACOMM
AWAGE
Canonical
Correlation


.07
-.12
.93
-.23
.44
.10


.24
.29
-.88
.57
-.46
-.05

-.39
-.83
-.31
.10

-.12
-.10
.20
.04
-.11


-.38
-.62
.91
-.06
-.62


.09
.58
.70
-.59
-.18
.12
-.06
.10


.28
.28
.08
.61
.33
-.43

-.18
-.12
-.48
.18

-.01
-.54
.58
-.30
-.52

-.34
.35
-.22
.25
-.10
-.05
.36
.04


.62 .39 .70 .70 .60 .55 .59 .51


To illustrate interpretation, reference can be made to the DEPENDENCE model results. The
first independent variate represents population pressure, along with local market access. Meat
purchases (DEPMT) is the highest loaded variable in the first dependent variate. A usual
interpretation of these patterns might be that a pattern of increasing population pressure









and/or local market access is correlated with increasing meat purchases.9 The second set of
variates in the DEPENDENCE model can similarly be interpreted as showing a linkage
between town proximity and village services, from the independent set, and the balance
between food purchases and sales.


Table 9: Interpretation of canonical variates (variables with high loadings)

Model Independent Variate Dependent Variate

DEPENDENCE 1 population pressure and integration neat purchases
DEPENDENCE 2 town proximity and village services balance between food (other than
neat) purchases and sales
FIELD population pressure and village size land and field management
intensification (not differentiation)
ACTIVITY 1 population pressure and village size declining importance of forest
resource activities (fish and hunt)
ACTIVITY 2 urban and town proximity tree and processing versus livestock
COMPLETE 1 population pressure and local market neat purchases and declining
access importance of forest activities
COMPLETE 2 urban and town proximity livestock versus trees and
processing; second pattern linked to
crop sales and long fallow
COMPLETE 3 services and local market access land and field intensification;
balance between crops and trees



In the FIELD model, only one canonical correlation was highly significant. The independent
variate is highly correlated with population variables, while the dependent variate is highly
correlated with intensification variables. The straightforward interpretation is that population
pressure is associated with resource use intensification, giving empirical validation of the
Boserup hypothesis (Boserup, 1965). The resource intensification response shown in the
FIELD canonical correlation model conforms to results of the field intensification index
regression and to the first rotated component of the complete model. In the principal
components analysis, however, intensification and differentiation variables both had high
loadings in the population driven component. The canonical correlation analysis suggests that
intensification is a more common response to population pressure than is crop pattern and
field type differentiation.

In the ACTIVITY model, the first independent variate can be interpreted as being a measure
of population pressure. This is highly correlated with a variate representing the importance



9High variable loadings in significantly correlated variates do not necessarily mean that the variables
themselves are highly correlated. Nevertheless, with proper caution, high loadings are used to identify and
interpret inter-set correlation patterns.








of fishing and hunting. In simple language, as population goes up, the importance of natural
resource-dependent activities declines.

Exactly analogous to the principal components analysis, a shift from subject matter models to
a complete model was used to assess associated patterns in field intensification, food
dependence and rural income generation, and also to help refine hypotheses about linkages
among response patterns and exogenous circumstances.

The first variate of the COMPLETE model is most correlated with population density, just as
was the case for all the subject matter models. Variables with high loadings in the first
variate for the dependent set are those for fishing and hunting, along with meat purchases.
This variate combines patterns shown in the DEPENDENCE model and the first variate of
the ACTIVITY model.

Once variation due to correlation between forest activities and population pressure is
removed, the second set of variates links integration (town, urban and local market) to tree
versus livestock balance, but also--and more interestingly-to fallow periods and crop sales.
The enterprise balance is mainly a sample artifact, but the other variables with high loading
for the dependent set variate isolate two key resource management responses to market-
induced pressures.

The third set of variates is perhaps the most interesting. Once having removed resource
pressure-response variation due to population and urban and town proximity, the remaining
variation among the independent variables is correlated with variables characterizing local
circumstances: village size, services and transport cost to the nearest market. The highest
loading variables on the dependent variate are those for land use and field management
intensity. The highest loading activity variable is ACOMM, representing non-farm activities.
From the standpoint of developmental characterization, this result shows that dominant
covariance between population and intensification (as shown in the components model) does
not tell the whole story. Villages services and local markets also stimulate (or at least are
correlated with) land use intensification and non-farm diversification. In essence, there are
small developmental nodes, in a spatial sense, which are hidden by more dominant population
and urban center driven patterns.10

4. ZONATION

The goal in zonation is to identify and differentiate research domains. This section illustrates
three approaches to zonation, all of which contribute to system characterization and pattern


'OThe "small developmental nodes" concept has particular value from a policy standpoint. While the
public sector can do little about population and urbanization patterns, at least in a reasonable planning
horizon, it might be able to work on village services and local markets to stimulate rural income
generation. Of course, introduction of sustainable, profitable technologies for intensified resource use
systems would be an important complementary public sector priority.









analysis.


4.1 Methods

The first approach used for zonation is linear discriminant analysis. The purpose of linear
discriminant analysis is to identify linear variates which best distinguish among predefined
groups or, in this case, zones. Conceptually and mathematically, linear discriminant analysis
is quite similar to principal components and canonical correlation analysis. In the case of
linear discriminant analysis, the linear variates are derived by maximizing within group
variance relative to between group variance. Maximization proceeds by composing an
appropriate matrix and, as before, sequentially decomposing into eigenvalues and associated
eigenvectors.

In line with the illustration orientation of this paper, nearly the same procedures used for the
components analysis and canonical correlation analysis were followed for the linear
discriminant analysis. First, a series of discriminant functions were derived for the various
subject matter variable sets, and then a combined model was developed. Functions with
significant eigenvalues were rotated and interpretations were based on function loadings.

For each model optimistic classification rates (Dillon and Goldstein, 1984) were obtained by
using derived discriminant functions to classify the original data, and then comparing
predicted groups to actual groups. This does not constitute model validation, but is useful for
one-sided interpretation; ie. poor classification clearly indicates a poor model while good
classification does not necessarily indicate the model will be useful for new observations.

In a slight change from the earlier applications, not all variables were included in the
complete discriminant function model. A small number of variables were excluded after
univariate F-statistics obtained when estimating the subject matter models showed no
significant differences among zones. Following Dillon and Goldstein (1984), lack of
significance was confirmed for the set of excluded variables as a whole (by comparing
reduced and full models using an F-statistic) and individually (based on partial F-statistics
after regressing out effects of the entire set of included variables).

To facilitate interpretation of the complete linear discriminant model, factor scores were
calculated and plotted against each other. Rather than show all observations, ellipsoid
showing fifty percent population regions were calculated on the basis of pairwise score
variances and covariances.

The second part of the section turns to zonation methods when zones are not knownin
advance. The first method shown, and in most cases the easiest to implement, is percentile
division. In the Cameroon study, a goal in zonation was to derive zones which are relatively
homogeneous with respect to the hypothesized resource pressure-intensification-diversification
development model. To do this a single development score was calculated using component
scores from the complete components model (Section 3). The relative importance of








livestock, fishing and tree production in Cameroon is mainly determined by ecology, not
developmental stage, as shown by the components regression and canonical correlation
analysis. To derive a developmental score, without arbitrarily treating livestock emphasis as
more developed than tree production (which is linked to fishing in Cameroon), all three
activities were dropped and the components analysis rerun. Four components were retained
and rotated. Component scores were determined and then a single weighted score calculated
using the four component scores. The weights used were the proportion of variance
accounted for by each component. Since relative signs are all that matter in interpreting
components (Wilkinson, 1990), component signs were switched for one of four components
to ensure compatibility was the intended development-intensification score.

Once the overall weighted development score was calculated, quartile groups were formed.
Development-linked groups were plotted according to village location, to illustrate differences
between area-based and development score zonation.

The development score and percentile approach has two clear limitations. One is that it
might not always be possible to identify and quantify a single continuous variable
representing the range of factors one wants to consider in zonation. Second, the decision on
number of groups is arbitrary, with no particular reason for small within group variance.

An alternative approach is clustering. Clustering, as a multivariate technique, is quite distinct
from the methods based on matrix decomposition. Hierarchial clustering, the approach used
in this paper, entails a search across all objects (villages in this case) to find the two which
are closest with respect to the variables included in the analysis. Once the first cluster is
formed, a search is carried out to find the next closest set of objects. At any point, an object
might be clustered with another object or with an existing cluster. Differences in how
distances are calculated distinguish various clustering algorithms. In this analysis, Euclidean
distances were used, as is normal practice, along with complete linkage."

Clustering results are presented for two sets of variables. The first clustering analysis is
based on developmental indices derived through components analysis of the subject matter
models. The second cluster analysis was based on scores for the first four rotated
components of the complete model. In both analyses, villages were assigned to a maximum
of ten clusters using within-cluster distance to determine the cut-off. In both cases, a
pragmatic decision was taken to extend within cluster distances for certain small, nearby



"A key difference in clustering algorithms is treatments of objects which have already been clustered.
The two most common approaches are: (a) distance based on the nearest cluster member (single-linkage), and
(b) distance based on the farthest cluster member (complete-linkage). The single linkage approach tends to
give many clusters which then cone together, with relatively little distance among higher order clusters.
The complete linkage approach tends to give fewer, grosser clusters, with greater distances at the final
stages of clustering. The complete linkage approach gives clearer distance breaks for identifying a limited
number of zones (or research domains) and so was the method used.









clusters, thereby joining those clusters."2


4.2 Linear Discrimination

Ecology zones were used for the linear discriminant analysis. The purpose of the analysis
was to identify variables or patterns which best distinguish the zones. The patterns
responsible for zone differentiation are identified through interpretation of loadings (or
coefficients, if one prefers) for the significant discriminant functions. These loadings are
shown in Table 10 for the subject matter models.


Table 10: Subject matter zone discriminant models

Prob. Rotated Loadings 1 Prob. Rotated Loadings
F2 1 2 3 F 1 2

DEPENDENCE (.387) 3 FIELD (.340)
DEPSF .00 -.36 .84 .39 NOTYPE .00 .60 -.02
DEPHT .14 .10 .05 .84 LUI .00 .49 -.01
DEPVL .03 .04 .74 -.06 FALO .00 -.61 .01
SELLC .00 .84 -.06 .09 FLDTYP .00 -.00 .89
ACTIVITY (.139) FLDINT .00 .60 .31
ACROP .98 .01 -.07 MARKET (.280)
ATREE .00 .59 .34 DISTWN .49 -.09 .30
APROC .10 .04 -.34 DISURB .30 -.14 .26
ALVSK .00 -.52 .08 POPDEN .00 .84 -.03
AFISH .00 .13 -.69 COSTH .01 -.10 -.57
AHUNT .00 .35 -.07 VILPOP .00 .51 -.33
ACOIN .12 .11 .34 SERVICE .34 .22 -.02
AWAGE .08 .04 .40

1. Rotated loadings shown only for significant functions (prob.<.05).
2. Significance probability for Univariate F-Statistic; ANOVA by zone.
3. Wilks' Laabda given in parentheses; highly significant for all models (prob.<.01).


Three significant discriminant functions were identified from the four variables in the
DEPENDENCE model. The first function represents staple food self-sufficiency. This
corresponds to the first unrotated component and the second canonical variate of the


"2There were four main reasons for doing so:
a) village circumstances should be reasonably representative because of the sampling procedure and
sample size, so outlying clusters comprising one to a few villages are not likely to represent major
blocks of villages
b) research can only be targeted to manageable number of domains
c) joined clusters were in all cases closer to each other than they were to all remaining clusters
d) there is no reason research domains should be equally homogenous as long as the nature of the domain
is understood








DEPENDENCE model in the earlier analyses. The third discriminant function relates to
importance of meat buying, and it corresponds to the second canonical variate. The main
new information from the discriminant analysis is from the second function, which
discriminates zones on the basis of staple food and vegetable purchases. In an integrated
national sample, staple food and vegetable buying is not a major source of inter-village
variation, but buying is concentrated in the arid savanna, transition, and highland zones
relative to the southern forest zones, so this pattern is significant for inter-zone
differentiation.

The first discriminant function for the FIELD model represents land use and soil management
intensification, corresponding to the only significant component and variate identified in
earlier analyses. A second discriminant function, attributable to the number of cropping
patterns, is also significant.

Both of the ACTIVITY model functions show patterns identified in the principal components
and canonical correlation analyses. This reinforces the interpretation that ecological zone has
a major influence^enterprise emphasis.

The biggest difference between zonal discrimination patterns and patterns identified in the
earlier analyses are for the MARKET model. The discriminant functions link population
pressure and village size in one pattern, and village size and local market access in another
pattern. In contrast, components analysis linked population pressure and local market access
in one component. Services and town proximity were linked in a second component."

Before calculating a complete model, univariate F-statistics, obtained when deriving the
subject matter functions, were reviewed to determine which variables contributed little to
zonal discrimination. These F-statistics showed that values for seven variables did not differ
significantly by zone. Partial F-statistics were calculated to verify lack of zonal effect,
controlling for covariance with all remaining variables. Only one variable, DISURB, had a
significant partial F-statistic. This variable was included along with all variables having
significant univariate F-statistics in a "selected" complete discriminant model.14

Loadings for the selected discriminant model are given in Table 11. All four discriminant



"In the discriminant analysis, population and village size differentiate villages in the north and
highlands from those in the other zones. The main remaining variation, for distinguishing among zones cones
from the pattern of village size and local market access. For example, villages are big and population
pressure is high in both the highlands and arid savanna similarr scores for first function), but cost of
transport to local markets is less in the highlands than in the arid north (different scores for second
function).

'4Nodel results for the selected model and a complete model involving all variables showed little
difference in either patterns identified or classification performance. The selected model is presented
since it is more efficient in the sense of achieving comparable discrimination with fewer variables.









functions were significant, and were therefore included in the rotation to improve
interpretation. The four identified patterns correspond closely to the four rotated components
of the principal components analysis, albeit in different order and with different loadings.


Table 11: Selected zone discrininant model rotated loadings 1

Discriminant Function
1 2 3 4

NOTYPE -.09 .43 .14 -.03
LUI .03 .32 .02 -.07
FALO -.08 -.41 .03 .26
FLDTYP -.41 .21 .04 .14
FLDINT -.18 .47 .02 -.16

ATREE .03 -.09 .59 -.11
ALVSK -.17 -.31 -.35 -.11
AFISH .09 .17 -.06 -.49
AHUNT .09 .15 .24 .01

AWAGE .07 -.00 .05 .43

DEPSF .10 .03 .15 -.52
DEPVL -.23 .15 .15 -.16
SELLC -.44 -.05 -.12 -.05

POPDEN .37 .57 -.17 -.20
DISURB -.10 -.05 .17 .07
COSTH -.16 -.05 -.13 .08
VILPOP -.06 .44 -.01 .10
POPDEN

1. Wilks' Laabda for selected complete model =.032 (prob<0.00).
Significance of residual roots for discriminant functions: (1)
0.00, (2) 0.00, (3) 0.00, (4) 0.07.


The first discriminant function differs the most, but the pattern based on crop sales and field
types clearly corresponds to the fourth rotated principal component of the complete model.
Food crop production was the highest ranked activity in all zones (and thus lacks significance
for zonal discrimination) but there are significant differences in crop selling among the zones.

The second discriminant function reflects population driven intensification. This function
corresponds to the first component and the third set of canonical variates. The third
discriminant function distinguishes the livestock-dominated zones from those where tree crop
production is the main non-food crop activity. The fourth function discriminates on the basis
of wage employment and staple food buying versus fishing.









The zonal discrimination patterns of the selected model can be graphically illustrated by
plotting population regions for function scores. Figure 4 shows joint regions for the first and
second functions. Figure 5 shows joint regions for the third and fourth functions. Population
regions were calculated at a 50 percent level, meaning that half the villages in each zone can
be expected to fall within the region.15


Figure 4: Discriminant score 50 percent regions for select model-first
(vertical) and second (horizontal) factors (scale=standard deviations)


Figure 4 shows that zone four, the highlands, is differentiated by the second discriminant
function, that for population pressure-intensification. There is little, if any, difference among
the other zones with respect to this function. The first function discriminates between zone
five (arid savanna) versus the highlands and zone three (forest-savanna transition). Crop
selling is more common in the latter zones, while population pressure is greater (at least
relative to the transition zone) in the arid savanna.


"For interpretation, areas of the ellipsoid depend on variance for the function scores. Shapes of the
ellipsoid are affected by relative variance for the two scores. Population regions appear nore circular if
variances for the function scores are similar, but appear more elongated in the direction of the function
for which variance is greater. Tilt of the ellipsoid depends on covariance between the factor scores.


2 5

1 2



-1





-4
-3 -

-4 -i i- i--i 1 i i- i i
-4 -3 -2 -1 0 1 2 3 4 5









Figure 5: Discriminant score 50 percent regions for
select model-third (vertical) and fourth (horizontal)
factors


-3 -2 -1 0 1 2 3


Figure 5 shows that zone five (arid savanna) is differentiated from the other zones because of
the emphasis on livestock. Zone three (transition) is distinguished from the other zones on
the basis of function three, but less distinctly so because of large within zone variance. The
fourth function differentiates zones four and five (highlands and arid savanna) from zone two
(semi-evergreen forest). Fishing is particularly important in the semi-evergreen forest, in
contrast to wage employment and staple food buying in the other zones.

In general, the various discriminant models were effective in differentiating the highlands and
arid savanna from the other zones, but were not particularly effective in differentiating among
the forest zones in southern Cameroon. This is reflected in Table 12.


Table 12: Percent correct "optimistic" classification using discriminant models

Semi Savan. Arid
Ever- Ever- Trans- High- Sa- Over-
green green ition lands vana all

SUFFICIENCY 35 33 61 57 67 55
MARKET 45 26 81 88 65 56
FIELD 30 19 61 100 67 48
ACTIVITY 65 52 50 100 89 67
SELECTED 75 74 94 100 100 86









Optimistic classification percentages in Table 12 give the percentage of sample villages which
are correctly classified into their actual zone using the selected model discriminant functions.
As can be seen, classification rates are consistently lower for the first three zones compared
to the highlands and arid savanna. In most cases, misclassification in one forest zone was
due to predicted classification in the other forest zone or the transition zone. Figures 4 and 5
and Table 12, together, suggest there is little reason to treat the two forest zones as separate
domains for future research.

4.3 Percentile Classification

Linear discrimination is not useful if groups cannot be determined a priori. This is typical of
analysis directed at domain differentiation within agro-ecological zones. Even if dealing with
clearly defined agro-ecological zones, an alternative zonation might be needed if variables
expected to affected farmer interests and responses are not correlated with ecological zones.

One alternative is to derive a single measure and then classify objects based on percentile
division. As described in Section 5.2, a single measure of development integration and
intensification was derived for Cameroon by redoing the components analysis without three
activities which were so dominated by ecological zone that they could not serve as useful
developmental proxies. The variance weighted-combined development score was then use to
classify villages into developmental quartiles.

Variable means, before normalization and standardization, are shown by development quartile
in Table 13. For nearly all variables, the first quartile is clearly differentiated from the
fourth.


Table 13: Variable means before standardization for score quartiles

Variable 1 2 3 4

DISURB 58 83 106 136
POPDEN 41 16 11 6
VILPOP 458 137 72 39
COSTH 1.60 1.33 2.51 3.02
SERVICE 1.9 1.5 1.5 0.7
DISTWN 45 44 78 128

FALO 2.0 4.4 5.3 4.4
LUI .67 .46 .41 .42
FLDINT 3.9 2.3 1.4 1.2
NOTYPE 7.9 4.1 4.7 3.8
FLDTYP 4.7 3.9 4.0 3.8

(continued next page)









Table 13: (continued)

Variable 1 2 3 4

DEPSF 4.2 3.8 4.0 3.3
DEPHT 6.8 6.3 4.4 2.8
DEPVL 2.2 1.9 1.2 1.2
SELLC 11.7 11.1 11.4 9.5

ACROP 4.2 4.9 4.0 4.9
ATREE 13.1 9.1 9.2 12.1
APROC 16.2 16.9 16.9 18.5
ALVSK 13.1 14.8 15.2 15.8
AFISH 22.9 19.2 17.8 13.6
AHUNT 16.1 15.2 17.4 16.1
ACOCM 20.2 22.1 22.4 22.5
AWAGE

1. Based on variance weighted scores for first four rotated components of complete model.


The effectiveness in zonation reflected in Table 13 can be compared to Table 2 which
showed means according to ecological zones. For many variables, there is clearer zonal
differentiation using a score classification approach--even though ecological zones in
Cameroon represent greater extremes than in most countries.

Spatial contiguity of zones is reduced but, in practice, not eliminated when using percentile
classification. Figure 6 shows a geographical plot of village location for sample villages.
The numbers give class assignment. The map covers 2 degrees north latitude to 12 degrees
50 seconds, by 9 degrees east longitude to 15 degrees east longitude. The boundary of
Cameroon is not shown, but should be clear by comparison to Figure 1.

As can be seen in Figure 6, the most resource intensive zones surround the various urban
centers of Cameroon. The higher population and intensity of the western highlands is well
identified, as is the less developed and integrated pattern of the forest zone south of Yaounde
(where there is a block of villages in the fourth quartile). Even the corridor from Yaounde,
in the center of southern Cameroon, going toward the western highlands, where there is
relatively high population and intensified production, is shown (blocks of first and second
quartile).











Figure 6: Quartile zonation based on variance weighted scores of first four
components of adjusted complete model


1
4 a












a
4
1
a


r4


4.4 Clustering

The first approach to clustering was based on indices for resource pressure-response derived
from the subject matter models. For the FIELD, DEPENDENCE and MARKET models,
the first unrotated component was used. In the first two models, additional components
account for small variation (eigenvalues less than one). The MARKET model revealed three
distinct sources of variation (Table 3), but the first unrotated component alone accounted for









32 percent of variation in the model variables. Without rotation, all variables except
SERVICE had loading greater than .49 with expected signs (Table 14), so scores derived
from this component well reflect the market proximity-population dimension.

The activity model posed a problem due to the substantial effect of ecology on livestock,
fishing and tree crop production. As in the score classification approach, these activities
were dropped and new components derived for use as indices. Two distinct, interesting,
components were revealed, so both were included in the cluster analysis. The first
component (Table 14) represents the nature of secondary income generation: through
processing for sale or non-agricultural activities. The second represents modernization,
identified from relative emphasis on food crop production and hunting versus wage
employment.


Table 14: Loadings for MARKET and ACTIVITY indices used in subject matter clustering

MARKET 1 ACTIVITY 1 2

DISTWN .56 ACROP -.11 .80
DISURB .62 APROC -.75 .18
POPDEN -.73 ABUNT .65 .49
COSTM .49 ACOMM .65 -.21
VILPOP -.63 AWAGE .13 -.68
SERVICE -.34



Hierarchial clustering with the five subject matter indices gave three large clusters and five
small clusters at a single distance measure. Subsequent increase in within cluster distance
would have led to merger of two of the large clusters. This can be seen in the denogram
presented in Figure 7. As an alternative, four of the small clusters which joined at a slightly
greater distance were put together, shown at clusters one and four.










Figure 7: Partial denogram
matter indices


for hierarchial clustering using subject


To characterize the various clusters, group means were calculated for the variables used in
the model. These are given in Table 15.


Table 15: Cluster size and mean component scores for subject matter model clusters

Cluster ACTIVITY
Size DEPEND MARKET FIELD 1 2

First 11 1.60 -1.10 1.30 .55 .86
Second 19 .06 -.50 -.11 -.10 .38
Third 22 -.55 .08 -.41 .61 -.57
Fourth 6 .73 .49 .85 -1.54 .44
Fifth 19 -.59 1.12 -.39 -.51 -.59
Sixth 2 1.71 -1.05 -.63 .30 2.80


Villages in cluster one represent the high end of a population pressure-resource
intensification spectrum. They have high field management intensity, high food purchases,
and greater than average emphasis on food processing for sale and wage employment. The
small cluster, number six, is similar to cluster one, but the latter has even greater field


Cluster Villages

6 2

4 2

4 4


2 19


1 7

1 4

3 22

5 19









management intensity. The second large cluster, cluster two, has moderately high market
integration-population pressure, but in most respects represents a mid level of development.
This is seen in the number means near zero. Clusters three and five include villages which
are less integrated and intensified. In both, the traditional activities of food crop production
and hunting have above average importance. Food purchases are low relative to sales,
identifying these areas as staple food self-sufficient. The main distinction between clusters
three and five is greater population pressure and market integration in cluster three villages.
Correspondingly, processing for sale is more important in cluster three villages. Cluster four
villages are at a moderately high level of resource pressure-intensification, not as much as
cluster one villages, but more so than villages in the other clusters.

A geographic representation of subject matter clustering is given in Figure 8.


Figure 8: Zonation based on clustering subject natter nodel scores


42


2 U

2








22

?' (>(^ ^^S



t --- s) \
^-^ \\








Several patterns correspond to those identified with development score classification. The
most integrated villages, where resource management intensity is greatest, are again near
urban centers in the north and highlands. Of particular interest in Figure 8 are the patterns
shown for clusters three and five. Villages in both clusters are more distant from urban
centers and, in most cases, from towns and local markets. Cluster five villages are even
more isolated than are cluster three village.

The clusters shown in Figure 8 have relatively distinct characteristics, but they do not
necessarily represent the most important dimensions of development. This is because the
individual subject matter dimensions are not weighted. Thus food dependence is equally
weighted with field management intensity, whereas one or the other might empirically be a
more significant dimension of development. An alternative is to cluster on the basis of the
major components of the complete model. This clustering will distinguish villages with
respect to the major sources of variation in resource pressure-intensification. For this
analysis, scores were calculated for the first four components. Interpretation of the scores
corresponds to the loadings shown in Table 7:
a) population driven land use intensification
b) market influenced diversification and food dependence
c) dominant non-food crop enterprises (livestock versus trees)
d) food crop production-sales dominance

Hierarchial clustering with the four complete model component scores gave four large clusters
and six small clusters at the same within group distance. As in the subject matter cluster
analysis, the cutoff within group distance was chosen to avoid further clustering of two large
clusters. The pattern of final stage clustering is shown in the denogram in Figure 9. After
review of component scores, four small clusters were clustered into two, following the
denogram structure but at greater within group distance.

To characterize the resulting eight clusters, cluster means were calculated for the four
component scores used for clustering. Results are shown in Table 16. Villages in clusters
seven and eight have high scores for population pressure-field intensification, while villages
in clusters one and five have low population-intensification scores. With respect to market
integration-diversification, clusters with high scores are four through six, and eight. Thus,
cluster eight represents the most intensified and differentiated resource use pattern. Clusters
two and three have low scores for the market integration dimension of development. Villages
in clusters five and seven have a strong emphasis on livestock, in contrast to villages in
clusters three, four and six where the emphasis is on tree production. The dominant-crop-
production-sales pattern distinguishes villages in clusters one, two and eight from those in
clusters three, six and seven.










Figure 9: Denogram for component score clustering


Table 16: Cluster size and mean
for complete nodel clusters


component scores


Size COMP1 COMP2 COMP3 COHP4

First 17 -.60 .11 -.29 .67
Second 13 -.05 -1.05 .03 .62
Third 12 -.01 -.82 -.66 -.82
Fourth 15 .05 .69 -.67 -.39
Fifth 9 -1.04 .62 1.74 .23
Sixth 2 -.11 1.57 -.70 -2.61
Seventh 5 1.23 -.48 1.88 -1.37
Eighth 6 2.18 .80 -.19 1.04


Comparison across component means is useful for distinguishing clusters. For example,
villages in clusters three and four both have above average emphasis on tree production, but
cluster three villages are areas with lower population and market integration. Similarly,
cluster five and cluster eight villages score high in market integration, but cluster five villages


Cluster Villages

5 2

5 7

6 2

4 15

3 12

2 13

1 17

8 6

7 3

7 2








score low in population pressure-intensification.


Geographic representation of the eight identified clusters is shown in Figure 10. The most
intense and differentiated pattern of resource use in Cameroon, represented by cluster eight,
is shown to be the western highlands. The relative isolation of villages in clusters two and
three are clear, seen by their location in relation to urban centers. Cluster one villages are
mainly located in the forest-savanna transition zone, where there is low population but little
tree crop production or livestock, resulting by default in a high activity emphasis on crop
production and sales.

Figure 10: Zonation based on clustering four components of complete model


a 2
f -^~ *""2


%S








The corridor of villages in cluster four represents an interesting and useful domain, where
market integration and diversification is high, as is the emphasis on tree crop production.
These villages run from a high population division in the urban periphery of Yaounde up to
the western highlands, encompassing the much of the most important coffee-producing areas
in Cameroon.

5. DISCUSSION

5.1 Implementation and Interpretation

There is a great deal of similarity among principal components, canonical correlation and
linear discrimination analysis in implementation and interpretation. In all three models,
matrices are decomposed into their eigenvalues and corresponding eigenvectors. Orthogonal
rotation is used, as in the above examples, to improve interpretation while retaining the
desirable feature of no correlation between variates. Correlations between original variables
and the derived linear variates, referred to as loadings, are used to identify dimensions (or
factors) represented by the significant variates. If the hypotheses are correct, the component
loadings should show the anticipated patterns or linkages. Important variables will have high
loadings. Signs of particular variables are interpreted relative to signs of other variables;
signs per se do not mean anything (Wilkinson, 1990).

Alternative approaches exist to those shown in Sections Two and Three. The most important,
for interpretation, is the difference between rotation and not. When variates are not rotated,
the first variate accounts for maximum variation (in principal components), variate
correlation, or group discrimination. This variate is used for assessing hypothesized linkages
and serves as a general index--whether for activity pattern, land use intensification or
whatever else. If rotation is done, which is the normal practice, one encounters the problem
of how many components to rotate. The eigenvalue-greater-than-one guideline is a good
starting point, but flexibility is needed; particularly when working with large models. There
is a reasonably clear trade-off: when fewer components are rotated, more variables will have
higher loadings. This is helpful when exploring hypotheses about possible system patterns,
but is less useful for creating indices, due to problems with confounding and interpretation.
Rotating a larger number of components is useful for isolating covariation among smaller sets
of variables, after variation due to other variables is removed. This, for example, was a
valuable outcome of rotating additional components in the complete components model.

Once the basic concepts of principal components, canonical correlation and linear
discrimination analysis are understood, and the mathematical similarity appreciated, it is a
short step to begin differentiating the methods with respect to research application.

The components model does not impose structure, nor are statistical properties particularly
important. In effect, components analysis is straightforward matrix decomposition, relying
on subject matter knowledge and simple rules-of-thumb for interpretation. An important use
of components analysis is to identify patterns for characterizing and distinguishing systems,








and refining general multivariate hypotheses. In the Cameroon study, principal components
analysis was particularly useful in refining hypotheses about market versus population induced
system dynamics. Identified patterns and refined hypotheses can be used for rapid appraisal
checklists when assessing circumstances in new localities. Scores derived from the
components analysis give indexes for identified patterns (or underlying components) which
can serve as variables for further analysis.

Principals components obtain due to variable covariance (or correlation). As a result, there is
no reason a prior that potentially interesting patterns linking exogenous circumstances and
system performance or dynamics will be revealed. This was found in the third rotated
component of the complete model. Canonical correlation analysis avoids this problem.
Correlation between sets of variables, rather than within sets, is maximized so linkages
between a "independent" and "dependent" set are necessarily identified. In the Cameroon
study, canonical correlation analysis helped identify secondary determinants of livestock
importance, aside from agro-ecology, and suggested an interesting linkage between local
market access and land use intensification. These were "hidden" in components analysis by
greater variable correlations.

Another role for canonical correlations analysis is situations in which a set of independent
variables is so highly correlated that multiple regression is unlikely to reveal significant
relationships between individual exogenous variables and component analysis-derived indices.
This clearly was an element affecting analyses with the Cameroon data set. Several variables
with high loadings in the complete components model and the various canonical correlation
models did not show up as significant in multiple regressions of subject matter component
scores.

Linear discriminant analysis is useful for determining whether zones are different and, if so,
how they differ. The question of "whether" is in practice less interesting than "how." This
is for two reasons. One is that zones used in discriminant analysis often have been defined
on the basis on institutional mandates, or study objectives. In the Cameroon study, little was
found to discriminate between the two forest zones, but the entire research system is
structured on an ecological zone basis (which, in fact, does not differentiate between
evergreen and semi-evergreen forests). This is not going to change because of multivariate
analysis results. The second reason is equally pragmatic. It is difficult to obtain two large
independent samples, one for model development and the other for validation. Therefore, in
most applications, the issue of "whether" cannot be statistically verified, only assessed in the
sense of a working hypothesis.

How zones differ leads to the pattern analysis aspect of discriminant analysis. One issue of
interest is whether certain sets of variable differ across zones. In the Cameroon study, rural
income activity patterns differed greatly by ecological zone, whereas market access
circumstances did so to a lesser extent. Comparison of discriminant function and component
patterns is particularly useful for evaluating which variables differ by zones and which do
not.








Linear discriminant analysis requires prior group or zone assignment. Percentile
classification and clustering are valuable methods for identifying groups when there are not
predefined groups for systems analysis.

Percentile classification is easy to implement and interpret. This is why classification and
grouping is already quite common in farming systems research. The distinguishing feature of
the approach presented in Section 4.3 was use of a score derived through multivariate
analysis. The outcome of score classification is to create domains which have similar
circumstances, either disregarding agro-ecology or within ecology (not shown in this paper
but easily done as a second stage analysis following zonation). In the Cameroon study, the
score used for classification was based on resource use intensification but any other set of
criteria could have been used.

Clustering is quite different from the other multivariate methods, in implementation and
interpretation. There is no model structure, variance is not addressed, and the only variable
weighting used is an implicit weight of one. In clustering for the Cameroon study, several
options were tried: all variables separately, subject matter indices, complete model
components, and weighted intensification score. Each was assessed using different linkage
methods. This exercise showed that clustering is an art, more than it is a science. Different
clustering patterns were obtained depending on variable treatment and linkage method, even
though the underlying data were the same. Single linkage gave too many small clusters to be
particularly useful for FSRE zonation. Clustering using many variables gave complex
patterns with several small outlying clusters only joining after all other clusters had already
merged. At least with the Cameroon data, the best combination was clustering with a small
set of variables, all of which could be accepted as having equal weight, using complete
linkage. Even with this approach, some subjective adjustments were needed, merging a few
small clusters at greater within cluster distance.

Once clusters are formed, the clusters must still be identified. In the Cameroon study, this
was done using descriptive statistics. Even this straightforward approach led to some
interpretation problems. Clustered observations are similar only in the sense of being more
alike each other than they are observations in other clusters. This does not mean they are
sufficiently alike to form a coherent domain for systems research. To better understand
cluster features and ensure reasonable cluster coherence, it is advisable to diagnose patterns
within clusters using either principal components or linear discriminant analysis. This was
done in the Cameroon study, revealing interesting differences in food dependence, market
linkages, field intensification and rural income generation depending on stage of resource
pressure-response (ie., within and between cluster group analysis; results not presented).

5.2 Strengths and Weaknesses

Because of the similarities of principal components, canonical correlation and linear
discriminant analysis, they share certain strengths and weaknesses. An obvious strength is
capacity for empirically assessing multivariate relationships. It is not entirely clear why








technical research in FSRE uses statistical techniques based on variance analysis, whereas
most diagnostic research is based on intuition and descriptive statistics. Given the normal
level of analytical sophistication in FSRE characterization and diagnosis, it is no wonder
many experienced researchers now rely on participatory methods rather than data analysis to
achieve understanding of system behavior and dynamics.

Shared weaknesses of the methods stem from their basis in matrix decomposition. Data
variance and covariance are handled differently in each model. Consequently, use of any one
model by itself can lead to mis-interpretation of relationships. To avoid this, it is particularly
important to understand how the different models are decompose variance. This can be
explained, and shown mathematically but, as with much research, intuition comes with
experience.

Another common problem is judging what is important relative to study objectives. Not all
significance components, variates or functions are of particular interest from a development
theory or pattern analysis perspective. This is even more true for variate loadings. Both
variates and loadings with variates are often due sample specific circumstances.

For pattern analysis, the principal components model seems less fraught with danger for
interpretation than is canonical correlation analysis. Certainly, the tendency in canonical
correlation analysis to attribute causality is not as likely in components interpretation.
However, a legitimate concern raised with respect to principal components analysis is lack of
a model structure based on theory and explicit hypotheses. Without adequate formulation of
hypotheses and foundation in subject matter knowledge, patterns and linkages unjustly can be
assumed to have general validity when they obtain due to sample specific relationships.
Potential problems with uncontrolled empiricism can minimized by starting with a clearly
defined set of multivariate hypotheses and only including variables which relate to the
hypotheses. Moreover, as the Cameroonian study shows, components analysis lends itself to
hypothesis testing through two-stage analysis; construction of indices using components
analysis followed by regression against exogenous variables.

The canonical correlation model has an independent versus dependent variables set
interpretation which conforms better to scientific enquiry. Linear discrimination has an even
stronger structure since groups are pre-defined. Canonical correlation and linear discriminant
analysis also have statistical properties which facilitate interpretation, but these depend on an
assumption of multivariate normality. This can complicate implementation. While both
models are relatively robust, care is needed in interpretation and validation should be a
standard practice. Ideally, models and hypotheses should be explored by analysis of multiple
sub-samples of the data (or verified against new data). Whether this is essential depends
greatly on the intended use. For the most part, these techniques are valuable in developing
insight and refining hypotheses; they are not good techniques for proving hypotheses.

A difficulty encountered in both canonical correlation and principal components is treatment
of grouping and ordinal data. The usual practice of decomposing product moment correlation








matrices is not correct if ordinal data dominant a model. In this case, rank order correlations
(eg. Spearman) should be substituted for Pearson correlations. More serious is the problem
of categorical data, particularly if one or more key categorical variables is confounded with
model hypotheses. For principal components analysis, two solutions are feasible and
relatively easily implemented. One is to regress all variables on the main grouping
variable(s), such as ecological zone. Then use a correlation matrix of residuals for the
components analysis. Alternatively, and preferably, if sufficient cases are available, do the
components analysis on within group sub-samples.

In essentially all respects, zonation achieved through classification and clustering corresponds
well to perceptions based on experience; only this zonation is empirical, with slightly
different emphasis depending on variable treatment. This in itself represents an improvement
relative to use of pre-defined zones or qualitatively defined groups.

The main strengths of percentile classification are logical clarity and easiness to implement.
Difficulties arise when potential classification criteria cannot be reduced to a single
dimension. Another problem is arbitrariness in defining group boundaries. However, this
second problem can be overcome by deriving a single measure and then determining groups
by clustering.

A particularly valuable contribution of both classification and clustering is ability to escape
the bonds of spatial zonation. Zonation can be done on any quantifiable dimension, and
zones might be contiguous or not. Such flexibility in zonation is likely to be necessary in
applications of FSRE to high income agriculture. While classification and clustering do not
necessarily lead to spatially contiguous zones, villages (or farmers) sharing several
characteristics often are near each other in a spatial sense. As a result, spatial grouping
generally is still feasible following classification and clustering.

Between the two clustering approaches, the subject matter index approach gave a clearer
identification and interpretation of clusters. This probably would be the preferred approach if
an analyst was confident of dimensions to be considered in zonation, and if those dimensions
can be represented by relatively few subject matter indices.

The main strength of the second clustering approach is use of the major sources of variation
in the rural economy (at least as measured by model variables) for zonation. While the basis
for clustering is more substantial, by comparison to use of selected indices, cluster
membership is more diverse. For example, several villages having low scores for crop
production-sales dominance actually had above average rankings for crop production. Low
scores for the component resulted from other variables included in the component (such as
services and crop sales). Thus, zonation based on general components is not necessarily
efficient for targeting.








6. CONCLUSIONS


This paper has addressed one of the most critical aspects of the systems research methodology
used in FSRE: system definition and characterization. System, or research domain, definition
in low income countries has relied primarily on ecology and cropping patterns. While several
FSRE programs have differentiated domains within agroecological zones, statistical methods
for pattern analysis and zonation using socio-economic criteria remain relatively unexplored.
This notable weakness in FSRE methodology, stemming from emphasis on rapid appraisal
diagnosis, might well be responsible for lack of substantial and sustained impact in areas
where highly profitable technologies are not quickly identified.

Legitimate arguments can and have been presented in support of ecology zonation, rapid
appraisal techniques and researcher experience--especially when considering the unique
circumstances of resource poor farmers and limited capacity of public sectors in low income
countries to provide individualized research and extension support. In the face of challenging
organizational and structural dynamics in high income agriculture coupled with declining
public sector resources, FSRE methods have a role to play beyond a developing country
context. However, lack of refined system characterization and domain definition could prove
to be a fatal flaw in adaptations of FSRE to high income agricultural systems.

For both low and high income agricultural systems, refined pattern analysis and zonation can
improve systems diagnosis and targeting. Particularly important is the need to go beyond
ecology-based zonation by more explicitly taking into account system dynamics affecting
farmers' circumstances, incentives and resource use. The challenge to improve the quality of
diagnostic analysis can be met through use of multivariate statistical methods--including
principal components, canonica linear discriminant and cluster analysis.
correlL Ion1








ANNEXES


A. Sampling Methodology and Enumeration

Stratification. Cameroon was divided into four agro-climax vegetation strata, derived from
rainfall, altitude and dry months by the Agroclimatological Studies Unit of IITA. The large
semi-evergreen forest stratum, as identified by IITA, was sub-divided into six -sub-strata,
taking into account provincial boundaries and ecological zones used for research planning by
the national research system. The evergreen stratum was similarly sub-divided into three sub-
strata.

Sampling unit. Sampling units were all ten minute longitude by ten minute latitude squares
falling in each stratum. Stratum membership was based on the center of the square for
squares on the border. Maps (200,000:1) were used to identify and eliminate cells with no
villages. These were considered to be areas not having village-level resource management
even though isolated households and individuals might be resident. Cells having at least one
village were randomly selected within each stratum. Sampling fractions ranged from nine to
eleven percent.

Enumeration unit. The enumeration unit for selected cells was the village nearest to the
center of the geographical square (an area of 18 kilometers by 18 kilometers). In a small
number of cases where the selected village no longer existed or was inaccessible, the next
closest village to the center of the square was used. If there was a single village in a cell,
and it no longer existed or was inaccessible, an alternate cell was randomly selected. There
were only a few cases of villages being eliminated due to inaccessibility.

Implementation. Survey implementation was carried out at village meetings called by the
village chiefs. All villagers were encouraged to attend. The village interview took place in
two stages: a village meeting following by three concurrent small group interviews.
Questions on food consumption, rural activities and cash revenue were addressed to a plenary
meeting. The remaining questions were posed in the small group interviews. All villagers
were encouraged to attend. An attempt was made to ensure even spread of women across the
groups. In villages where too few women attended to make this feasible, women were
concentrated in group interviews covering land and vegetation, and crop management).

B. Derivation of Ranks and Indices

Activity ranks: total of three rankings, each from 1 (highest) to 9 (lowest), for men's labor
time, women's labor time, and cash revenue.

Food sufficiency indexes: total of qualitative scores for proportion of main foods purchased
outside the village. Scores: 0 none, 1 little, 2 half, 3 most, 4 all. Number of foods
in index: staple foods 6; meats 3; vegetables and legumes 3.









Sell crop index: total of qualitative scores for proportion of households selling main crops.
Scores: 0 none, 1 little, 2 half, 3 most. Crops in index: groundnut, cassava, maize,
and plantain.

Land use intensity index: one if continuous cropping; otherwise, ratio of seasons with crops
to sum of seasons with crops and seasons in fallow.

Field types index: number of following field types found in village, ranging from one to six.

Field intensity: sum of crop and soil management practices--such as tillage, residue
incorporation and manuring--ranging from zero to thirteen.

Village services: sum of services active in village, including extension, credit associations,
veterinary services, cooperatives; range from zero to five.

Distance to town: sum of four "time-adjusted" kilometer distances to town: paved roads
divided by 1.0; graded dirt roads divided by 0.7, unkept dirt roads divided by 0.4; paths
divided by 0.2.








REFERENCES


Agricultural Technology Improvement Project (ATIP). 1986. Farming systems research at
Mahalapye: Summary of findings. 1982-1985. ATIP Paper No. 1. Gaborone: Ministry of
Agriculture.

Boserup, E. 1965. The conditions of agricultural growth: The economics of agrarian change
under population pressure. New York: Aldine Publishing Company.

Byerlee, D., M. Collinson, et al. 1980. Planning technologies appropriate to farmers:
Concepts and procedures. El Batan, Mexico: CIMMYT.

Dillon, W., and M. Goldstein. 1984. Multivariate analysis: Methods and applications.
New York: John Wiley & Sons.

Franzel, S. 1992. Extension agent surveys for defining recommendation domains: A
case study from Kenya. Journal for Farming Systems Research-Extension 3: 71-86.

Harrington, L., and R. Tripp. 1984. Recommendation domains: A framework for on-
farm research. Economics Program Working Paper 2/84. El Batan, Mexico: CIMMYT.

Hildebrand, P., and F. Poey. 1985. On-farm agronomic trials in farming systems
research and extension. Boulder: Lynne Rienner Publishers.

Karson, M. 1982. Multivariate statistical methods. Ames: The Iowa State University
Press.

Mardia, K., J. Kent, and J. Bibby. 1979. Multivariate analysis. London: Academic
Press.

Wilkinson, L. 1990. SYSTAT: The system for statistics. Evanston, IL: SYSTAT Inc.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs