Historic note

Group Title: AREC-H research report - Agricultural Research and Education Center-Homestead ; SB-82-2
Title: Analysis of nematode survey data by computer
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00067849/00001
 Material Information
Title: Analysis of nematode survey data by computer
Series Title: Homestead AREC research report
Physical Description: 10 leaves : ; 28 cm.
Language: English
Creator: McSorley, R ( Robert )
Agricultural Research and Education Center, Homestead
Publisher: University of Florida, Agricultural Research and Education Center
Place of Publication: Homestead Fla
Publication Date: 1982
Subject: Nematoda -- Data processing -- Florida   ( lcsh )
Genre: government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
non-fiction   ( marcgt )
Bibliography: Includes bibliographical references (leaf 6).
Statement of Responsibility: Robert McSorley
General Note: "June 23, 1982."
 Record Information
Bibliographic ID: UF00067849
Volume ID: VID00001
Source Institution: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: oclc - 72539270

Table of Contents
    Historic note
        Historic note
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
Full Text


The publications in this collection do
not reflect current scientific knowledge
or recommendations. These texts
represent the historic publishing
record of the Institute for Food and
Agricultural Sciences and should be
used only to trace the historic work of
the Institute and its staff. Current IFAS
research may be found on the
Electronic Data Information Source

site maintained by the Florida
Cooperative Extension Service.

Copyright 2005, Board of Trustees, University
of Florida

Homestead AREC Research Report SB82-2

SAnalysis of Nematode Survey Data by Computer

-I" Robert McSorley
,/" q/ Assistant Professor
University of Florida, IFAS
Agricultural Research and Education Center
Homestead, Florida 33031 .i


Surveys measuring frequencies and/or population levels of plant-parasitic nematodes
in soil samples taken from a specific crop or geographical region have long been an
important tool in obtaining information on the distribution, abundance, and plant
associations of plant-parasitic nematodes. Since nematology is a young science, it
is likely that the nematological survey will remain a useful tool as long as new
crop-nematode associations are explored in various geographical regions. Some re-
cent examples have examined nematode occurrence in apple orchards in Quebec (12)
and on bananas in the Philippines (4).

Analysis of data collected in nematological surveys can be tedious, due to the large
number of samples involved. The use of a computer can greatly facilitate the analy-
sis of survey data, and may reveal relationships which otherwise would be extremely
difficult to detect (8). As reliance on nematode diagnostic services (3) increases,
it is likely that much incidental data on nematode numbers and plant associations
will be accumulated by regional diagnostic labs. Such data bases could, in a sense,
function as nematological surveys, providing detailed information on nematode dis-
tribution in a given region. However, the organization and interpretation of large
amounts of data are greatly restricted without the use of the computer.

This report has been developed to demonstrate simplified methods for using the com-
puter to analyze survey data. Data from a specific nematode survey will be used
as an example. The methodology presented here could be adapted to surveys conducted
in related disciplines, such as entomology or plant pathology. The examples pre-
sented here use SAS, the Statistical Analysis System (6), as the data analysis sys-
tem of choice since it is readily available at the University of Florida. Other
data analysis systems such as SPSS, the Statistical Package for the Social Sciences
(11), could be used, provided that the structure of the data set is changed to meet
format requirements.


A survey of nematodes associated with mango (angifera indica L.) in southeastern
Florida is used as an example to demonstrate data analysis by the computer. Re-
sults of this survey, emphasizing the relationship of Hemicriconemoides mangiferae
Siddiqi to tree condition, have been presented elsewhere (9).

The survey consisted of 123 soil-samples collected from 20 different mango groves
in Dade County, Florida, with the number of samples collected per grove based on
the size of the grove (9). Soil samples were colleSted to a depth of 15 cm from
three locations around each of five trees. A 100cm portion of each sample was
processed for nematodes by a combination of sieving and centrifugation (7). Counts
of plant-parasitic nematodes in each soil sample were obtained.

June 23, 1982


The data from the soil samples alone would be sufficient to develop tables showing
the frequency of occurrence and average population level of each nematode found in
* the mango survey. However, the usefulness of the survey was greatly enhanced by
including detailed supplementary information in addition to the nematode counts
obtained from soil samples. Table 1 illustrates the categories of supplementary
information obtained at various sampling sites. This list can serve as a guide-
line in the selection of categories of information which may benefit a survey.
Additional information, such as local weather data, may be pertinent to a survey
as well.

In the mango survey, detailed data on tree condition and weed density were collected
along with each soil sample. Each of the five trees comprising a single sample were
rated for condition on a 1-6 rating scale, where 1 = healthy; 2 = first signs of
decline; 3 = unthrifty, with bare twigs evident; 4 = many bare twigs and some dead
branches; 5 = nearly dead; 6 = dead (10). A mean rating of tree condition was then
calculated for each sample.

Weed density was rated by estimating the percentage of ground covered by weeds in
a 2.0 m radius around the trunks of the five trees of each sample. Ground cover was
rated on a 1-5 scale, where 1 = weeds absent; 2 = trace (0-10% of ground covered);
3 = light (10-30% of ground covered); 4 = medium (30-50% of ground covered); 5 =
heavy (more than 50% of ground covered). For each sample, a rating was obtained
for each of the dominant weed species present as well as a composite rating of all
weeds together.


It is most convenient if data for analysis are entered into the computer programs
S in either a numerical or a coded form. Numerical values of nematode counts, tree
condition, weed ratings, and tree age were entered directly into a data set for
analysis by SAS, the Statistical Analysis System. Values for other variables were
coded (Table 2); e.g. in a given sample, if a plant was on Turpentine rootstock,
a value of "T" would be entered for rootstock in that case. In SAS, codes can be
either numerical, alphabetical, or combinations of both. An additional code was
established for grove locations and growers, for which each grove was given a num-
ber from 1-20.

Certain supplementary information (variables) were not entered into the computer
data set at all. For example, no soil fumigant had been used at any site, and
soil type was similar for 19 of the 20 groves surveyed. Accordingly, almost no
additional information could be obtained by including them in a data set. Counts
of very rare nematodes, such as Hoplolaimus sp. or Paratylenchus sp., were also
not included in the data set. If a computerized record of such items were desired,
a separate file could be established for them, different from the SAS programs
discussed here.


Table 3 illustrates the structure of the SAS data set for the mango survey. Table
3 is a simplified program which can be used to print out the data. This program
was run at the University of Florida on an Amdahl 470 V/6 IT computer via a remote

* The first three lines of Table 3 are job control statements, which will vary from
system to system or even from operator to operator (2). The data statement (line 4)


names the data set. It can be especially useful if more than one data set is
involved. For example, if several years of data are involved, an individual data
* set could be constructed for each year. This would enable the researcher to ex-
tract and analyze data from only one year, if desired. The INPUT statement
(lines 5-6) specifies the format by which the data will be entered. The CARDS
statement (line 7) indicates that each following line of data will be entered on
a separate computer card or on a separate line on a remote terminal. The fol-
lowing lines in Table 3 contain the data from the 123 samples in the survey, with
data from one sample occupying one line in the program. Various commands for
analyzing the data are placed after the data lines.

The most important feature in facilitating the analysis of survey data by computer
is the organization of the data for each sample into a SAS data set, since data or-
ganized in such a manner can be readily subjected to any of the analyses available
in SAS (6). By the same token, if an alternative data analysis package, such as
SPSS(11), is used, all of the analyses in that package can be easily executed once
the samples are organized according to the appropriate data formats.

In this example, each data line of Table 3 contains data from a single sample,
entered according to the formats specified in the input statement. The code words
following the word INPUT are the names of the variables used in the data set. Vari-
able names are defined in Table 4. In addition, the values of the variables for
the first sample are indicated, corresponding to the first line of data (line 8,
Table 3). Note that individual variable names in the input statement as well as
individual numbers in a data line are each separated by one space. This is the
simplest form of SAS input statement; more complex alternatives are available (5,
6). In addition, note that both VART and ROOT are each followed by a "$" sign
in the input statement. The reason for this is that the values used for these
* variables are not numerical, but may contain letters. The value "." entered for
MOIS in the last data line of Table 3 indicates a missing data point.

If a user prefers not to have the data contained in the program as in Table 3, it
is possible to establish separate data files using the same formats for each data
line specified here. Data can then be read into the SAS program from the separate
data file which has been maintained on either disk or tape storage. Specialized
control commands are required for this (1), and these can vary from system to


Numerous statements are available in SAS for examining and analyzing data sets
(5, 6). The following examples will illustrate the use of some of those state-
ments which could be useful in examining survey data. The PROC PRINT and TITLE
statements of Table 3 will cause all of the data to be printed in a table entitled
"Mango Data File."

Means and standard deviations of variables used in the survey can be obtained by
use of the SAS command PROC MEANS. Table 5 illustrates several alternative SAS
statements which can be used in a variety of situations. These statements can be
placed in the SAS program after the TITLE statement in Table 3. Use of the PROC
MEANS statement alone will cause the mean, standard deviation, maximum and minimum
values, number of observations, sum, variance, standard error, and coefficient of
variation of all variables in the data set to be printed. Use of the VAR state-
ment in Table 5 will cause those statistics to be printed only for the variables

HEMI and ROT, rather than for every variable in the data set. The first two
statements in Table 5 will obtain means of Hemicriconemoides mangiferae and
* Rotylenchulus reniformis counts over all of the samples in the data file. Often
it is desirable to obtain means for only certain categories of data rather than
over the entire data set. An example is given showing means arranged by type of
rootstock (Table 6). The statements needed to develop a table of this type are
given in lines 3-7 of Table 5. In order to compute means by rootstock, the data
set must first be sorted by rootstock value. This is the function of the state-
ments PROC SORT and BY ROOT. These statements must precede the statements PROC
MEANS and BY ROOT, which actually compute the means by rootstock. The VAR state-
ment indicates that means by rootstock are to be obtained only for the variables
HEMI and ROT, rather than for all variables in the data set. Use of the above
statements illustrate the advantage of supplementary data in the nematological
survey. They provide the possibility of computing means for as many categories
as are available in the survey.

Frequency distributions can also be obtained by SAS. Frequency distributions for
counts of H. mangiferae and R. reniformis can be obtained by the statements:


These statements are placed near the end of the SAS data set before or after the
PROC MEANS or PROC PRINT statements; the exact location is not critical, provided
that: 1) these statements are placed after the actual data lines in the program,
2) all statements for the same procedure are together, and 3) PROC SORT must be
performed before any procedure which uses SORT. Use of these statements will cause
a tabled frequency distribution to be printed for the variables HEMI and ROT.


will cause the frequency distribution For HEMI to be printed in the form of a bar
graph. More detailed frequency distributions can be obtained by using PROC FREQ
or PROC CHART in conjunction with PROC SORT and BY statements (5).

Several other statements can be used following the data lines in Table 3. The


will obtain correlation coefficients between all pairs of variables in the data

The two statements


will obtain a correlation coefficient only between TC (tree condition) and HEMI
(H. mangiferae counts). PROC CORR can also be used with PROC SORT and BY to ob-
tain correlation coefficients for data sorted into categories. The statements



will cause a graph to be printed. The PLOT statement indicates that ROT (R.
reniformis counts) will be on the Y-axis and BP (Bidens pilosa density) on the
* X-axis. Additional statements useful in analyzing survey data are available in
SAS as well (5, 6).


Occasionally it may be necessary to transform survey data prior to subsequent
analyses. For example, a logarithmic transformation of the H. mangiferae counts
can be accomplished by the statement:


A statement of this type should be placed just before the CARDS statement of
Table 3. A new variable, LHEM, is created by this statement, and can be used
in any of the future analyses performed on the data set.

Additional variables can be calculated mathematically by means of statements placed
before the CARDS statement in the SAS program. For example, the equation


will compute the total numbers of plant-parasitic nematodes per .00cm of soil
in each sample. The new variable TOTAL is then available for subsequent analyses.

Occasionally it is useful to focus on only a few samples in the data set which have
a specific characteristic. This can he accompli shed by the use of IF statements.
For example, suppose that an analysis is to be performed only on samples having lH.
S mangiferae counts greater than 100 nematodes per 100cm of soil. This can be ac-
complished by inserting the following statement between the INPUT and CARDS state-
ments in the data set:


This statement instructs SAS to include only samples with HEMI values of more than
100 in any subsequent analyses performed by the program. An equivalent way of
accomplishing this is to ask that any samples with values less than or equal to
100 be ignored in the data analyses. The required SAS statement is


Either of these two statements could be used to obtain similar results. IF state-
ments are very useful in searching for specific information in large data sets.
A more extensive description of the use of IF statements has been presented else-
where (5).

The above examples have attempted to illustrate: 1) the types of data that can be
included in a survey, 2) a method for organizing survey data for analysis by com-
puter, and finally 3) the use of some of the more basic statements available in
SAS for data analysis (6). The salient feature of these procedures is that once
the survey data are organized into and stored in a data analysis system, such as
SAS, almost anyof the analyses available in that system can be performed on the
data in the future.


1. Anonymous. 1978. NERDC user's manual utilities. Northeast Regional Data
Center, Gainesville. 138 pp.
2. Anonymous. 1979. NERDC user's manual general. Northeast Regional Data
Center, Gainesville. 150 pp.
3. Barker, K. R., and C. J. Nusbaum. 1971. Diagnostic and advisory programs.
In B. M. Zuckerman, W. F. Mai, and R. A. Rohde (eds.), Plant Parasitic
Nematodes. Vol. 1 Morphology, Anatomy, Taxonomy, and Ecology. Academic
Press, New York. pp. 281-301.
4. Davide, R. G. 1980. Influence of cultivar, age, soil texture, and pH on
Meloidogyne incognita and Radopholus similis on banana. Plant Disease
5. Helwig, J. T. 1978. SAS introductory guide. SAS Institute Inc., Cary, NC
83 pp.
6. Helwig, J. T., and K. A. Council (eds.) 1979. SAS user's guide, 1979, 1979
edition. SAS Institute Inc., Cary, NC 494 pp.
7. Jenkins, W. R. 1964. A rapid centrifugal-flotation technique for separating
nematodes from soil. Plant Dis. Reptr. 48:692.
8. McSorley, R. 1981. The computer as a research tool for the analysis of
nematode survey data. J. Nematol. 13:449-450 (Abstr.).
9. McSorley, R., J. L. Parrado, and S. Goldweber. 1981. Plant-parasitic nema-
todes associated with mango and relationship to tree condition. Nematropica
10. Milne, D. L., E. A. deVilliers, and L. C. Holtzhausen. 1971. Litchi tree
decline caused by nematodes. Phytophylactica 3:37-44.
11. Nie, N. H., C. H. Hull, J. G. Jenkins, K. Steinbrenner, and D. H. Bent. 1975.
Statistical package for the social sciences. Second edition. McGraw-Hill
* Book Co., New York. 675 pp.
12. Vrain, T. C., and G. L. Rousselle. 1980. Distribution of plant-parasitic
nematodes in Quebec apple orchards. Plant Disease 64:582-583.

Table 1. Supplementary information collected during grove visits.

Grower Information
Grove Location

Horticultural Information
Planting Date/Plant Age
Planting Distances
Trench System
Yield/Plant Condition

Agronomic and Cultural Information
Type of Soil
Weed Control
Disease Control

Sample Collections
Date Collected
Field Condition
Description of Samples

Supplementary Laboratory Analyses
Soil Moisture
Soil Analysis (pH, N, P, K)

Table 2. Codes used in constructing computerized data

Rootstock Codes
T = Turpentine
S = Seedling
Variety Codes
T = Tommy Atkins
K = Kent
M = Mixture

Irrigation Codes
1 = Drip
2 = Sprinkler
3 = Overhead Sprinkler
4 = Drip & Sprinkler
5 = Drip & Overhead Sprinkler

Table 3. Structure of SAS data set for mango survey.

//MCSOR JOB (9999,9999,20,5,000),' MANGO',CLASS=2







1 11 1 T 1.8 18.6 70 10 80 10 0 0 0 2 1 2 1 2 1 1 1
1 11 1 T T 1.7 18.6 110 65 110 5 0 0 0 2.5 1 2 1 2 1 1 2


20 25 3 M T 1.7 0 5 25 0 0 0 0 4 4 4 1 2 1 1 123



I-------- --- -

Table 4. Definitions of input variables used in mango data set and values of
variables for the first sample (Table 3).

Variable in first Definition of Variable
















NO 1

Grower code -- identifies grove location

Tree age in years

Irrigation code -- see Table 2
Variety code -- see Table 2

Rootstock code -- see Table 2
Rating of tree condition (Average for five
trees in the sample)

Per cent soil moisture

Number of Hemicriconemoides mangiferae per
100cm of soil
Number of Macroposthonia spp. per 100cm of

Number of Rotylechulus reniformis larvae
larvae per 100cm of soil

Number of Helicotylenchus spp. per 100cm3 of

Number of Pratylenchus brachyurus per
100cm of soil

Number of Quinisulcius acutus :per 100cm3
of soil

Number of Xiphinema sp. per 100 cm o:
Rating of amount of ground covered by
weeds together
Rating of amount of ground covered by
Rating of amount of ground covered by
pilosa L.

Rating of amount of ground covered by
Parthenium hysterophorus L.

Rating of amount of ground covered by
Lantana camera L.

Rating of amount of ground covered by
oleraceus L.

Rating of amount of ground covered by
virginicum L.

Sample number (1 to 123)

f soil






Table 5. SAS statemecus used to obtain means
and standard deviations of sorted and un-
sorted data.



Table 6. Example of type of table that can be constructed from outputs
obtained by using sorted data with PROC MEANS.

Nematode counts per 100cm3 soil
Variable Number Standard Minimum Maximum
of samples Mean deviation value value

Rootstock = Unknown

HEMI 6 65.0 43.36 25 135
ROT 6 62.5 63.38 10 160

Rootstock = Seedling

HEMI 25 66.2 141.25 0 675
ROT 25 134.8 236.41 5 1105

Rootstock = Turpentine

HEMI 92 98.6 124.50 0 820
ROT 92 159.2 241.75 0 1475

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs