• TABLE OF CONTENTS
HIDE
 Historic note
 Main














Group Title: Agronomy research report - University of Florida Institute of Food and Agricultural Sciences ; AY-85-3
Title: A Microcomputer as a terminal to use the statistical analysis programs (SAS) at the Northeast Regional Data Center (NERDC)
CITATION THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00056051/00001
 Material Information
Title: A Microcomputer as a terminal to use the statistical analysis programs (SAS) at the Northeast Regional Data Center (NERDC)
Physical Description: 27 p. : ; 28 cm.
Language: English
Creator: Costello, Shawn Randolph, 1960-
Gallaher, Raymond N
Block, David H
University of Florida -- Agronomy Dept
Publisher: Department of Agronomy, Institute of Food and Agricultural Sciences, University of Florida
Place of Publication: Gainesville Fla
Publication Date: 1985?]
 Subjects
Subject: Agriculture -- Data processing -- Florida   ( lcsh )
Computer network protocols -- Florida   ( lcsh )
Genre: non-fiction   ( marcgt )
 Notes
Statement of Responsibility: by Shawn R. Costello, Raymond N. Gallaher, and David H. Block.
General Note: Caption title.
General Note: Agronomy research report - University of Florida Institute of Food and Agricultural Sciences ; AY-85-3
 Record Information
Bibliographic ID: UF00056051
Volume ID: VID00001
Source Institution: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: oclc - 62558880

Table of Contents
    Historic note
        Historic note
    Main
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
Full Text





HISTORIC NOTE


The publications in this collection do
not reflect current scientific knowledge
or recommendations. These texts
represent the historic publishing
record of the Institute for Food and
Agricultural Sciences and should be
used only to trace the historic work of
the Institute and its staff. Current IFAS
research may be found on the
Electronic Data Information Source
(EDIS)

site maintained by the Florida
Cooperative Extension Service.






Copyright 2005, Board of Trustees, University
of Florida









?" C' Agronomy Research Report AY-85-3





A Microcomputer as a Terminal to Use the Statistical
Analysis Programs (SAS) at the Northeast Regional
Data Center (NER DC)


BY

Shawn R. Costello, Raymond N. Gallaher, and David H. Block
Agricultural Technician II, Professor, and Chemist II, Department of
Agronomy, Institute of Food and Agricultural Sciences, University of
Florida, Gainesville, Florida. 32611.


Introduction to NERDC, TCP, and SAS

When the researcher is faced with large amounts of data to analyze,
sometimes a microcomputer just doesn't have the memory capabilities or the
programs required to handle it. In this case the microcomputer can be used as
an interactive terminal to gain access to the expanded facilities of a data
processing center; at the University of Florida this facility is the Northeast
Regional Data Center, better known as NERDC. Several programs are in use at
NERDC; for our purposes we will be using the local terminal control program,
TCP, to create and edit stored files and submit jobs to NERDC. Also available
at NERDC for general use are certain public programs which can be used by
anyone with access to them to perform a variety of functions. In this report
some of the basic programming statements of the SAS (statistical analysis
system) package will be explained so that a beginning user can utilize SAS to
analyze data. First, however, an explanation of the steps required to
communicate with NERDC from-a microcomputer and some basic editing
commands in TCP will be necessary before a SAS program can be written.

Logging on to NERDC from the Radio Shack TRS-80 Model lII

To log on to NERDC requires three simple things; a terminal, an account
number with a password, and a lot of patience. The terminal will be a TRS-80
Model IM with a modem and a program called OMNITERM which allows
communication with NERDC. The account number -ill be an eight digit number,
perhaps with a sequence number added on the end after-a-commat. The password
must match with the account number or NERDC will deny ou accesstC to hat
account. These two pieces of information should be you need to log-on. The
log on procedure will now be gone through step by st p: \j i c iD
1) Turn on the computer.
2) Insert a disk that contains the O MNITERM pr gram on it into phe lo 'er
disk drive. OMNITE RM is a terminal program that all ~ ih$.mid reputer-to
communicate with NERDC. Press the orange button. --
3) Now set the modem switches to the following positions: ON, AUTO,
ORIG, and OFF. The modem links your terminal to NERDC through the phone
lines. Depress a line button to select an unused phone line.









4) Since the terminal will be connected to NERDC through the phone lines,
it is necessary to dial NERDC from the terminal. To do this first it is necessary
to invoke the OMNITERM program, so once "TRSDOS READY" has appeared on
the screen, type in OMNITERM DIAL and push ENTER.
5) The terminal, after a few moments, should display a screen which is
black except for a flashing cursor, indicating it is ready for you to try to dial.
6) NERDC's phone number is 2-5311, and to call NERDC you should type
*D25311X. If the OMNITERM program is working properly all the characters
should appear doubled on the screen as you type them.
7) Now you should hear some clicking noises as the modem dials NERDC's
phone number for you. The next few steps of the process should be done rapidly
or NERDC will hang up on you, so you should memorize them.
8) As the modem is dialing, push the @ (control) key twice. This will take
you to OMNITERM's main menu. As soon as the menu appears, type X. Now the
system menu should appear, and you should type L (for load settings).
9) A new line appears on the screen, asking you which settings you wish to
load, and you should type in NERDC and push the ENTER key. To repeat steps 7
through 9 quickly, as these are the ones which must be done rapidly, hit @
twice, then type X, then L, then NERDC and push ENTER. As soon as the main
menu appears again, press BREAK.
10) If all this has not taken too long, and NERDC hasn't hung up on you (if
it has just start over), the computer should be back to the terminal screen with
NERDC's phone number on it. At this point type in PPP and push ENTER.
NERDC will ask for your choice of terminal programs. Unfortunately, this often
looks like a lot of unintelligible garbage, so as soon as its done simply type T
(for TCP, the terminal program we will be using) and push ENTER.
11) Sometimes in the process of selecting a program NERDC will cause the
lettering on your terminal screen to become overly large and slanted. To get rid
of this annoying lettering simply push the @ (control) key twice to go to the
main menu, and then push BREAK immediately to get back to NERDC. This
should solve your lettering problem.
12) After you have selected TCP as the program you will be using, NERDC
will want to know which account you are using so they can begin charging you
immediately. Therefore, the first thing you need to do after NERDC has
responded with PROCEED on your screen is to type your account number. To do
this, simply type

/ID xxxxxxxx,x

where the xxxxxxxx,x represents your eight digit account number followed
by a sequence number if you have one. Now NERDC will ask you for your
PASSWORD by typing:

HHHHHHHHH


SUSNERDC ENTER PASSWORD

You should type your PASSWORD after this and ENTER it. Now NERDC
will either reject you by stating PASSWORD DOES NOT MATCH or something
like it, or it will welcome you and give you the terminal number you are using
by stating YOU ARE ON TERMINAL NUMBER something IN THE MVS SYSTEM.
Now you are ready to begin building files, congratulations!










Entering data and editing using TCP

TCP contains an editing program which allows the researcher to build and
edit files and to store them at NERDC. Commands in TCP are always preceded
by a slash (/). To create a work file at NERDC once the user has logged on is
very simple, simply start typing. As each line is entered by pushing the ENTER
key NERDC will temporarily save what has been typed as part of the work
file. The work file can hold approximately 450 lines of information at a time.
Once the researcher has typed in some data or some programming statements,
the work file can be saved as a permanent file in NERDC's memory under the
researcher's account number. To save what has been typed into the work file
simply give the TCP command:

/SAVE file name

The file name can be any previously unused combination of alphabetic
characters up to eight letters long. (Any character except the first one can be
a numeral.) The number and size of file that can be saved at any one time is
dependent on the memory size that has been allocated to the account number
being used. To get a listing of the files that have been saved under the
account number being used simply type:

/FLIST

To find out the cash balance available to be used on the account type:

/BAL

If a file has been saved previously and needs to have alterations made on
it, or needs to be called up so that it can be submitted as a job to be run at
NERDC, it can be loaded into the work file by typing

/FETCH file name

This will load the file into the work file, adding it to whatever is already
in the work file. If only the file to be fetched is wanted in the work file, the
work file should be erased before- FETCHing the file desired. To erase the
work file type

/ERASE

Once the file to be worked on has been fetched into the work file, it is
possible to do extensive editing on it using TCP. To see the work file on the
screen so that it can be edited it-is necessary to tell NERDC to display it. To
do this type

/LIST

The simple list command will result in the whole file, including data and
programming lines to be listed sequentially on the screen, non-stop. The
researcher will note at this point that NERDC has assigned a line number to
each individual line of his file. These line numbers allow the researcher to









move about in the file by telling NERDC which line number he is referring to
in his TCP editing commands. If only a part of the file need be shown on the
screen, the lines desired can be specified by typing

/LIST first line number last line number

For example, to see the 21st through the 50th line of a file on the
screen, simply type

/LIST 21 50

Now that the lines the researcher wishes to edit are shown on the screen,
a variety of commands are available to alter the file. Some of these commands
will be listed along with a description of what they do and an example.
1) /ALTER The ALTER command allows the researcher to change a
single line in the file. To use the /ALTER command the line number to be
altered must be designated immediately following the command. Then, the
phrase to be changed must be typed surrounded by single quotes, followed by
the new phrase to be inserted also surrounded by single quotes. For example

/ALTER 21 '7.5 1 2 3 7.5' '7.5 1 2 3 4'

will change line number 21 from reading 7.5 1 2 3 7.5, to reading 7.5 1 2
3 4. The entire line need not be changed, but the researcher should be careful
to define the phrase he wishes to change completely so that he alters only
what he wishes to. For example, the phrase

/ALTER 21 '7.5' '4'

would result in line 21 reading 4 1 2 3 7.5, rather than 7.5 1 2 3 4 in the
previous example. LISTing the line that has been ALTE Red after altering it is
always recommended, just to be sure no undesired changes have been made
accidentally.
2) /COPY The COPY command allows the researcher to copy a
designated set of lines following a specific line number. For example, the
command

/COPY 21 37 45


will result in the lines numbered 37 through 45 being copied following line
number 21. After copying, all the lines following the inserted lines will be
renumbered automatically by NEaRDC. This command is especially convenient
when the same programming statements are being used over and over again.
3) /INSERT The INSE R-T command allows the researcher to insert new
lines of data or program statements following the specified line number. For
example, typing -

/INSERT 21

will result in the screen responding with a flashing cursor. After this any
lines which are typed on the screen and entered will be added to the file
following line 21. The new lines and the following lines are automatically
renumbered by NE RDC. When the researcher is done inserting lines into the


::


I
I;-
-
i
` :









file, any TCP command (which will be preceded by a /) typed in will get the
terminal out of INSERT mode.
4) /MOVE The MOVE command will move any set of lines in the file to
a new location, following the specified number. For example

/MOVE 21 37 45

will result in line numbers 37 through 45 being moved to immediately
following line number 21. All lines following line number 21 will automatically
be renumbered.
5) /REPLACE The REPLACE command allows the researcher to either
delete lines from his file, or replace one line at a time. For example, typing

/REPLACE 21

and then typing in another TCP command, such as /LIST, will result in
line number 21 being deleted from the file and the lines following it being
renumbered. However, if the researcher types

/REPLACE 21

and then types in a new data line or programming statement, the new line
will be inserted in the file as the new line 21. Any number of lines can be
replaced at once, for example

/REPLACE 21 35

will delete the current lines 21 through 35 in the file, and the researcher
is then free to either insert new lines to replace them with, or give another
TCP command, resulting in their being permanently deleted from the file.
6) SEARCH The SEARCH command allows the researcher to search
either the entire file or the range of line numbers designated for a specific
phrase surrounded by single quotes. For example, the command

/SEARCH 'PROC ANOVA'

will result in a listing on the-screen of all the line numbers in the file
that contain the phrase PROC ANOVA. If the researcher wishes to see the
actual lines containing the phrase- and not merely a listing of line numbers, he
can type

/SEARCH 'PROC ANOVA' V

The V (for visible) tells NE R DC to print out all the lines containing the
phrase being searched for on the screen. This command is especially useful in
finding one's way around in a large file, as it is quicker than LISTing out a
whole file or even a number of lines. To search only part of the file, simply
specify the line numbers bracketing the search area before the phrase that is
to be searched; for example,


/SEARCH 21 75 'PROC ANOVA'









will SEARCH only between line number 21 and 75 for the phrase PROC
ANOVA.
These 6 simple commands will allow the researcher a large amount of
freedom in editing his files. Once the file has been edited to the researcher's
satisfaction, it can be RESAVEd under its original file name by typing

/RESAVE

If the researcher decides he doesn't like the changes he has made in his
file during a work session, he can simply erase the work file by typing
/ERASE, and the file will still be saved in NERDC's memory in its original
form. After SAVEing or RESAVEing a file it is always a good idea to use the
/ERASE command to erase the work file before FETCHing another file.
Otherwise, the fetched file will simply be added to the end of the old file,
which is usually undesireable.
If the researcher has finished with a file completely, he should eliminate
it from NERDC's memory banks in order to save file storage charges. To get
rid of a saved file simply type

/RELEASE file name

If a researcher wishes to change the name of a SAVEd file for some
reason, there is a RENAME command available on TCP. For example:

/RENAME OLDFILE NEWFILE

will result in the data file OLDFILE's name being changed to NEWFILE.
To end the session with NERDC, simply type

/END

NERDC should respond with how much CPU (central processing unit) time
and money that has been used by this session, along with the balance in that
account.


Entering data to NERDC from the Radio Shack disk

If a researcher has a lot of data to enter it may be cheaper for him
to enter all the data on disk on the Radio Shack TRS 80 and transfer it to
NERDC's memory, rather than using expensive CPU time to type 5,000 numbers -
into the work file. Putting the data on disk will also give the researcher
a permanent copy of the data in case (heaven forbid!) his files should be
lost at NERDC. Program statements as well as data can be transferred
through the modem from the TRS 80 to NERDC very rapidly and efficiently.
The procedure for transferring data will now be covered step-by-step.









The data can be stored in a SCRIPSIT file. Be sure when entering the
data that it is in the form that you want it, with all treatments and
variables properly identified (see the Chapter Introduction to SAS for
examples of how data should be entered so that SAS can analyze it).
Another important thing to remember when entering data to send to NERDC is
to leave one blank space at the beginning of each line. The blank space
will be used by NERDC to number the lines of the file, and if the space is
not there it will simply number directly over the first character in each
line! NERDC's work file will also only hold up to 450 lines of data at any
one time, so if you wish to transfer a file larger than 450 lines the file
should be split into two smaller files and then transferred separately.
The SCRIPSIT data file should be saved in a special way, as an ASCII file.
ASCII (American Standard Code for Information Interchange) is a universal
numerical code, and is the only form in which NERDC can read information.
To save the SCRIPSIT file as an ASCII file, simply type S,A filename when
saving the data file initially. Saving the file S,A will still allow you
to edit the file in SCRIPSIT, but it will also allow NERDC to read the
file once it has been transferred.
Now that the data has been stored in a form that NERDC can read, the
steps required to transfer the data to NERDC will be covered.
1) Log on to NERDC (see the first chapter Logging on to NERDC from
the Radio Shack TRS 80 Model III for details). Once you are logged on to
NERDC, check the balance in the account and the file listing to see if
there is money and space enough to transfer the data file to your account.
2) Press the control key (@) twice to take you to the main menu. Once
there, look at the options available to you, and remember what you want to
do. You want to transfer a file from your disk, to the buffer in the
terminal, and then read it out of the buffer to NERDC.
3) In order to transfer your file from the disk to the buffer, the
INPUT to buffer option on the left-hand side must read ON. If it doesn't
just press I to turn it on. Even if the INPUT to buffer option is already
ON, you should still press I twice (once to turn it off and once to turn
it back on again) in order to clear out anything that might already be in
the buffer before transferring. To check to see if anything is in the
buffer simply look in the lower right hand corner of the main menu where
it should show how many bytes of the buffer memory is taken up.
4) Now press F to FILL buffer from disk. This option is on the right
hand side, and hitting F should result in an arrow asking you to enter the
name of the file to be loaded into the buffer. Type in the filename and
ENTER.
5) The file to be transferred should now be in the buffer. Look again
in the lower right hand corner of the menu and it should tell you how many
bytes of the buffer memory your file is taking up. Now you want to
transfer the file from the buffer to the work file at NERDC.
6) To do this, first turn OFF the INPUT to buffer option by hitting I
again. This option should now read OFF.
7) Then you want to turn ON the OUTPUT from buffer option directly
below the INPUT. Press the 0 for OUTPUT, and the screen should respond
with an arrow asking for a prompt string. The prompt string will be
inserted at the beginning of each line before NERDC reads the line. Our
copies of OMNITERM use the & as the prompt string, press & and ENTER.
8) After you enter the prompt string, the screen will respond with a
second question, asking for the time between characters. The value entered
can be anywhere from 1 to 250, and is very flexible. I usually use 100.










9) Now the terminal is ready to OUTPUT from the buffer into the NERDC
work file. To start the output process, simply press BREAK to get back to
the work file. Your program and data file should be seen printing slowly
across the screen, with each line preceded by the prompt string (&). At
this point you can do nothing but wait for the entire file to be printed
from the buffer into the work file, so this is a good time to go get a cup
of coffee. Once the whole program has printed the screen will respond with
* OUTPUT COMPLETE. Save your new file by first pressing ENTER once and
then typing /SAVE filename. The filename need not be the filename used on
your disk. It is always a good idea to /LIST your new file after
transferring to make sure no interference from the phone lines has shown
up in the file. This can happen very easily, and you may have to edit some
to get rid of the garbage.
For the rest of this session with NERDC the OUTPUT COMPLETE will
show up on the screen every time you ENTER a command. It will not,
however, have any affect on your files, it is simply annoying.
This procedure does not always go smoothly, so don't be upset if you
have to run through it a couple of times before you make a successful
transfer. Good luck!












JOB submission at NERDC

Ultimately, the researcher will wish to submit some of his files so that the
program he has written along with-his data can be analyzed at NE RDC. The
program he has written can be almost anything; the SAS statistical analysis
programming statements will be described in the following sections, but in order
for NERDC to keep track of the-large number of jobs being submitted daily a
very specific set of statements are required to precede the actual program.
These four or five statements are- known as the "job cards" and contain all the
information NERDC requires to -rm- the job and, of course, bill the account
being used. The job cards are written in JCL or Job Control Language. The
format of these few lines is defined- very exactly, and must be followed down to
every space and comma or the job will not run. For this reason, most
researchers choose to store their-job statements in a separate file which can be
fetched whenever they wish to submit one of their programs. The format of the
job cards will be covered line byline and described in detail.
The first line of the four or five required to submit a job contains
information about the account being used and sets the parameters in computer
memory space and execution time that will be required to run the job. The
general format of the first line is as follows:

//jobname JOB (account parameters),'programmer',CLASS=X,RE GION=XK









The // is required, followed by whatever job name the researcher chooses
to call his job. The job name will be printed on the first page of the program
output, so something descriptive of the program to be run is generally used. The
account parameters are enclosed by parentheses, and consist of a series of
numbers separated by commas. These numbers tell NERDC the account number,
how much execution time to allow, how many lines of output, etc. The only
number required by NE RDC to run a job is the account number the job is to be
billed to. The format for this is (xxxx,xxxx) where the first xxxx is the first
four digits of the account number, then a comma, followed by the last four
digits of the account number. For almost all jobs the default values designated
by NERDC for execution time, number of lines, etc. will allow the job to be
run, so unless the job is unusually long, these parameters can be omitted. The
next character in the first line must be a comma, followed by the programmer's
name enclosed by single quotes. The programmer may select any name he
wishes. The next character must be another comma, followed by the phrase
CLASS=. The CLASS determines when the job is to be run and consequently how
much it will cost. The cheapest way to run jobs at NERDC is to run them
overnight, a job submitted during the day may be picked up at the NERDC
facilities the following morning. Overnight execution is equal to CLASS=1, and
results in the job being run at a 75% discount. If no CLASS= is specified the job
will be done quickly at top priority at the highest cost. The RE GIO N= statement
may be added after a comma if a region size larger than 256K is required to
run the job. This may be true of some graphic jobs, but generally this statement
may be omitted.
The next line of the job statements is the PASSWORD line, which has the
general format:

/*PASSW 0 RD sequence#,password

After typing /*PASSWORD the sequence number (if there is one) to the
account being billed should be designated, followed by a comma, and then the
password used with the account. The password must match the eight digit
account number or the job will not run.
The third line in the job lines is the ROUTE statement, with the general
format:

/*ROUTE PRINT location

For all jobs except for graphics programs this line should read /*ROUTE
- PRINT LOCAL, which will result in the output for the job being filed by JOB
number at the NERDC facilities across from the Hub. For graphics the output
should be ROUTEd to NER.RO, (that last character is a numeric zero, not a
capital alphabetic 0) which will result in the output graphics being printed on
rolls of paper and filed by job number in a separate box at the NERDC facilities
across from the Hub.
The next line of the job statements will be used to tell NERDC what
program the researcher wants to use on this job. The package described in this
report is SAS (statistical analysis system), a statistical package which can
analyze a wide range of experimental data. To call up the SAS package the
following line should be typed in before the data and programming statements
are included:


// EXEC SAS









The final line which needs to be included in the job statements if they are
being kept in a separate file is an INCLUDE command with the general format:

/*INCLUDE filename

This command will call up the file containing the program the researcher
wishes to submit for execution to NERDC. Note that in this way the only phrase
that needs to be changed before submitting the JOB is the name of the file to
be included.
If the researcher wishes, after the INCLUDE program statement a
statement indicating the end of the job can be typed. This is considered good
form, and tells NE RDC the job is concluded. The general form of this statement
is

/*EOJ

To summarize a typical set of JOB cards, then:

//ANALYSIS JOB (2222,2222),'RESEA RCHER',CLASSl=
/*PASSWORD 2,PASS
/*ROUTE PRINT LOCAL
// EXEC SAS
/*INCLUDE PROGRAM
/*EOJ

The job statements are usually saved as a separate file called TOP or
something similar. To submit any saved file listed under the researcher's
account number, then, all he has to do is /ERASE the work file, /FETCH TOP,
change the name of the file to be INCLUDed in this submission, and type the
following TCP command:

/RJE

RJE stands for remote job entry, and will result in NERDC printing back a
series of lines to the terminal, telling the researcher that his job is being
accepted and the number that has been assigned to this job. The job number will
be the first phrase in the second line, and will be given as J&Sxxxx, where x
equals the digits of the job number. The output for the job will be filed at
NERDC's facilities across from the HUB by the last three digits of this four
digit number, so it is very important to write down the job number.
The researcher can check on- the status of his job by typing in to NERDC:

/DJOB job name
or
/DJOB xxxx

where job name equals the-job name specified in the job card and xxxx
equals the job number that NERDC is running the job under. If the job is
awaiting execution still, NERDC will tell you. If the job has already been run,
NERDC will answer back with JOB NOT FOUND.
If you realise after submitting a job for execution that there is a mistake
in the program or the data, it is possible to save some money by cancelling the
job before it executes at NERDC. To do this simply type






11




/CJOB xxxx

where xxxx equals the job number you wish to cancel.
Now that the basics of working with NERDC have been covered, the
researcher is ready to enter some data and write simple statistical programs
using the SAS statistical program package.













Introduction to SAS

SAS, which stands for Statistical Analysis System, is a set of
statistical subroutines available at NERDC for everyone's use. SAS can
handle everything from simply finding mean values to complicated linear
regressions and sophisticated graphics. After the first 6 or 7 lines
(discussed under job cards in the previous section), which were written
in JCL, all the rest of the data analysis program will be written in
Fortran. A statement in the Fortran programming language must always be
ended with a semicolon ( ; ). This is the most common error made in
writing SAS programs, and can be a very frustrating error to find if
your program is a long one, so be careful to always remember the ";"
before entering a statement. Using SAS allows the researcher the maximum
amount of flexibility in setting up his data and analyzing it in various
ways; by including one simple program statement SAS can sort and
reorganize very large data sets.
As a general overview a complete SAS program including data lines
is listed below. This program-was written to preform analysis of
variance and Duncan's multiple--range test on some corn data.

DATA CORN; INPUT REP TRTTSTLKKG EARKG SHELLPER;
GRAINKG = EARKG SHELLPER;
CARDS; :; -,
1 1 200.3 300.5 0.85 '

.(data lines)
; ,
4 4 152.4 250.7 0.87
PROC PRINT; '
PROC ANOVA; CLASSES REP TRT;
MODEL STALKKG EARKG GRAINKG = REP TRT;
MEANS TRT / DUNCAN;









Data entry under SAS


The way SAS has been set up allows the researcher to enter any
number of variables associated with a specific treatment on one line. To
enter data under SAS a DATA statement is used. The DATA statement tells
the computer to expect data to follow. In the DATA statement the data
set you will be creating must have a name containing up to 8 characters,
of which the first character must be alphabetic, immediately following
the word DATA. For example, if your data set consists of soil analysis
data for 1982 you might want to name this data set SOIL82;. The next few
lines following the DATA statement describe to the computer how the data
are arranged. For example; if your data come from soil samples taken on
several sampling dates at several different depths, which have been
analyzed for N, P, and K content, SAS must be told in which order all
these factors or variables will be entered. This is done by using an
INPUT statement. The INPUT statement simply lists the variables to be
entered in this DATA step in the order in which they will be entered.


Example
If your data look

Replication 1:
year date


1982

1982

1982

1982

1982

1982


like this;


depth

0-5"

5-10"

10-20"

0-5"

5-10"

10-20"


and your study has 6 replications containing exactly the same kind
of information, your data set will contain 6 x 6 or 36 observations, 36
lines of data. For each line of data all the parameters that identify
that piece of data or sample, such as what replication it's from, what
year, what depth and what date it was taken on, must be specified. Also
the variables measured, in this case N, P, and K content, must be
identified so that SAS can analyse them. This is why the INPUT statement
is very important; it is the program line that identifies and names all
the variables in the data set.. For this example you might choose to name
the variables REP (replication), YEAR (year), DATE (sampling data),
DEPTH (depth of sampling), and simply N, P, and K for the concentrations
of the elements. The names you choose for your variables can be anything
so long as they are no more than 8 characters long and begin with a
letter, but they should be easy to remember and understandable to you,
as you will be using these names throughout the analysis. So, to
continue with the example; you have now typed the following:


DATA SOIL82;


INPUT REP YEAR DATE DEPTH N P K;


N

2.5

2.1

1.0

2.1

2.0

1.2


P

200

150

100

210

125

100


K

70

100

110

85

90

120


t -"









Note that the names are separated by spaces, and that a semicolon
follows each SAS statement. After the INPUT statement other statements
creating new variables or converting the raw data you will be entering
in this data set can be added. For example, if you wanted to calculate a
new variable which was the N:P ratio in the soil for each of the depths
and days sampled, the statement:

NPRATIO =N / P;

will divide N by P for each line of data and create the new
variable NPRATIO. (The symbols used for division, multiplication,
addition, and subtraction are /, *, +, and -, respectively.) NPRATIO can
then be analyzed statistically like any other variable later on in the
program. Before calculating NPRATIO it may be necessary to convert N and
P into the same units, either ppm or % for example. To convert all N
values from % to ppm before carrying out the calculation of the new
variable NPRATIO, a simple statement is inserted before the statement
that creates the variable NPRATIO:

N = N 10000;

This statement redefines the variable N to be the original N value
multiplied by 10000, changing it from % to ppm. Now, looking at the
statements in the program so far

DATA SOIL82;
INPUT REP YEAR DATE DEPTH N P K;
N = N 10000;
NPRATIO = N / P;

We have told SAS to create a data set SOIL82, which will include
the variables REP, YEAR, DATE, DEPTH, N, P, and K, and to convert the
variable N from its original value to its value multiplied by 10000.
Then SAS will create a new variable NPRATIO by dividing the value for N
by the value for P for each line of data and add it to the data set
SOIL82. Up to this point we have not entered a single line of data. Now
we have described the data to SAS and created a new variable from our
data, and are ready to enter-the data itself. To tell SAS that the next
lines consist of data, a CARDS statement is used. To continue with the
example:

DATA SOIL82;
INPUT REP YEAR DATE DEPTH N-P K;
N = N 10000;
NPRATIO = N / P;
CARDS;

Following this CARDS statement SAS will expect lines of data, and
will include them in DATA set SOIL82 in the order they are typed. A line
of data is not followed by a semicolon, unlike all other statements in
SAS. To indicate the end of a line of data simply enter it with the
ENTER key on the terminal. This will tell SAS that the line is ended and
to expect the next line of data. SAS will also match up the variable








names in the INPUT statement with the numbers in each line of data in
the order they are listed, for example

CARDS;
1 1982 135 1 2.5 200 80

tells SAS that for the first sample, REP = 1, YEAR = 1982, DATE =
135, DEPTH = 1, N = 2.5, P = 200, and K = 80. SAS will instantaneously
convert N from 2.5 to 25000 by multipling it by 10000, and create the
new variable NPRATIO by dividing the value for N (25000) by the value
for P (200) to get the value 80.
Note that for non-numerical variables such as DEPTH the values
should be assigned a numerical "code"; 1 = 1 5", 2 = 5 10", 3 = 10 -
20", etc. SAS can handle alphabetic data, such as names, etc. with a
special set of statements, but it is generally easier to assign a
numeric code value. This is also true for describing experimental
treatments such as tillage or sub-soiling. For example, no-tillage could
be represented by 1 while conventional is represented by 2, and
subsoiling represented by 1 while not subsoiled is represented by 0. A
combination of these treatments such as no-till + subsoil, no-till -
subsoil, conventional + subsoil, and conventional subsoil, could be
identified in the INPUT statement as 2 variables: TILL and SUB, and
would be entered in the data lines in the form of the four treatment
combinations:

1 1 .(equals no-tillage + subsoil)
1 0 .(equals no-tillage subsoil)
2 1 .(equals conventional + subsoil)
2 0 .(equals conventional subsoil)

Replications must also always be designated by their own variable
names in the INPUT statement and entered in the data lines as separate
values. This is so that SAS can partition the variability in the
experiment into its separate components, variability due to
replications, tillage, and subsoiling, in the statistical analysis.
Once all the lines of data-are entered, the data set is complete.
SAS does not need to be told-that this is the end of the data, simply
continue with the next SAS program statement you wish to use. In most
cases the researcher will wish to check the accuracy of the data entry,
assignment of the variables, and creation of any new variables. To do
this it would be nice to have a printout of the new data set SOIL82. In
SAS this is very simple, only one program statement is required; a PROC
PRINT statement. For example, following the last line of data entry for
data set SOIL82

6 1982 150 3 2.1 180 110
PROC PRINT DATA = SOIL82;

This statement (which, like all SAS program statements, must be
followed by a semicolon) will -result in the data set SOIL82 being
printed out as part of your output. The data set name need not be
specified with the DATA = data set name part of the PROC PRINT
statement. If it is not specified which data set is to be printed SAS





15


will simply print the last data set created before the PROC PRINT
statement is executed. For example:

6 1982 150 3 2.1 180 110
PROC PRINT;

will result in data set SOIL82 being printed exactly as in the
previous example.
If errors are found in the SAS program or in the data entered, the
program and data can be edited using TCP as described earlier.













.Manipulating data sets with SAS

One of the nice features of SAS is that it allows the researcher a
lot of flexibility in working with his data. Large data sets can be
sorted, added to, new variables created, etc. very easily. How to create
new variables has already been discussed in the previous section, but
now some of the other data manipulation options possible in SAS will be
illustrated by a series of examples using the same data set, SOIL82,
created in section one.
Example 1 Sorting data sets. Sometimes it would be desirable to
get a printout of the data set with the data presented in a certain
order. Suppose for the data set SOIL82 you wanted a copy of the data
ordered by depth of sampling, with all the values for depth 1 followed
by all the values for depth 2, and finally all the values for depth 3.
To do this you would use a PROC SORT statement to reorganise the data,
- and then a PROC PRINT statement- to print it out in the new order. The
data set will be sorted by the-variable specified in a BY statement that
must immediately follow the PROC SORT statement from lowest to highest
value of the variable specified. For example, to sort and print data set
SOIL82 by depth of sampling would require the following 3 SAS
statements:

PROC SORT DATA = SOIL82;
BY DEPTH;
PROC PRINT;

As in the PROC PRINT statement, if the data set name is not given
in the PROC SORT statement the last data set named will be sorted. These
3 statements will result in the following output:





16


REP YEAR

1 1982

2 1982


DATE N P K NPRATIO


135


25000 200

20000 200


90 100


6 1982


220 15000 100


200


If the researcher wants the data set sorted by more than one
variable at a time, for example in order by depth of sampling and within
sampling depths in order by sampling date, more than one variable can be
included in the BY statement:

PROC SORT;
BY DEPTH DATE;

These two statements will result in the data set being sorted first
by depth and secondly by date. The data set can be sorted by any or all
the variables in the data set, including created variables such as
NPRATIO. Variables will always be sorted in ascending order,lowest to
highest.
Example 2 Duplicating a data set. Sometimes a researcher may want
all the information in a data set reproduced into another data set under
a different name. For example, if we wanted all the data in SOIL82,
which has the soil mineral analysis data in ppm, reproduced into another
data'set where all the information could be converted into % only two
SAS statements are needed. First a new data set must be created using a
DATA statement, and then a SET statement to "set" the data from the old
data set specified into the new:

DATA NEW; SET SOIL82;

These two statements result in all the information from data set
SOIL82 being transferred into the new data set NEW. Now a series of
conversion statements can be used to change the values of the mineral
analysis from ppm to %:


N = N / 10000;
P = P / 10000;
K = K / 10000;


Any number of data sets can be set into
necessarily have to contain any variables in
will, if printed out, simply list the entire
which they were listed in the SET statement.
statements:


one data set. They do not
common as the data sets
data sets in the order in
For example, the


DATA ALL; SET ONE TWO THREE;


DEPTH

1


`1


'-:
-- :Y










will result in the information in data set ONE being listed, and
then the information in data set TWO, and finally the information in
data set THREE, all included in the new data set ALL.
Example 3 Merging two data sets. Sometimes it is convenient to
create separate data sets containing similar information, and then to
merge them into one large data set containing all the information
overlapped by what they have in common. To return to our original
example, suppose the experiment was continued into 1983 measuring the
same variables in the same way. A second data set, SOIL83, could be
created and then a larger data set containing the information for both
years could be created by merging SOIL82 and SOIL83. This new data set
should be given a new name, for our example we could call it TWOYEARS. :.
To merge 2 (or more) data sets they must contain at least one variable
in common, and variables that the data sets have in common should be
specified under the same variable names in the INPUT statement. The
first step in merging data sets is to sort the data sets to be merged by
the variable(s) they are to be merged by using a PROC SORT statement.
For example, to merge SOIL82 and SOIL83 by year, the following
statements are required:

PROC SORT DATA = SOIL82; BY YEAR;
PROC SORT DATA = SOIL83; BY YEAR;

Then, to create the new data set TWOYEAR containing all the
information from SOIL82 and SOIL83 a simple DATA statement is used along
with a MERGE statement:

DATA TWOYEAR;
MERGE SOIL82 SOIL83; BY YEAR;

These 3 statements tell SAS to create a new data set, TWOYEAR, by
merging SOIL82 and SOIL83 by the variable YEAR. Now there are 3 data
sets, SOIL82, SOIL83, and TWOYEAR. Any number of data sets can be listed
following the MERGE in the MERGE statement to be included in the new
data set. Any number of variables can be used to sort and merge the data
sets by. In this example the data set TWOYEAR if printed out would list
all the data lines for year 1982 followed by all the data lines for 1983
as it was merged by the variable YEAR.
MERGEing is usually more- useful than SETing in creating new data
sets, since usually the researcher wants to organize his data in some
specific order that is logical to him. SAS, however, does not care how
the data is organized so Long as each variable is identified properly.
In merging (and setting) data sets you should be careful that the same
variable is identified by the same name in each data set to be merged
(not N in one data set and NITROGEN in another for example) or SAS will
assume you are talking about 2 different things.
Now that the data is stored in SAS data sets containing the
information you want in the form and order you want it, you are ready to
begin statistical analysis.






18




Getting mean values

Using SAS to calculate mean values is perhaps one of its most
useful and least used functions. SAS can calculate the means for all the
variables desired in any way desired (by year, treatment, replication,
etc.) in milliseconds and also give the researcher other valuable
information such as estimates of variability, minimum and maximum
values, and the number of observations that make up each mean value.
There are two ways to obtain means using SAS; if other analysis of
variance is going to be done on the data, the means can be derived as
part of that procedure, if only the means are desired a PROC MEANS
statement can be used.
Suppose that mean values of N, P, and K concentrations are desired
for each sampling date, each sampling depth, and each replication in the
data set SOIL82. To get the mean values for N, P, and K concentrations
for each sampling date averaged over depths and replications only five
SAS statements are required:

PROC SORT DATA = SOIL82; BY DATE;
PROC MEANS; BY DATE; VAR N P K;

Before the mean values can be calculated a PROC SORT statement is
required to sort the data set by the variable for which the means are to
be calculated. In this example the means are to be calculated for each
sampling date, therefore the data set must first be sorted by DATE. (A
good rule of thumb is that when any statistical procedure is done BY
some variable, the data set must first be sorted by that variable using
a PROC SORT and BY statement.) The PROC MEANS statement tells SAS to
calculate mean values, the BY statement tells SAS to calculate the mean
for each sampling date, and the VAR statement tells SAS the variables
for which means are to be calculated. In this example N, P, and K were
specified, but up to any number of variable means can be calculated in
one PROC MEANS execution. If the VAR statement is not used, SAS will
automatically calculate mean values for all the variables in the data
set (including ones you don't need, such as YEAR, REP, etc.).
Mean values can be calculated in many different ways using separate
PROC MEANS statements. For example, if mean values for N, P, and K
concentrations were desired far-each sampling date at each depth the
following sets of program statements would be used:

PROC SORT DATA = SOIL82; BY DATE DEPTH; -.i.
PROC MEANS; BY DATE DEPTH; VAR N P K;

The result of this MEANS statement is to average values over
replications for each sampling date and depth. Note that the data set
must first be sorted by both these variables in the same order.
Sometimes it is convenient to create a new data set containing only
the mean values averaged in a certain way. This is especially useful if
you want to graph out the results using the SAS GPLOT procedure (which
will be discussed later), as the graph is not so crowded with data
points if only the mean values are shown. The PROC MEANS statement can
be used to create a new data set containing only the mean values









calculated by using an OUTPUT OUT= statement. Suppose for the last
example we wanted to create a new data set MEANS containing the mean
values of N, P, and K concentrations for each sampling date and depth.
The following statements would be used:

PROC SORT DATA = SOIL82; BY DATE DEPTH;
PROC MEANS; BY DATE DEPTH; VAR N P K;
OUTPUT OUT=MEANS MEAN=MEANN MEANP MEANK;

The OUTPUT OUT=MEANS part of the added statement tells SAS to
create a new data set called means. The MEAN=MEANN MEANP MEANK part of
the statement tells SAS that MEAN values only will be included, and that
the new variables in this data set will be MEANN (for mean N), MEANP,
and MEANK. The data set and variable names can be anything you want, but
again, they should be straightforward and easily understood. Results
from the PROC MEANS statement are automatically printed by SAS; a PROC
PRINT statement is not necessary. However, to print out the data set
created using the OUTPUT OUT= statement a PROC PRINT statement will be
required. SAS will store the new data set whether it is printed out or
not.







The SAS ANOVA Procedure

The SAS analysis of variance procedure (ANOVA) is set up so that it can
handle a variety of experimentaL designs. However, it will not be able to handle
unbalanced data (missing values) or perform linear regression analysis. If the
researcher is faced with either of these two cases, he should use the SAS GL M
(General Linear Model) procedure rather than the ANOVA. The GLM procedure
is much more expensive to run than the ANOVA procedure, so ANOVA should be
used whenever possible.
The reason that the ANOVA -procedure can handle a variety of
experimental designs is that the researcher must give SAS as part of the
programming statements the statistical model which SAS will use to run ANOVA.
This requires some basic knowledge:of statistics. In this section examples of
programming statements which will analyze the most common types of
experiments will be given, but itis very important to realize that if your
experiment varies at all from these basic designs these statements will not give
a true analysis of the variation. A statistician should be consulted to see if
your experiment fits these models, and if not, an appropriate model statement
should be written for SAS.
The ANOVA procedure generally consists of a series of three statements:
1) The PROC ANOVA statement which tells SAS that ANOVA is to be
used.
2) A CLASSES statement which tells SAS the classification variables to
be analyzed in this experiment; in our example the classification variables would
be replications (REP), sampling dates (DATE), and sampling depths (DEPTH). The
results of the experiment the variable measured (in our example they would be
N, P, and K) are not classification variables. As more examples are given this
distinction should become clearer.
3) A MODEL statement which tells SAS how to analyze the experiment
and gives the independent variables. In our example the independent variables
are N, P, and K.









To illustrate how the ANOVA procedure is used examples of three types
of experiments (randomized complete block, factorial, and split-plot) and the
SAS statements which will analyze them will be given.
Example 1 Randomized Complete Block- a simple RCB consisting of two
sources of variation, blocks or replications and treatments is perhaps the easiest
model to write in SAS. Suppose the researcher is investigating the effect of N
source on corn yield. The treatments consist of four N sources (NSOURCE); and
the researcher will be measuring several parameters, say for example N content
of the grain (G RAINN), N content of the last leaf (LEAFN), and dry weight yield
of grain (GRYLD). To analyze this simple experiment three statements are
required:

PROC ANOVA;
CLASSES REP SOURCE;
MODEL LEAFN GRAINN GRYLD = REP SOURCE;

The PROC ANOVA tells SAS to run the ANOVA procedure. The
CLASSES statement tells SAS that there are two source of variation in this
model, REP replicationss) and NSOURCE (treatments). The MODEL statement
gives the independent variables for which the ANOVA is to be run, LEAFN,
GRAINN, and GRYLD; and partitions the variation in this experiment into its
component parts. This MODEL statement allows for two variation components,
variation due to REP's, and variation due to NSOURCEs. Running this procedure
will result in the printout of three separate ANOVA tables; one for LEAFN, one
for GRAINN, and one for GRYLD.
These three statements will give the researcher the sums of squares for
each of the classification variables in the MODEL statement along with the
appropriate error sums of squares, the degrees of freedom associated with each
factor, the mean square, the F value for each component of variation along
with the probability of the component not being significant (small values where
SAS says PR > F indicate significance), an estimate of the coefficient of
variation, and more. It will take some time to become familiar with the output
from the ANOVA procedure, and the SAS book contains diagrams explaining
what each value means.
There are many options that can be used with the basic ANOVA
procedure to give the researcher-even more information. For example, in the
case of the NSOURCE experiment the following addition would result in SAS
calculating the mean values for alLthe NSOURCEs:

MEANS SOURCE;

If the means for NSOURCE-for each replication are desired, the
following statement can be used;

MEANS NSOURCE*REP;

This will calculate the mean for each REP and NSOURCE combination. If
the experiment is one for which Duncan's Multiple Range would be appropriate,
ANOVA has an option used with the MEANS statement which will do those
calculations. The following statement will run it for all main effects (NSOURCE)
means given in the MODEL statement.


MEANS SOURCE / DUNCAN;









Note that the DUNCAN option is preceded by a slash ( / ) and is part of
the MEANS statement. In our example three separate Duncans will be run, one
for treatment means for LEAFN, one for GRAINN, and one for GRYLD.
Example 2 Factorial. Many of the experiments run consist of factorial
combinations of two or more factors. A good example of this would be an
experiment comparing two tillage treatments, conventional and no-tillage, both
with and without subsoiling. In entering the data from such an experiment coded
values should be used to represent the four possible combinations of factors. In
our example one factor could be called TILL for tillage, which would consist of
two levels, 1 for no-tillage and 2 for conventional. The second factor, subsoiling
(SUB) would also consist of two levels, 0 for no subsoiling and 1 for subsoiled
treatments. The four treatment combinations would then be:

1 1 ... (no-tillage + subsoil)
1 0 ... (no-tillage subsoil)
2 1 ... (conventional + subsoil)
2 0 ... (conventional subsoil)

This way SAS can separate the variation due to TILL from the variation
due to SUB, giving the researcher more information. If the treatments were
replicated REP would be another source of variation that must be considered. If
yield (YLD) was the only parameter measured, the following statements would
be used to analyze this experiment:

PROC ANOVA;
CLASSES REP TILL SUB;
MODEL YLD = TILL SUB REP TILL*SUB;

Note that these statements will result in one ANOVA being run
measuring the effect on YLD by four sources of variation; REPs, TILLage,
SUBsoiling, and the interaction between TILLage and SUBsoiling. If the
interaction is significant then the analysis should be run again using the SAS
GLM procedure with a CONTRAST option, which will perform orthogonal
contrasts on the data to measure the separate effects of TILL and SUB without
the complicating effects of the interaction. The DUNCANs procedure is not
appropriate for a factorial experiment.
Example 3 Split-plot. The split-plot is very common in many areas of
research, and requires only a minor-variation in the statements required to
analyze it using the ANOVA procedure as compared to a simple RCB. Suppose
the researcher is comparing conventional and no-tillage as main plots, with
split-plots of four different N sources. As in the previous example, when
entering the data the two factors- should be coded as separate effects; two
types of TILL, 1 for conventionaL-and 2 for no-tillage, and four types of
NSOURCE. If the treatments are replicated, then REP would be another
calssification variable to be included in the ANOVA. In this example the
independent variables to be measured were yield (YLD) and root resistance
(ROOTRES). The following statements would analyze this experiment using SAS:

PROC ANOVA;
CLASSES REP TILL SOURCE;
MODEL YLD ROOTRES = REP TILL REP*TILL SOURCE
TILL*NSOU R C E;
TEST H = TILL E = REP*TILL;









Note that both the main and subtreatments are independent classification
variables in the CLASSES statement. The MODEL statement takes into account
the variation due to REPs, main plots (TILL), sub-plots (NSOURCE), the main
plot error (REP*TILL), and the sub-plot error (TILL*NSOURCE). An extra
statement is required to tell SAS to test the main plot effects using the mean
square for the main plot error; otherwise all effects would be tested with the
last error term designated, in this case the sub-plot error. The TEST statement
tells SAS to test hypothesis (H) of the main plots (TILL) using the error term (E)
for main plots (REP*TILL). These four statements would result in two separate
ANOVAS, one for YLD and one for ROOTRES. The test of the main plots using
the main plot error will be printed separately below the rest of the tables, so
be careful in reading the output to be sure to find the correct main plot F test.
Almost any kind of experiment can be analyzed using the ANOVA
procedure, but it is very important to indicate the correct statistical model in
the MODEL statement. For more complicated designs a statistician should
always be consulted to prevent unconscious errors, as SAS can only be as
accurate as the information that is given to it and cannot detect errors in the
model. For unbalanced data the GLM procedure should be used, and will be
covered in the next section.







Using SAS GLM Procedure

The GLM procedure in SAS is more flexible than the ANOVA procedure.
It can be used for analysis of continuous variables rather than those
which occur at discrete intervals selected by the researcher. The GLM
procedure is especially useful as it will conduct both simple and
multiple regression analysis. GLM will also handle orthogonal contrasts
in cases where a significant interaction is masking the effects of
treatments in an ANOVA procedure. In this section an example of how to
write program statements for a simple and multiple regression will be
given, and also an example of the orthogonal contrast procedure.
Example 1 Simple and multiple regression. The program statements
for simple and multiple regression are the same, only the number of
independent variables will vary. As in the SAS ANOVA procedure it is the
responsibility of the researcher to provide SAS with an accurate
statistical model to work with. If the researcher is simply interested
in testing the effects of a number of independent variables on a
dependent variable two simple statements will analyze the variable by
multiple regression; their general format is:



PROC GLM;
MODEL dependent variable = independent variables;









The PROC GLM statement tells SAS to execute the GLM procedure, and
the MODEL statement indicates which variable is to be the dependent
variable and what independent variables are to be included in the
analysis. For example, suppose a researcher wanted to consider the
effect of certain environmental factors on corn yields over several
different locations. Some of the factors he has measured that needed to
be included in his model are soil N content (SOILN), soil P (SOILP) and
K (SOILK) content, and soil moisture levels (SOILH20). Corn yields (YLD)
measured under each set of conditions. The program statements he would
use to run a multiple regression on this experiment would be:

PROC GLM;
MODEL YLD = SOILN SOILP SOILK SOILH20;

The results from this analysis would be a regular ANOVA table
dividing the variation in this study into three parts, variation due to
the Model, variation due to Error, and Total variation. Beneath the
ANOVA table the GLM procedure gives the researcher estimates of the R
value (the amount of variation in the dependent variable that is
accounted for by the model), the c.v., the standard deviation of the
dependent variable, and the overall mean of the dependent variable.
Below these estimates the GLM procedure gives the type IV and type I
sums of squares for each independent variable included in the MODEL
statement, along with the F value and the Probability level. These sum
of square values will indicate to the researcher which of the
independent variables has a significant effect on the dependent variable
according to his model.
Example 2 Orthoganal contrasts. More traditional types of
experiments can also easily be run using the GLM procedure. Our
tillage/subsoiling study could have been run using GLM and the exact
same statements we used under the ANOVA procedure:

PROC GLM;
CLASSES REP TILL SUB;
MODEL YLD = REP TILL SUB TILL*SUB;

These statements will give-the researcher the same information that
was obtained using the ANOVA procedure (at almost twice the cost in
computer time). However, suppose that the interaction between TILLage
and SUBsoiling was significant, possibly masking the significance of the
two factors if they were considered separately. The GLM procedure has a
CONTRAST option which allows the researcher to run orthogonal contrasts
on the two variables to discover if they might be significant if taken
separately. Writing the statements necessary to run the CONTRAST option
involves going to a matrix table in order to give SAS the coefficients
required to run the orthogonal contrast, and it would be highly
recommended to consult a statistician before attempting this procedure.
The GLM procedure can definitely be valuable in certain types of
analysis, but, in general, most experiments can be analyzed completely
using the much less expensive- ANOVA procedure.











The SAS Graphics program

The GPLOT graphics procedure is a relatively new one in SAS. SAS
GPLOT allows the researcher to graph data, or to use GPLOT along with the
GLM procedure to graph the results of a linear regression analysis. The PLOT
procedure is extremely flexible and allows the researcher to control almost
every aspect of the resulting graphs including the axis, the type of line to be
drawn, whether or not all the points will be connected, titles, footnotes and
other legends; it even has color capabilities. Different kinds of graphs can be
drawn ranging from pie charts to bar charts to simple line graphs.
SAS GPLOT jobs are run on a special printer at NERDC, and therefore
must be ROUTEd to a different location on the job cards. Instead of /*ROUTE
PRINT LOCAL, this line should read /*ROUTE PRINT NER.RO (that last
character is a numeric zero, not an alphabetic capital 0). Also, whenever the
SAS GPLOT procedure is being used, the last line before the /*EOJ indicating
the end of the program should be: // EXEC PLOT. This informs NERDC that the
GPLOT procedure will be used during the execution of this program. These two
substitutions are very important to insure execution of the program, and their
exact format must be used, because they are JCL or job control language
statements.
There are a lot of options available under the SAS GPLOT procedure, and
in order to fully exploit the potential of this procedure the SAS GPLOT manual
should really be consulted. However, it doesn't require too many program
statements to run a fairly simple SAS graphics program with success. In the
following few pages the basic statements required will be presented along with
some examples.
Titles The SAS GPLOT procedure allows the researcher to print any title
above the graph to be executed. For example, if the words Nitrogen Content
are to be printed above the graph, the following statement will do it:

Title Nitrogen Content;

Note that all the program statements in the SAS GPLOT procedure must be
followed by semicolons just as in any other SAS procedure.
A footnote reading Over All Treatments at the bottom of the graph can be
printed in the same way by typing:

Footnote Over All Treatments;

In order to distinguish different treatments from one another SAS utilizes
a SY M BOL statement. The SY MBOL statements assign a value to certain
numbers which will be used to designate different treatments. For example, if
corn yields under four different tillage treatments are to be graphed against
increasing fertilizer rates, four SY MBOL statements would be required. Each
SY M BOL statement would designate a certain symbol to stand for a certain
tillage treatment. If the tillage treatment number 1 was to be represented by a
line with stars at the data points, the following statement would assign the
value of star to treatment 1:


SYMBOLl V=STAR;










The V stands for "value", and the possible values using SAS GPLOT range
from all the alphabetic characters to a variety of special symbols such as
STAR, SQUARE, PLUS (+), DIAMOND, TRIANGLE, etc. So, for this example,
suppose the researcher chose to assign symbols to the different treatments as
follows:

SYMBOL V=STAR; ::
SY MBOL2 V=DIA M OND;
SYMBOL V=SQUARE;
SY MBOL4 V=TRIANGLE;

From now on in the program, no matter how many plots are run, the
different tillage treatments will be assigned these values, until a new series of
SY M BOL statements reassigns them. At this point nothing has even been
plotted, but the descriptive titles and symbols have been given. To actually plot
the data a PROC GPLOT statement is used. Many of the same options that
were available for the statistical procedures are also available for the GPLOT
procedure. For example, the data set the data are to be drawn from can be
indicated using a DATA = data set name; statement immediately following the
PROC GPLOT. Also, data can be plotted BY some variable in the data set, for
example if the researcher wanted two separate graphs for two separate years
data he could do this using a BY YEAR; statement. (Remember, whenever a BY
statement is used the data set must first have been SO RTed by the variable in
the BY statement.) Finally, SAS must be told which variables will be plotted in
this PROC GPLOT procedure, and what symbols to use. For example, if yields
(YLD) were to be plotted against fertilizer rates (FERTRATE) for the four
different tillage treatments for two separate years, the following program
statements would be required.

PROC SORT DATA = TILLDATA; BY YEAR;
PROC GPLOT DATA = TILLDATA UNIFORM; BY YEAR;
PLOT YLD*FERTRATE = TRT / OVERLAY;

The SORT statement sorts the data set indicated by the variable YEAR.
The PROC GPLOT statement tells SAS to plot data from the indicated data set
and to print one graph for each YEAR in the data set. The UNIFORM option
used in the GPLOT statement will-make SAS use the same vertical axis for each
graph plotted in the BY group. This is very convenient as it allows direct
comparison of the graphs from several years, even if the values of the variable
plotted vary widely from year-to-year (good growing seasons vs. bad growing
seasons, for example). Finally,_rhe PLOT statement tells SAS to plot YLD
(vertical axis) vs. FERTRATE (horizontal axis). The = sign tells SAS which
SY MBOL statement to refer back to to plot the data with, in other words,
which symbol to use to represent the data points. By stating that the symbol to
be used = TRT (treatment) the researcher can force SAS to use SYMBOLl for
when TRT equals 1, SYMBOL2_when TRT equals 2, SYMBOL3 when TRT equals
3, and SYMBOL4 when TRT equals 4. SAS will go back to the SYMBOL
statements previously given, and assign a STAR to TRT 1, a SQUARE to TRT 2,
a DIAMOND to TRT 3, and a TRIANGLE to TRT 4. The / OVERLAY informs
SAS that all four lines, TRTs 1,2, 3, and 4, are to be overlayed on the same
graph. The result from these program statements will be two separate graphs,
one for each of the two years of the study, with four lines representing four
treatment on each of them.










If the researcher would rather designate the symbols to be used in some
other way (suppose he does not have a single treatment TRT, but 2 factors in a
factorial design) the SY M BO L statement can be used in another way. For
example, the researcher wants to graph two different genotypes of peanut, one
a runner type and the other a bunch type. The following two SY M BOL
statements would assign a B to represent data points for one line and an R to
represent data points on another line:

SYMBOL V=B;
SYMBOL V=R;

Now, if the researcher wanted to plot yields of the two genotypes
(YLDRUN for yield of the runner type and YLDBUN for yield of the bunch type)
vs. fertilizer rate (FERTRATE) for one year he could use the following
statements:

PROC GPLOT;
PLOT YLDBUN*FERTRATE = 1
YLDRUN*FERTRATE = 2 / OVERLAY;

Note that SAS will represent the data points for YLDBUN with SYMBOL1
since YLDBUN*FERTRATE was to be plotted = 1, and SY M BOL1 has value V=B,
therefore all the YLDBUN data points will be printed as Bs. The same type of
logic results in all the data points for YLDRUN being printed as capital Rs.
Note that from the word PLOT to the word OVERLAY is all one SAS
programming statement, which will result in two lines (one for YLDRUN and one
for YLDBUN) being printed on one graph because of the OVERLAY option.
It is in the SY M BOL statement that the researcher has the option to
choose what kind of line the GPLOT procedure will draw. Several types of line,
dashed, dotted, solid, broken etc. are available and are designated by L = a
number indicating the type of line. The numbers for each kind of line are listed
in the SAS GPLOT manual, but for example a solid line (the default value if the
L = option is not used) is indicated by L = 1, and a simple dashed line would be
L = 2. Another option used with the SY M BOL statement is an interpolation
option to specify if the line is to be drawn to smoothly connect all the points (I
= SPLINE) or if the points are to be joined with straight lines (I = JOIN), or
even if no line is to be drawn between the points but only the data points
graphed (I = NONE). The I = option allows SAS to graph regression equations.
For example, if the researcher had a quadratic regression, the option used in
the SYMBOL statement would be I = RQ, a linear regression would be I = RL
and a cubic would be I = RC. The data points would not necessarily fall on the
regression line drawn. When using the I = RL, RQ, or RC options, confidence .
limits of a designated probability level can also be drawn on the graph. The
option I = RLCLM95 would draw a linear regression line on predicted mean (M)
values and a set of 95 % confidence limits. A series of examples should help
illustrate some of the flexibility allowed in the SY MBOL statement:

SYMBOL V=STAR L=2 I=SPLINE;

This statement would result in a dashed line (L = 2) with a STAR
indicating the data points, and the data points would be smoothly connected (I =
SPLINE).


SYMBOL V=X I=RLM99;










This statement results in a solid line (L = default) with an X indicating the
data points, and the line will be drawn as the best fitting linear regression with
99 % confidence limits surrounding it.

SYMBOL V=DIAMOND I=NONE;

This statement results in the data points being printed as diamonds, with
no lines connecting them.
Working with the GPLOT procedure can be frustrating at first, but
perseverence and some attention to the manual will pay off with impressive
results. The cost of the GPLOT procedure is calculated by the inches of paper
used and is equal to approximately $0.90 per graph if they are of a standard
size and run on low priority. The rolls of graphs can be picked up at the
NE RDC center across from the Hub where the other output is distributed, and
are stored in an upright box by job number.











The Last Word?

This manual is written in a cookbook fashion, for the
beginning user. It is hoped that through exploration of the SAS
books and NERDC user manuals the student will rapidly out grow
these basic and simple procedures, and progress far beyond them
in his or her capabilities. If used effectively, a large
regional data processing facility such as NERDC can be
invaluable to the researcher. Initially, mistakes and their
consequent frustrating delays will probably abound, but it is
only through these mistakes that experience and knowledge are
gained. Persistence wilL be finally rewarded with the addition
of a powerful and incredibly flexible tool to the researcher's
resources, capable of saving hours of his or her valuable time.
In conclusion, Good Luck!




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs