KNOWLEDGEBASED TECHNIQUES FOR PARAMETERIZING
SPATIAL BIOPHYSICAL MODELS
By
RAFAEL ANDRES FERREYRA
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2003
Copyright 2003
by
Rafael Andres Ferreyra
This document is dedicated to Lili, my wife.
ACKNOWLEDGMENTS
Many people and institutions contributed to make this dissertation possible. First and foremost, I thank Dr. Jim Jones, my advisor, for his infinite patience and balance in guiding me through my Ph.D. process and helping me grow while supporting my independence. He is an extremely busy man, but he always had time for me and always pushed me to excel and have ambitious goals. His enormous productivity and unvarying affability have been very inspiring.
I thank my other committee members for their help, patience, and sense of humor. Dr. Ken Boote provided muchappreciated advice at several critical junctures of my program. His fantastic course on plant physiology was a revelation for me. Dr. Doug Dankel is a great teacher, and is extraordinarily supportive and friendly. Dr. Wendy Graham, with infinite patience and a keen sense of humor, helped me develop some geostatistical and hydrologic sensibility, and saw me through the publication of my first dissertationrelated paper. Dr. Gerrit Hoogenboom provided valuable advice and constructive criticism throughout the process, did not despair despite my initially scant publishing momentum, and was determined to keep me honest.
I have special thanks for some of my other great teachers at UF, who provided great help and inspiration. Robert McSorley stimulated my interest in agricultural ecology. Carl Barfield's grant writing course is so useful it should be a requirement for graduation. Stanley Latimer helped me organize the GPS work at my research sites, and trusted me with a lot of expensive equipment (I also thank the Trimble Corporation for providing
iv
that equipment to UF in the first place). Anand Rangarajan introduced me to Bayesian Networks, and helped me get my IEEE grant. Sergei Pilyugin is a friendly and very effective teacher of differential equations. Charles Guy teaches a great course on scientific issues. Mickie Swisher's course on methods of scientific inquiry was valuable and fun. Paul Gader stimulated my interest in applications of fuzzy logic. Damon Andrew's badminton course helped me get through a difficult semester. Jon Dain and Franklin Paniagua taught a conflict resolution course that gave me a new perspective on natural resources management and methodologies for solving problems involving people. Ted Spiker welcomed me into his course on magazine and feature writing at the Journalism Department, and taught me much about writing. Finally, Patricia Craddock's professional editing course was very enlightening and valuable during the last weeks of my program. I thank all of these professors; they prove that nothing can replace direct contact with a great teacher.
I am very thankful to Guillermo PodestA of the University of Miami. Guillermo
helped me a great deal in my early weather satellite days, hosting me several times at the Rosenstiel School of Marine and Atmospheric Science (RSMAS). He also encouraged my pursuit of further academic aspirations, and later helped catalyze my coming to UF. He is a great promoter of climate change research in Argentina and continues to invest in argentine atmospheric and agricultural sciences, undaunted by obstacles.
I also thank Dave Letson of the University of Miami. Working with him in ENSOrelated research was great, and I have learnt much from his crisp writing. My gratitude also goes to Jeff White for hosting Dr. Jones and his students at CIMMyT, and to Joe
v
Ritchie for honoring me with an invitation to his symposium in Detroit in 2000. I also wish to thank Bill Batchelor for his valuable career advice and friendly attitude.
I thank Pedro Murillo, Eduardo Toselli, and Miguel Gurassa for their friendship
and support during and after our years working together in Argentina. I learnt much from them. I also thank Julio Dardanelli for his friendship and guidance during my MS, and for contributing to my coming to UF. We have been able to keep our research collaboration going mostly because of his patience and accommodation of the time limitations imposed by my Ph.D. process.
I am exceedingly thankful to Hernan Apezteguia. He generously provided the data and motivation for Chapter 4. He is a wise and generous man, with a fantastic sense of humor. I hope we can continue our collaboration into the distant future.
Ludmila and Yakov Pachepsky deserve more thanks than I can express. Their generosity and friendship know no bounds. They are also fantastic scientists. They convinced me to pursue a Ph.D. and provided me with all sorts of logistic support and encouragement.
I am extremely grateful to AgConnections, Inc. of Murray, KY, and its two owners, Rick Murdock and Pete Clark. They have invested heavily in my work, and have honored me with their friendship and the opportunity to witness how a group of intelligent, enterprising people can prosper given enough effort and powerful ideas. I've seen them grow from three smart guys in a big room a few years ago to their current status as a major force of nature. I greatly thank Rick Murdock for his vision, encouragement, and generosity; I have learnt much from him. His family also provided me with a home away from home in my visits to Murray. I am also very grateful to Pete Clark for his
vi
downtoearth good advice and business perspective; to John Potts, whose sharp intellect and expert knowledge of agroecosystems I frequently tapped for this dissertation; to Joe Bullen, who measured water content in the Suggs 4 field under extreme environmental conditions more times than he would like to remember; and to Chad Wortham, who took care of the weather station in the field, measured soil water content, and was always helpful and in high spirits.
I thank Rick Murdock, Chad Wortham, Pete Clark, Jamie Stockdale, Kenny Kingins, Clay Bailey, Joe Bullen and John Potts for helping me pound so many poorlydesigned plastic tubes through a nearly impenetrable barrier that soil scientists cynically call a "fragipan". I failed to see anything "fragile" about it. Rick's ability to nurse battered steel pipes back to health and improvise a ramming head from pieces of scrap metal is the material from which metalworking legends are wrought.
I am very thankful to Jerry McIntosh of the NRCS, who generously contributed a lot of time and expert knowledge to this work; to Ron Riffey and Tim Taylor who, while at the AGRIS Corporation, helped me assemble a proofofconcept dataset and find my way to AgConnections; and to Chuck Cunnyngham, Mike Fouts, Duane Frederking, Bob Guse, and Russ Henry of Pioneer, who provided me with valuable information on the 3281W corn hybrid.
I thank the people who have helped me publish, especially Dr. Jones, who softly
but relentlessly pushed me to write; Dr. Graham, who patiently helped me with water and geostatistical issues; my departmental reviewers Tom Burks, Daniel Lee, Fred Royce, Carlos Messina, and Jawoo Koo; the anonymous reviewers who greatly improved my manuscripts; and Ludmila B. Pachepsky, who helped me start. I also greatly appreciate
vii
the funding for publication costs received from the IFAS Agricultural Experiment Station Journal Series program.
Carlos Messina is a good friend and an extraordinary scientist. I have profited
greatly from the innumerable hours of scientific discussion we've shared during the last four years. Working and studying with him has been an honor. I am also deeply grateful for his support during the final, hectic hours of my Ph.D. process.
I thank The Royce family, Fred, Estela, Karina, Anelkis and Sofia, for their
friendship and hospitality. The shared evenings at their home really helped me feel at home in Gainesville. I also wish to thank Jawoo Koo; over the last four years he showed his labmates that human kindness and consideration for others can be limitless.
I thank the other fine people of the Crop Systems Modeling Lab, Alagarswamy, Andre, Ayse, Cheryl, McNair, Oxana, Ramkrishnan, Shrikant, Valerie, and Wayne, for their friendship and numerous acts of kindness. Special thanks go to Ricardo Braga, a great friend and crop modeler, who made our early stay in Gainesville very enjoyable.
I also wish to thank Hdctor, Claudia, Juan Manuel, and Lucila Crena. They have been our family in Gainesville.
Several organizations have provided muchappreciated funding for diverse aspects of my work: the United Soybean Board, the Honor Society of Phi Kappa Phi, the Scientific Honor Society of Sigma Xi, the Neural Networks Society of the Institute for Electric and Electronic Engineers, and particularly AgConnections, Inc. I am especially grateful for the funding provided by the University of Florida's College of Agriculture and Life Sciences, CALS, through an Alumni Graduate Fellowship and travel grants. I also thank the Graduate Student Council at UF for its support through travel grants and
viii
for organizing the annual Graduate Student Forum, which provides a valuable opportunity for graduate students to exercise their presentation skills.
One of the most extraordinary and pleasant things I have been exposed to during my program at UF is the inclusive attitude of CALS/IFAS. They funded my program although I was a student of the College of Engineering; and treated me as one of their own from the first day to the last. In particular, I thank Dr. Jane Luzar for being a relentless cheerleader and contributor for the ABE Town Hall, a studentorganized discussion forum we created in our department; and I thank Dr. Mike Martin for participating in it. I thank Carlos Messina, Kelly Brock, Amy Dedrick, and everybody else who in one way or another contributed to getting the ABE Town Hall going. Special thanks go to the professors and graduate students from ABE and other departments throughout UF who came and participated. Hopefully the Town Hall concept will live long and prosper. I also hope our incipient Undergraduate Research Experience program through the Graduate Student Council will do the same.
I am very thankful to the Agricultural and Biological Engineering Department for its friendly, hospitable atmosphere, and for putting so many resources at my disposal. I thank the ABE Department's administrative cast. Jane Elholm, Mary Hall, Mary Harris, Dawn Mendoza, Betty Pearson, and Jeanette Wilson always solved my problems, or did things so well that there were no problems to solve. I also thank Dr. Ken Campbell, the ABE Graduate Coordinator, for his kindness, help, and good advice. I also wish to thank Maud Fraser at the UF International Center for her good advice and good humour.
I wish to thank Tommy Estevez at the UF Meat Processing Center for his kindness and style. He is practically the patron saint of the Argentine community in Gainesville. I
ix
also thank Michelle at the Reitz Union Wendy's for her smile and upbeat attitude. Thanks also go to the Hare Krishas for the Krishna Lunch at UF's Plaza of the Americas. I didn't go often, but I know people who wouldn't have made it through grad school without it.
I thank the United States Postal Service for making a major contribution to my quality of life through their great service. Manuscripts, important paperwork, bills, etc. always came and went flawlessly.
I am thankful for the UF libraries. They amaze me today as they did when I first saw them. I hope I never take something as valuable as that for granted.
Many thanks go to my parents, who supported us throughout the process, on occasion risking their health to visit us; and to my sister. My magnificent children, Nicolis and Tomis, have also been a constant source of wonder and encouragement.
Leaving the best for last, I thank my wife Lili. Without her love, sacrifice and determination, this would have been impossible.
x
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS ............................................................................................. iv
L IS T O F T A B L E S ..................................................................................... ...... .......... xv
L IST O F F IG U R E S ..................................................................................................... . xvi
A B ST R A C T ....................................................................................................... . . . . . xxi
CHAPTER
I IN T R O D U C T IO N ........................................................................................................1
Precision Agriculture and Crop Models .......................................................................1
G oal and O bjectives.................................................................................................. 8
O utline of the D issertation ....................................................................................... 9
2 SOURCES OF ERROR WHEN INVERSE MODELING IS APPLIED TO THE
PARAMETERIZATION OF SPATIALLYCOUPLED CROP MODELS ..........12
In tro d u ctio n .................................................................................................................12
M aterials and M ethods .......................................................................................... 15
Generating Synthetic Yield Maps: the SpatiallyCoupled Model...................15
Generating Synthetic Yield Maps: Input Data ............................................... 20
Estimating Soil Parameters Using Inverse Modeling......................................22
Evaluating With Independent Data .................................................................24
R esults and D iscussion ........................................................................................... 25
Exploring Similar Behavior Between SpatiallyCoupled andUncoupled
M o d els....................................................................................................... . . 2 5
Simulated Yield Profiles..................................................................................28
Inverse Modeling: Parameter Estimation Error.............................................. 30
Evaluating With Independent Data .................................................................36
C o n c lu sio n s.................................................................................................................4 0
3 PLANNING CROP SCOUTING PATHS WITH OPTIMIZATION
ALGORITHMS AND A SELFORGANIZING FEATURE MAP .......................42
In tro d u ctio n .................................................................................................................4 2
T h e o ry .........................................................................................................................4 4
Sequential Approach: Sampling Locations .................................................... 44
Sequential Approach: the Traveling Salesman Problem (TSP) ......................48
xi
The Sim ultaneous A pproach ........................................................................... 48
M aterials and M ethods .......................................................................................... 50
Case Study Problem , Location, and Dataset .................................................... 50
The Sequential A pproach ............................................................................... 51
The Sim ultaneous A pproach ........................................................................... 52
Evaluation of Results....................................................................................... 53
Results and D iscussion ........................................................................................... 54
Sam pling Location Layout ............................................................................. 54
Predictive A ccuracy......................................................................................... 55
Tour Length .................................................................................................... 60
Runtim e .......................................................................................................... 61
Sem ivariogram s................................................................................................ 62
Practical Considerations .................................................................................. 62
Conclusions.................................................................................................................63
4 REDUCING SOIL WATER SPATIAL SAMPLING DENSITY USING
SCALED SEMIVARIOGRAMS AND SIMULATED ANNEALING.................64
Introduction.................................................................................................................64
Tem poral Stability ........................................................................................... 65
Sim ulated A nnealing ...................................................................................... 67
M aterials and M ethods .......................................................................................... 68
Study Location................................................................................................ 68
Sem ivariogram M odeling ................................................................................ 69
The Density Reduction Problem .................................................................... 70
Fitness Functions ............................................................................................. 72
Scaled kriging variance (SKV )................................................................ 72
Scaled m ean squared error (SM SE) ........................................................ 73
V alidation ........................................................................................................ 73
Results and D iscussion ........................................................................................... 75
Sem ivariogram s................................................................................................ 75
D ensity Reduction ........................................................................................... 75
V alidation ........................................................................................................ 79
Residual Analysis of Validation Results and Tests of Kriging Assumptions .....83 Tem poral Stability Analysis........................................................................... 85
Sources of error and N onstationarity.............................................................. 87
Conclusions.................................................................................................................90
5 A FASTER ALGORITHM FOR CROP MODEL PARAMETERIZATION BY
INVERSE MODELING: SIMULATED ANNEALING WITH DATA REUSE ......92 Introduction.................................................................................................................92
M aterials and M ethods .......................................................................................... 95
Sim ulated A nnealing O verview ...................................................................... 95
Crop M odel M anagem ent................................................................................ 97
Case Studies.................................................................................................... 97
Results and D iscussion .............................................................................................100
Case Study 1......................................................................................................100
xii
C a se S tu d y 2 ......................................................................................................10 4
Conclusions...............................................................................................................107
6 USING BAYESIAN NETWORKS TO HELP UNDERSTAND CAUSAL
RELA TION SHIPS ...................................................................................................109
Introduction...............................................................................................................109
Bayesian Nets as Simple Expert Systems to Help Explain and Understand
How Things W ork................................................................................................110
A Sim ple Exam ple................................................................................................113
Deductive Inference..................................................................................................114
Abductive Inference..................................................................................................120
Conclusions...............................................................................................................122
7 INTEGRATING MULTIPLE KNOWLEDGE SOURCES FOR
PARAMETERIZING SPATIAL CROP MODELS WITH
IN V ERSE M ODELIN G ...........................................................................................123
Introduction...............................................................................................................123
M aterials and M ethods .............................................................................................127
Case study: the Suggs 4 Field............................................................................127
Elevation and topographic attributes..........................................................128
Soil electroconductivity..............................................................................129
S o il d a ta ......................................................................................................1 3 0
Yield m aps..................................................................................................131
Soil water data............................................................................................132
Param eter Estim ation Process ...........................................................................133
Know ledge elicitation ................................................................................133
Updating the soil m ap, selecting the sim ulation locations.........................134
The IM fram ework .....................................................................................136
The four param eterevaluation criteria.......................................................138
Objective function (aggregation of criterion results).................................141
Sim ulations........................................................................................................143
The (uncoupled) spatial crop m odel...........................................................143
The spatiallycoupled crop m odel ..............................................................144
W eather data needed for crop sim ulations .................................................144
Initial conditions.........................................................................................145
Genetic coefficients....................................................................................146
Analyses ........................................................................................................... 147
Results and Discussion .............................................................................................148
Elevation, W etness Index ..................................................................................148
Electroconductivity (EC)...................................................................................150
S o ils ...................................................................................................................1 5 3
Yield M aps ........................................................................................................154
Updating the Soil M ap, Selecting the Sim ulation Locations ............................158
Soil Probe Observations ....................................................................................161
The Sim ulation Dom ain ....................................................................................162
xiii
Knowledge Elicitation: Populating the Neighborhood Criteria .................... 163
Knowledge Elicitation: Parameter Sensitivity and Parameter Selection...........168
Observed Yields in the IM Framework Domain ...............................................174
E v alu atio n ..........................................................................................................17 8
Simulations with a neighborhood criterion ................................................179
IM with coupled and uncoupled models ....................................................180
Recommendations for Building Neighborhood Criteria ...................................186
C o n clu sio n s...............................................................................................................19 9
8 CONCLUSIONS ......................................................................................................201
APPENDIX
A THE SIMULATED ANNEALING ALGORITHMS USED IN CHAPTER 4........207
Sacks and Schiller Algorithm ...................................................................................207
Spatial Simulated Annealing ....................................................................................208
Acceptance Criterion.........................................................................................208
Generation M echanism ......................................................................................208
Cooling Schedule...............................................................................................209
B NEIGHBORHOOD CRITERIA DATA AND SOURCE CODE ............................210
Depth Criterion .........................................................................................................210
Source Code for Soil Map Neighborhood and Yield History Criteria,
OW A Operator....................................................................................................214
LIST OF REFERENCES.................................................................................................225
BIOGRAPHICAL SKETCH ...........................................................................................244
xiv
LIST OF TABLES
Table page
21. Soil parameters used for the spatiallycoupled crop model. ...............................21
31. Standardized variograms of the 1999 and 2001 McCallon 1, and 1999 Suggs 4 m aize data. .............................................................................. 62
41. Locations contained in the most relevant patterns mentioned in the text, together
with their values of scaled mean squared error (SMSE) and
scaled kriging variance (SKV) over the calibration and validation data sets.....74 42. Mean total soil water content in the first meter of soil, phenological stage,
and semivariogram model parameters per measurement date, and parameters
for the scaled sem ivariogram m odel. .................................................................. 75
43. R esults of residual analysis. ................................................................................ 85
44. Spearman's rank correlation tests for temporal stability......................................86
51. Crop model parameters and ranges for case study 2...........................................99
71. Different IM scenarios, showing number of instances of each criterion. .......147 72. Semivariogram parameters for the yield map data.................................................154
73. Soil probe observations corresponding to the anomalies in Figure 718. ..............160
74. Values adopted for the neighborhood criterion's constraint thresholds.................164
75. Crop model parameters and ranges used in IM framework. ..................................172
76. Soil water holding characteristics obtained by applying the Saxton pedotransfer
functions to textural fractions taken from the literature.........................................173
BI. M atrix encoding for the soil neighborhood criterion. ............................................210
xv
LIST OF FIGURES
Figure pqge
21. Landscape model used for the spatiallycoupled model simulations...................15
22. Param eterization w eather cases........................................................................... 23
23. Values of CNE02 for different combinations of rainfall and curve number values in
cells 1 and 2 (CN;, CN2) of the spatiallycoupled model. ..................................28
24. Histogram of PESW in available initial condition set........................................ 29
25. Simulated yield profiles made with the spatiallycoupled and
uncoupled m odels............................................................................................. . . 29
26. CN2 values obtained by IM for the spatiallycoupled and uncoupled models.........31
27. FAW values obtained by IM for the spatiallycoupled and uncoupled models.......32 28. CN2 sensitivity coefficient for different landscape positions and models...........34
29. FA W sensitivity coefficient for different landscape positions and models......34 210. Yield RMSE for the twelve parameterization scenarios. .....................................35
211. Evaluation yield RMSE for the spatiallycoupled model when IM initial
conditions are unknown and evaluation initial conditions are known. ................38
212. Evaluation yield RMSE for the spatiallycoupled model when both the
IM and evaluation initial conditions are unknown...................................................38
213. Evaluation yield RMSE for the uncoupled model when IM initial
conditions are unknown and evaluation initial conditions are known. ................39
214. Evaluation yield RMSE for the uncoupled model when both the
IM and evaluation initial conditions are unknown...................................................39
31. Sampling locations and tour lengths for the 3 cases at a sampling density
of 2.5/ha (22points).......................................................................................... . 56
32. Evaluation of the sampling schemes produced by the three cases at different
sam pling densities. ............................................................................................. . 57
xvi
33. (A) Observed 1999 yield map; (B) MMKV+TSP estimate at 2.5 samples/ha
(22 points); C) MMKV + TSP estimate at 10 samples/ha (88 points)................58
34. (A) Comparison of the three cases' tour lengths. (B) Difference between
MMSD / MMKV case tour lengths and expertderived tour lengths...................60
41. Layout of the m icrow atershed............................................................................. 69
42. Progress of the five instances of the scenario defined by the scaled
mean squared error (SMSE) fitness function and the
Spatial Simulated Annealing (SSA) algorithm. ..................................................77
43. Results of the calibration and validation processes for each of the four
scenarios, shown as scaled kriging variance and scaled mean squared error...........78
44. Maps of interpolated water content for both validation dates.............................80
45. Map of relative prediction error of the best SMSEcalibrated scenario for
both validation dates........................................................................................... . . 82
46. Distribution, for each validation date, of relative prediction error in the
estimated locations not belonging to the optimal pattern for the optimal SKVcalibrated, and the optimal SMSEcalibrated scenarios......................................82
47. Residual analysis of validation results and tests of kriging assumptions
for both validation dates ...................................................................................... 84
48. Ranked intertemporal relative deviation from the mean (across the
m icrowatershed) spatial soil water content, 6 . .......................................................88
51. A hypothetical field divided into three environments (soil types)......................94
52. O bjective function used in case study 1 ............................................................... 98
53. Objective function vs. number of algorithm iterations for six runs of the
sim ulated annealing algorithm ...............................................................................101
54. Unique objective function calculations (equivalent to crop model runs) for
18 scenarios (7 repetitions / scenario) in case study 1. ..........................................103
55. Total model runs vs. number of locations per environment for case study 2.........104
56. Error at each location of interest for the grid search and two
sim ulated annealing scenarios of case study 2. ......................................................105
61. Scale used to translate between verbal quantifiers and probabilities.....................114
62. Causal model of soil roughness made using a Bayesian network..........................115
xvii
63. Conditional probability tables for the soil roughness model..................................116
64. Deductive inference on a Bayesian network. .........................................................118
65. Deductive inference in the Bayesian network........................................................119
66. Deductive inference in the Bayesian network........................................................120
67. Abductive inference in the Bayesian network........................................................121
7 1. S u g g s 4 field ...........................................................................................................12 9
72. Original division of the Suggs 4 field into soil types.............................................130
73. Custombuilt predrilling tool for use with the threeprong TDR probe................133
74. IM framework for the proposed SCM parameter estimation process.........138
75. Dependence of the yield history criterion for year i (YHC,) on the value of
p aram eter k . ............................................................................................................13 9
76. Functions used for evaluating the neighborhood constraints.................................140
77. Monthly rainfall in Murray during 197099..........................................................1 44
78. Semivariogram estimated from elevation data.......................................................148
79. Wire frame elevation map of the Suggs 4 field......................................................149
710. Wetness index calculated for the Suggs 4 field......................................................150
711. Semivariogram estimated from surface and deep electroconductivity data...........151
712. Veris electroconductivity maps of the field. ..........................................................152
713. Semivariograms fitted to the observed yield data. .................................................155
714. R esam pled 1997 m aize yield data..........................................................................156
715. Resam pled 1998 Soybean yield data......................................................................156
716. R esam pled 1999 m aize data...................................................................................157
717. Normalized threeyear (1997, 1998, 1999) yield map. ..........................................157
718. Summary of anomalies and candidate zones for additional soil map units
identified during discussion sessions with the domain experts..............................158
719. Field observations w ith a soil probe.......................................................................161
xviii
720. Set of 13 soil types used as the IM framework simulation domain. ......................163
721. Soil depth neighborhood criterion..........................................................................165
722. W etness neighborhood criterion.............................................................................166
723. Plant density neighborhood criterion. ....................................................................167
724. Using diagrams to support the discussion and knowledge elicitation process.......169 725. Conceptual water balance model expressed as a causal map.................................170
726. Record of a discussion session with domain experts. ............................................171
727. Compact representation of the limiting factor data................................................172
728. Observed crop yield in the 13 locations of interest................................................174
729. Observed relative crop yield in the 13 locations of interest for the
two calibration years (1999 and 2001) and validation year (1997)........................175
730. Cumulative rainfall from Jan. I to Aug. 23 during 1997, 1999, and 2001. ...........176
73 1. R ainfall during the crop season..............................................................................177
732. Evaluation locations, shown on the normalized yield map....................................178
733. Relative position on the landscape of the four locations used for evaluation........178 734. Five realizations of parameterization using the IM framework with only one
objective function input, the soil depth neighborhood criterion. ...........................180
735. Parameter estimates of 2year coupled and uncoupled model IM scenarios. ........181
736. Errors and comparison of yields for 2year coupled and uncoupled model
IM scenarios relative to observed values. ..............................................................181
737. Parameter estimates of 3year coupled and uncoupled model IM scenarios. ........182
738. Errors and comparison of yields for 3year coupled and uncoupled model
IM scenarios relative to observed values. ..............................................................182
739. Simulated and observed soil water data for the 2001 crop season,
015 cm layer of the LoB soil location...................................................................189
740. Simulated and observed soil water data for the 2001 crop season,
1530 cm layer of the LoB soil location.................................................................189
xix
741. Simulated and observed soil water data for the 2001 crop season.
3045 cm layer of the LoB soil location.................................................................190
742. Simulated and observed soil water data for the 2001 crop season,
4560 cm layer of the LoB soil location.................................................................190
743. Simulated and observed soil water data for the 2001 crop season,
015 cm layer of the CaA (W ) soil location............................................................191
744. Simulated and observed soil water data for the 2001 crop season,
1530 cm layer of the CaA(W ) soil location..........................................................191
745. Simulated and observed soil water data for the 2001 crop season,
3045 cm layer of the CaA(W ) soil location..........................................................192
746. Simulated and observed soil water data for the 2001 crop season,
4560 cm layer of the CaA(W ) soil location. .........................................................192
747. Simulated and observed soil water data for the 2001 crop season,
015 cm layer of the GrB(Ba) soil location............................................................193
748. Simulated and observed soil water data for the 2001 crop season,
1530 cm layer of the G rB(Ba) soil location..........................................................193
749. Simulated and observed soil water data for the 2001 crop season,
3045 cm layer of the GrB(Ba) soil location..........................................................194
750. Simulated and observed soil water data for the 2001 crop season,
4560 cm layer of the GrB(Ba) soil location..........................................................194
751. Simulated and observed soil water data for the 2001 crop season,
015 cm layer of the H n soil location.....................................................................195
752. Simulated and observed soil water data for the 2001 crop season,
1530 cm layer of the H n soil location...................................................................195
753. Simulated and observed soil water data for the 2001 crop season,
3045 cm layer of the H n soil location...................................................................196
754. Simulated and observed soil water data for the 2001 crop season,
4560 cm layer of the H n soil location...................................................................196
755. Simulated and observed cumulative transpiration for the 2001 crop season,
H n so il lo catio n . .....................................................................................................19 7
756. Simulated and observed cumulative transpiration for the 2001 crop season,
G rB (B a) soil location . ............................................................................................197
xx
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
KNOWLEDGEBASED TECHNIQUES FOR PARAMETERIZING SPATIAL BIOPHYSICAL MODELS
By
Rafael Andr6s Ferreyra
December 2003
Chair: James W. Jones
Major Department: Agricultural and Biological Engineering
This study presents new approaches for practical problems related to using crop models in precision agriculture. Agriculture is becoming increasingly competitive and regulated. Farmers must maximize profits yet decrease their farms' environmental impact. Precision agriculture has been proposed as a way to improve farmers' income and minimize the environmental impact of farming by optimizing the applied levels of fertilizers and other crop inputs on a sitespecific basis. However, for spatially variable prescriptions to be effective, farmers need to thoroughly understand how several interacting physical and biological factors contribute to cause spatial yield variability.
Crop simulation models are software programs that imitate plant growth and
development. They can help us understand spatial yield variability and how to manage it. However, crop models have expensive and impractical soil data requirements, especially for spatial applications. A technique called inverse modeling uses the crop models themselves to search for the model parameters that best fit observed results. This
xxi
technique is very convenient for practical applications in precision agriculture, but its current state of development does not ensure good predictive power.
Our objectives were
* To identify and quantitatively compare different sources of error in the use of
inverse modeling to parameterize spatially coupled and uncoupled crop models.
" To develop methods for optimizing spatial sampling schemes for representing the
spatiotemporal variability of yield and yieldlimiting factors.
* To develop and evaluate a portable framework for eliciting knowledge from
experts using that knowledge to parameterize a spatial crop model.
We found that crop yield spatiotemporal variability in a field can be represented using a limited number of sampling locations; that those locations can be found using efficient combinatorial optimization algorithms; and that in many applications crop model results in the sampling locations can be kept within acceptable error levels without needing the computationally intensive coupling (i.e., interchange of water) between simulation locations. This can be facilitated by imposing a set of spatial constraints on the system during the inverse modeling process. The constraints can be elicited from local domain experts.
xxii
CHAPTER 1
INTRODUCTION
Precision Agriculture and Crop Models Presentday row crop (maize, soybeans, etc.) agriculture is beset by economic and environmental problems. Commodity prices decreased steadily through the 20th Century (USDA NASS, 1994), increasing the economic risk of agricultural production. Moreover, growing levels of environmental regulatory pressure have limited farmers' ability to manage risk. The limitations include water body contamination limits and total maximum daily loads (EPA, 2003), market limitations on the use of genetically modified organisms, and competition between agricultural and urban water use.
Precision, or sitespecific, agriculture has been proposed as a way for improving farmers' income and reducing the environmental impact of agriculture by optimizing the applied doses of fertilizers and other inputs on a sitespecific basis (NRC, 1997). Precision agriculture merges several enabling technologies such as global positioning systems (GPS), geographical information systems (GIS), and realtime variablerate application technology (Morgan and Ess, 1997). Sitespecific management requires equipment that can vary application rates in real time while moving through the field (Anderson and Humburg, 1997). Another necessary component for variablerate application is the spatiallyvariable prescription (i.e., sitespecific dosage of crop inputs) (Morgan and Ess, 1997).
Effective prescriptions require a thorough understanding of the causes of spatial yield variability, as well as objective methods for predicting crop yield responses to
1
2
changes in specific inputs. In the case of variablerate application of fertilizers, prescriptions are usually driven by yield goals and soil test results, and are generally based on crop and soil nutrient budgeting (Hergert et al., 1997). However the response functions that are used to make recommendations from these inputs are frequently based on soil test results originally aggregated over a whole field, and often organized at the state or regional level (LowenbergDeBoer and Swinton, 1997). Additionally, they may possibly be biased toward overapplication, due to assumptions made regarding farmers' preferences (Hergert et al., 1997). This may lead to unexpected responses of the crop to variablerate application, with the consequent reduction in the perceived value of sitespecific variablerate technology.
Precision agriculture has, so far, led less to the growth in farm decisionmakers'
understanding of the causes of spatial yield variability than it has led to the development of machinery to monitor spatial yield variability and apply prescriptions. There have been many advances in techniques for measuring and representing spatial yield variability for many crops (Pierce et al., 1997). Yield monitors have been developed using realtime direct volume methods (Borgelt, 1993; Searcy et al., 1989); realtime direct weighing methods (Schrock et al., 1995; Wagner and Schrock, 1989); and indirect, pressureplate methods (Birrell et al., 1996). Their accuracy has also been studied, both in laboratory and field settings (AlMahasneh and Colvin, 2000; Arslan and Colvin, 1999). Yield monitors and yield mapping methods have been developed for many crops, such as barley (Stafford et al., 1996;); maize (Perez Muhoz and Colvin, 1996; Pfeiffer et al., 1993); peanuts (Boydell et al., 1995); potato (Campbell et al., 1994; Rawlins et al., 1995); soybean and maize (Jaynes and Colvin, 1997); sugarbeets (Hofman et al., 1995); and
3
wheat (Miller et al., 1988). There is also a large body of research on variablerate technology (Anderson and Humburg, 1997), including the development of appropriate spray nozzles (Miller and Smith, 1992); spinnerdisc fertilizer applicators (Fulton et al., 200 1); the special case of center pivots (Camp and Sadler, 1998); control requirements (Paice et al., 1996); and accuracy analysis (Goense, 1997; Way et al., 1992; Weber et al., 1993).
Many researchers have sought to understand spatially variable yieldlimiting factors of major crops using statistical regression analysis. Several of these studies focused on maize (Braga, 2000; Everett and Pierce, 1996; Mallarino et al., 1996; Tomer et al., 1995); maize and soybeans (Khakural et al., 1996; Cambardella et al., 1996; Kessler and LowenbergDeBoer, 1998; Sudduth et al., 1996). However, when these studies correlated yield with soil properties or terrain attributes, either they described a very limited fraction of yield variability (Everett and Pierce, 1996; Kessler and LowenbergDeBoer, 1998; Mallarino et al. 1996; Sudduth et al., 1996), or the relationships were not consistent across different years (Braga, 2000; Tomer et al., 1995).
These results suggest that purely statistical approaches may have descriptive value, but are not appropriate for predictive purposes. Interannual variability of weather, the spatial variability of soil properties, and the landscapepositiondependence of other processes create a dynamic environment for plant growth. The problem of separating the effects of different environmental factors using statistical techniques is especially complex because crop yield in a field is controlled by numerous concurrent factors. Furthermore, plant susceptibility to a particular factor may depend on the crop's
4
developmental stage and on other environmental conditions, and in field experiments the number of variables can easily exceed the number of available data.
Crop simulation models have been used as a powerful analytic tool to understand environmental influences on crop yield. They provide the unique opportunity to account for many interacting yieldinfluencing factors in ways that are impossible with traditional agronomic experimentation. Crop models have often been used to analyze the causes of temporal, weather and climaterelated yield variability (Boote et al., 1996; Ferreyra et al., 2001; Parry et al., 1999), and have recently also been used for understanding spatial yield variability (Batchelor et al., 2002; Braga, 2000; Irmak et al., 2001; Paz and Batchelor, 2000; Paz et al., 1998; Sadler et al., 2000). Crop simulation models' ability to reproduce both temporal and spatial crop yield variability suggests that they may be ideal tools for diagnostic and prescriptive use in precision agriculture.
However, the quality of cropmodelbased diagnostic analyses and prescriptions depends on the accuracy with which the models' parameters, or values that represent characteristics of the model and remain constant throughout a simulation (Jones and Luyten, 1998), are determined. Measuring soil waterholding parameters is timeconsuming (Klute, 1986) and expensive (a typical value is U$S 20 for testing one soil water holding limit on one soil sample; see A&L Labs, 2003), but given accurate field measurements, crop simulation results can capture variability well, as shown by Braga (2000) with the CERESMaize model (Ritchie et al., 1998). Conversely, if parameter values are taken from coarse estimates such as soil survey data, the model may perform poorly, as found by Sadler et al. (2000) using the same model.
5
The enormous cost of sampling soil hydraulic parameters at an adequate spatial density for using crop models in precision agriculture (henceforth, we will refer to crop models used for simulating spatiotemporal variability as spatial crop models) motivated the search for alternatives for estimating the parameters instead of measuring them. Inverse modeling (IM) is an estimation method suitable for use with crop models. It uses the model itself and a search algorithm to propose parameter values. Welch et al. (I 999a) used an IMbased method for estimating crop model genetic coefficients. The method exhaustively simulated all the parameter combinations in a discrete input space (the parameter space), and then examined the results to find the best parameter combination (or parameter set) for each crop variety. The "best" parameter set was defined as the one producing the lowest value of an objective function: the sum of squared residuals between simulated and observed data. Irmak et al. (2001) expanded the grid search concept for estimating soil parameters.
Using inverse modeling to parameterize a crop model is not without problems. When the observed crop yield has been affected by a factor not considered by the crop model, the inverse modeling algorithm will attempt to explain the effect using soil properties. Most crop models do not account for yieldlimiting factors such as pests, weeds, diseases, nutrients, and extreme pH. Thus, attributing yield losses by using IM to match predicted and observed yields may yield incorrect parameter values.
Extending the models to simulate the effects of additional factors is possibleFallick et al. (2002), Paz et al. (2001), and Irmak et al. (2002) extended crop models to include soil pH, soybean cyst nematodes, and weed effectsbut in most practical applications quantitative knowledge about the effect of extraneous factors on yield is
6
imperfect (thus reducing confidence in the model results). Additionally, quantifying the factor itself (e.g., sitespecific degree of nematode infestation) may be difficult or impractical. Uncertainty in observed yield, due to extraneous factors or yield monitor measurement errors (Lark et al., 1997; Whelan and McBratney, 1997) thus implies the possibility of error in the parameter estimates, with the consequent degradation in the quality of the model's predictions.
Moreover, to date there have been few efforts at using crop simulation models for describing how the spatial variability of soil water content influences crop yield. Considering that droughtrelated stresses typically limit crop growth in the world's major cropproducing regions, spatial water distribution is an important consideration when applying precision agriculture in those regions.
Paz and Batchelor (2000) used the CROPGRO model (Boote et al., 1998) to
analyze spatial soybean yield variability in a field, attributing it to the spatially variable effects of plant density, weeds, soybean cyst nematodes, and water stress. Previously, Paz et al. (1998) explained the influence of water stress in terms of rooting depth (loosely equivalent to a soil water holding capacity parameter), and a drainage parameter, either the saturated hydraulic conductivity (KSAT) or a soil drainage rate coefficient (SLDR), both of which control the rate at which saturated soil drains fully down to its drained upper limit. These authors noted that their modeling scheme's ability to reproduce observed data degraded in lowlying areas, possibly due to the model's inability to account for runon or subsurface flow from neighboring areas. Indeed, given that these processes were not considered, the parameterization procedure tried to explain excess water across the field in terms of slow drainage or increased rooting depth. This might
7
affect the predictive capability of this procedure when the model is used in years not used for parameter calibration.
If spatial crop models accounted for threedimensional water movement over the landscape, they could possibly better explain the causes of waterstressinduced yield variability. This would require the inclusion into the spatial crop model of a threedimensional water balance simulation across the landscape. Such a complete model does not currently exist; and its parameterization by inverse modeling could prove to be computationally intractable.
We will use the term spatiallycoupled crop model to denote a spatial crop model in which landscape units interchange water and nutrients. Each landscape unit (or cell) in such a model cannot be parameterized independently using IM, because variation of the parameters of one cell could affect the availability of water, and thus the yield, of another cell. In principle, such a model demands a simultaneous parameterization of all the cells. However, the parameter space size (i.e., the number of unique parameter combinations among which the optimum must be found) would then grow exponentially with the number of cells into which the field of interest was divided. Also, it is not clear whether the uncertainty in the determination of crop model parameters would not be amplified through a spatiallycoupled model.
An alternative scheme is to reduce the complexity of the problem by representing the spatial variability of crop yield with geostatistical techniques (Goovaerts, 1997), and applying a spatial interpolation algorithm to the yield values simulated independently in a limited number of locations. The parameterization of an uncoupled model (i.e., one in which the cells are not spatially coupled) would then be modified to improve its ability to
8
capture the behavior of the crop at these locations of interest. To avoid the confounding influence of the aforementioned extraneous factors, this process would have to rely on additional information.
Spatiotemporal yield variability can be simulated accurately when the spatial and temporal behavior of the primary yieldlimiting factor (typically soil water) is simulated properly (Braga, 2000; Calmon et al., 1999; Ferreyra, 1998). Direct measurements of soil water content are not appropriate for practical, extensionist or consultantdriven applications of crop models in farm decision support (R. Murdock, pers. comm.); but other sources of knowledge are available, typically in the form of expert opinion. Farmers, extension agents, crop consultants, and other experts such as Natural Resource Conservation Service (NRCS) soil scientists, possess a wealth of knowledge about dominant behavior of different parts of the field, including wetness, soil properties, weed pressure, and so on. Eliciting this knowledge is relatively inexpensive, but harnessing it in a way that can be valuable for parameterizing a spatial crop simulation model is a challenging endeavor; it is also the problem that motivates this dissertation.
Goal and Objectives
The goal of this study was to develop new approaches and practical solutions for the problem of spatial crop model parameterization for precision agriculture applications in which soil water availability is the primary yieldlimiting factor. Its specific objectives are the following:
1. To identify and quantitatively compare different sources of error in the use of
inverse modeling to parameterize spatially coupled and uncoupled crop models.
2. To develop methods for optimizing spatial sampling schemes for representing the
spatiotemporal variability of yield and yieldlimiting factors.
9
3. To develop and evaluate a portable framework for eliciting knowledge from experts
and using that knowledge to parameterize a spatial crop model.
Outline of the Dissertation
This dissertation contains six components that address the above listed objectives: Chapters 2 to 7. Each chapter is selfcontained and can be read independently. Chapter 2 addresses the first objective; Chapters 3 and 4 address the second; Chapters 5 to 7 address the third.
Chapter 2 compares three different sources of error in the use of inverse modeling to parameterize a spatial crop model: a) spatiallycoupled vs. uncoupled model; b) lack of knowledge about initial conditions; and c) biases in weather data used for parameterization. We used inverse modeling to parameterize a simple spatial crop model based on CROPGRO, exploring different scenarios built from combinations of different levels of the errors mentioned above. For the simulations we used weather and soils data from a waterlimited environment in C6rdoba, Argentina.
Chapter 3 develops solutions to the problem of concurrently obtaining an optimal spatial sampling scheme for a phenomenon of interest (e.g., yield) and an optimal closed scouting path that links the locations of the sampling scheme. This problem is characteristic of crop scouting, and is relevant for the observation of both crop yield and the level of yieldaffecting factors in a field. Chapter 3 also explores the problem in the context of minimal data requirements.
Chapter 4 elaborates on the concept of spatial sampling scheme optimization, applying it to the soil water content domain. We compared different algorithms and objective functions for obtaining the best spatiotemporal predictive capability with a given number of locations, and explored the predictive limits of geostatistical techniques
10
in a landscape in which spatial water movement is believed to occur. For this chapter we used a soil water dataset from C6rdoba, Argentina: 5 dates of soil water content observations in 57 locations on an isometric grid covering an 8.3hectare microwatershed.
Chapter 5 revisits the search algorithm used by Irmak et al. (2001) for
inversemodelingbased parameterization. These authors compared an exhaustive grid search method with the sophisticated adaptive simulated annealing algorithm proposed by Ingber (1993) and used by Braga (2000), Calmon et al. (1999), and Paz et al. (1998) for estimating crop model soil parameters. Irmak et al. (2001) claimed better performance using the former. We developed a hybrid between the simulated annealing algorithm and a grid search. Our algorithm is more efficient than the grid search and capable of providing tentative solutions that can be progressively updated.
Chapter 6 introduces Bayesian networks as tools for representing knowledge about causal relationships and for combining different sources of knowledge into a common probabilistic framework. It shows a simple example of how we used Bayesian networks technology to help understand the causes of, and predict, spatiotemporal yield variability in a field in Kentucky during discussions involving domain experts: the farmer, an NRSC soil scientist, and crop consultants.
Chapter 7 builds on the preceding chapters. We used a twotiered approach to the inverse modeling parameterization problem. The lower tier is a network of spatial relationships of crop model inputs or outputs among the locations of interest, elicited from interaction with domain experts. The top tier is the (uncoupled) crop model, run for each location of interest. The spatiallycoupled bottom tier constrains the behavior of the
I1
uncoupled top tier; the problem remains computationally tractable, despite its large parameter space, because evaluating the spatial constraints is very fast compared to a single crop model run, and because we used the fast algorithm developed in Chapter 5.
In Chapter 7 we also explored a case study using the scheme mentioned above to parameterize and test a spatial crop model based on CERESMaize in a field in Kentucky, USA. Available data included realtime kinematic GPSderived elevation data, three years of corn yield maps, one year of wheat and soybean yield maps, SCS Soil Survey data, soil electroconductivity data, and in situ soil probe observations and soil water content time series, coupled with expert opinions.
Finally, in Chapter 7 we also applied the abovementioned network of spatial
relationships to constrain the IM parameterization of a simple, spatiallycoupled, CERESMaizebased crop model in the same field in Kentucky.
CHAPTER 2
SOURCES OF ERROR WHEN INVERSE MODELING IS APPLIED TO THE
PARAMETERIZATION OF SPATIALLYCOUPLED CROP MODELS Introduction
Agricultural production of crops such as corn, soybeans, and wheat currently poses economic and environmental problems. Inflationadjusted agricultural commodity prices decreased steadily through the 20th Century (USDA NASS, 1994). This contributed to make agricultural production less profitable and more risky. Moreover, increasing levels of environmental regulatory pressure limit farmers' riskmitigating management options. These limits are manifested through groundwaterqualitydriven limits on fertilizer and pesticide applications, through market limitations on the use of genetically modified organisms, and through competition between agricultural and urban water use.
Precision agriculture in general, and variable rate application technology in
particular, has shown promise for addressing both economic and environmental concerns of agricultural production (National Research Council, 1997). From an economic perspective, farmers could conceivably maximize net returns by boosting yields in areas where crop growth can respond to additional inputs. Additionally, the use of fertilizers, pesticides, lime, etc. could be minimized in lowyielding areas where crop growth is limited by factors beyond the farmer's control. From an environmental viewpoint, it would be possible to approach the ideal situation in which all the inputs applied to the crop would actually be consumed by it, leaving none free to contaminate the environment (Pierce and Nowak, 1999).
12
13
Variable rate application technology is controlled by spatially variable
prescriptions, i.e. sitespecific dosages of crop inputs (Morgan and Ess, 1997). Making these prescriptions requires understanding the causes of spatial yield variability, as well as the sensitivity of crop yield to the application of specific inputs.
Crop models have been used as analytical tools to understand environmental influence on crop yield. Crop models provide a unique opportunity to account for numerous factors influencing yield in ways that are impossible with traditional agronomic experimentation. Models have often been used to analyze causes of temporal yield variability related to weather and climate (Boote et al., 1996; Messina et al., 1999; Rosenzweig and Iglesias, 1998), and have recently also been used for understanding spatial yield variability (Irmak et al., 2001; Paz and Batchelor, 2000; Paz et al., 1998). Crop simulation models' ability to reproduce both temporal and spatial crop yield variability suggests that they may be ideal tools for diagnostic and prescriptive use in precision agriculture.
However, to date there have been few efforts at using crop simulation models for describing how the spatial variability of soil water content influences crop yield. Considering that droughtrelated stresses typically limit crop growth in the world's major cropproducing regions, spatial water distribution is an important consideration when applying precision agriculture in those regions.
Paz and Batchelor (2000) used the CROPGRO model (Boote et al., 1998) to
analyze spatial soybean yield variability in a field, attributing it to the spatially variable effects of plant density, weeds, soybean cyst nematodes, and water stress. Previously, Paz et al. (1998) explained the influence of water stress in terms of rooting depth (loosely
14
equivalent to a soil water holding capacity parameter), and a drainage parameter, either the saturated hydraulic conductivity (KSA T) or a soil drainage rate coefficient (SLDR), both of which control the rate at which saturated soil drains fully down to its drained upper limit. These authors noted that their modeling scheme's ability to reproduce observed data degraded in lowlying areas, possibly due to the model's inability to account for runon or subsurface flow from neighboring areas. Indeed, given that these processes were not considered, the parameterization procedure tried to explain excess water across the field in terms of slow drainage or increased rooting depth. This might affect the predictive capability of this procedure when the model is used in years not used for parameter calibration.
Perhaps cropmodeling efforts in precision agriculture could explain more reliably the causes of water stress induced yield variability if spatial water movement were explicitly considered, i.e. if the model simulated the coupling, or interchange of water and possibly nutrients, between different landscape locations. This should include a three dimensional water balance simulation across the landscape. Such a complete spatiallycoupled crop model does not currently exist. However, a simple approximation can be used to test our working hypothesis, implicit in the literature to date, that an appropriate selection of parameters can allow an uncoupled model, i.e. one in which the different landscape units do not interchange water, to reproduce the spatiotemporal variability of simulated yield produced by a spatiallycoupled model. The specific objectives of our study were:
1. To develop a simple spatiallycoupled water balance model, and use it to generate
synthetic crop yield maps.
2. To determine under which conditions, if any, spatiallycoupled and uncoupled
models might produce similar results.
15
3. To estimate, using spatiallycoupled and uncoupled models with inverse modeling
(IM) techniques, the soil parameters of the spatiallycoupled model, and quantify
the errors incurred.
4. To quantify the error of prediction incurred when using the different sets of soil
parameters resulting from objective 3 to predict yields in years not used for
parameterization.
Materials and Methods
Generating Synthetic Yield Maps: the SpatiallyCoupled Model
We simulated a soybean crop across a spatially variable field using the
CROPGROSoybean model (Boote et al., 1998). We chose soybeans because the CROPGRO model reproduces crop responses to multiple environmental factors such as temperature and day length, as well as the effects on plant growth of both insufficient and excessive soil water content. The latter two are of special interest in this study, since they can be expected to show spatial variability across agricultural fields.
Pi
Cell 29 Cell 1
CeCel 1
Cell 9
Cel 10
Figure 21. Landscape model used for the spatiallycoupled model simulations. Note that
the cells only communicate via surface flow; subsurface lateral flow was
assumed to be insignificant.
We made a simple spatial extension to CROPGRO as shown in Figure 21. We assumed an agricultural field with significant topographical variation along one
16
dimension and little or no variation along the other. We approximated the field with a toposequence (Figure 21) composed of several (10) cells that represent parallel wholefieldlong swaths of some arbitrary width. This is equivalent to a sloping field having straight, parallel contour lines.
The model can be configured so the cells behave in one of two ways during rainfall events:
* Spatiallycoupled: the surface of each cell receives input of water from rainfall and from cells uphill of it. Water outputs from the cell surface are infiltration and runoff to the cells downhill of it. " Uncoupled: the surface of each cell only receives inputs form rainfall, with
no contribution from neighboring cells. Runoff is assumed to be lost and does not contribute water to neighboring cells. In both cases, we partitioned water inputs into infiltration and runoff using the SCS Curve Number Method (USDA SCS, 1972), which we chose due to its simplicity and popularity, and because it is already built into CROPGRO. A detailed explanation of the method is shown below to make subsequent work clearer.
The curve number method can be derived starting from a mass balance equation applied to a storm event
R = P,  I (21)
where R (mm) is runoff, I (mm) is the amount of infiltration and surface retention, and Pe
(mm) is a term called effective precipitation. In the SCS method this is defined as the amount of rainfall that can contribute to runoff, equal to the rainfall amount exceeding an initial abstraction Ia (mm), which is the amount of precipitation necessary before runoff can begin. Thus,
P, =P1, (22)
17
where P (mm) is precipitation. The critical assumption in the curve number method is
R I
R 1 (23)
P, S
where S is the potential maximum retention (in mm). Equation 23 stipulates that the ratio of runoff to the effective rainfall is the same as the ratio of actual retention to S (Boughton, 1989). Operating with Equation 21 yields
I=P,R, (24)
Replacing Equation 24 into Equation 23, R  R +  = PC>
P, S P S S
R = _ _ _ = P__2 , and replacing P, with Equation 22,
S I+ I S+ P (S+P) (S+P,)
SP, S P, P, P
R = " (25)
(PIa)+S
A second assumption of the method is that I, = 0.2  S (26)
Replacing Equation 26 into Equation 25, and considering the definition of Ia yields the SCS runoff equation:
R = P02) for P > 0.2S
(P + 0.8S) (27)
0 for P U0.2S
The potential retention parameter S is usually expressed in terms of a dimensionless runoff curve number CN through the expression S = 25.4 1000 10 (mm) (28)
(CN
18
2
P  (5.08 1000 10
s,  =C for P > I, (mm) (29)
P+20.321 10
CN
0 forP
We ran the modified version of CROPGRO successively for the 10 cells, starting downward from the highest cell in the toposequence. The first (highest) cell did not ever receive runon, since it is assumed that there are no positions higher in the landscape that contribute water to it. In the spatiallycoupled model, each one of the successively lower cells received daily runon equal to the total daily runoff in the cell immediately above it. The main assumptions are:
1. The only relevant processes in the landscape are rainfall P, infiltration I, and runoff
(or runon) R.
2. The total runoff volume from cell iI equals the total volume of runon to cell i. 3. The runon to a cell can be considered to generate runoff as if it were additional
rainfall (i.e. that we can use the SCS runoff equation to estimate the runoff from a
cell, taking as rainfall input the sum of rainfall and runon in the cell).
4. The total runoff volume from the field is equivalent to the total runoff volume from
the last cell.
5. The field is assumed to be partitioned into N cells of equal area, with each cell
having an area A, = A / N, where A is the total area of the field and N is the total
number of cells into which it is partitioned.
Assuming P > Ia, mass balance for an individual cell i, can be expressed as:
A,P, + A,_, R,_ = A,I, + A, R, + A,I, (210)
Where P, is the rainfall on the cell, Ri is the runon to the cell, equal to the runoff from the previous cell iI, I, is the infiltration, Iaj is the initial abstraction and R, is the runoff. This equation is expressed in terms of total volumes of water. However, if all the cells have an equal area AP, Equation 210 can be divided by A, and the equation will
19
then be expressed in volume of water per unit area. Infiltration can thus be expressed as follows:
I, = P, + R,_,  R,  1, (211)
Runoff can be calculated directly using the SCS runoff Equation 29, modified to add runon to the precipitation on a cell (and assuming that the areas of all cells are equal): R (P + R,, 0.2.S,) (212)
(P,+R,,+0.8S,)
where S, is the maximum retention term. Finally, replacing Equation 28 into 212: (1000~1
P, +R,_, 5.08 101 0
CN
R, = (000 for P > I,, (mm) (213)
P,+R_1 +20.32 10 10
CN,
0 forP
We used Equations 211 and 212 to calculate the runoff and infiltration for each cell. We began from the top of the slope because cell I receives no runon; consequently, the value of RI could be calculated directly using Equation 29. We moved downward from cell 1 to the last cell N, calculating, on a daily basis, R and then I, for each cell. This assumes that the field is steep enough for all the runoff to leave the field in one day.
We ran the model for 30 years of historic weather data. Assuming that the spatiallycoupled crop model is a perfect predictor of crop yield, this procedure created a synthetic yield map for each weather year and parameter pattern. Yield varied spatially, i.e. across the cells, due to differences in soil properties from cell to cell and differences in spatial soil water distribution.
It was not necessary to modify the crop model itself to obtain the additional spatial functionality (daily runon is added to each cell's precipitation input), but it was
20
convenient to write a simple shell program to invoke the successive model runs and manage the input and output data of the cells of the toposequence. Generating Synthetic Yield Maps: Input Data
Our study site represents conditions near C6rdoba (Argentina). C6rdoba (310 29' S, 64' 13' W) lies on the northwestern edge of the Pampas region of Argentina. Mean annual rainfall between 1966 and 1995 was 844 mm, mostly concentrated in the spring and summer. The soil is a Typic haplustoll, a deep silty loam with high water holding capacity and no limitations to drainage (Dardanelli et al., 1997).
However, soils in the region have poor structural stability (Chagas et al., 1995) and are thus subject to crusting and high runoff during the highintensity thunderstorms characteristic of the summer. The soil properties adopted for the simulations are shown in Table 21. These parameters were derived from expert opinion and field measurements in a microwatershed in the location of interest (H. Apezteguia, Pers. Comm.). Note how there is a small variation of the parameters along the toposequence, responding to a somewhat greater presence of clay particles downslope. We used CROPGRO genetic coefficients corresponding to varieties that have been used widely in the region: Asgrow 5406 (S. Meira and E. Guevara, Pers. Comm.). We used a planting date of November 10, the modal date for the region (J. Dardanelli, Pers. Comm.).
Soil water content at planting depends on factors such as tillage, weed
management, the weather in the months leading up to planting, and the water extraction pattern of the previous crop. Soil water can strongly influence final crop yield, especially in areas like C6rdoba, where drought periods are frequent during the growing season. We simulated the interannual variability of initial soil water content by sampling from a distribution of 2970 synthetic water content values at planting simulated for the region by
21
Ferreyra et al. (200 1). The mean value was approximately 100 mm of available water,
representative of field measurements in the region (Ferreyra, 1998).
Table 21. Soil parameters used for the spatiallycoupled crop model.
Cell 1 2 3 4 5 6 7 8 9 10
Deptha (cm) 210 210 210 210 210 210 210 210 210 210
SLDR' 0.60 0.60 0.59 0.58 0.57 0.56 0.55 0.54 0.53 0.52
CN2c 93.00 93.30 94.00 94.70 95.00 95.00 95.00 94.88 94.76 94.65
KSA d (cm/day) 4 4 4.22 4.22 4.22 4.22 4.22 4.21 4 3.72
LLOP (cm3/cm3) 0.122 0.122 0.122 0.122 0.122 0.122 0.122 0.126 0.13 0.134 LL02e (cm3/cm3) 0.113 0.113 0.113 0.113 0.113 0.113 0.113 0.117 0.121 0.125 LL03e(cm3/cm3) 0.103 0.103 0.103 0.103 0.103 0.103 0.103 0.107 0.111 0.115 LL04e (cm3/cm3) 0.097 0.097 0.097 0.097 0.097 0.097 0.097 0.101 0.105 0.109 LL05 (cm3/cm3) 0.097 0.097 0.097 0.097 0.097 0.097 0.097 0.101 0.105 0.109 LL06e(cm3/cm3) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111 LL07e (cm3/cm3) 0.103 0.103 0.103 0.103 0.103 0.103 0.103 0.107 0.111 0.115 LL08e(cm3/cm3) 0.101 0.101 0.101 0.101 0.101 0.101 0.101 0.105 0.109 0.113 LL09e (cm3/cm3) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111 LLJOe(cm3/cm3) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111
DULOIf(cm3/cm3) 0.321 0.321 0.321 0.321 0.321 0.321 0.321 0.325 0.329 0.333 DUL02f(cm3/cm3) 0.294 0.294 0.294 0.294 0.294 0.294 0.294 0.298 0.302 0.306 DUL03 (cm3/cm3) 0.267 0.267 0.267 0.267 0.267 0.267 0.267 0.271 0.275 0.279 DUL04f(cm3/cm3) 0.250 0.250 0.250 0.250 0.250 0.250 0.250 0.254 0.258 0.262 DULO5f(cm3/cm3) 0.247 0.247 0.247 0.247 0.247 0.247 0.247 0.251 0.255 0.259 DUL06f(cm3/cm3) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DULO7f(cm3/cm3) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DULO8F(cm3/cm3) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DUL09 (cm3/cm3) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DULIOf(cm3/cm3) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257
SATOP (cm3/cm3) 0.488 0.488 0.488 0.488 0.488 0.488 0.488 0.492 0.496 0.500 SATO21 (cm3/cm3) 0.487 0.487 0.477 0.477 0.477 0.477 0.477 0.477 0.487 0.497 SATQ3O(cm3/cm3) 0.476 0.476 0.466 0.466 0.466 0.466 0.466 0.466 0.476 0.486 SA T041 (cm3/cm3) 0.431 0.431 0.421 0.421 0.421 0.421 0.421 0.421 0.431 0.441 SAT05'(cm3/cm3) 0.403 0.403 0.393 0.393 0.393 0.393 0.393 0.393 0.403 0.413 SA T061 (cm3/cm3) 0.386 0.386 0.376 0.376 0.376 0.376 0.376 0.376 0.386 0.396 SATO 7 (cm3/cm3) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 SAT081 (cM3/cM 3) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 SA T09(cm3/cm3) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 SAT1O'(cm 3/cM3) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395
a: Maximum rooting depth of the soil profile. b: Soil drainage rate. Controls the rate at which a saturated layer drains to its DUL. c: nominal seasonlong CN value used in the SCS runoff curve number method. d: Saturated hydraulic conductivity.
e: Lower limit of soil water holding capacity for the ten soil layers. f: Drained upper limit of soil water holding capacity for the ten soil layers. g: Saturation soil water content for the ten soil layers.
22
Estimating Soil Parameters Using Inverse Modeling
We estimated three soil parameters (detailed below) using IM, and compared the estimates with the known parameter values (Table 21). We traversed the toposequence downhill, estimating soil parameters for each cell. We searched exhaustively over the 3dimensional parameter space of each cell, as per Irmak et al. (2001). As in that study, each cell's optimal parameter combination was the one that minimized an objective function defined as the root mean squared error of that cell's yield prediction over several years. We detail the number of years and their selection below.
We varied three parameters: the nominal seasonlong CN value used in the SCS runoff curve number method (USDA SCS, 1972), CN2; the saturated hydraulic conductivity of the bottom soil layer, KSA T; and the fraction of nominal maximum available water, FA W. The latter was used to modify the soil water holding characteristics of the whole profile using only one parameter: we defined FA Was the ratio between each soil layer's estimated maximum available water and the true maximum available water for that layer. The maximum available water is defined as (DUL  LL), where DUL and LL are the drained upper limit, and lower limit of soil water holding capacity, respectively (Ritchie, 1981). We kept the LL of each soil layer at its true value, and modified the DUL according to the FA W value (FA W = 1 makes the DUL take its real value, FA W= 0.5 takes DUL halfway between the real value of DUL and the LL, etc.)
We classified weather years according to water availability during the season, expressed as the sum of initial soil water content and rainfall during the season. We called this variable TSW (total seasonal water), and used it to rank the 30 available weather years. We sampled four years from each tercile of the TSW distribution: four
23
"dry" years, four "normal" years, and four "wet" years, and used them to define parameterization cases.
We defined three different cases based on the TSW of the crop years used for parameterization. The first case was an unbiased benchmark, consisting of two years from each of the three TSW terciles, for a total of six years. The second case was biased toward "dry" years, and consisted of four "dry" years and two "normal" years. The third case was biased toward "wet" years, and consisted of four "wet" and two "normal" years. We chose to use six years of weather for each case, 23 times the number used for other recent studies (Batchelor et al., 2002; Irmak et al., 2001; Paz et al., 1998), to minimize the possibility of overfitting.
Figure 22 shows the weather years chosen for the three weather cases, and the
TSW of each. Note how the total TSW range exceeds 400 mm, and how the unbiased case shares three years with each of the other cases.
86 78 74 84 83 80
Wetbiased 00 A A AA
a)
Cn
o 67 90 85 86 84 83
Unbiased * * 0 0 A A
a)
C
73 67 90 70 91 85
Drybiased 00 0 0 0
500 600 700 800 900 1000
TSW (Soil water at planting + crop season rainfall, in mm)
Figure 22. Parameterization weather cases. Each row of points describes a weather case.
The filled circles are years from the lower TSW tercile, the open squares are
from the middle TSWtercile, and the filled triangles are from the upper tercile.
The label on each point shows the year, e.g. 85 corresponds to 1985.
24
An important source of crop modeling error may arise from lack of knowledge of initial soil water conditions. To explore this aspect we defined two initial condition cases: having perfect knowledge about the initial soil water conditions, and having no knowledge. In the latter we used the median (over the 30 available water years) soil water content value, approximately 100 mm.
In summary, there were three sources of uncertainty in parameter estimation: " The imperfect crop model when using the uncoupled model. " Biased weather cases in the IM process.
* The lack of knowledge about initial soil water conditions.
Combining the possible states of these sources of uncertainty led to 12 distinct IM parameter estimation scenarios, defined by 3 weather cases (benchmark, drybiased, wetbiased) x 2 models (spatiallycoupled, uncoupled) x 2 initial condition cases (initial conditions known or unknown). One of these scenarios, the one using the benchmark weather case, the spatiallycoupled model, and known initial conditions, was expected to reproduce the real soil parameters most closely. Evaluating With Independent Data
After estimating soil parameters for the 12 IM scenarios, we tested how well the corresponding parameter sets estimated yields for the 18 weather years not involved in each scenario's parameter estimation process. We tried this in two ways: having perfect knowledge about the initial conditions and with no knowledge thereof, similarly to the cases described above. We stratified the results by each year's TSWtercile. We did this to show whether the spatiallycoupled and uncoupled models' predictive performance varied according to the relationship between the weather case used for parameter estimation and the weather (as described by the TSW tercile) used for evaluation.
25
Results and Discussion
Exploring Similar Behavior Between SpatiallyCoupled and Uncoupled Models
The conditions under which the spatiallycoupled and uncoupled models can
behave similarly can be analyzed using the top two cells of the toposequence. Equation 213 describes runoff for an arbitrary cell in the toposequence of the spatiallycoupled model, and the particular case of the topmost cell, which receives no runon, can be described using Equation 29. Replacing the expression of the runon entering cell 2 with Equation 29, i.e. the runoff from cell 1, and assuming that rainfall is constant throughout the toposequence and greater than the greater Ia of the two cells, runoff from the second cell can be expressed as follows:
S+ (P 0.2  S,)2 0.2S
R + (214)
P+(P  0.2 S, )'
(P + 0.8 . , ) +
where S and S2 are the retention parameters for the first (topmost) and second cells, respectively.
Equation 29 describes runoff in any cell of the uncoupled model. In order for the uncoupled model to substitute for the spatiallycoupled model, then for any realistic values of S and S2 in the spatiallycoupled model there should exist a value SEQ2 and its corresponding curve number CNEQ2 that predict, using the uncoupled model, the same input of water into the soil of cell 2, I + Ia, as that in cell 2 of the spatiallycoupled model. This should be valid for any realistic environmental conditions, i.e. rainfall. Based on Equation 211, and assuming that P + R, > 0.2S2 and P > 0.2SEQ2, the following should be true:
26
P+ (P  0.2S()  0.2S, 2 (215)
P+(P 0.2 SJ2 (P+ 0.8S,1) )02 PP0.2SEQ2k 2 )
(P+0.8s,) (P 0.2.S,)2 +P+0.8.SEQ.2
P+ 0.  S, 0.8 S2
We solved Equation 215 for the SEQ2 of the simpler problem in which S = S2 = S, i.e. CN = CN2 = CN, and obtained the following solutions:
8S'+ 290PS2 300P2S+1500P3 + 2.8284 8S' 545PS2 150P2S2250P3(S5P)2
E2A 2(50P2 + 30PS +17S2)
(216a)
8S3 +290PS 2 300P2S+1500P3 2.8284v8S' 545PS 2 150P2S 2250P'(S 5P)2
E2 =2(50P2 +30PS+17S2)
(216b)
Of the two solutions, only Equation 216b is valid. Given that both solutions
produce the same water input results, the righthand side of Equation 215 evaluates to the same numerical value for both solutions; however, the P  0.2SEQ2 term is negative for Equation 216a. Although it produces the same value as Equation 216b when squared, it does not have a physical meaning. As shown in Equation 29, runoff should be
0 when P . '0. 2SEQ2.
Figure 23 shows the values of CNEQ2 corresponding to the SEQ2 of Equation 216b, for different combinations of P and CN. Note how CNEQ2 initially decreases with increasing rainfall, reflecting the additional contribution of runon to infiltration through a smaller CN. For greater values of P, CNEQ2 increases asymptotically toward CN, and this trend begins for lower values of P as CN increases. This behavior can be understood by differentiating Equation 214 with respect to P. Defining the auxiliary terms shown by Equation 217, the derivative results in Equation 218 as follows:
27
P 0.2S (P 0.2S, )2 =P +C 0.2S
B= ',8S C=( O2i2 D=P O2l (217)
P+P0.8S P+0.8S] P+C+0.8S
8R2 [1+2BB2].[2DD 2 (218)
ap
Equation 218 tends to 1 as P tends to infinity, and as S tends to 0, i.e. as CN tends to 100. If the derivative tends to 1, then CNEQ2 will tend to CN because the effect of any runon from upslope (and thus, the effect of a spatiallycoupled model) become irrelevant as the additional water is fully lost to runoff.
Equation 215 and Figure 23 relate to our objective of determining under which conditions, if any, the spatiallycoupled and uncoupled model might produce similar results. The only case for which CNEQ2 remains constant for different rainfall amounts is CN = CN2 = 100 (not shown in Figure 23), which is useless in an agricultural environment because it is associated with zero infiltration, as shown by Equation 27 for S = 0. In more practical scenarios, it would be impossible to exactly reproduce the water infiltration regime of a spatiallycoupled model using an uncoupled model; a crop season includes many rainfall events, each with its own rainfall amount, and the number of storms and rainfall amounts varies from year to year.
The relevance in terms of crop yield of this waterspecific conclusion will vary from year to year and will depend on the CN in question. For very wet years, as well as for soils associated with high curve numbers or intense, convective storms during the cropping season, it may be irrelevant. Contrarily, it may be very important in waterlimited environments with intermediate infiltration and storms with moderate rainfall amounts. Furthermore, it is possible that other errors such as initial conditions or weather
28
biases might be more relevant to yield than model errors due to uncoupling. These aspects are further explored below.
95
90 CN1 = CN2 =90
SO85
t80
a) ON1 CN2 =80
75
U ON1 = N2 =75
UJ70
65 ON1 = N 70
60   
0 10 20 30 40 50 60 70 80 90 100 Rainfall (mm)
Figure 23. Values of CNEQ2 (curve number in toposequence cell 2 of the uncoupled
model that produces equivalent infiltration to cell 2 of the spatiallycoupled
model) for different combinations of rainfall and curve number values in cells
1 and 2 (CN, CN2) of the spatiallycoupled model.
Simulated Yield Profiles
Figure 24 shows a histogram of the initial soil water content used in this study. Plant extractable soil water to a depth of 210 cm ranged from slightly over 30 mm to slightly under 190 mm, reflecting the effects of interannual weather variability during antecedent crop cycles. The former case could be expected following a crop that extracts water from very deep in the profile, such as sunflower, and a subsequent dry winter typical for the region; the latter can reflect conditions after a shortseason maize (which can leave water deep in the profile), followed by good spring rains prior to planting.
29
30'
25 20: 15 10
5
0
73% 63% F
C"
C
0
ci)
n
0
01
87%
37
02
93%
ET
0 02
100% 100%
E7
0 0)
A
Figure 24. Histogram of PESW in available initial condition set.
I I
1 2 3 4 5 6 7 8 9 10
Coupled model
. 12 0 0 1 0 **
1 2 3 4 5 6 7 8 9 10
Uncoupled model
Cell number
Figure 25. Simulated yield profiles made with the spatiallycoupled (left) and uncoupled
(right) models. The results of the uncoupled model show the effect of spatial
variability of soil properties; the spatiallycoupled model results show the
additional effect of spatial water movement.
43%
33% 10%
o 0 0
0
0)
Initial PESW (mm)
0
CO
V
a
0
0
0 U.'
0) 0 )
6000 5000
4000 3000
2000 1000
0
6000 5000
4000
_ 3000
2000 1000
0
6000 Za 5000
4000 S3000
S2000 1000
0
30
Figure 25 shows simulated yield for the spatiallycoupled (left column) and
uncoupled (right column) models using the soil properties shown in Table 21. Each row corresponds to a tercile of the TSW distribution; the dry tercile is on the top, the wet tercile on the bottom. Each boxplot shows the results of 10 years of simulations. The 10 cells shown per boxplot are arranged in progressively lower landscape positions from left to right. The results of the uncoupled model show the influence of the spatial variability of soil properties shown in Table 21. The spatiallycoupled model results additionally include the effects of spatial water movement. Note how the uncoupled model results change from cell to cell, especially in the middle and wet weather terciles. This is primarily a result of spatial CN2 variability; despite its apparently small variation throughout the toposequence, from 93 to 95, infiltration is greatly affected by these small variations in the upper CN2 range i.e. lower S range. This becomes clear by differentiating Equation 214 with respect to CNI or CN2 (not shown).
Central Argentina is a waterlimited environment for soybean growth; note how the yield at the top of the slope has a median value of slightly over 2000 kg/ha, and increases significantly downslope given that the lower cells have more water available from runon. The effect is less noticeable in dry years because there is less rain and consequently, less runoff / runon.
Inverse Modeling: Parameter Estimation Error
Figures 26 and 27 show the results of using IM and a search algorithm to find the parameter combination that best fits the "observed" yield patterns of Figure 25. Only CN2 and FA W results are shown because large changes in KSA T did not produce changes in yield. This happens because precipitation rarely exceeds evapotranspiration during the growing season in C6rdoba. When it does, the high water holding capacity of the Entic
31
Haplustoll makes it highly improbable that the drainage implementation of the
CROPGRO water balance module, which only begins moving water out of a layer when
the layer's water content exceeds its DUL, could drain beyond a depth of 210 cm. This is
consistent with the lack of a limiting horizon in Entic Haplustolls.
Perfect C knowledge Only know IC average
96 96
92 0 0 o 92 0 o
M9 2 0 o
988 88
Z Z
841
0 __
()
Q 96
92, o
Z' 84 r 80
1 2
_j
0
3 4 5 6 7 8 9 10
CELL
0 0 0
0
0
3 4 5 6 7 8 9 10
CELL
0 8 LI :
96,
S 92
cc
r 84 06 L
1 2 3 4 5 6 7 8
CELL
* 0 0 0 0
0
0
1 2 3 4 5 6 7 8
CELL
96 96
92 o 92 00
88 88 84 84
801 80
1 2 3 4 5 6 7 8 9 10 1 2 3
CELL
INITIAL CONDITION KNOWLEDGE CASE
0
0
4 5 6 7
CELL
o 0 0 8 9 10
Figure 26. CN2 values obtained by IM for the spatiallycoupled (filled circles) and
uncoupled (open circles) models. The left column shows the case of perfect
knowledge of initial conditions, and the right shows the case in which only the
average initial soil water content is known. The three rows correspond to the three calibration weather cases. The filled circles in the left column coincide
with the actual parameter values.
There are twelve scenarios defined by the combinations of model, knowledge of
initial conditions, and IM weather case. In all the scenarios the results provided by the
spatiallycoupled and uncoupled models coincide in the first, uppermost cell of the slope.
9 10
* 4
o o 9 10
32
This occurs because the uppermost cell in the spatiallycoupled model receives no runon
and thus behaves identically to the corresponding cell of the uncoupled model.
Perfect IC knowledge Only know C average
12 12
S 0.6 06
04 _2 3 4__ 04 0 0 7 0
104 0 4   , ,    +1
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
CELL CELL
c14
L 12 12
___0__ __ 1.0
0.6 1 06 0
Z__ 04_ D 0 41 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
CELL CELL
o 14
S 1.0
080
Z 0.6 fl 06 0
0.044 4 5 8 9 1
1 2. 3 4 5 6 7 6 9 10 0.4 __ 2_   7 8 9 1
CELL CELL
INITIAL CONDITION KNOWLEDGE CASE
Figure 27. FAW values obtained by IM for the spatiallycoupled (filled circles) and
uncoupled (open circles) models. The left column shows the case of perfect
knowledge of initial conditions, and the right shows the case in which only the
average initial soil water content is known. The three rows correspond to the
three IM weather cases.
The spatiallycoupled model faithfully reproduced its own parameters at all
landscape positions when the initial conditions were known, as shown by the CN2 values
(filled circles of the left column of Figure 26) being equal to those of Table 21, and by
the estimated FA W being 1 for all cells. However, the spatiallycoupled model had some
difficulty when initial conditions were unknown, especially with the FA W parameter
33
calibrated in wetter years and wetter (downslope) cells in which water limitation was not a major problem.
We hypothesized that the erratic FA W estimates under unknown initial conditions corresponded to a lower sensitivity of the FA W parameter relative to CN2, as expressed by comparing the sensitivity coefficient (Hamby, 1994) of each parameter across different years and landscape positions. Since the results of a sensitivity analysis may be greatly dependent on the chosen base case (Atherton et al., 1975; Gardner et al., 1981), we used the base case defined by the parameter values shown in Table 21, so the results represent the behavior of the parameters around the optimal parameter estimate. The coefficient is defined as:
0= AY X, (219)
AX, Y
where X is the base case value of the ith parameter, AX is an deviation of the parameter with respect to its base case, Y is the base case yield value, and AY is the yield deviation corresponding to the AX deviation.
Figures 28 and 29 show sensitivity results at different landscape positions for
CN2 and FA W, respectively, in the spatiallycoupled and uncoupled models. Sensitivity is weather dependent, somewhat soil property dependent (see right column of Figure 28), and landscape position dependent. Note how the sensitivity of CN2 is two orders of magnitude greater than that of FA W. The high CN2 sensitivity was explained previously; the reasons for low FA W sensitivity are linked to a high water holding capacity of the soil; the same happens for KSAT. Ferreyra et al. (2001) noted how when using the CERES model (Ritchie et al., 1998) in the C6rdoba region, a large entry of water into the lower soil layers happened very infrequently. This occurs in CERES and CROPGRO
34
because these models' water balance simulation does not move water downward from a layer until its soil water content has surpassed its drained upper limit; which is difficult in soils with a high water holding capacity and high runoff. Consequently, FA W can affect neither the total amount of water available to the crop, nor the timing of its availability around a base case in which FA W = 1.
1 2 3 4 5 6 7 8 9 10
Coupled model
Cell
1 2 3 4 5 6 7 8 9 10
Uncoupled model
Figure 28. CN2 sensitivity coefficient for different landscape positions and models. The
points and whiskers show means and standard deviations of coefficients
calculated for the 6 years of the unbiased scenario of Figure 22.
1 2 3 4 5 6 7 8 9 10
Coupled model
Cell
1 2 3 4 5 6 7 8 9 10
Uncoupled model
Figure 29. FA W sensitivity coefficient for different landscape positions and models. The
points and whiskers show means and standard deviations of coefficients
calculated for the 6 years of the unbiased scenario of Figure 22.
20
0
20
40
60
80
100
C
L)
0 C.)
(D,
C
0.50
0.25
C
a)
0.00
o 0.25
0.50
0.75
)
1.00
. "+++++44
35
The uncoupled model tried to compensate its lack of runon contributions by
estimating progressively lower CN2 values downhill (Figure 26). According to Equation
29, this would result in less runoff losses, and hence, more infiltration. However, the
uncoupled model's CN2 compensation attempt was only partially successful (Figure
210); the error increased downhill. As demonstrated above, although CN2 could
conceivably be modified for each cell in the uncoupled model so the results of Equations
29 and 213 coincide for a given storm (or the yields of the two models coincide for a
given year), the nonlinearity of Equation 213 with respect to P for different CN values
makes it practically impossible for a single set of CN2 values in the uncoupled model to
reproduce the results of the spatiallycoupled model over several years.
Perfect IC knowledge Only know IC average
51000
400
800
CL300 0 L600
200 10
4X 4001
100 00
S200 400 ,
100 200L
U, . gi I 0_I1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
CELL CELL
10500
iE400 800
300 63:  60
2000 ' 400
1008 0 200 0
: 5 0 _  _+   +   + 0 ___ ___ __0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
CELL CELL
000
500 8) j 10001
T C T 400N E
a).00 0 1o
8)3Q 400 0
0U 0 200
1000
00 4j 0~ . 200
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
CELL CELL
INMlAL CONDMlON KNOWLEDGE CASE
Figure 210. Yield RMSE for the twelve parameterization scenarios (6 years per
scenario). Filled circles represent the spatiallycoupled model; open circles
represent the uncoupled model.
36
The spatiallycoupled model behaved differently: RMSE was 0 for all cells when initial conditions were known. This was expected because the IM algorithm converged to the original parameters. However, RMSE increased with uncertain initial conditions, especially for the wetter cases.
These results suggest a very limited capacity of the uncoupled model to predict reality, especially when the weather years used for parameter estimation are similar. However, the spatiallycoupled model's error also increased under uncertain initial conditions. We explored this further with the evaluation data set. Evaluating With Independent Data
Figures 211 to 214 show the RMSE for several different evaluation scenarios.
The 18 evaluation years were split by TSW tercile, and the terciles' results were shown in separate columns. Each point represents 6 years.
These results only correspond to scenarios having uncertain initial conditions
(i.e., the values shown in the right column of Figures 26 and 27) since measuring the initial conditions at the parameter estimation phase is currently not practical in precision agriculture modeling applications (R. Murdock, Pers. Comm.). For reference, however, the RMSE of the IM scenario using the spatiallycoupled model and full knowledge of initial conditions, evaluated with full knowledge of initial conditions, is zero for the ten cells in all combinations of IM weather cases and evaluation TSW tercile.
When the evaluation initial conditions are known (Figure 211), the prediction RMSE of the spatiallycoupled model is minimum for the drybiased IM weather case, increasing towards the wetbiased IM case, especially for the cells at the bottom of the toposequence of the dry tercile. This happens because under very dry conditions, crop
37
yield responds strongly to changes in infiltration, initial conditions are less variable, and the CN2 parameter is consequently estimated more accurately.
Conversely, the wetbiased weather case (bottom right of Figures 26 and 27)
produces the poorest parameter estimates because water does not limit the crop's growth in some of the years, especially in the lower cells (see bottom left panel of Figure 25). The parameter estimation process thus fits parameters to explain the variability of initial conditions rather than the crop's response to weather; this results in spurious parameter values (bottom right panels of Figures 26 and 27).
The spatiallycoupled model's prediction error in evaluation simulations increases when the initial conditions are unknown (Figure 212). The increase is most noteworthy when using parameters obtained with the drybiased weather case. This is due to the impact of uncertainty in the knowledge of initial conditions, which explains why the runs corresponding to the central tercile of evaluation TSW are the most affected: in the case of the dry tercile variability of the initial conditions is lower; for the wet tercile, the impact of variability of initial conditions is lower.
Figure 213 shows evaluation results for the uncoupled model under perfect
knowledge of initial conditions. The patterns are similar to those shown in Figure 210, with error increasing downslope as the decreased curve number fails to properly capture the intraannual and interannual variability of spatial water movement of the spatial model. However, for the wetbiased IM case, the prediction error downslope is actually less than in the spatiallycoupled model (Figure 211), because errors in parameter estimation cannot compound downslope in the uncoupled model.
38
Coupled model, perfect evaluation IC knowledge
2500
2000 1500 1000 500
0
2500
2000 1500 1000 500
0 r
2500
2000 1500 1000500
0
1 2 3 4 5 6 7 8 9 10
Ewiluation TSW tercile: Dry
1 2 3 4 5 6 7 8 9 10 Evaluation TSW tercile: Medium
Cell
1 2 3 4 5 6 7 8 9 10 Emeluation TSW tercile Wet
Figure 211. Evaluation yield RMSE for the spatiallycoupled model when IM initial
conditions are unknown and evaluation initial conditions are known. Each
point represents six years.
Coupled model, only know average evaluation ICs
CD 0)
w U,
a,
(DC
2500
2000 1500 1000 500
0
2500
2000 1500 1000 500
0~
S0 0 c p 0 0 0 0
2500
2000
S1500
161 1000
500
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Eeluation TSW tercile: Dry Exeluation TSW tercile: Medium Evaluation TSW tercile: Wet Cell
Figure 212. Evaluation yield RMSE for the spatiallycoupled model when both the IM
and evaluation initial conditions are unknown. Each point represents six years.
(U 0)
w
U)
a,
a,
0
 U
39
Uncoupled model, perfect evaluation IC knowledge
a) a)
2500
2000 1500 1000 500
0*
2500
2000 1500 1000 500
0*
2500
2000 1500 1000 500
0
1 2 3 4 5 6 7 8 9 10
E\eluation TSW tercile: Dry
1 2 3 4 5 6 7 8 9 10 Emeluation TSW tercile: Medium
Cell
Figure 213. Evaluation yield RMSE for the uncoupled model when IM initial conditions
are unknown and evaluation initial conditions are known. Each point
represents six years.
Uncoupled model, only know average evaluation ICs
2500
2000 1500 1000 500
0
2500
2000 1500 1000 500
0
2500
2000 1500 1000 500
0
1 2 3 4 5 6 7 8 9 10
E'oluation TSW tercile: Dry
1 2 3 4 5 6 7 8 9 10 Emeluation TSW tercile: Medium
Cell
1 2 3 4 5 6 7 8 9 10 Equation TSW tercile: Wet
Figure 214. Evaluation yield RMSE for the uncoupled model when both the IM and
evaluation initial conditions are unknown. Each point represents six years.
1 2 3 4 5 6 7 8 9 10 Emaluation TSW tercile: Wet
Cu U)
Cu 0)
w 00
V
0)
. .0
40
Figure 214 (uncoupled model, unknown initial conditions) shows results similar to those of Figure 213, except for the increased prediction RMSE of the drybiased scenarios due to the uncertainty in initial conditions already mentioned for Figure 212. Ultimately, the result shown for the spatiallycoupled model (Figure 212) and the uncoupled model (Figure 214) are very similar, suggesting that in conditions of uncertain initial conditions and biased IM weather cases, errors in the spatial coupling of the model are not the primary cause of yield prediction error.
The spatiallycoupled model has several caveats. It does not consider subsurface flow; it calculates runoff using the SCS method (as does the uncoupled model) that has little or no physical basis (Boughton, 1989); it assumes that runoff leaves the field in one day; it does not consider the increasing complexity of the runoff hydrograph downslope as runoff contributions from uphill cells arrive with different time lags; also, since it adds all the runoff from a cell to the precipitation of its immediate downslope neighbor, this implies that it is assumed that runoff will travel downslope in sheet form. However, these simplifications do not negate the effects of the three aforementioned sources of error, and thus do not detract from the central findings of this study.
Conclusions
The literature to date on the inverse modeling based parameterization of spatial crop models is dominated by uncoupled models. We studied three possible sources of error for such models: model error from lack of spatial coupling and water transport among different landscape locations, parameter error from biased weather in the years of yield data used for the parameterization process, and errors due to lack of knowledge of initial soil water conditions. Each of these sources of error impacted spatiotemporal yield prediction capability.
41
With respect to model error, we showed analytical proof that the spatiotemporal
infiltration behavior of a spatiallycoupled water balance model cannot be reproduced by modifying the parameters of an uncoupled model. The corresponding yield prediction limitations of the uncoupled model were confirmed, using an example, both at the parameter estimation and evaluation stages.
In our example, however, parameter error due to weather biases and the error from lack of knowledge of initial conditions greatly impacted the predictive capability of the spatiallycoupled model, and had less effect on its uncoupled counterpart.
Based on our analysis we concluded that the use of spatiallycoupled crop models requires highquality data. Practical precision agriculture applications are characterized by uncertain initial conditions and the possibility that the weather used for calibration is not representative. Under these circumstances, the use of a spatiallycoupled model may not be justified, especially for low landscape positions.
CHAPTER 3
PLANNING CROP SCOUTING PATHS WITH OPTIMIZATION ALGORITHMS AND A SELFORGANIZING FEATURE MAP Introduction
In the context of decades of falling commodity prices, climate change, and
increasing environmental regulatory pressure, farmers need to sustain high crop yields and incomes year after year in order to survive. Effective risk mitigation requires that farmers make crop management decisions based on uptodate information.
Crop scouting is a datacollection activity that is used to support crop management decisions such as when to make insecticide, fungicide, and herbicide applications. A crop scout typically walks through a field to get a general impression of its state, occasionally stopping to make more detailed measurements. The kind of information collected by crop scouts depends on the crop in question and the decisions to be made, but may include qualitative and quantitative assessment of the presence of insects, diseases, weeds, and water stress.
The advent of precision agriculture (Pierce and Nowak, 1999) and precision
integrated pest management (Fleischer et al., 1999) has brought the possibility of sitespecific applications. It is increasingly common for farmers to selectively apply pesticides, fertilizer, lime, and other products to areas in which the application will maximize profit. This in turn has led to the need for sitespecific scouting.
Many farmers currently accumulate different types of spatial data in electronic format, using them in some form of geographical information system (GIS) to provide
42
43
information about the spatial variability of factors that affect their crops. The amount of information available varies with the time since the farmer's adoption of spatial technologies, as well as his/her level of investment, but may range from having only a field boundary to multiple years of yield data, electrical conductivity, elevation, and multiple soil test datasets (NRC, 1997).
Scouting can be integrated into a spatial data management system, using the scouting maps together with existing information in a spatial database to generate application maps (Nelson et al., 1999). Currently many crop scouts in the U.S. record their data on preprinted paper forms that are later completed and faxed to the farmer. A crop scout will typically service a number of growers, up to a total area of about 800010000 hectares, charging a fee per unit area. Their responsibility varies between growers, from making all prescriptions and supervising subsequent applications, to merely communicating their recommendations. Some scouts focus exclusively on insects; others also include diseases and weeds. The price per unit area may vary an order of magnitude depending on the level of services rendered.
A crop scout working in such a regime needs to optimize the use of his/her time;
spending too much time per unit area is uneconomical, and spending too little exposes the scout to making expensive errors. Moreover, if the intent is to describe spatial variability of the variable of interest for a precision agriculture / precision IPM application, the placement of the samples becomes especially relevant (Fleischer et al., 1999).
The path chosen to link sampling locations strongly influences the use of the
scout's time. An important condition to be met by this path is that it must be closed i.e. the scouting path must have the same starting and ending point. This condition eliminates
44
the downtime required for the scout to return to his/her vehicle from a distant position in the field.
In general, the search for a convenient scouting path can be imagined as the combination of two activities:
" Determining the sampling locations, and " Finding the shortest tour (closed path) linking all of the locations.
The optimal placement of sampling sites has been extensively treated in soil
science (McBratney and Webster, 1981; Burgess and Webster, 1984; van Groenigen et al. 1999; Ferreyra et al. 2002). Optimizing the path through a set of scouting sites has not been given much attention in the agricultural literature, but is equivalent to a classic problem in computer science: the Traveling Salesman Problem (TSP).
The goal of this study is to develop objective methods for solving the two points shown above, i.e., sample placement and scouting path construction, to build scouting maps. Two possible approaches to these problems are:
1. Sequential, in which the optimal sampling locations are determined first, followed by a search process to solve the associated TSP, and
2. Simultaneous, in which sampling points and the tour are developed simultaneously.
Our specific objectives were to apply both approaches in a representative case study and to compare their performance in terms of predictive error and practical applicability, using runtime as the criterion to assess the latter.
Theory
Sequential Approach: Sampling Locations
The search for an optimal sampling location network depends on various factors. Although methods exist for determining the optimal number of samples required to represent a data set with a given error level (Gath and Geva, 1989), in agricultural
45
practice the spatial sampling density is usually determined by the field's surface area and economical considerations. Moreover, there may be few layers of available data. An extreme scenario occurs when only the field boundary is available, digitized from a map or acquired via GPS. This datapoor scenario may be typical of many farming operations that are beginning to adopt information technology and precision farming.
In sitespecific agriculture, given a set of sampling locations across a field
(sampling scheme), it is desirable to spatially distribute the points in a way that allows the best prediction of data values at unsampled locations using the sampled data. This is an optimization problem.
Typically, optimization problems involve the search for an optimal combination of data (sampling locations, in our case) that minimize (or maximize) an objective function or OF (Winston, 1994). In the datapoor scenario described above, a suitable OF to minimize is the Minimization of the Mean of Shortest Distances (MMSD) criterion defined by van Groenigen and Stein (1998). The MMSD function is the expectation of the distance between an arbitrarily chosen point within the study region and the sampling location nearest to it. For large sampling regions e.g. an infinite plane, this criterion produces an equilateral triangle grid. The criterion can be expressed as follows (van Groenigen and Stein, 1998):
m d (X S) (31
$MMSDN=
J~ M
where S is the sampling scheme (set of sampling locations), M is the total number of evaluation points composing the field, xj is the jt evaluation point, and d(x,,s)is the distance between point xj and the nearest sampling point. It is assumed that the evaluation
46
points are distributed across the area of interest on a finely meshed grid (ten meters, for example).
The OF must be combined with a generation mechanism i.e. a method to search iteratively for progressively better solutions to the problem. A powerful such method is Simulated Annealing (Aarts and Korst, 1990), a combinatorial optimization algorithm that has been applied successfully to replace exhaustive searches in large problems (Kirkpatrick et al. 1983) and is insensitive to local optima in the OF, unlike more traditional methods such as gradient descent. Using simulated annealing, the sampling scheme is iteratively perturbed by moving a randomly selected point in the scheme to a new random location, keeping the new scheme if it improves on the previous value of the OF, and rejecting it with an increasingly higher probability if it does not improve the OF value. Van Groenigen and Stein (1998) developed a variant of this method, called Spatial Simulated Annealing, which differs from the above primarily in that the distance that a point can be moved during a perturbation also decreases as the algorithm progresses.
A sampling scheme for the datapoor scenario can made by coupling simulated
annealing with the MMSD criterion using minimal data: a field boundary to make a raster map of the field interior (the evaluation points).
A different criterion may be used when additional information is available, such as a semivariogram of the spatial random variable of interest. For example, if the scout's goal is to estimate crop yield throughout the field from values of yield (or a proxy such as number of grains per unit area) measured at the sampling locations, geostatistics can provide a principled optimal solution based on minimizing kriging variance (van Groenigen et al., 1999). The starting point is a spatial covariance model: a description of
47
how similarity among values of the variable sampled at different locations varies with the distance between the locations. Considering a stationary spatial random variable Z, its semivariance is defined as:
y(h) = I Var(Z(u + h) Z(u)) (32)
2
where u is a location in space and h is a given displacement away from it (Deutsch and Journel, 1992).
Ordinary Kriging (Goovaerts, 1997) is a popular method for spatial interpolation. For an arbitrary point u in the region of interest, the estimated value of the variable of interest Z is the weighted sum of the measured values of Z at the n sampling locations ui. Thus,
2(u) .=Z(u,) (33)
where ki are the weights, determined using the semivariogram of Z and assuming a constant, albeit unknown, expectation E[Z(u) = m .
The error or kriging variance (KV) for ordinary kriging is defined as
aK(u= y(u, u)+ (34)
where y is a Lagrange multiplier as described by Webster and Oliver (1990). Kriging variance depends on the sampling scheme geometry and on the semivariogram, but is independent of actual data values (Goovaerts, 1997). It is zero at the sampling locations, and increases away from them. It can be used in an objective function; for example, van Groenigen (2000) proposed a method in which the mean KV value over the field is minimized, and another that minimizes the maximum KV (MMKV).
48
Sequential Approach: the Traveling Salesman Problem (TSP)
Imagine a salesman who has to travel across a network of cities, and that the
distance between each pair of cities is known. The TSP consists of finding the shortest tour that will visit all the cities (once) and return to the starting point; it may seem simple, but no efficient solution to it is known. The TSP forms part of a family of problems known as NPcomplete (Cormen et al., 2001); the runtimes of known solutions to NPcomplete problems are exponential functions of the size of program input (the number of sampling locations, in this case). Thus, runtime increases dramatically with increasing input size. There is much ongoing research on the TSP, and numerous approximate solutions have been postulated for it (Golden et al., 1980). The Simultaneous Approach
The Kohonen selforganizing feature map, or SOFM (Kohonen, 1982), is a form of neural network that can be used to transform highdimensional signal pattern inputs (such as several layers of GIS data for an agricultural field) into a lowerdimensional representation such as a onedimensional scouting path. In a scouting problem, the input data are vectors corresponding to the nodes of a grid (for example, with tenmeter isometric spacing) overlaid onto the field. These vectors are multidimensional; each dimension is an attribute of the corresponding location, such as x, y, and in datarich scenarios, electroconductivity, elevation, slope, past yields maps, etc. The output is the scouting path: the sequence of (sampling) locations to visit. These locations exist in the highdimensional space, but are topologically ordered in a lowerdimensional form i.e. a onedimensional sequence (node 1, node 2, etc). Since the input vector attributes are expressed in different units, the data are usually normalized in order to make the distance calculations meaningful.
49
The SOFM is based on the idea of competitive learning. Each output node is
represented by a neuron (a vector having the dimensionality of the input data), and the neurons compete for the input data vectors. This competition is based on distance (measured in the highdimensional space) between the input data and the neurons. The algorithm is iteratively presented a vector, selected randomly from the input data. The distance between the vector and each of the neurons is evaluated, and the neuron that is nearest to the input vector is declared the winner. The winning neuron is subsequently rewarded by being moved towards the input vector. The neurons that are near the winning neuron (this nearness is measured in the low dimensional space) are also moved with it to some extent, depending on the value of a neighborhoodfunction.
Haykin (1994) defined the function thus: let dj,i denote the lateral distance of
neuron j from the winning neuron i, measured in the low dimensional output space (such that adjacent neurons would have a distance of 1). Let rji denote the value of the neighborhood function centered on the winning neuron i; its value is maximum for dji= 0, and must tend to zero as dj,i tends to infinity. A typical function used for this purpose is the following Gaussian:
7r], =exp d, (35)
"_ 2u(n)2
where a(n) is the effective width of the topological neighborhood after n iterations of the process. In each iteration the neurons are updated as follows: W ,(n + 1) = w , (n) + r/(n),Trg. (n)(x(n)  w ,(n)) (3 6)
50
where wj(n) is the state of neuron j after n iterations, il is the learning rate, and i(x) is the winning neuron corresponding to input vector x. This iterative update process is repeated thousands of times.
The learningrate parameter i used to update the weight vectors and the effective width of the neighborhood function a should decrease as the algorithm progresses, similar to the cooling process described earlier for simulated annealing. Ritter et al. (1992) proposed:
c(n)= o exp  n_ (37)
7(n)= qo expf _ n (38)
where ao and 11o are the function values when n = 0, and T, T are time constants that determine the rate of decay.
The topological ordering property results from the update Equation 36, which forces the winning neuron to move toward the input vector x. It also moves the weight vectors of the nearby neurons contained within the neighborhood function. Thus, the onedimensional incarnation of the SOFM can be visualized as an elastic band containing a sequence of nodes that exist in the highdimensional input space (Haykin, 1994).
Materials and Methods
Case Study Problem, Location, and Dataset
The case study is a yield estimation problem in a datapoor scenario (as previously defined). This is not an obvious application of scouting, but it can be a valuable decisionsupport tool for farmers negotiating futures contracts. In such a situation, yield estimates would be made prior to crop maturity using a proxy such as grain number per unit area.
51
We used two versions of the sequential approach and one of the simultaneous approach (a total of three different methods, henceforth called "cases") to propose sampling schemes and tours at nine different sampling densities ranging from 0.57 samples / ha to 10 samples / ha.
Our study area was the McCallon 1 field near Murray, KY, USA (360 32' N,
880 27' W, elevation 222 m). Its surface area is 8.33 ha (22.1 acres). Soils in McCallon 1 are predominantly somewhat poorly drained Calloway soils (Glossaquic Fragiudalfs) and poorly drained Henry (Typic Fragiaqualfs) soils. Both have a fragipan. Available data were the field boundary, maize (Zea mays L.) yield maps for 1999 and 2001, and an additional yield map taken in the 1999 harvest at a nearby field called Suggs 4. The latter yield map was used to provide a semivariogram; we assumed its spatial covariance structure could be representative of the crop in other years and in similar fields such as McCallon 1, and thus be useable to drive the MMKV criterion in the absence of actual previous McCallon 1 yield data.
For each sampling scheme, maize yield data were obtained by averaging all the raw yield map data available within a 5 m radius of each sampling location. The resulting data were used to estimate the yield throughout the field on a 10 m grid of evaluation points, using ordinary point kriging as detailed further below. The Sequential Approach
We used the SANOS program (van Groenigen and Stein, 1998) to determine the optimal sampling locations using both the MMSD and minimal maximum KV (MMKV) criteria. SANOS is a versatile program that can design sampling schemes in complex domains. It can accommodate a finite, discontinuous region composed of arbitrarily shaped subregions, and it can integrate existing sampling locations into its optimization.
52
The runtime of the spatial simulated annealing algorithm in SANOS is userspecified, but an optimal value can be calculated within SANOS using the optimal initial transition probability estimation method proposed by Aarts and Korst (1990).
For the TSP we used the 3Opt algorithm as implemented by Syslo et al. (1983). Although it is not guaranteed to produce the optimal solution to any given TSP, this algorithm produces highquality approximate solutions very rapidly, as shown by empirical studies such as that of Golden et al. (1980).
During the remainder of this study, we refer to two sequential cases: MMSD+TSP (using SANOS with the MMSD criterion and using the 3Opt TSP solution), and MMKV+TSP (as above but using SANOS with the MMKV criterion). In both cases we used the field boundary to build a domain for SANOS, used SANOS to propose a sampling scheme, and then used the 3Opt code to propose the scouting tour. We repeated the process for several sampling densities.
With respect to the spatial interpolation step, we used the Suggs 4 semivariogram for kriging in the MMKV+TSP case, and a linear, zeronugget semivariogram (typically the default in many geostatistical packages) in the MMSD+TSP case. The Simultaneous Approach
The third case under study was a variant of a 1 D SOFM. We altered the
topological neighborhood function to force the SOFM to close on itself i.e. make a closed tour, calculating the distance djj as follows:
1 if abs(ji) > int(N / 2)
2 then d = N  abs(ji)
3 else d = abs(ji)
4 return d
53
where N is the total number of neurons. Thus, if i, j are the first and last neurons, djj is now 1 instead of N1.
The field boundary was converted into 10meter raster data corresponding to the locations inside the field boundary. These 833 xy pairs were used as input to the SOFM algorithm. Considering the possibility of the results being parameterdependent, we made 21 realizations of the SOFM, keeping three parameters constant (lIo = 1, to = 20000, T 20000) and varying co from 0.5 to 1.0 in steps of 0.025, In order to represent typical SOFM solutions in the subsequent evaluation of the three cases, we picked the SOFM realization having the median MMSD value.
For the spatial interpolation step we used a linear semivariogram and ordinary kriging as in the MMSD+TSP case.
Evaluation of Results
The three cases applied at 9 different sampling densities were evaluated according to a) MMSD values, b) capability of predicting spatial yield variability as expressed by the root mean squared error (RMSE) between observed and estimated yield maps over the 10meter grid of (833) evaluation points, c) predictive capability: relative error in predicting the field average yield calculated from the 833 evaluation points, and d) tour length. The observed values mentioned in point (b) were obtained by fitting semivariograms to the observed yield map data points and then using ordinary kriging to resample the observed yield map data onto the 833point grid.
To explore the quality of the TSP solutions, we asked an expert crop consultant to plot what he felt was the shortest tour through the different sampling schemes, and compared his answers with the solutions provided by the 3Opt algorithm.
54
A simulated annealing algorithm can be run for an arbitrary duration, or until a
timeinvariant solution is found. SANOS specifies runtime as proposed by Aarts and
Korst (1990), assuring a slow "cooling" of the system and maximum rejection of local
minima. As applied in SANOS using parameter values suggested by van Groenigen et al.
(1999), the algorithm's runtime is about 2.5 and 4 hours for the MMSD and MMKV criteria, respectively, on a Pentium PC with a 1.3 GHz processor speed. This is excessively long for practical applications, so we explored how solutions changed if the TSP cases' runtime was decreased to a small fraction of the optimum (1 min).
Finally, we compared the semivariograms from the 1999 and 2001 observed yield maps in the McCallon 1 field with the one used as a proxy from the nearby Suggs 4 field.
Results and Discussion
Sampling Location Layout
Figures 31A, 31B and 31C show the sampling locations and scouting tours obtained for a sampling density of 2.5 samples/ha (1/ac) by the MMSD+TSP, MMKV+TSP, and SOFM cases, respectively. The points are spread quite evenly over the field, although MMKV+TSP allocates more points to the periphery than the other methods. The optimal tours vary greatly between cases, and are not easily predictable a priori by observing the sampling schemes.
Some spatial datacollection problems require sampling schemes with unevenly distributed sampling locations. These applications generally involve prior knowledge, such as the areas of the field more prone to fungal diseases. Another such application involves planning a soil sampling strategy that ensures all soil mapping units get sampled, even if they are very small. These kinds of stratified sampling are easily implemented in the MMSDTSP and MMKVTSP cases by modifying the generation
55
mechanism to guarantee that sampling locations assigned a priori to a mapping unit remain within it.
However, implementing stratified sampling in the SOFM case is more complex due to its simultaneous sample placement and path determination. An a priori assignment of sampling locations to polygons in the SOFM could result in complicated, suboptimal path shapes, unless the neighborhood functions and other algorithm parameters could be updated dynamically during operation, as in the Kalmanfilterdriven AutoSOFM algorithm (Haese and Goodhill, 2001).
Predictive Accuracy
Figure 32A shows how MMSD varied with sampling density in the three cases. The MMSD+TSP algorithm consistently produced the lowest MMSD values, although there was little variability among cases. The better performance of the MMSD+TSP case was expected, since MMSD is what was being optimized in it.
Observed mean corn yields and standard deviations in McCallon I were 8,406 and 1,222 kg/ha (CV = 14.5%) respectively in 1999, and 9,912 and 1,174 kg/ha (CV = 11.8%) respectively in 2001. Yields were higher and less variable in 2001, a more favorable weather year.
56
4044150
x x
4044100
x
A 4044050 X
4044000) X
4043950
370000 370050 370100 370150 370200
4044150 4044100B 4044050
4044000 4043950
x
x X
370000 370050 370100 370150
4044150 4044100
C1
404400040439507
x
K
3 2x 37K x370250 370300 370350
x x 
X,
370200
x 370250 370300
x
370350
x
370000 370050 370100 370150 370200 370250 370300 370350
Figure 31. Sampling locations and tour lengths for the 3 cases at a sampling density of
2.5/ha (22points): (A) MMSD+TSP: 1,433 m; (B) MMKV+TSP: 1,395 m;
(C) SOFM: 1,424 m.
x X
57
60 55 50
45 40 35 30 25
20 15 10
4 MMSD+TSP
1500
0 MMSD+TSP
X MMKV+TSP 1250  SOFM
Ca
0)
1000
W 750
500
0 2 4 6 8 10
Sampling density (1/ha)
B
)4 6 8 10
MMSD+TSP  MMKV+TSP
    SOFM
0 2 4 6 8 10
Sampling density (1/ha)
C
8% 6%
4%
E
2%
E
J= Y
C
. 4%
 6%
8%
Density (1/ha)
A
2 A A R 10
4MMSD+TSP
: MMKV+TSP
...  SOFM
Density (1/ha)
D
E
Figure 32. Evaluation of the sampling schemes produced by the three cases at different
sampling densities: (A) MMSD, (B) yield prediction RMSE for 1999,
(C) yield prediction RMSE for 2001, (D) percent error predicting 1999 mean
field yield, (E) percent error predicting 2001 mean field yield.
 MVTSD+TSP   MVIKV+TSP
A SOFM
0 2 4 6 8 10
Sample density (1/ha)
1500
w
ci,
SMMKV+TSP 1250 , ..  SOFM
1000 750
500
8% 6% 4%
E
2%
0 0
E
S2% .6 4%
C
' 6% S8%
40441504044100
8000 9001
A 4044050
 0
4044000 0
4043950 40 X
370200 370250 37
370000 370050 370100 370150
0
Co
000
0
0300 370350 370400
4044150
4044100B 4044050
4044000
4043950 0
370000 370050 370100 370150 370200 370250 370300 370350 370400
4044150] 11000
$ 10000
4044100  9000
 8000
C 4044050 7000
07000
5000
4044000 000
4043950
__3000
370000 370050 370100 370150 370200 370250 370300 370350 370400 2000
Figure 33. (A) Observed 1999 yield map; (B) MMKV+TSP estimate at 2.5 samples/ha
(22 points); C) MMKV + TSP estimate at 10 samples/ha (88 points).
Figures 32B and 32C show estimated yield RMSE vs. sampling density for 1999 and 2001. RMSE decreased with increasing density, and varied little between cases.
58
11000 10000 '9000 8000 7000 6000 5000
4000 3000
2000 11000 10000 9000
7000 6000 5000
4000 3000
2000
8000
N
006
0000k
59
None of the methods was clearly superior to the others for both years across the sampling density range. Figures 32D and 32E show relative error in field average interpolated yield vs. sampling density for 1999 and 2001. Error was generally low; average (across cases and years) relative error Er was below 5% (less than 28% of the CV) for all sampling densities, a great improvement with respect to estimating the field mean yield from one random location.
Figure 33A shows the observed yield map for 1999. Figure 33B shows the
surface interpolated from 22 points (sampling density: 2.5/ha ~ 1/acre) obtained with the MMKV+TSP algorithm. Although this layout estimated mean field yield with an error of 0.5% in 1999 and 3.2% in 2001, it reproduced spatial variability relatively poorly (RMSE of 1,064 kg/ha in 1999 and 1,004 kg/ha in 2001). In contrast, Figure 33C shows the surface obtained with the same algorithm and 88 points (density: 10/ha ~ 25/acre). The relative error of prediction of the mean at this density was 0.7% in 1999 and 0.3% in 2001 and RMSE improved to 742 kg/ha in 1999 and 786 kg/ha in 2001.
Planning a sampling scheme and scouting path using multivariate data (of which the x,y pairs of the datapoor scenario are a special case) can be considered an attempt at accurately depicting the joint probability distribution of the data using a small sample. The SOFM tends to bias this representation by overrepresenting regions of low input density and underrepresenting regions of high input density (Haykin, 1994). This may actually be valuable in crop scouting, where small, distinct regions in the input data distribution may correspond to environmental conditions favoring pests, weeds, etc.
60
Tour Length
Figure 34A shows tour length vs. sampling density for the three cases. Tour
lengths were quite similar, with the MMKV+TSP case producing the shortest tour at five densities, and MMSD+TSP producing the shortest at the remaining four. The SOFM produced tours that were, on average, 4% longer than the average of the other cases.
Fig. 34B shows the difference between the tour lengths resulting from the TSP algorithm and the expert. Note that:
* The algorithm  and expertderived tour lengths coincided at the lowest (5point)
sampling density,
" The algorithmderived tours tended to be increasingly shorter than the expertderived tours as sampling density increased, and
" The trend was stronger for MMKV than for MMSD.
Sample density (1/ha)
0 2 4 6 8
3000 2500
2000 1500
1000 500
0
MMSD+TSP MS Vl~+TSP x. MMKV+TSP
A SOFM
0 2 4 6 8
Sample density (1/ha)
A
C
0)
C U)
C
U) C.)
C ~1) U)
0
10
0%
2%
4%
6%
8%
10%
10
0
00
0MMvKV
MMKSD . AKV
B
Figure 34. (A) Comparison of the three cases' tour lengths. (B) Difference between
MMSD / MMKV case tour lengths and expertderived tour lengths. Note how
the algorithmderived tours tend to be shorter, and how the difference
increases with sampling density.
c
3
0
61
Runtime
Our SOFM algorithm runs quickly (23 minutes), stopping when it reaches an
invariant scheme. Thus we did not try timeconstrained SOFM runs. Likewise, the Syslo et al. (1983) implementation of the Opt3 algorithm we used to solve the TSP component of the other cases ran remarkably fast. The spatial sampling design stage performed with SANOS was where time reduction was necessary. The "optimal" value was derived from van Groenigen et al.'s (1999) suggestion of using a conservative value (over 0.99) for a parameter (called cc) that sets how quickly the simulated annealing algorithm "cools".
When runtime of the TSP cases was constrained to only 1 minute by reducing cc, tour length decreased an average of 3.5% across cases and years, reflecting the more clumped structure of the suboptimal schemes. The RMSE remained essentially the same, as did the estimation of field mean, except for the MMKV cases, where estimation error was about twice that of the timeunconstrained cases. The latter effect results from the time required per iteration of the MMSD and MMKV cases: MMKV iterations involve solving several systems of linear algebraic equations in order to determine the weights ?. shown in equations 33 and 34. Conversely, MMSD iterations are relatively very quick, only requiring arithmetical comparisons. Under very constrained runtimes there may not be enough iterations of the MMKV algorithm to attain proper equilibrium of the simulated annealing algorithm at each acceptance probability level, whereupon the method's rejection of local minima could break down. Thus, in practical timecritical applications, the MMSD+TSP algorithm is preferable.
62
Semivariograms
Table 31 shows the parameters of the exponential models fitted to the three
semivariograms derived from raw yield map data. The nugget (CO), sill (CO+C 1) and effective range (r) values differ between the three semivariograms. It has been noted by van Groenigen (2000) that changes in variogram parameters can impact the results of a sampling scheme based on minimizing kriging variance. Thus, although MMKV+TSP is based on sound geostatistical principles whereas MMSD+TSP is empirical, the advantage of MMKV+TSP when using a proxy semivariogram is questionable. Table 31. Standardized variograms of the 1999 and 2001 McCallon 1, and 1999 Suggs 4
maize data. Columns are the nugget effect or CO, sill or CO+C 1, and effective
range or r (m).
SV CO C0+C1 r
McCallonl '99 0.35 0.75 117
McCallonl '01 0.45 0.556 75
Suggs 4 '99 0.2 0.8 66
Practical Considerations
Aside from improvement with respect to expertderived sampling schemes and paths, using one of the proposed methods to generate a scouting map has several advantages with respect to a paperbased approach:
* Officebased map generation
" The map can be downloaded to a GPSenabled handheld computer that can log
results and ease the transfer of data back into a crop database.
* Digital scouting tools allow the farmer to keep a permanent sitespecific record of
crop state. Thus, scouting maps can be used to make application maps.
* Increased accountability of scouting performance: files contain timestamps,
location, etc.
" Repeatedly scouting the same areas can allow comparison of the state of the field
over time.
63
" Greater potential for delegation.
* It is possible to bias the samplelocating process to increase the chances of finding
pests that are first detected in certain kinds of environments. It is thus possible to
avoid (or prioritize) field edges, etc.
Potential drawbacks include:
" Additional hardware / software requirements.
* The requirement of following a set path may potentially be less costeffective
(more timeconsuming) for the crop scout.
" The learning process of using new technology.
Conclusions
The methods shown herein provide a principled approach to the design of cropscouting activities as a form of spatial sampling. The methods are sufficiently quick and accurate to be usable in practical applications.
The TSP methods (MMKV+TSP and MMSD+TSP) tended to make slightly shorter tours than the SOFM, although the three methods' tours were never longer than the expert opinions. The TSP methods also typically estimated yield slightly better than the SOFM. When runtime is unconstrained (and a semivariogram is available), the MMKV+TSP case seems most appropriate. Contrarily, when runtime is strongly constrained MMSD+TSP may be more dependable. In intermediate situations the three methods are practically equivalent.
CHAPTER 4
REDUCING SOIL WATER SPATIAL SAMPLING DENSITY USING SCALED
SEMIVARIOGRAMS AND SIMULATED ANNEALING Introduction
Estimating the spatial and temporal patterns of soil water content in agricultural
areas is of great value in various activities such as predicting of crop yields, assessing the fate of potentially contaminating crop inputs, and estimating soil erosion. The necessary data requirements must be met through spatial sampling, and the spatial density of the measurements will strongly influence the cost of the process, the quality of the results, and the feasibility of a longterm study. Careful design of the sampling scheme can save time and money, and spatial statistics may be used to optimize such a scheme (Van Groenigen and Stein, 1998). The density reduction of an existing spatial network is a related problem, relevant in many regions of the world where funding for environmental monitoring is decreasing.
The spatial dataset used in this study was taken from an ongoing experiment
running since 1992 in an 8hectare microwatershed. The spatial variability of soil water content was characterized by repeatedly sampling soil water at 57 locations distributed throughout the microwatershed, but it was impractical to sample all the points with the desired temporal frequency over an extended period of time. It became necessary to develop a methodology to reduce the number of sampling points while maintaining the ability to describe spatial soil water content across the field. Our goal was to identify a timeinvariant relationship between the water content measurements across the 57
64
65
locations. The existence of such a relationship would allow us to infer the spatial pattern of soil water content over the entire microwatershed at future dates by sampling a reduced subset of locations.
Temporal Stability
Formally, the concept of a timeinvariant relationship between water contents at different locations may hold only for covered plots draining freely; however, there is evidence that it has a wider range of application (Sisson, 1987). Vachaud et al. (1985) developed a technique for reducing spatial sampling density, based on the concept of temporal stability of soil water content. They defined this as a time invariant association between spatial location and classical statistical parameters, emphasizing the persistence of the rank of soil water content measured at different locations in a network. This idea has been subsequently tested under different conditions by several authors, with contradictory results as shown below.
The work reported by Vachaud et al. (1985) involved data sets that showed no spatial correlation, perhaps due to the great heterogeneity of the soil properties in the study locations. This sample independence made it possible to study the temporal stability of the data using simple statistics. However, at many scales of interest, soils are not necessarily randomly distributed; Kachanoski and de Jong (1988) presented additional tests of temporal stability in the context of spatial associations in the data and scale dependency. Other researchers have observed temporal stability in soil water patterns: Goovaerts and Chiang (1993) in a longterm fallow plot, Kamgar et al. (1993) in bare soil laid out in furrows and beds, Zhang and Berndtsson (1988) on shortcut grass, Jaynes and Hunsaker (1989) in an irrigated wheat field, and Reichardt et al. (1993) under different conditions of land cover ranging from bare soil to a corn crop.
66
Other studies produced mixed results. Cassel et al. (2000) observed greater
temporal stability of water content in deeper soil layers than in shallow layers under a wheat crop. This effect could be attributed to the impact of crop root water uptake. Grayson and Western (1998) studied three catchments having significant relief, and observed that although the overall spatial soil moisture patterns were not time stable, the measurements in a specific subset of the locations within the measurement network were time stable and could adequately represent mean soil moisture over their areas of interest. Grayson and Western (1998) denoted the locations in this subset as catchment average soil moisture monitoring (CASMM) sites. Comegna and Basile (1994) obtained opposite results: they observed both spatial associations between water content across locations, and a timestable spatial structure for the water content in the top 90 cm of the soil profile. However, they were unable to find CASMM locations, and attributed this effect to the great homogeneity of the volcanic soil of their study site.
Other researchers have observed a lack of temporal stability. Van Wesenbeeck et
al. (1988) observed that the spatial pattern of surface (00.2 m) soil water content below a corn crop was not stable over time, but was a function of crop growth stage and mean soil water content. Mohanty et al. (2000) observed time instability of soil moisture patterns in a gently sloping range field, and suggested that this might be the consequence of lateral base flow and aspectdriven accelerated or decelerated evapotranspiration and condensation. Indeed, Kachanoski and De Jong (1988) pointed out that soil water content at a point is the product of hydrologic processes operating at different spatial scales.
Variogram analysis has been used very effectively to study spatial associations (Vieira et al., 1983; McBratney and Webster, 1981; Burgess and Webster, 1980).
67
However, in the context of temporal analysis, seasonal differences in rainfall and water content in a field may render a variogram calculated at one time not representative of the conditions at another. Kachanoski and De Jong (1988) noted that time stability results in time independence of the normalized semivariogram but not necessarily the ordinary semivariogram. Vieira et al. (1991) proposed a variogram scaling technique, dividing the variogram of the observations taken at a particular date by their sample variance, and subsequently merging several dates' variograms into one. Comegna and Basile (1994) and Vieira et al. (1997) applied this concept to time stability analysis of soil water content.
Simulated Annealing
The spatial sampling density reduction problem requires selecting a subset of the original dataset that will, in combination with a spatial interpolation algorithm, produce the best possible estimate of the variable of interest at the points that will no longer be sampled. This is a nontrivial combinatorial problem when the number of locations involved is high. An optimization algorithm may be used to search for a solution, but the algorithm in question should converge to the global optimum. Simulated annealing (Aarts and Korst, 1990) is such a method; different forms of the algorithm originally proposed by Metropolis et al. (1953) have been recently applied to numerous problems such as modeling spatial variability of heavy metal concentration in soils (Lin and Chang, 2000), spatial variability of phosphorus content and texture (van Groenigen et al., 1999), soil pore structure modeling (Moran and McBratney, 1997), and soil parameter estimation for functional crop models (Calmon et al., 1999; Braga and Jones, 1998; Braga et al., 1998; Shen et al., 1998; Paz et al., 1998). Examples of combinatorial applications of simulated annealing range from printed circuit board design (Kirkpatrick et al., 1983) to the
68
selection of representative nodes in a meteorological network (Robledo, 1994) and the determination of optimal soil sampling strategies for precision agriculture research (Van Groenigen et al., 2000).
Our objectives in this study were a) to model the spatial variability of soil water
across several dates in an 8 ha microwatershed using measurements taken at 57 locations, and b) to define a reduced subset of 10 of the original measurement network locations which could be used to adequately predict the soil water content in the rest of the network.
Materials and Methods
Study Location
Our study area is an 8hectare microwatershed (64' 13' W, 310 29' S) located 25 km to the south of the city of C6rdoba, Argentina. The conditions in this field are considered representative of approximately 20000 hectares affected by water erosion in the region (Romero et al., 1995). A map of the microwatershed including elevation level curves is shown in Figure 41. Slope in the microwatershed varies from 0.8 to 1.2%, and runoff is discharged through a flume located at its southeastern corner (coordinates x = 0, y = 280 in Figure 41). The soil is a silty loam Typic Haplustoll. A modal horizon profile is Al (014 cm), A2 (1420 cm), Bw (2040 cm), BC (4060 cm), C (6084 cm), and Ck (84+ cm). The soil is very deep, and the depth to the water table is approximately 20 meters. Soybeans (Glycine Max (L.) Merrill) were grown on the microwatershed under conventional tillage every year since 1990. An Agripo maturity group VII variety was used in the 1991 / 92 season and an Asgrow maturity group VI variety was used thereafter.
69
300 7 14
+ + S
250 6 3 t0 6
+ f E W
200 5 2 h9 5 1
+ + N
4 1 8 4 344
150 4 1 8 4' 30 35/407/44
150 +,+ + + + + +{
3 10 17 23 29 34 39/ 43 7 53
100 + + + + + + + +
2 9 1 33 38 42 46 49 57
50 1 15 21 32 37 4 45 1 56
0 0
0 50 100 150 200 250 300 350 400 450 500
Figure 41. Layout of the microwatershed, showing the sampling locations as numbered
crosses. Coordinates are expressed in meters. Note the rotated map
orientation.
The measurement layout consisted of a grid pattern of 57 locations covering the entire microwatershed, as shown in Figure 41. The grid had an isometric interval of 41.66 m between adjacent points. Gravimetric soil water content measurements were performed in each of the grid points at depths of 030, 3065, and 65100 cm, and these values were used to estimate the total soil water content to a depth of 1 m. This study used measurements performed on 2/7/1992, 2/24/1992, 3/20/1992, 1/25/1993, and 12/23/1993. The first three dates were used to develop and calibrate a model of spatial variability; and the last two, belonging other cropping seasons, were used for validation. Semivariogram Modeling
We analyzed the spatial variability of the soil water content of each soil layer and of the total water content in the first meter of the soil profile using variogram modeling
70
(McBratney and Webster, 1986; Trangmar et al., 1985; Vieira et al., 1983). We calculated an experimental semivariogram for each measurement date and fitted a continuous function to it. The optimal semivariogram type was selected, and the semivariograms were tested, using crossvalidation. Apezteguia et al. (1999) reported these results.
For validation we used a scaled semivariogram (Vieira et al., 1997), built by
dividing the experimental semivariograms of each of the three calibration dates by the sample variance of each date's data, and fitting a new continuous function to the union of all the scaled data. Its usage will be described under Validation below. The Density Reduction Problem
The sampling density reduction problem consisted of choosing the subset of given cardinality of the 57 measurement locations which would best approximate the spatial distribution of soil water content at the calibration dates, using the data measured at the subset to estimate the data at the remaining points by means of a spatial interpolation algorithm. The optimization process was formulated as the minimization of an objective or fitness function J that will be discussed below. Longterm sampling cost considerations set the subset cardinality to 10.
Choosing an optimal subset of 10 points out of 57 is a complex combinatorial
problem; using a bruteforce method to evaluate all the possible combinations would be very timeconsuming given that C(57,10) = 4.318  10" and each iteration is computationally intensive. Instead, we approached the problem using simulated two annealing algorithms: the one described by Sacks and Schiller (1988), and the newer Spatial Simulated Annealing algorithm proposed by van Groenigen and Stein (1998).
71
The Sacks and Schiller algorithm (S&S) is designed to work on a discrete domain D (57 points, in our case). At any given time j, Si is a subset of D of the desired cardinality (10). The algorithm iteratively proposes and evaluates a new subset by replacing one point from the previous subset, and accepts or rejects the change according to the application of a simple acceptance criterion. In each iteration the new pattern S' is proposed by randomly choosing an entering point t E (D  Si), followed by the deterministic selection of the replaced exiting point s* e Sj that minimizes the fitness function J(S') i.e. J(SJ u t  s*) = mi J(S1 u t  s). After each such change, the new
sSS
value of J, J(S') may or may not have improved (decreased) with respect to the previous iteration. If it improved, then the new pattern is accepted with a probability of 1. If it did not improve, then the pattern is accepted with a probability given by a control parameter 7c, such that 0 < 7r < 1, and it is a function that tends to decrease through the algorithm's execution, making it progressively more improbable that the algorithm accepts new patterns that do not improve the solution.
The Spatial Simulated Annealing algorithm (SSA) is different from the previous algorithm in three fundamental aspects: i) it is designed for a continuous domain, ii) instead of only using the control parameter to set the acceptance probability, it also includes the difference in fitness between the new and old patterns, and iii) instead of replacing a point of the subset SY with another one belonging to (D SJ), it chooses a point s within the subset and moves it over space to a new location shifted with respect to the original in a random direction and by a random distance, the latter bounded from above by a function hmax that tends to decrease as the algorithm execution progresses. Thus, s may initially be shifted large distances, but as the algorithm progresses the
72
movements become progressively smaller and less probable. In a manner similar to S&S, the SSA control parameter decreases with time.
The nature of our problem did not allow the use of the SSA algorithm as originally described by van Groenigen and Stein (1998); it was only possible to consider locations from the discrete domain of 57 points due to the existence of several years of other data (crop biomass, yield, etc.) sampled only at those locations. We implemented a variation of the SSA algorithm that moved location s over space, but only to candidate locations on the grid. Otherwise, the algorithm is almost identical to the one presented by van Groenigen and Stein. A detailed description of both implemented algorithms (S&S and SSA) is provided in the Appendix.
Fitness Functions
In each iteration of the simulated annealing algorithms, we used a fitness function to describe the ability of the proposed pattern S' to predict the water content throughout D at all the dates of interest. The prediction of water content was performed with a spatial interpolation algorithm. We applied ordinary kriging (Deutsch and Journel, 1992), using each calibration date's calculated semivariogram. We performed the process with two different fitness functions, both of which simultaneously evaluated the performance of candidate subsets across all the calibration dates. The functions are described below. Scaled kriging variance (SKV)
The scaled kriging variance function is defined as SKV = E)2 N 2>(6 )
where N is the total number of date  space combinations, slightly less than 3  57 (calibration set) or 2 57 (validation set) due to the existence of missing data. The SKV function adds the kriging variance of water content across all the points i of the
73
microwatershed and across all the dates of interest j, scaling it by the variance of the observed water content data across the microwatershed on the corresponding date. Provided that the intrinsic hypothesis of geostatistics is valid, the predictive accuracy of ordinary kriging can be expressed by the kriging variance (van Groenigen, 2000). Thus, finding a solution that minimizes the kriging variance (or the SKV function, in this case) can be expected to maximize predictive accuracy. Scaled mean squared error (SMSE)
The scaled mean squared error function is defined as SMSE =j _ I j N a (Oi)
This function adds the error of prediction of water content across all the points i of the microwatershed and across all the dates of interest j, scaling the square of each residual by the variance of the observed water content data across the microwatershed on the corresponding date. This allowed us to combine errors across different dates.
We explored the four possible scenarios defined by combinations of the two fitness functions (SKV, SMSE) and two algorithms (S&S, SSA). We ran five repetitions (instances) per scenario, differing in their initial conditions and in the random numbers used throughout the process. The corresponding parameters are shown in the Appendix. Validation
The optimal subset of D was used to estimate the water content in the top one meter of soil on January 25, 1993 and December 23, 1993. These two dates had not been used in the calibration process. As in the calibration phase, we estimated water content in the 47 points not in the subset using ordinary kriging as a spatial interpolator. However, we used the scaled semivariogram multiplying its nugget effect and scale by the variance of the data observed in the optimal 10point subset under consideration.
74
We evaluated the optimal subsets obtained from the four algorithm / fitness
function combinations (SKVS&S, SKVSSA, SMSES&S, SMSESSA). In order to observe the benefits of applying our proposed method, we also evaluated three regular grids (Table 41) and 132 randomly generated subsets. We calculated relative errors
0 0
Er = ' . 100% and standardized residuals for each location and validation date, tested for bias, and plotted the standardized residuals vs. the estimated total soil water content for each validation date to check for trends in the estimation. We also calculated the ShapiroWilk W statistic (Shapiro & Wilk, 1965) to verify whether the residuals were normally distributed, and tested for heteroscedasticity using regression between the mean estimated soil water content and the variance of 4point clusters of adjacent points, following Goovaerts (1997). In order to verify compliance with kriging assumptions, we calculated histograms of the residuals divided by their respective kriging standard deviation, and checked for normality, zero mean, and unit variance.
To put the results into context, we also checked for temporal stability as defined by Vachaud et al. (1985), using temporal analysis of the differences between individual and spatial averages, and performing Spearman's rank correlation on the data of all possible pairs of the five available measurement dates. Table 41. Locations contained in the most relevant patterns mentioned in the text,
together with their values of scaled mean squared error (SMSE) and scaled
kriging variance (SKV) over the calibration (subscript c) and validation
(subscript v) data sets.
Pattern Si S2 S3 S4 S5 S6 S7 S8 S9 sio SKV, SKV, SMSEc SMSEv
Best SKVbased 9 11 13 22 25 34 37 43 49 55 .3276 .1441 .7559 .6336 Best SMSEbased 5 7 13 16 30 37 41 50 53 56 .3844 .4268 .3635 .5174 Regular grid #1 3 7 15 19 29 37 40 47 51 57 .3467 .1723 .7537 .7117 Regular grid #2 4 7 8 23 26 32 40 45 53 56 .3549 .2072 .6187 .5286 Regular grid #3 7 8 11 27 30 41 44 51 53 56 .3590 .2594 .7297 .6814 Best random grid 5 9 13 17 28 33 34 45 55 57 .3731 .3730 .5042 .5040
75
Results and Discussion
Semivariograms
For each date of interest, the semivariogram model that produced the best fit was an exponential model. The corresponding parameters are shown in Table 42. Due to the unavailability of data pairs with lag distances below 41.66 meters, it was difficult to assess the existence of a nugget effect. We assumed a zero nugget throughout, based on the great similarity among the repetitions of water content measurements that were pooled for each point at each measurement date (not shown). The scaled semivariogram model coalesced from the three calibration set semivariograms was
y(h)=CO +Cl Ie a , with Co = 0 (nugget), CI = 0.1082 (scale), and a = 176 m
(effective range).
Table 42. Mean total soil water content in the first meter of soil, phenological stage, and
semivariogram model parameters per measurement date, and parameters for
the scaled semivariogram model.
Semivariogram
parameters
Date 6 Phenological stage Co CI a
Feb 7 1992 217 mm V4 (4th node on main stem) 0 520 110m
Feb 24 1992 228 mm V6 (6th node on main stem) 0 650 130 m
Mar 20 1992 253 mm R5 (Beginning seed) 0 1115 130 m
Jan 25 1993 196 mm R2 (Full flower) 0 270 135 m
Dec 23 1993 142 mm Planting 0 320 100 m
Scaled SV N/A N/A 0 1.082 176 m
Density Reduction
Figure 42 shows the progress of the five instances of one of the scenarios (SMSE, SSA). Note how the length of the process varied among instances, but the final value of the fitness function they all arrived at was approximately the same. This behavior was consistent among all scenarios. However, in both the SKV and SMSEbased scenarios,
76
the five instances of the S&S algorithm reached their optimum value faster than any of the SSA instances (not shown), probably due to the S&S algorithm's deterministic selection of the exiting point of the subset. This characteristic makes the S&S algorithm susceptible to converge towards local minima when 71 has reached low values; the effect is countered by making it possible to automatically increase 7c, and thus the probability of escaping a local minimum, when the fitness function has not improved over a given number of iterations. This automatic increase in 71 typically resulted in noisy output that did not converge to the optimum; the optimum would be reached at some intermediate point, and the algorithm would oscillate thereafter until the cutoff limit of M iterations without changes in 7r was reached. In contrast, the SSA algorithm took longer to reach its optimum, but always converged toward it. This is consistent with the asymptotic convergence proven by Aarts and Korst (1990) for simulated annealing algorithms using the Metropolis (fully stochastic vs. the partially deterministic S&S) perturbation method. The parameterization of the SSA algorithm was also simpler than with the S&S algorithm (see the Appendix). Both the S&S and SSA algorithms found the same optimum pattern for the SMSE criterion. However, only the SSA algorithm arrived at the optimal pattern shown for the SKV criterion; the S&S solutions were slightly inferior.
There was some variability (not shown in Figure 43) among the results of the five instances of the process for each of the four calibration scenarios, depending on the initial conditions of the process and/or the sequence of random numbers involved. This suggests that the chosen parameters may have quenched the system too rapidly, despite the fact that, in the case of the S&S algorithm, our chosen parameter set (see the Appendix) was more conservative and thus should converge more slowly than the one proposed by Sacks
77
and Schiller (1988). This leads us to recommend repeating the process for different initial
conditions / parameter values, especially if the S&S algorithm is being used.
1.2*
1.0 ,
(/)
S0.8
0.6. 4 s.9
0.4
,4 1223
Ittrat.0
> 086
0.4
uJ
C/)
(I) 1~j.
.2 i~0.6.
1.2
~U) j
0.4
1.2'
0.4
0 100 200 300 400 500 600 700 800
Iteration
Figure 42. Progress of the five instances of the scenario defined by the scaled mean
squared error (SMSE) fitness function and the Spatial Simulated Annealing
(SSA) algorithm. The missing points correspond to patterns (10location
subsets) that were penalized, by adding a large number to their fitness
function, for not predicting the water content of all the remaining 47 locations in the microwatershed. This penalization kept the algorithms from artificially
reducing the value of their fitness function by minimizing the number of errorcontributing estimates. The algorithms could do this by clumping the locations
and leaving parts of the microwatershed beyond the maximum kriging search
radius.
78
* * 4.
* *
* .11
1.2 1.0 0.8 0.6
0.4 0.2 0.0
1.2 1.0 0.8 0.6
0.4 0.2 0.0
Cd) C/) Cd Ci)
W/ WU
'4:
Ci) Ci)
C/,
(n
(U
z
0
C
C
0
z
. .
. . It
. .
U
CU) U) CU n
C/) C/) CiD C) 0
Ci) Ci
CI) C/) 0
0
z
Calibration dataset Validation dataset
Calibration criterion
Figure 43. Results of the calibration (left) and validation (right) processes for each of the
four scenarios, shown as scaled kriging variance (SKV, top) and scaled mean
squared error (SMSE, bottom). Left panes: results of the density reduction process on the calibration data set. Right panes: application of the optimal
calibrationphase patterns to estimate the water content in the validation set
(using the scaled semivariogram). Since calibration used 3 sets of
measurements and validation only 2, the absolute values of the errors shown in the left and right panes should not be compared with one another. Results
for three regular grids and 132 random patterns (median and range) were
added for contrast.
The results of the density reduction process on the calibration data set are shown on
the left half of Figure 43. Observe how the values of the scaled kriging variance were
quite similar across the optima of the different calibration scenarios as well as the regular
grids and the 132 random patterns. This is in great measure explained by two factors:
i) we set a 300meter maximum search radius in the GSLIB kb2d routine (Deutsch and
Journel, 1992) used for kriging, and ii) In the case of the randomly generated patterns, we
Ui) CU
UJ Cin
Ci) Ci)
.11

xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EK3XWDG6E_ZGVEN5 INGEST_TIME 20150403T19:04:58Z PACKAGE AA00030054_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
PAGE 1
KNOWLEDGEBASED TECHNIQUES FOR PARAMETERIZING SPATIAL BIOPHYSICAL MODELS By RAFAEL ANDRES FERREYRA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2003
PAGE 2
Copyright 2003 by Rafael Andres Ferreyra
PAGE 3
This document is dedicated to Lili, my wife.
PAGE 4
ACKNOWLEDGMENTS Many people and institutions contributed to make this dissertation possible. First and foremost, I thank Dr. Jim Jones, my advisor, for his infinite patience and balance in guiding me through my Ph.D. process and helping me grow while supporting my independence. He is an extremely busy man, but he always had time for me and always pushed me to excel and have ambitious goals. His enormous productivity and unvarying affability have been very inspiring. I thank my other committee members for their help, patience, and sense of humor. Dr. Ken Boote provided muchappreciated advice at several critical junctures of my program. His fantastic course on plant physiology was a revelation for me. Dr. Doug Dankel is a great teacher, and is extraordinarily supportive and friendly. Dr. Wendy Graham, with infinite patience and a keen sense of humor, helped me develop some geostatistical and hydrologic sensibility, and saw me through the publication of my first dissertationrelated paper. Dr. Gerrit Hoogenboom provided valuable advice and constructive criticism throughout the process, did not despair despite my initially scant publishing momentum, and was determined to keep me honest. I have special thanks for some of my other great teachers at UF, who provided great help and inspiration. Robert McSorley stimulated my interest in agricultural ecology. Carl Barfield's grant writing course is so useful it should be a requirement for graduation. Stanley Latimer helped me organize the GPS work at my research sites, and trusted me with a lot of expensive equipment (I also thank the Trimble Corporation for providing iv
PAGE 5
that equipment to UF in the first place). Anand Rangarajan introduced me to Bayesian Networks, and helped me get my IEEE grant. Sergei Pilyugin is a friendly and very effective teacher of differential equations. Charles Guy teaches a great course on scientific issues. Mickie Swisher's course on methods of scientific inquiry was valuable and fun. Paul Gader stimulated my interest in applications of fuzzy logic. Damon Andrew's badminton course helped me get through a difficult semester. Jon Dain and Franklin Paniagua taught a conflict resolution course that gave me a new perspective on natural resources management and methodologies for solving problems involving people. Ted Spiker welcomed me into his course on magazine and feature writing at the Journalism Department, and taught me much about writing. Finally, Patricia Craddock's professional editing course was very enlightening and valuable during the last weeks of my program. I thank all of these professors; they prove that nothing can replace direct contact with a great teacher. I am very thankful to Guillermo Podesta of the University of Miami. Guillermo helped me a great deal in my early weather satellite days, hosting me several times at the Rosenstiel School of Marine and Atmospheric Science (RSMAS). He also encouraged my pursuit of further academic aspirations, and later helped catalyze my coming to UF. He is a great promoter of climate change research in Argentina and continues to invest in argentine atmospheric and agricultural sciences, undaunted by obstacles. I also thank Dave Letson of the University of Miami. Working with him in ENSOrelated research was great, and I have learnt much from his crisp writing. My gratitude also goes to Jeff White for hosting Dr. Jones and his students at CIMMyT, and to Joe
PAGE 6
Ritchie for honoring me with an invitation to his symposium in Detroit in 2000. 1 also wish to thank Bill Batchelor for his valuable career advice and friendly attitude. I thank Pedro Murillo, Eduardo Toselli, and Miguel Gurassa for their friendship and support during and after our years working together in Argentina. I learnt much from them. I also thank Julio Dardanelli for his friendship and guidance during my MS, and for contributing to my coming to UF. We have been able to keep our research collaboration going mostly because of his patience and accommodation of the time limitations imposed by my Ph.D. process. I am exceedingly thankful to Hernan Apezteguia. He generously provided the data and motivation for Chapter 4. He is a wise and generous man, with a fantastic sense of humor. I hope we can continue our collaboration into the distant future. Ludmila and Yakov Pachepsky deserve more thanks than I can express. Their generosity and friendship know no bounds. They are also fantastic scientists. They convinced me to pursue a Ph.D. and provided me with all sorts of logistic support and encouragement. I am extremely grateful to AgConnections, Inc. of Murray, KY, and its two owners, Rick Murdock and Pete Clark. They have invested heavily in my work, and have honored me with their friendship and the opportunity to witness how a group of intelligent, enterprising people can prosper given enough effort and powerful ideas. I've seen them grow from three smart guys in a big room a few years ago to their current status as a major force of nature. I greatly thank Rick Murdock for his vision, encouragement, and generosity; I have learnt much from him. His family also provided me with a home away from home in my visits to Murray. I am also very grateful to Pete Clark for his vi
PAGE 7
downtoearth good advice and business perspective; to John Potts, whose sharp intellect and expert knowledge of agroecosystems I frequently tapped for this dissertation; to Joe Bullen, who measured water content in the Suggs 4 field under extreme environmental conditions more times than he would like to remember; and to Chad Wortham, who took care of the weather station in the field, measured soil water content, and was always helpful and in high spirits. I thank Rick Murdock, Chad Wortham, Pete Clark, Jamie Stockdale, Kenny Kingins, Clay Bailey, Joe Bullen and John Potts for helping me pound so many poorlydesigned plastic tubes through a nearly impenetrable barrier that soil scientists cynically call a "fragipan". I failed to see anything "fragile" about it. Rick's ability to nurse battered steel pipes back to health and improvise a ramming head from pieces of scrap metal is the material from which metalworking legends are wrought. I am very thankful to Jerry Mcintosh of the NRCS, who generously contributed a lot of time and expert knowledge to this work; to Ron Riffey and Tim Taylor who, while at the AGRIS Corporation, helped me assemble a proofofconcept dataset and find my way to AgConnections; and to Chuck Cunnyngham, Mike Fouts, Duane Frederking, Bob Guse, and Russ Henry of Pioneer, who provided me with valuable information on the 328 1W corn hybrid. I thank the people who have helped me publish, especially Dr. Jones, who softly but relentlessly pushed me to write; Dr. Graham, who patiently helped me with water and geostatistical issues; my departmental reviewers Tom Burks, Daniel Lee, Fred Royce, Carlos Messina, and Jawoo Koo; the anonymous reviewers who greatly improved my manuscripts; and Ludmila B. Pachepsky, who helped me start. I also greatly appreciate vii
PAGE 8
the funding for publication costs received from the IFAS Agricultural Experiment Station Journal Series program. Carlos Messina is a good friend and an extraordinary scientist. I have profited greatly from the innumerable hours of scientific discussion we've shared during the last four years. Working and studying with him has been an honor. I am also deeply grateful for his support during the final, hectic hours of my Ph.D. process. I thank The Royce family, Fred, Estela, Karina, Anelkis and Sofia, for their friendship and hospitality. The shared evenings at their home really helped me feel at home in Gainesville. I also wish to thank Jawoo Koo; over the last four years he showed his labmates that human kindness and consideration for others can be limitless. I thank the other fine people of the Crop Systems Modeling Lab, Alagarswamy, Andre, Ayse, Cheryl, McNair, Oxana, Ramkrishnan, Shrikant, Valerie, and Wayne, for their friendship and numerous acts of kindness. Special thanks go to Ricardo Braga, a great friend and crop modeler, who made our early stay in Gainesville very enjoyable. I also wish to thank Hector, Claudia, Juan Manuel, and Lucila Crena. They have been our family in Gainesville. Several organizations have provided muchappreciated funding for diverse aspects of my work: the United Soybean Board, the Honor Society of Phi Kappa Phi, the Scientific Honor Society of Sigma Xi, the Neural Networks Society of the Institute for Electric and Electronic Engineers, and particularly AgConnections, Inc. I am especially grateful for the funding provided by the University of Florida's College of Agriculture and Life Sciences, CALS, through an Alumni Graduate Fellowship and travel grants. I also thank the Graduate Student Council at UF for its support through travel grants and viii
PAGE 9
for organizing the annual Graduate Student Forum, which provides a valuable opportunity for graduate students to exercise their presentation skills. One of the most extraordinary and pleasant things I have been exposed to during my program at UF is the inclusive attitude of CALS/IFAS. They funded my program although I was a student of the College of Engineering; and treated me as one of their own from the first day to the last. In particular, I thank Dr. Jane Luzar for being a relentless cheerleader and contributor for the ABE Town Hall, a studentorganized discussion forum we created in our department; and I thank Dr. Mike Martin for participating in it. I thank Carlos Messina, Kelly Brock, Amy Dedrick, and everybody else who in one way or another contributed to getting the ABE Town Hall going. Special thanks go to the professors and graduate students from ABE and other departments throughout UF who came and participated. Hopefully the Town Hall concept will live long and prosper. I also hope our incipient Undergraduate Research Experience program through the Graduate Student Council will do the same. I am very thankful to the Agricultural and Biological Engineering Department for its friendly, hospitable atmosphere, and for putting so many resources at my disposal. I thank the ABE Department's administrative cast. Jane Elholm, Mary Hall, Mary Harris, Dawn Mendoza, Betty Pearson, and Jeanette Wilson always solved my problems, or did things so well that there were no problems to solve. I also thank Dr. Ken Campbell, the ABE Graduate Coordinator, for his kindness, help, and good advice. I also wish to thank Maud Fraser at the UF International Center for her good advice and good humour. I wish to thank Tommy Estevez at the UF Meat Processing Center for his kindness and style. He is practically the patron saint of the Argentine community in Gainesville. I ix
PAGE 10
also thank Michelle at the Reitz Union Wendy's for her smile and upbeat attitude. Thanks also go to the Hare Krishas for the Krishna Lunch at UF's Plaza of the Americas. I didn't go often, but I know people who wouldn't have made it through grad school without it. I thank the United States Postal Service for making a major contribution to my quality of life through their great service. Manuscripts, important paperwork, bills, etc. always came and went flawlessly. I am thankful for the UF libraries. They amaze me today as they did when I first saw them. I hope I never take something as valuable as that for granted. Many thanks go to my parents, who supported us throughout the process, on occasion risking their health to visit us; and to my sister. My magnificent children, Nicolas and Tomas, have also been a constant source of wonder and encouragement. Leaving the best for last, I thank my wife Lili. Without her love, sacrifice and determination, this would have been impossible.
PAGE 11
TABLE OF CONTENTS page ACKNOWLEDGMENTS iv LIST OF TABLES xv LIST OF FIGURES xvi ABSTRACT xxi CHAPTER 1 INTRODUCTION 1 Precision Agriculture and Crop Models 1 Goal and Objectives 8 Outline of the Dissertation 9 2 SOURCES OF ERROR WHEN INVERSE MODELING IS APPLIED TO THE PARAMETERIZATION OF SPATIALLYCOUPLED CROP MODELS 12 Introduction 12 Materials and Methods 15 Generating Synthetic Yield Maps: the SpatiallyCoupled Model 15 Generating Synthetic Yield Maps: Input Data 20 Estimating Soil Parameters Using Inverse Modeling 22 Evaluating With Independent Data 24 Results and Discussion 25 Exploring Similar Behavior Between SpatiallyCoupled andUncoupled Models 25 Simulated Yield Profiles 28 Inverse Modeling: Parameter Estimation Error 30 Evaluating With Independent Data 36 Conclusions 40 3 PLANNING CROP SCOUTING PATHS WITH OPTIMIZATION ALGORITHMS AND A SELFORGANIZING FEATURE MAP 42 Introduction 42 Theory 44 Sequential Approach: Sampling Locations 44 Sequential Approach: the Traveling Salesman Problem (TSP) 48 xi
PAGE 12
The Simultaneous Approach 48 Materials and Methods 50 Case Study Problem, Location, and Dataset 50 The Sequential Approach 51 The Simultaneous Approach 52 Evaluation of Results 53 Results and Discussion 54 Sampling Location Layout 54 Predictive Accuracy 55 Tour Length 60 Runtime 61 Semivariograms 62 Practical Considerations 62 Conclusions 63 4 REDUCING SOIL WATER SPATIAL SAMPLING DENSITY USING SCALED SEMIVARIOGRAMS AND SIMULATED ANNEALING 64 Introduction 64 Temporal Stability 65 Simulated Annealing 67 Materials and Methods 68 Study Location 68 Semivariogram Modeling 69 The Density Reduction Problem 70 Fitness Functions 72 Scaled kriging variance (SKV) 72 Scaled mean squared error (SMSE) 73 Validation 73 Results and Discussion 75 Semivariograms 75 Density Reduction 75 Validation 79 Residual Analysis of Validation Results and Tests of Kriging Assumptions 83 Temporal Stability Analysis 85 Sources of error and Nonstationarity 87 Conclusions 90 5 A FASTER ALGORITHM FOR CROP MODEL PARAMETERIZATION BY INVERSE MODELING: SIMULATED ANNEALING WITH DATA REUSE 92 Introduction 92 Materials and Methods 95 Simulated Annealing Overview 95 Crop Model Management 97 Case Studies 97 Results and Discussion 100 Case Study 1 100 xii
PAGE 13
Case Study 2 104 Conclusions 107 6 USING BAYESIAN NETWORKS TO HELP UNDERSTAND CAUSAL RELATIONSHIPS 109 Introduction 109 Bayesian Nets as Simple Expert Systems to Help Explain and Understand How Things Work 110 A Simple Example 113 Deductive Inference 114 Abductive Inference 120 Conclusions 122 7 INTEGRATING MULTIPLE KNOWLEDGE SOURCES FOR PARAMETERIZING SPATIAL CROP MODELS WITH INVERSE MODELING 123 Introduction 123 Materials and Methods 127 Case study: the Suggs 4 Field 127 Elevation and topographic attributes 128 Soil electroconductivity 129 Soil data 130 Yield maps 131 Soil water data 132 Parameter Estimation Process 133 Knowledge elicitation 133 Updating the soil map, selecting the simulation locations 134 The IM framework 136 The four parameterevaluation criteria 138 Objective function (aggregation of criterion results) 141 Simulations 143 The (uncoupled) spatial crop model 143 The spatiallycoupled crop model 1 44 Weather data needed for crop simulations 144 Initial conditions 145 Genetic coefficients 146 Analyses 147 Results and Discussion 148 Elevation, Wetness Index 148 Electroconductivity (EC) 150 Soils 153 Yield Maps 1 54 Updating the Soil Map, Selecting the Simulation Locations 158 Soil Probe Observations 161 The Simulation Domain 162 xiii
PAGE 14
Knowledge Elicitation: Populating the Neighborhood Criteria 163 Knowledge Elicitation: Parameter Sensitivity and Parameter Selection 168 Observed Yields in the IM Framework Domain 1 74 Evaluation 178 Simulations with a neighborhood criterion 179 IM with coupled and uncoupled models 1 80 Recommendations for Building Neighborhood Criteria 186 Conclusions 199 8 CONCLUSIONS 201 APPENDIX A THE SIMULATED ANNEALING ALGORITHMS USED IN CHAPTER 4 207 Sacks and Schiller Algorithm 207 Spatial Simulated Annealing 208 Acceptance Criterion 208 Generation Mechanism 208 Cooling Schedule 209 B NEIGHBORHOOD CRITERIA DATA AND SOURCE CODE 210 Depth Criterion 210 Source Code for Soil Map Neighborhood and Yield History Criteria, OWA Operator 214 LIST OF REFERENCES 225 BIOGRAPHICAL SKETCH 244 xiv
PAGE 15
LIST OF TABLES Table page 21 . Soil parameters used for the spatiallycoupled crop model 21 31 . Standardized variograms of the 1999 and 2001 McCallon 1, and 1999 Suggs 4 maize data 62 41. Locations contained in the most relevant patterns mentioned in the text, together with their values of scaled mean squared error (SMSE) and scaled kriging variance (SKV) over the calibration and validation data sets 74 42. Mean total soil water content in the first meter of soil, phenological stage, and semivariogram model parameters per measurement date, and parameters for the scaled semivariogram model 75 43. Results of residual analysis 85 44. Spearman's rank correlation tests for temporal stability 86 51 . Crop model parameters and ranges for case study 2 99 71 . Different IM scenarios, showing number of instances of each criterion 147 72. Semivariogram parameters for the yield map data 1 54 73. Soil probe observations corresponding to the anomalies in Figure 718 160 74. Values adopted for the neighborhood criterion's constraint thresholds 1 64 75. Crop model parameters and ranges used in IM framework 1 72 76. Soil water holding characteristics obtained by applying the Saxton pedotransfer functions to textural fractions taken from the literature 1 73 B1 . Matrix encoding for the soil neighborhood criterion 210 xv
PAGE 16
LIST OF FIGURES Figure Â£age 21 . Landscape model used for the spatiallycoupled model simulations 15 22. Parameterization weather cases 23 23. Values of CN E Q2 for different combinations of rainfall and curve number values in cells 1 and 2 (CNj, CN 2 ) of the spatiallycoupled model 28 24. Histogram of PESW in available initial condition set 29 25. Simulated yield profiles made with the spatiallycoupled and uncoupled models 29 26. CN2 values obtained by IM for the spatiallycoupled and uncoupled models 3 1 27. FAW values obtained by IM for the spatiallycoupled and uncoupled models 32 28. CN2 sensitivity coefficient for different landscape positions and models 34 29. FA W sensitivity coefficient for different landscape positions and models 34 210. Yield RMSE for the twelve parameterization scenarios 35 211. Evaluation yield RMSE for the spatiallycoupled model when IM initial conditions are unknown and evaluation initial conditions are known 38 212. Evaluation yield RMSE for the spatiallycoupled model when both the IM and evaluation initial conditions are unknown 38 213. Evaluation yield RMSE for the uncoupled model when IM initial conditions are unknown and evaluation initial conditions are known 39 214. Evaluation yield RMSE for the uncoupled model when both the IM and evaluation initial conditions are unknown 39 31 . Sampling locations and tour lengths for the 3 cases at a sampling density of2.5/ha (22points) 56 32. Evaluation of the sampling schemes produced by the three cases at different sampling densities 57 xvi
PAGE 17
33. (A) Observed 1999 yield map; (B) MMKV+TSP estimate at 2.5 samples/ha (22 points); C) MMKV + TSP estimate at 10 samples/ha (88 points) 58 34. (A) Comparison of the three cases' tour lengths. (B) Difference between MMSD / MMKV case tour lengths and expertderived tour lengths 60 41 . Layout of the microwatershed 69 42. Progress of the five instances of the scenario defined by the scaled mean squared error (SMSE) fitness function and the Spatial Simulated Annealing (SSA) algorithm 77 43. Results of the calibration and validation processes for each of the four scenarios, shown as scaled kriging variance and scaled mean squared error 78 44. Maps of interpolated water content for both validation dates 80 45. Map of relative prediction error of the best SMSEcalibrated scenario for both validation dates 82 46. Distribution, for each validation date, of relative prediction error in the estimated locations not belonging to the optimal pattern for the optimal SKVcalibrated, and the optimal SMSEcalibrated scenarios 82 47. Residual analysis of validation results and tests of kriging assumptions for both validation dates 84 48. Ranked intertemporal relative deviation from the mean (across the microwatershed) spatial soil water content, 8 ( 88 51 . A hypothetical field divided into three environments (soil types) 94 52. Objective function used in case study 1 98 53. Objective function vs. number of algorithm iterations for six runs of the simulated annealing algorithm 101 54. Unique objective function calculations (equivalent to crop model runs) for 18 scenarios (7 repetitions / scenario) in case study 1 103 55. Total model runs vs. number of locations per environment for case study 2 104 56. Error at each location of interest for the grid search and two simulated annealing scenarios of case study 2 105 61 . Scale used to translate between verbal quantifiers and probabilities 114 62. Causal model of soil roughness made using a Bayesian network 115 xvn
PAGE 18
63 . Conditional probability tables for the soil roughness model 116 64. Deductive inference on a Bayesian network 118 65 . Deductive inference in the Bayesian network 119 66. Deductive inference in the Bayesian network 1 20 67. Abductive inference in the Bayesian network 121 71. Suggs 4 field 129 72. Original division of the Suggs 4 field into soil types 130 73. Custombuilt predrilling tool for use with the threeprong TDR probe 1 33 74. IM framework for the proposed SCM parameter estimation process 138 75. Dependence of the yield history criterion for year i (YHC,) on the value of parameter k 139 76. Functions used for evaluating the neighborhood constraints 1 40 77. Monthly rainfall in Murray during 1 97099 1 44 78. Semivariogram estimated from elevation data 148 79. Wire frame elevation map of the Suggs 4 field 149 710. Wetness index calculated for the Suggs 4 field 150 711. Semivariogram estimated from surface and deep electroconductivity data 151 712. Veris electroconductivity maps of the field 152 713. Semivariograms fitted to the observed yield data 155 714. Resampled 1997 maize yield data 156 715. Resampled 1998 Soybean yield data 156 716. Resampled 1999 maize data 157 717. Normalized threeyear (1997, 1998, 1999) yield map 157 718. Summary of anomalies and candidate zones for additional soil map units identified during discussion sessions with the domain experts 158 719. Field observations with a soil probe 161 xviii
PAGE 19
720. Set of 13 soil types used as the IM framework simulation domain 163 721. Soil depth neighborhood criterion 165 722. Wetness neighborhood criterion 166 723. Plant density neighborhood criterion 167 724. Using diagrams to support the discussion and knowledge elicitation process 169 725. Conceptual water balance model expressed as a causal map 170 726. Record of a discussion session with domain experts 171 727. Compact representation of the limiting factor data 172 728. Observed crop yield in the 13 locations of interest 174 729. Observed relative crop yield in the 1 3 locations of interest for the two calibration years (1999 and 2001) and validation year (1997) 175 730. Cumulative rainfall from Jan. 1 to Aug. 23 during 1997, 1999, and 2001 176 73 1 . Rainfall during the crop season 177 732. Evaluation locations, shown on the normalized yield map 1 78 733. Relative position on the landscape of the four locations used for evaluation 178 734. Five realizations of parameterization using the IM framework with only one objective function input, the soil depth neighborhood criterion 180 735. Parameter estimates of 2year coupled and uncoupled model IM scenarios 181 736. Errors and comparison of yields for 2year coupled and uncoupled model IM scenarios relative to observed values 181 737. Parameter estimates of 3year coupled and uncoupled model IM scenarios 182 738. Errors and comparison of yields for 3year coupled and uncoupled model IM scenarios relative to observed values 182 739. Simulated and observed soil water data for the 2001 crop season, 01 5 cm layer of the LoB soil location 1 89 740. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the LoB soil location 189 xix
PAGE 20
741 . Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the LoB soil location 190 742. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the LoB soil location 190 743. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the CaA(W) soil location 191 744. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the CaA(W) soil location 191 745. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the CaA(W) soil location 192 746. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the CaA(W) soil location 192 747. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the GrB(Ba) soil location 193 748. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the GrB(Ba) soil location 193 749. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the GrB(Ba) soil location 194 750. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the GrB(Ba) soil location 194 751 . Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the Hn soil location 195 752. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the Hn soil location 195 753. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the Hn soil location 196 754. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the Hn soil location 196 755. Simulated and observed cumulative transpiration for the 2001 crop season, Hn soil location 197 756. Simulated and observed cumulative transpiration for the 2001 crop season, GrB(Ba) soil location 197 xx
PAGE 21
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy KNOWLEDGEBASED TECHNIQUES FOR PARAMETERIZING SPATIAL BIOPHYSICAL MODELS By Rafael Andres Ferreyra December 2003 Chair: James W. Jones Major Department: Agricultural and Biological Engineering This study presents new approaches for practical problems related to using crop models in precision agriculture. Agriculture is becoming increasingly competitive and regulated. Farmers must maximize profits yet decrease their farms' environmental impact. Precision agriculture has been proposed as a way to improve farmers' income and minimize the environmental impact of farming by optimizing the applied levels of fertilizers and other crop inputs on a sitespecific basis. However, for spatially variable prescriptions to be effective, farmers need to thoroughly understand how several interacting physical and biological factors contribute to cause spatial yield variability. Crop simulation models are software programs that imitate plant growth and development. They can help us understand spatial yield variability and how to manage it. However, crop models have expensive and impractical soil data requirements, especially for spatial applications. A technique called inverse modeling uses the crop models themselves to search for the model parameters that best fit observed results. This xxi
PAGE 22
technique is very convenient for practical applications in precision agriculture, but its current state of development does not ensure good predictive power. Our objectives were Â• To identify and quantitatively compare different sources of error in the use of inverse modeling to parameterize spatially coupled and uncoupled crop models. Â• To develop methods for optimizing spatial sampling schemes for representing the spatiotemporal variability of yield and yieldlimiting factors. Â• To develop and evaluate a portable framework for eliciting knowledge from experts using that knowledge to parameterize a spatial crop model. We found that crop yield spatiotemporal variability in a field can be represented using a limited number of sampling locations; that those locations can be found using efficient combinatorial optimization algorithms; and that in many applications crop model results in the sampling locations can be kept within acceptable error levels without needing the computationally intensive coupling (i.e., interchange of water) between simulation locations. This can be facilitated by imposing a set of spatial constraints on the system during the inverse modeling process. The constraints can be elicited from local domain experts. xxii
PAGE 23
CHAPTER 1 INTRODUCTION Precision Agriculture and Crop Models Presentday row crop (maize, soybeans, etc.) agriculture is beset by economic and environmental problems. Commodity prices decreased steadily through the 20th Century (USDA NASS, 1994), increasing the economic risk of agricultural production. Moreover, growing levels of environmental regulatory pressure have limited farmers' ability to manage risk. The limitations include water body contamination limits and total maximum daily loads (EPA, 2003), market limitations on the use of genetically modified organisms, and competition between agricultural and urban water use. Precision, or sitespecific, agriculture has been proposed as a way for improving farmers' income and reducing the environmental impact of agriculture by optimizing the applied doses of fertilizers and other inputs on a sitespecific basis (NRC, 1 997). Precision agriculture merges several enabling technologies such as global positioning systems (GPS), geographical information systems (GIS), and realtime variablerate application technology (Morgan and Ess, 1997). Sitespecific management requires equipment that can vary application rates in real time while moving through the field (Anderson and Humburg, 1997). Another necessary component for variablerate application is the spatiallyvariable prescription (i.e., sitespecific dosage of crop inputs) (Morgan and Ess, 1997). Effective prescriptions require a thorough understanding of the causes of spatial yield variability, as well as objective methods for predicting crop yield responses to 1
PAGE 24
2 changes in specific inputs. In the case of variablerate application of fertilizers, prescriptions are usually driven by yield goals and soil test results, and are generally based on crop and soil nutrient budgeting (Hergert et al., 1997). However the response functions that are used to make recommendations from these inputs are frequently based on soil test results originally aggregated over a whole field, and often organized at the state or regional level (LowenbergDeBoer and Swinton, 1997). Additionally, they may possibly be biased toward overapplication, due to assumptions made regarding farmers' preferences (Hergert et al., 1997). This may lead to unexpected responses of the crop to variablerate application, with the consequent reduction in the perceived value of sitespecific variablerate technology. Precision agriculture has, so far, led less to the growth in farm decisionmakers' understanding of the causes of spatial yield variability than it has led to the development of machinery to monitor spatial yield variability and apply prescriptions. There have been many advances in techniques for measuring and representing spatial yield variability for many crops (Pierce et al., 1997). Yield monitors have been developed using realtime direct volume methods (Borgelt, 1993; Searcy et al., 1989); realtime direct weighing methods (Schrock et al., 1995; Wagner and Schrock, 1989); and indirect, pressureplate methods (Birrell et al., 1996). Their accuracy has also been studied, both in laboratory and field settings (AlMahasneh and Colvin, 2000; Arslan and Colvin, 1999). Yield monitors and yield mapping methods have been developed for many crops, such as barley (Stafford et al., 1996;); maize (Perez Munoz and Colvin, 1996; Pfeiffer et al., 1993); peanuts (Boydell et al., 1995); potato (Campbell et al., 1994; Rawlins et al., 1995); soybean and maize (Jaynes and Colvin, 1997); sugarbeets (Hofman et al., 1995); and
PAGE 25
3 wheat (Miller et al., 1988). There is also a large body of research on variablerate technology (Anderson and Humburg, 1997), including the development of appropriate spray nozzles (Miller and Smith, 1992); spinnerdisc fertilizer applicators (Fulton et al., 2001); the special case of center pivots (Camp and Sadler, 1998); control requirements (Paice et al., 1996); and accuracy analysis (Goense, 1997; Way et al., 1992; Weber et al., 1993). Many researchers have sought to understand spatially variable yieldlimiting factors of major crops using statistical regression analysis. Several of these studies focused on maize (Braga, 2000; Everett and Pierce, 1996; Mallarino et al., 1996; Tomer et al., 1995); maize and soybeans (Khakural et al., 1996; Cambardella et al., 1996; Kessler and LowenbergDeBoer, 1998; Sudduth et al., 1996). However, when these studies correlated yield with soil properties or terrain attributes, either they described a very limited fraction of yield variability (Everett and Pierce, 1996; Kessler and LowenbergDeBoer, 1998; Mallarino et al. 1996; Sudduth et al., 1996), or the relationships were not consistent across different years (Braga, 2000; Tomer et al., 1995). These results suggest that purely statistical approaches may have descriptive value, but are not appropriate for predictive purposes. Interannual variability of weather, the spatial variability of soil properties, and the landscapepositiondependence of other processes create a dynamic environment for plant growth. The problem of separating the effects of different environmental factors using statistical techniques is especially complex because crop yield in a field is controlled by numerous concurrent factors. Furthermore, plant susceptibility to a particular factor may depend on the crop's
PAGE 26
4 developmental stage and on other environmental conditions, and in field experiments the number of variables can easily exceed the number of available data. Crop simulation models have been used as a powerful analytic tool to understand environmental influences on crop yield. They provide the unique opportunity to account for many interacting yieldinfluencing factors in ways that are impossible with traditional agronomic experimentation. Crop models have often been used to analyze the causes of temporal, weatherand climaterelated yield variability (Boote et al., 1996; Ferreyra et al., 2001; Parry et al., 1999), and have recently also been used for understanding spatial yield variability (Batchelor et al., 2002; Braga, 2000; Irmak et al., 2001; Paz and Batchelor, 2000; Paz et al., 1998; Sadler et al., 2000). Crop simulation models' ability to reproduce both temporal and spatial crop yield variability suggests that they may be ideal tools for diagnostic and prescriptive use in precision agriculture. However, the quality of cropmodelbased diagnostic analyses and prescriptions depends on the accuracy with which the models' parameters, or values that represent characteristics of the model and remain constant throughout a simulation (Jones and Luyten, 1998), are determined. Measuring soil waterholding parameters is timeconsuming (Klute, 1986) and expensive (a typical value is U$S 20 for testing one soil water holding limit on one soil sample; see A&L Labs, 2003), but given accurate field measurements, crop simulation results can capture variability well, as shown by Braga (2000) with the CERESMaize model (Ritchie et al., 1998). Conversely, if parameter values are taken from coarse estimates such as soil survey data, the model may perform poorly, as found by Sadler et al. (2000) using the same model.
PAGE 27
The enormous cost of sampling soil hydraulic parameters at an adequate spatial density for using crop models in precision agriculture (henceforth, we will refer to crop models used for simulating spatiotemporal variability as spatial crop models) motivated the search for alternatives for estimating the parameters instead of measuring them. Inverse modeling (IM) is an estimation method suitable for use with crop models. It uses the model itself and a search algorithm to propose parameter values. Welch et al. (1999a) used an IMbased method for estimating crop model genetic coefficients. The method exhaustively simulated all the parameter combinations in a discrete input space (the parameter space), and then examined the results to find the best parameter combination (or parameter set) for each crop variety. The "best" parameter set was defined as the one producing the lowest value of an objective function: the sum of squared residuals between simulated and observed data. Irmak et al. (2001) expanded the grid search concept for estimating soil parameters. Using inverse modeling to parameterize a crop model is not without problems. When the observed crop yield has been affected by a factor not considered by the crop model, the inverse modeling algorithm will attempt to explain the effect using soil properties. Most crop models do not account for yieldlimiting factors such as pests, weeds, diseases, nutrients, and extreme pH. Thus, attributing yield losses by using IM to match predicted and observed yields may yield incorrect parameter values. Extending the models to simulate the effects of additional factors is possibleÂ— Fallick et al. (2002), Paz et al. (2001), and Irmak et al. (2002) extended crop models to include soil pH, soybean cyst nematodes, and weed effects Â— but in most practical applications quantitative knowledge about the effect of extraneous factors on yield is
PAGE 28
6 imperfect (thus reducing confidence in the model results). Additionally, quantifying the factor itself (e.g., sitespecific degree of nematode infestation) may be difficult or impractical. Uncertainty in observed yield, due to extraneous factors or yield monitor measurement errors (Lark et al., 1997; Whelan and McBratney, 1997) thus implies the possibility of error in the parameter estimates, with the consequent degradation in the quality of the model's predictions. Moreover, to date there have been few efforts at using crop simulation models for describing how the spatial variability of soil water content influences crop yield. Considering that droughtrelated stresses typically limit crop growth in the world's major cropproducing regions, spatial water distribution is an important consideration when applying precision agriculture in those regions. Paz and Batchelor (2000) used the CROPGRO model (Boote et al., 1998) to analyze spatial soybean yield variability in a field, attributing it to the spatially variable effects of plant density, weeds, soybean cyst nematodes, and water stress. Previously, Paz et al. (1998) explained the influence of water stress in terms of rooting depth (loosely equivalent to a soil water holding capacity parameter), and a drainage parameter, either the saturated hydraulic conductivity (KSAT) or a soil drainage rate coefficient (SLDR), both of which control the rate at which saturated soil drains fully down to its drained upper limit. These authors noted that their modeling scheme's ability to reproduce observed data degraded in lowlying areas, possibly due to the model's inability to account for runon or subsurface flow from neighboring areas. Indeed, given that these processes were not considered, the parameterization procedure tried to explain excess water across the field in terms of slow drainage or increased rooting depth. This might
PAGE 29
7 affect the predictive capability of this procedure when the model is used in years not used for parameter calibration. If spatial crop models accounted for threedimensional water movement over the landscape, they could possibly better explain the causes of waterstressinduced yield variability. This would require the inclusion into the spatial crop model of a threedimensional water balance simulation across the landscape. Such a complete model does not currently exist; and its parameterization by inverse modeling could prove to be computationally intractable. We will use the term spatiallycoupled crop model to denote a spatial crop model in which landscape units interchange water and nutrients. Each landscape unit (or cell) in such a model cannot be parameterized independently using IM, because variation of the parameters of one cell could affect the availability of water, and thus the yield, of another cell. In principle, such a model demands a simultaneous parameterization of all the cells. However, the parameter space size (i.e., the number of unique parameter combinations among which the optimum must be found) would then grow exponentially with the number of cells into which the field of interest was divided. Also, it is not clear whether the uncertainty in the determination of crop model parameters would not be amplified through a spatiallycoupled model. An alternative scheme is to reduce the complexity of the problem by representing the spatial variability of crop yield with geostatistical techniques (Goovaerts, 1997), and applying a spatial interpolation algorithm to the yield values simulated independently in a limited number of locations. The parameterization of an uncoupled model (i.e., one in which the cells are not spatially coupled) would then be modified to improve its ability to
PAGE 30
capture the behavior of the crop at these locations of interest. To avoid the confounding influence of the aforementioned extraneous factors, this process would have to rely on additional information. Spatiotemporal yield variability can be simulated accurately when the spatial and temporal behavior of the primary yieldlimiting factor (typically soil water) is simulated properly (Braga, 2000; Calmon et al., 1999; Ferreyra, 1998). Direct measurements of soil water content are not appropriate for practical, extensionistor consultantdriven applications of crop models in farm decision support (R. Murdock, pers. comm.); but other sources of knowledge are available, typically in the form of expert opinion. Farmers, extension agents, crop consultants, and other experts such as Natural Resource Conservation Service (NRCS) soil scientists, possess a wealth of knowledge about dominant behavior of different parts of the field, including wetness, soil properties, weed pressure, and so on. Eliciting this knowledge is relatively inexpensive, but harnessing it in a way that can be valuable for parameterizing a spatial crop simulation model is a challenging endeavor; it is also the problem that motivates this dissertation. Goal and Objectives The goal of this study was to develop new approaches and practical solutions for the problem of spatial crop model parameterization for precision agriculture applications in which soil water availability is the primary yieldlimiting factor. Its specific objectives are the following: 1 . To identify and quantitatively compare different sources of error in the use of inverse modeling to parameterize spatially coupled and uncoupled crop models. 2. To develop methods for optimizing spatial sampling schemes for representing the spatiotemporal variability of yield and yieldlimiting factors.
PAGE 31
9 3. To develop and evaluate a portable framework for eliciting knowledge from experts and using that knowledge to parameterize a spatial crop model. Outline of the Dissertation This dissertation contains six components that address the above listed objectives: Chapters 2 to 7. Each chapter is selfcontained and can be read independently. Chapter 2 addresses the first objective; Chapters 3 and 4 address the second; Chapters 5 to 7 address the third. Chapter 2 compares three different sources of error in the use of inverse modeling to parameterize a spatial crop model: a) spatiallycoupled vs. uncoupled model; b) lack of knowledge about initial conditions; and c) biases in weather data used for parameterization. We used inverse modeling to parameterize a simple spatial crop model based on CROPGRO, exploring different scenarios built from combinations of different levels of the errors mentioned above. For the simulations we used weather and soils data from a waterlimited environment in Cordoba, Argentina. Chapter 3 develops solutions to the problem of concurrently obtaining an optimal spatial sampling scheme for a phenomenon of interest (e.g., yield) and an optimal closed scouting path that links the locations of the sampling scheme. This problem is characteristic of crop scouting, and is relevant for the observation of both crop yield and the level of yieldaffecting factors in a field. Chapter 3 also explores the problem in the context of minimal data requirements. Chapter 4 elaborates on the concept of spatial sampling scheme optimization, applying it to the soil water content domain. We compared different algorithms and objective functions for obtaining the best spatiotemporal predictive capability with a given number of locations, and explored the predictive limits of geostatistical techniques
PAGE 32
10 in a landscape in which spatial water movement is believed to occur. For this chapter we used a soil water dataset from Cordoba, Argentina: 5 dates of soil water content observations in 57 locations on an isometric grid covering an 8. 3 hectare micro water shed. Chapter 5 revisits the search algorithm used by Irmak et al. (2001) for inversemodelingbased parameterization. These authors compared an exhaustive grid search method with the sophisticated adaptive simulated annealing algorithm proposed by Ingber (1993) and used by Braga (2000), Calmon et al. (1999), and Paz et al. (1998) for estimating crop model soil parameters. Irmak et al. (2001) claimed better performance using the former. We developed a hybrid between the simulated annealing algorithm and a grid search. Our algorithm is more efficient than the grid search and capable of providing tentative solutions that can be progressively updated. Chapter 6 introduces Bayesian networks as tools for representing knowledge about causal relationships and for combining different sources of knowledge into a common probabilistic framework. It shows a simple example of how we used Bayesian networks technology to help understand the causes of, and predict, spatiotemporal yield variability in a field in Kentucky during discussions involving domain experts: the farmer, an NRSC soil scientist, and crop consultants. Chapter 7 builds on the preceding chapters. We used a twotiered approach to the inverse modeling parameterization problem. The lower tier is a network of spatial relationships of crop model inputs or outputs among the locations of interest, elicited from interaction with domain experts. The top tier is the (uncoupled) crop model, run for each location of interest. The spatiallycoupled bottom tier constrains the behavior of the
PAGE 33
11 uncoupled top tier; the problem remains computationally tractable, despite its large parameter space, because evaluating the spatial constraints is very fast compared to a single crop model run, and because we used the fast algorithm developed in Chapter 5. In Chapter 7 we also explored a case study using the scheme mentioned above to parameterize and test a spatial crop model based on CERESMaize in a field in Kentucky, USA. Available data included realtime kinematic GPSderived elevation data, three years of corn yield maps, one year of wheat and soybean yield maps, SCS Soil Survey data, soil electroconductivity data, and in situ soil probe observations and soil water content time series, coupled with expert opinions. Finally, in Chapter 7 we also applied the abovementioned network of spatial relationships to constrain the IM parameterization of a simple, spatiallycoupled, CERESMaizebased crop model in the same field in Kentucky.
PAGE 34
CHAPTER 2 SOURCES OF ERROR WHEN INVERSE MODELING IS APPLIED TO THE PARAMETERIZATION OF SPATIALLYCOUPLED CROP MODELS Introduction Agricultural production of crops such as corn, soybeans, and wheat currently poses economic and environmental problems. Inflationadjusted agricultural commodity prices decreased steadily through the 20th Century (USDA NASS, 1994). This contributed to make agricultural production less profitable and more risky. Moreover, increasing levels of environmental regulatory pressure limit farmers' riskmitigating management options. These limits are manifested through groundwaterqualitydriven limits on fertilizer and pesticide applications, through market limitations on the use of genetically modified organisms, and through competition between agricultural and urban water use. Precision agriculture in general, and variable rate application technology in particular, has shown promise for addressing both economic and environmental concerns of agricultural production (National Research Council, 1997). From an economic perspective, farmers could conceivably maximize net returns by boosting yields in areas where crop growth can respond to additional inputs. Additionally, the use of fertilizers, pesticides, lime, etc. could be minimized in lowyielding areas where crop growth is limited by factors beyond the farmer's control. From an environmental viewpoint, it would be possible to approach the ideal situation in which all the inputs applied to the crop would actually be consumed by it, leaving none free to contaminate the environment (Pierce and Nowak, 1999). 12
PAGE 35
13 Variable rate application technology is controlled by spatially variable prescriptions, i.e. sitespecific dosages of crop inputs (Morgan and Ess, 1997). Making these prescriptions requires understanding the causes of spatial yield variability, as well as the sensitivity of crop yield to the application of specific inputs. Crop models have been used as analytical tools to understand environmental influence on crop yield. Crop models provide a unique opportunity to account for numerous factors influencing yield in ways that are impossible with traditional agronomic experimentation. Models have often been used to analyze causes of temporal yield variability related to weather and climate (Boote et at, 1996; Messina et at, 1999; Rosenzweig and Iglesias, 1998), and have recently also been used for understanding spatial yield variability (Irmak et at, 2001; Paz and Batchelor, 2000; Paz et at, 1998). Crop simulation models' ability to reproduce both temporal and spatial crop yield variability suggests that they may be ideal tools for diagnostic and prescriptive use in precision agriculture. However, to date there have been few efforts at using crop simulation models for describing how the spatial variability of soil water content influences crop yield. Considering that droughtrelated stresses typically limit crop growth in the world's major cropproducing regions, spatial water distribution is an important consideration when applying precision agriculture in those regions. Paz and Batchelor (2000) used the CROPGRO model (Boote et at, 1998) to analyze spatial soybean yield variability in a field, attributing it to the spatially variable effects of plant density, weeds, soybean cyst nematodes, and water stress. Previously, Paz et at (1998) explained the influence of water stress in terms of rooting depth (loosely
PAGE 36
14 equivalent to a soil water holding capacity parameter), and a drainage parameter, either the saturated hydraulic conductivity (KSAT) or a soil drainage rate coefficient (SLDR), both of which control the rate at which saturated soil drains fully down to its drained upper limit. These authors noted that their modeling scheme's ability to reproduce observed data degraded in lowlying areas, possibly due to the model's inability to account for runon or subsurface flow from neighboring areas. Indeed, given that these processes were not considered, the parameterization procedure tried to explain excess water across the field in terms of slow drainage or increased rooting depth. This might affect the predictive capability of this procedure when the model is used in years not used for parameter calibration. Perhaps cropmodeling efforts in precision agriculture could explain more reliably the causes of water stress induced yield variability if spatial water movement were explicitly considered, i.e. if the model simulated the coupling, or interchange of water and possibly nutrients, between different landscape locations. This should include a three dimensional water balance simulation across the landscape. Such a complete spatiallycoupled crop model does not currently exist. However, a simple approximation can be used to test our working hypothesis, implicit in the literature to date, that an appropriate selection of parameters can allow an uncoupled model, i.e. one in which the different landscape units do not interchange water, to reproduce the spatiotemporal variability of simulated yield produced by a spatiallycoupled model. The specific objectives of our study were: 1 . To develop a simple spatiallycoupled water balance model, and use it to generate synthetic crop yield maps. 2. To determine under which conditions, if any, spatiallycoupled and uncoupled models might produce similar results.
PAGE 37
15 3. To estimate, using spatiallycoupled and uncoupled models with inverse modeling (IM) techniques, the soil parameters of the spatiallycoupled model, and quantify the errors incurred. 4. To quantify the error of prediction incurred when using the different sets of soil parameters resulting from objective 3 to predict yields in years not used for parameterization. Materials and Methods Generating Synthetic Yield Maps: the SpatiallyCoupled Model We simulated a soybean crop across a spatially variable field using the CROPGROSoybean model (Boote et al., 1998). We chose soybeans because the CROPGRO model reproduces crop responses to multiple environmental factors such as temperature and day length, as well as the effects on plant growth of both insufficient and excessive soil water content. The latter two are of special interest in this study, since they can be expected to show spatial variability across agricultural fields. Figure 21. Landscape model used for the spatiallycoupled model simulations. Note that the cells only communicate via surface flow; subsurface lateral flow was assumed to be insignificant. We made a simple spatial extension to CROPGRO as shown in Figure 21 . We assumed an agricultural field with significant topographical variation along one
PAGE 38
16 dimension and little or no variation along the other. We approximated the field with a toposequence (Figure 21) composed of several (10) cells that represent parallel wholefieldlong swaths of some arbitrary width. This is equivalent to a sloping field having straight, parallel contour lines. The model can be configured so the cells behave in one of two ways during rainfall events: Â• Spatiallycoupled: the surface of each cell receives input of water from rainfall and from cells uphill of it. Water outputs from the cell surface are infiltration and runoff to the cells downhill of it. Â• Uncoupled: the surface of each cell only receives inputs form rainfall, with no contribution from neighboring cells. Runoff is assumed to be lost and does not contribute water to neighboring cells. In both cases, we partitioned water inputs into infiltration and runoff using the SCS Curve Number Method (USDA SCS, 1972), which we chose due to its simplicity and popularity, and because it is already built into CROPGRO. A detailed explanation of the method is shown below to make subsequent work clearer. The curve number method can be derived starting from a mass balance equation applied to a storm event R = P e I (21) where R (mm) is runoff, / (mm) is the amount of infiltration and surface retention, and P e (mm) is a term called effective precipitation. In the SCS method this is defined as the amount of rainfall that can contribute to runoff, equal to the rainfall amount exceeding an initial abstraction I a (mm), which is the amount of precipitation necessary before runoff can begin. Thus, P e =PI a (22)
PAGE 39
17 where P (mm) is precipitation. The critical assumption in the curve number method is (23) AL p~s where S is the potential maximum retention (in mm). Equation 23 stipulates that the ratio of runoff to the effective rainfall is the same as the ratio of actual retention to S (Boughton, 1989). Operating with Equation 21 yields I = P e R, (24) R P Â— R Replacing Equation 24 into Equation 23, Â— = Â— > P. S R s R = 1 1 + A 's O i + P P 2 iÂ± _ p e , and replacing P e with Equation 22, (s + p e )~( s + p <) R (25) A second assumption of the method is that I a = 0.2 Â• S (26) Replacing Equation 26 into Equation 25, and considering the definition of I a yields the SCS runoff equation: R Jr02S l forP>0.2S (P + 0.8S) 0 for P< 0.25 (27) The potential retention parameter S is usually expressed in terms of a dimensionless runoff curve number CN through the expression 5 = 25.4 1000 CN 10 (mm) (28)
PAGE 40
18 R = ( V p5.oa 1000 CN \\ 2 10 J) for P > I a so, r 1000 > (mm) (29) P + 20.32 10 ) 0 for i> < / We ran the modified version of CROPGRO successively for the 10 cells, starting downward from the highest cell in the toposequence. The first (highest) cell did not ever receive runon, since it is assumed that there are no positions higher in the landscape that contribute water to it. In the spatiallycoupled model, each one of the successively lower cells received daily runon equal to the total daily runoff in the cell immediately above it. The main assumptions are: 1. The only relevant processes in the landscape are rainfall P, infiltration /, and runoff (or runon) R. 2. The total runoff volume from cell i1 equals the total volume of runon to cell i. 3. The runon to a cell can be considered to generate runoff as if it were additional rainfall (i.e. that we can use the SCS runoff equation to estimate the runoff from a cell, taking as rainfall input the sum of rainfall and runon in the cell). 4. The total runoff volume from the field is equivalent to the total runoff volume from the last cell. 5. The field is assumed to be partitioned into N cells of equal area, with each cell having an area Aj = A I N, where A is the total area of the field and N is the total number of cells into which it is partitioned. Assuming P > I a , mass balance for an individual cell i, can be expressed as: Where P, is the rainfall on the cell, is the runon to the cell, equal to the runoff from the previous cell i1, // is the infiltration, l ai is the initial abstraction and /?, is the runoff. This equation is expressed in terms of total volumes of water. However, if all the cells have an equal areaA p , Equation 210 can be divided by A p and the equation will = A l I l+ A l R l +A,I a , (210)
PAGE 41
19 then be expressed in volume of water per unit area. Infiltration can thus be expressed as follows: I,=P,+R^R,I ai (211) Runoff can be calculated directly using the SCS runoff Equation 29, modified to add runon to the precipitation on a cell (and assuming that the areas of all cells are equal): +*,_,o.2. s,y where Sj is the maximum retention term. Finally, replacing Equation 28 into 212: R. = 5.08 1000 K CN, 10 P, +/?,., +20.32 0 1000 K ar, 10 for P > I for P < I (mm) (213) We used Equations 21 1 and 212 to calculate the runoff and infiltration for each cell. We began from the top of the slope because cell 1 receives no runon; consequently, the value of Ri could be calculated directly using Equation 29. We moved downward from cell 1 to the last cell N, calculating, on a daily basis, R> and then /, for each cell. This assumes that the field is steep enough for all the runoff to leave the field in one day. We ran the model for 30 years of historic weather data. Assuming that the spatiallycoupled crop model is a perfect predictor of crop yield, this procedure created a synthetic yield map for each weather year and parameter pattern. Yield varied spatially, i.e. across the cells, due to differences in soil properties from cell to cell and differences in spatial soil water distribution. It was not necessary to modify the crop model itself to obtain the additional spatial functionality (daily runon is added to each cell's precipitation input), but it was
PAGE 42
20 convenient to write a simple shell program to invoke the successive model runs and manage the input and output data of the cells of the toposequence. Generating Synthetic Yield Maps: Input Data Our study site represents conditions near Cordoba (Argentina). Cordoba (31Â° 29' S, 64Â° 13' W) lies on the northwestern edge of the Pampas region of Argentina. Mean annual rainfall between 1 966 and 1 995 was 844 mm, mostly concentrated in the spring and summer. The soil is a Typic haplustoll, a deep silty loam with high water holding capacity and no limitations to drainage (Dardanelli et al., 1997). However, soils in the region have poor structural stability (Chagas et al., 1995) and are thus subject to crusting and high runoff during the highintensity thunderstorms characteristic of the summer. The soil properties adopted for the simulations are shown in Table 21 . These parameters were derived from expert opinion and field measurements in a microwatershed in the location of interest (H. Apezteguia, Pers. Comm.). Note how there is a small variation of the parameters along the toposequence, responding to a somewhat greater presence of clay particles downslope. We used CROPGRO genetic coefficients corresponding to varieties that have been used widely in the region: Asgrow 5406 (S. Meira and E. Guevara, Pers. Comm.). We used a planting date of November 10, the modal date for the region (J. Dardanelli, Pers. Comm.). Soil water content at planting depends on factors such as tillage, weed management, the weather in the months leading up to planting, and the water extraction pattern of the previous crop. Soil water can strongly influence final crop yield, especially in areas like Cordoba, where drought periods are frequent during the growing season. We simulated the interannual variability of initial soil water content by sampling from a distribution of 2970 synthetic water content values at planting simulated for the region by
PAGE 43
21 Ferreyra et al. (2001). The mean value was approximately 100 mm of available water, representative of field measurements in the region (Ferreyra, 1998). Table 21. Soil parameters used for the spatiallycoupled crop model. Cell 1 2 3 4 5 6 7 8 9 10 Depth 3 (cm) 210 210 210 210 210 210 210 210 210 210 SLDJ? 0.60 0.60 0.59 0.58 0.57 0.56 0.55 0.54 0.53 0.52 CN2 e 93.00 93.30 94.00 94.70 95.00 95.00 95.00 94.88 94.76 94.65 KSAT* (cm/day) 4 4 4.22 4.22 4.22 4.22 4.22 4.21 4 3.72 ZZ02 e (cm 3 /cm 3 ) 0.113 0.113 0.113 0.113 0.113 0.113 0.113 0.117 0.121 0.125 ZZ03 e (cm 3 /cm 3 ) 0.103 0.103 0.103 0.103 0.103 0.103 0.103 0.107 0.111 0.115 ZZ04 e (cm 3 /cm 3 ) 0.097 0.097 0.097 0.097 0.097 0.097 0.097 0.101 0.105 0.109 ZI05 e (cm 3 /cm 3 ) 0.097 0.097 0.097 0.097 0.097 0.097 0.097 0.101 0.105 0.109 ZZ0<5 e (cm 3 /cm 3 ) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111 ZZ07 e (cm 3 /cm 3 ) 0.103 0.103 0.103 0.103 0.103 0.103 0.103 0.107 0.111 0.115 LL08 e (cm 3 /cm 3 ) 0.101 0.101 0.101 0.101 0.101 0.101 0.101 0.105 0.109 0.113 LL09 e (cmW) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111 ZZ;0 e (cm 3 /cm 3 ) 0.099 0.099 0.099 0.099 0.099 0.099 0.099 0.103 0.107 0.111 DULOl' (cm 3 /cm 3 ) 0.321 0.321 0.321 0.321 0.321 0.321 0.321 0.325 0.329 0.333 Z)[/Z02 f (cm 3 /cm 3 ) 0.294 0.294 0.294 0.294 0.294 0.294 0.294 0.298 0.302 0.306 Z)t/Z03 f (cm 3 /cm 3 ) 0.267 0.267 0.267 0.267 0.267 0.267 0.267 0.271 0.275 0.279 Z)M0^ f (cmW) 0.250 0.250 0.250 0.250 0.250 0.250 0.250 0.254 0.258 0.262 Z)Â£/Z05 f (cm 3 /cm 3 ) 0.247 0.247 0.247 0.247 0.247 0.247 0.247 0.251 0.255 0.259 DUL06 f (cm 3 /cm 3 ) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 Z)f/I07 f (cm 3 /cm 3 ) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DUL08 { (cm 3 7cm 3 ) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 Z)t/Z09 f (cm 3 /cm 3 ) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 DlÂ£70 f (cm 3 /cm 3 ) 0.245 0.245 0.245 0.245 0.245 0.245 0.245 0.249 0.253 0.257 SAT01* (cnrVcm 3 ) 0.488 0.488 0.488 0.488 0.488 0.488 0.488 0.492 0.496 0.500 SAT02 S (cm 3 lea?) 0.487 0.487 0.477 0.477 0.477 0.477 0.477 0.477 0.487 0.497 W03 g (cm 3 /cm 3 ) 0.476 0.476 0.466 0.466 0.466 0.466 0.466 0.466 0.476 0.486 W0â€¢ 8 (cm 3 /cm 3 ) 0.431 0.431 0.421 0.421 0.421 0.421 0.421 0.421 0.431 0.441 W05 g (cm 3 /cm 3 ) 0.403 0.403 0.393 0.393 0.393 0.393 0.393 0.393 0.403 0.413 W06 g (cm 3 /cm 3 ) 0.386 0.386 0.376 0.376 0.376 0.376 0.376 0.376 0.386 0.396 W07 g (cm 3 /cm 3 ) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 W0S g (cm 3 /cm 3 ) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 &47Â»0 g (cm 3 /cm 3 ) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 Â£4770 g (cm 3 /crn 3 ) 0.385 0.385 0.375 0.375 0.375 0.375 0.375 0.375 0.385 0.395 a: Maximum rooting depth of the soil profile. b: Soil drainage rate. Controls the rate at which a saturated layer drains to its DUL. c: nominal seasonlong CN value used in the SCS runoff curve number method, d: Saturated hydraulic conductivity. e: Lower limit of soil water holding capacity for the ten soil layers. f: Drained upper limit of soil water holding capacity for the ten soil layers. g: Saturation soil water content for the ten soil layers.
PAGE 44
22 Estimating Soil Parameters Using Inverse Modeling We estimated three soil parameters (detailed below) using IM, and compared the estimates with the known parameter values (Table 21). We traversed the toposequence downhill, estimating soil parameters for each cell. We searched exhaustively over the 3dimensional parameter space of each cell, as per Irmak et al. (2001). As in that study, each cell's optimal parameter combination was the one that minimized an objective function defined as the root mean squared error of that cell's yield prediction over several years. We detail the number of years and their selection below. We varied three parameters: the nominal seasonlong CN value used in the SCS runoff curve number method (USDA SCS, 1972), CN2; the saturated hydraulic conductivity of the bottom soil layer, KSAT; and the fraction of nominal maximum available water, FA W. The latter was used to modify the soil water holding characteristics of the whole profile using only one parameter: we defined FAW as, the ratio between each soil layer's estimated maximum available water and the true maximum available water for that layer. The maximum available water is defined as (DUL LL), where DUL and LL are the drained upper limit, and lower limit of soil water holding capacity, respectively (Ritchie, 1981). We kept the LL of each soil layer at its true value, and modified the DUL according to the FA lvalue (FAW= 1 makes the DUL take its real value, FAW= 0.5 takes DUL halfway between the real value of DUL and the LL, etc.) We classified weather years according to water availability during the season, expressed as the sum of initial soil water content and rainfall during the season. We called this variable TSW (total seasonal water), and used it to rank the 30 available weather years. We sampled four years from each tercile of the TSW distribution: four
PAGE 45
23 "dry" years, four "normal" years, and four "wet" years, and used them to define parameterization cases. We defined three different cases based on the TSW of the crop years used for parameterization. The first case was an unbiased benchmark, consisting of two years from each of the three TSW textiles, for a total of six years. The second case was biased toward "dry" years, and consisted of four "dry" years and two "normal" years. The third case was biased toward "wet" years, and consisted of four "wet" and two "normal" years. We chose to use six years of weather for each case, 23 times the number used for other recent studies (Batchelor et al., 2002; Irmak et al., 2001; Paz et al., 1998), to minimize the possibility of overfitting. Figure 22 shows the weather years chosen for the three weather cases, and the TSW of each. Note how the total TSW range exceeds 400 mm, and how the unbiased case shares three years with each of the other cases. Wetbiased CD W CC O jjj Unbiased Dry biased 67 90 85 73 67 90 70 91 85 Â• Â• Â• Â• 86 78 74 84 83 80 A A A A 86 84 83 A A 500 600 700 800 900 1000 TSW (Soil water at planting + crop season rainfall, in mm) Figure 22. Parameterization weather cases. Each row of points describes a weather case. The filled circles are years from the lower TSW textile, the open squares are from the middle TSWtercile, and the filled triangles are from the upper tercile. The label on each point shows the year, e.g. 85 corresponds to 1985.
PAGE 46
24 An important source of crop modeling error may arise from lack of knowledge of initial soil water conditions. To explore this aspect we defined two initial condition cases: having perfect knowledge about the initial soil water conditions, and having no knowledge. In the latter we used the median (over the 30 available water years) soil water content value, approximately 100 mm. In summary, there were three sources of uncertainty in parameter estimation: Â• The imperfect crop model when using the uncoupled model. Â• Biased weather cases in the IM process. Â• The lack of knowledge about initial soil water conditions. Combining the possible states of these sources of uncertainty led to 12 distinct IM parameter estimation scenarios, defined by 3 weather cases (benchmark, drybiased, wetbiased) x 2 models (spatiallycoupled, uncoupled) x 2 initial condition cases (initial conditions known or unknown). One of these scenarios, the one using the benchmark weather case, the spatiallycoupled model, and known initial conditions, was expected to reproduce the real soil parameters most closely. Evaluating With Independent Data After estimating soil parameters for the 12 IM scenarios, we tested how well the corresponding parameter sets estimated yields for the 1 8 weather years not involved in each scenario's parameter estimation process. We tried this in two ways: having perfect knowledge about the initial conditions and with no knowledge thereof, similarly to the cases described above. We stratified the results by each year's TWtercile. We did this to show whether the spatiallycoupled and uncoupled models' predictive performance varied according to the relationship between the weather case used for parameter estimation and the weather (as described by the TSW tercile) used for evaluation.
PAGE 47
25 Results and Discussion Exploring Similar Behavior Between SpatiallyCoupled and Uncoupled Models The conditions under which the spatiallycoupled and uncoupled models can behave similarly can be analyzed using the top two cells of the toposequence. Equation 21 3 describes runoff for an arbitrary cell in the toposequence of the spatiallycoupled model, and the particular case of the topmost cell, which receives no runon, can be described using Equation 29. Replacing the expression of the runon entering cell 2 with Equation 29, i.e. the runoff from cell 1 , and assuming that rainfall is constant throughout the toposequence and greater than the greater I a of the two cells, runoff from the second cell can be expressed as follows: \2 ) R 2 = 7 ^ V (214) { (P + 0.8S.) 2 J where Si and S 2 are the retention parameters for the first (topmost) and second cells, respectively. Equation 29 describes runoff in any cell of the uncoupled model. In order for the uncoupled model to substitute for the spatiallycoupled model, then for any realistic values of Si and S 2 in the spatiallycoupled model there should exist a value S E Q2 and its corresponding curve number CN E Q2 that predict, using the uncoupled model, the same input of water into the soil of cell 2,1 + I a , as that in cell 2 of the spatiallycoupled model. This should be valid for any realistic environmental conditions, i.e. rainfall. Based on Equation 211, and assuming that P + R, > 0.2S 2 and P > 0.2S E q 2 , the following should be true:
PAGE 48
26 (P02^ { (P + OSSt) Â• 2 ) (P0.2. SEQ2 f (2 . 15) (P + MSi) " f (PQ.aJ,)' > Â°25 ? (P + 0.8. 5 , ;2 )5 I (P + 0.85,) ' 2 J We solved Equation 215 for the S EQ2 of the simpler problem in which S/ = S 2 = S, i.e. CNi = CA^ = CN, and obtained the following solutions: ' EQ2A SS' +290PS 2 300P 2 S + 1500F 3 + 2.&2&4tJ&S } 545PS 2 \50P 2 S 2250P } (S 5P)l 2{50P 2 +30PS + MS 2 ) (216a) _ 8S 3 +290P5 2 SOOP^ + ISOOP 3 2.8284V85 3 545P5 2 \50P 2 S2250P 3 {S5P)2 2(50/' 2 +30P5 + 175 2 ) (216b) Of the two solutions, only Equation 216b is valid. Given that both solutions produce the same water input results, the righthand side of Equation 215 evaluates to the same numerical value for both solutions; however, the P 0.2S E Q2 term is negative for Equation 216a. Although it produces the same value as Equation 216b when squared, it does not have a physical meaning. As shown in Equation 29, runoff should be 0 whenP <0.2S EQ2 . Figure 23 shows the values of CN EQ2 corresponding to the S EQ2 of Equation 216b, for different combinations of P and CN. Note how CN E q 2 initially decreases with increasing rainfall, reflecting the additional contribution of runon to infiltration through a smaller CN. For greater values of P, CN EQ2 increases asymptotically toward CN, and this trend begins for lower values of P as CN increases. This behavior can be understood by differentiating Equation 214 with respect to P. Defining the auxiliary terms shown by Equation 217, the derivative results in Equation 218 as follows:
PAGE 49
27 B = P0.2S, P + 0.8S, ' C = (P0.2S,) 2 P + 0.8S, P + C + 0.8S, P + C0.2S, (217) dR 2 [\ + 2BB 2 ][2DD 2 ] (218) Equation 218 tends to 1 as P tends to infinity, and as 5" tends to 0, i.e. as CN tends to 100. If the derivative tends to 1, then CN E Q2 will tend to CN because the effect of any runon from upslope (and thus, the effect of a spatiallycoupled model) become irrelevant as the additional water is fully lost to runoff. Equation 215 and Figure 23 relate to our objective of determining under which conditions, if any, the spatiallycoupled and uncoupled model might produce similar results. The only case for which CN EQ2 remains constant for different rainfall amounts is CN] = CN 2 = 100 (not shown in Figure 23), which is useless in an agricultural environment because it is associated with zero infiltration, as shown by Equation 27 for S = 0. In more practical scenarios, it would be impossible to exactly reproduce the water infiltration regime of a spatiallycoupled model using an uncoupled model; a crop season includes many rainfall events, each with its own rainfall amount, and the number of storms and rainfall amounts varies from year to year. The relevance in terms of crop yield of this waterspecific conclusion will vary from year to year and will depend on the CN in question. For very wet years, as well as for soils associated with high curve numbers or intense, convective storms during the cropping season, it may be irrelevant. Contrarily, it may be very important in waterlimited environments with intermediate infiltration and storms with moderate rainfall amounts. Furthermore, it is possible that other errors such as initial conditions or weather
PAGE 50
28 biases might be more relevant to yield than model errors due to uncoupling. These aspects are further explored below. Cty = CN 2 = 90 CNi jCN, = 85 CNt = CN 2 = 80 CNi = CN 2 = 75 CNi = CN 2 = 70 i 1 1 1 0 10 20 30 40 50 60 70 80 90 100 Rainfall (mm) Figure 23. Values of CNeq2 (curve number in toposequence cell 2 of the uncoupled model that produces equivalent infiltration to cell 2 of the spatiallycoupled model) for different combinations of rainfall and curve number values in cells 1 and 2 (CNi, CN 2 ) of the spatiallycoupled model. Simulated Yield Profiles Figure 24 shows a histogram of the initial soil water content used in this study. Plant extractable soil water to a depth of 210 cm ranged from slightly over 30 mm to slightly under 190 mm, reflecting the effects of interannual weather variability during antecedent crop cycles. The former case could be expected following a crop that extracts water from very deep in the profile, such as sunflower, and a subsequent dry winter typical for the region; the latter can reflect conditions after a shortseason maize (which can leave water deep in the profile), followed by good spring rains prior to planting.
PAGE 51
29 30 25 in Â•I 20 to I 15 V) a Â° 10 O "i 5 5 0 100% 100% 93% 87% 73% 63% 43% 33% 10% o o o o o o o o in r0) n in f~o" o o G. o o o o o a> m If) Initial PESW(rrm) Figure 24. Histogram of PESW'm available initial condition set. 123456789 10 12345678 Coupled model Uncoupled model Cell number 9 10 Figure 25. Simulated yield profiles made with the spatiallycoupled (left) and uncoupled (right) models. The results of the uncoupled model show the effect of spatial variability of soil properties; the spatiallycoupled model results show the additional effect of spatial water movement.
PAGE 52
30 Figure 25 shows simulated yield for the spatiallycoupled (left column) and uncoupled (right column) models using the soil properties shown in Table 21. Each row corresponds to a tercile of the TSW distribution; the dry tercile is on the top, the wet tercile on the bottom. Each boxplot shows the results of 10 years of simulations. The 10 cells shown per boxplot are arranged in progressively lower landscape positions from left to right. The results of the uncoupled model show the influence of the spatial variability of soil properties shown in Table 21. The spatiallycoupled model results additionally include the effects of spatial water movement. Note how the uncoupled model results change from cell to cell, especially in the middle and wet weather terciles. This is primarily a result of spatial CN2 variability; despite its apparently small variation throughout the toposequence, from 93 to 95, infiltration is greatly affected by these small variations in the upper CN2 range i.e. lower S range. This becomes clear by differentiating Equation 214 with respect to CNi or CN 2 (not shown). Central Argentina is a waterlimited environment for soybean growth; note how the yield at the top of the slope has a median value of slightly over 2000 kg/ha, and increases significantly downslope given that the lower cells have more water available from runon. The effect is less noticeable in dry years because there is less rain and consequently, less runoff / runon. Inverse Modeling: Parameter Estimation Error Figures 26 and 27 show the results of using IM and a search algorithm to find the parameter combination that best fits the "observed" yield patterns of Figure 25. Only CN2 and FAW results are shown because large changes in KSAT did not produce changes in yield. This happens because precipitation rarely exceeds evapotranspiration during the growing season in Cordoba. When it does, the high water holding capacity of the Entic
PAGE 53
31 Haplustoll makes it highly improbable that the drainage implementation of the CROPGRO water balance module, which only begins moving water out of a layer when the layer's water content exceeds its DUL, could drain beyond a depth of 210 cm. This is consistent with the lack of a limiting horizon in Entic Haplustolls. LU Â§ 5 . 96 I z 88 j 84 Â£ 92 ra S ^ 88 1 " 84 1  80 Perfect IC knowledge O o Â° o 123456789 10 CELL 0 o o o 123456789 10 CELL 96 ath 92 CD 5 88 Z i u 84 ryb 80 Q Â£ 92 8 S N 88 is Â„ Only know C average 123456789 10 CELL Â• O o o o 123456789 10 CELL ro 92 Is 88 .5 i 80 O o o o 9 10 I 92 11) 8B U 80 o o o 1 2 3 4 5 6 7 CELL INITIAL CONDITION KNOWLEDGE CASE 1 2 3 4 5 6 7 CELL Figure 26. CN2 values obtained by IM for the spatiallycoupled (filled circles) and uncoupled (open circles) models. The left column shows the case of perfect knowledge of initial conditions, and the right shows the case in which only the average initial soil water content is known. The three rows correspond to the three calibration weather cases. The filled circles in the left column coincide with the actual parameter values. There are twelve scenarios defined by the combinations of model, knowledge of initial conditions, and IM weather case. In all the scenarios the results provided by the spatiallycoupled and uncoupled models coincide in the first, uppermost cell of the slope.
PAGE 54
32 This occurs because the uppermost cell in the spatiallycoupled model receives no runon and thus behaves identically to the corresponding cell of the uncoupled model. Perfect IC knowledge Only know C average CELL CELL INITIAL CONDITION KNOWLEDGE CASE Figure 27. FAW values obtained by IM for the spatiallycoupled (filled circles) and uncoupled (open circles) models. The left column shows the case of perfect knowledge of initial conditions, and the right shows the case in which only the average initial soil water content is known. The three rows correspond to the three IM weather cases. The spatiallycoupled model faithfully reproduced its own parameters at all landscape positions when the initial conditions were known, as shown by the CN2 values (filled circles of the left column of Figure 26) being equal to those of Table 21, and by the estimated FAW being 1 for all cells. However, the spatiallycoupled model had some difficulty when initial conditions were unknown, especially with the FA W parameter
PAGE 55
33 calibrated in wetter years and wetter (downslope) cells in which water limitation was not a major problem. We hypothesized that the erratic FA W estimates under unknown initial conditions corresponded to a lower sensitivity of the FAW parameter relative to CN2, as expressed by comparing the sensitivity coefficient (Hamby, 1994) of each parameter across different years and landscape positions. Since the results of a sensitivity analysis may be greatly dependent on the chosen base case (Atherton et al., 1975; Gardner et al., 1981), we used the base case defined by the parameter values shown in Table 21, so the results represent the behavior of the parameters around the optimal parameter estimate. The coefficient is defined as: *.*L.Â£ (219) ' AX, Y where Xj is the base case value of the i th parameter, AX t is an deviation of the parameter with respect to its base case, Tis the base case yield value, and zlTis the yield deviation corresponding to the zLY, deviation. Figures 28 and 29 show sensitivity results at different landscape positions for CN2 and FA W, respectively, in the spatiallycoupled and uncoupled models. Sensitivity is weather dependent, somewhat soil property dependent (see right column of Figure 28), and landscape position dependent. Note how the sensitivity of CN2 is two orders of magnitude greater than that of FA W. The high CN2 sensitivity was explained previously; the reasons for low FA W sensitivity are linked to a high water holding capacity of the soil; the same happens for KSAT. Ferreyra et al. (2001) noted how when using the CERES model (Ritchie et al., 1998) in the Cordoba region, a large entry of water into the lower soil layers happened very infrequently. This occurs in CERES and CROPGRO
PAGE 56
34 because these models' water balance simulation does not move water downward from a layer until its soil water content has surpassed its drained upper limit; which is difficult in soils with a high water holding capacity and high runoff. Consequently, FA W can affect neither the total amount of water available to the crop, nor the timing of its availability around a base case in which FAW=\. 20 0 123456789 10 123456789 10 Coupled model Uncoupled model Cell Figure 28. CN2 sensitivity coefficient for different landscape positions and models. The points and whiskers show means and standard deviations of coefficients calculated for the 6 years of the unbiased scenario of Figure 22. Figure 29. FA W sensitivity coefficient for different landscape positions and models. The points and whiskers show means and standard deviations of coefficients calculated for the 6 years of the unbiased scenario of Figure 22.
PAGE 57
35 The uncoupled model tried to compensate its lack of runon contributions by estimating progressively lower CN2 values downhill (Figure 26). According to Equation 29, this would result in less runoff losses, and hence, more infiltration. However, the uncoupled model's CN2 compensation attempt was only partially successful (Figure 210); the error increased downhill. As demonstrated above, although CN2 could conceivably be modified for each cell in the uncoupled model so the results of Equations 29 and 213 coincide for a given storm (or the yields of the two models coincide for a given year), the nonlinearity of Equation 213 with respect to P for different CN values makes it practically impossible for a single set of CN2 values in the uncoupled model to reproduce the results of the spatiallycoupled model over several years. Perfect IC knowledge S or LU 1 S w 300 \ 5 200 m a Â£ Q 100 0 . 500 CD CO Â£ 6 400  w 300 "> ^ I 200 CO cr s q 100 5 * o 4 5 6 CELL 7 8 9 10 12 3 4 5 6 CELL 7 8 9 10 Only know IC average CD 1000 S  800 * w 600 S3 Â•Eg 200 Â£ ? 0 400 123456789 10 CELL CD CO  i CO E CO D ll 1000 800 600 400 200 0 123456789 10 S _ 500 S 15 "S 400 S 300 11 200 CO K 5 O 100 Â£ 0 1 2 3 4 5 CELL 7 8 9 10 Â» _ 1000 8  800 jj w 600 If 400 co 5 3 O 200 Si 0 Â•* 123456789 10 CELL INITIAL CONDITION KNOWLEDGE CASE Figure 210. Yield RJVISE for the twelve parameterization scenarios (6 years per scenario). Filled circles represent the spatiallycoupled model; open circles represent the uncoupled model.
PAGE 58
36 The spatiallycoupled model behaved differently: RMSE was 0 for all cells when initial conditions were known. This was expected because the IM algorithm converged to the original parameters. However, RMSE increased with uncertain initial conditions, especially for the wetter cases. These results suggest a very limited capacity of the uncoupled model to predict reality, especially when the weather years used for parameter estimation are similar. However, the spatiallycoupled model's error also increased under uncertain initial conditions. We explored this further with the evaluation data set. Evaluating With Independent Data Figures 211 to 214 show the RMSE for several different evaluation scenarios. The 18 evaluation years were split by TSW tercile, and the terciles' results were shown in separate columns. Each point represents 6 years. These results only correspond to scenarios having uncertain initial conditions (i.e., the values shown in the right column of Figures 26 and 27) since measuring the initial conditions at the parameter estimation phase is currently not practical in precision agriculture modeling applications (R. Murdock, Pers. Comm.). For reference, however, the RMSE of the IM scenario using the spatiallycoupled model and full knowledge of initial conditions, evaluated with full knowledge of initial conditions, is zero for the ten cells in all combinations of IM weather cases and evaluation TSW tercile. When the evaluation initial conditions are known (Figure 211), the prediction RMSE of the spatiallycoupled model is minimum for the drybiased IM weather case, increasing towards the wetbiased IM case, especially for the cells at the bottom of the toposequence of the dry tercile. This happens because under very dry conditions, crop
PAGE 59
37 yield responds strongly to changes in infiltration, initial conditions are less variable, and the CN2 parameter is consequently estimated more accurately. Conversely, the wetbiased weather case (bottom right of Figures 26 and 27) produces the poorest parameter estimates because water does not limit the crop's growth in some of the years, especially in the lower cells (see bottom left panel of Figure 25). The parameter estimation process thus fits parameters to explain the variability of initial conditions rather than the crop's response to weather; this results in spurious parameter values (bottom right panels of Figures 26 and 27). The spatiallycoupled model's prediction error in evaluation simulations increases when the initial conditions are unknown (Figure 212). The increase is most noteworthy when using parameters obtained with the drybiased weather case. This is due to the impact of uncertainty in the knowledge of initial conditions, which explains why the runs corresponding to the central tercile of evaluation TSW are the most affected: in the case of the dry tercile variability of the initial conditions is lower; for the wet tercile, the impact of variability of initial conditions is lower. Figure 213 shows evaluation results for the uncoupled model under perfect knowledge of initial conditions. The patterns are similar to those shown in Figure 210, with error increasing downslope as the decreased curve number fails to properly capture the intraannual and interannual variability of spatial water movement of the spatial model. However, for the wetbiased IM case, the prediction error downslope is actually less than in the spatiallycoupled model (Figure 211), because errors in parameter estimation cannot compound downslope in the uncoupled model.
PAGE 60
38 Figure 211. Evaluation yield RMSE for the spatiallycoupled model when IM initial conditions are unknown and evaluation initial conditions are known. Each point represents six years. tu CO s 2 2500 2000 k S8 1500 Â£ 3 " i. ) c a 5 1000 500 0 2500 0) CO ra 2000 o "S to (13 1500 Q C 1000 => 500 I 0 S .5 2500 2000 1500  1000 > 500 0 Coupled model, only know average evaluation ICs 123456789 10 Evaluation TSW tercile: Dry 123456789 10 Evaluation TSW tercile: Medium Cell 1 23456789 10 Evaluation TSW tercile: Wet Figure 212. Evaluation yield RMSE for the spatiallycoupled model when both the IM and evaluation initial conditions are unknown. Each point represents six years.
PAGE 61
39 Uncoupled model, perfect evaluation IC knowledge LU CO DC 32 d> 5= 2500 I p 2000 i S 1500  1000 5 Q 500 ^ 0 2500 2000 1500 1000 500 0 II 2500 2000 1500 1000 500 0 123456789 10 Evaluation TSW tercile: Dry 1 23456789 10 Evaluation TSW tercile: Medium Cell 123456789 10 Evaluation TSW tercile: Wet Figure 213. Evaluation yield RMSE for the uncoupled model when IM initial conditions are unknown and evaluation initial conditions are known. Each point represents six years. Uncoupled model, only know average evaluation ICs 8 2ra LU or 32 1 5 5 2500 2000 1500 1000 500 0 2500 2000 1500 1000 500 0 2500 2000 1500 1000 500 0 123456789 10 Evaluation TSW tercile: Dry 123456789 10 Evaluation TSW tercile: Medium Cell 123456789 10 Evaluation TSW tercile: Wet Figure 214. Evaluation yield RMSE for the uncoupled model when both the IM and evaluation initial conditions are unknown. Each point represents six years.
PAGE 62
40 Figure 214 (uncoupled model, unknown initial conditions) shows results similar to those of Figure 213, except for the increased prediction RMSE of the drybiased scenarios due to the uncertainty in initial conditions already mentioned for Figure 212. Ultimately, the result shown for the spatiallycoupled model (Figure 212) and the uncoupled model (Figure 214) are very similar, suggesting that in conditions of uncertain initial conditions and biased IM weather cases, errors in the spatial coupling of the model are not the primary cause of yield prediction error. The spatiallycoupled model has several caveats. It does not consider subsurface flow; it calculates runoff using the SCS method (as does the uncoupled model) that has little or no physical basis (Boughton, 1989); it assumes that runoff leaves the field in one day; it does not consider the increasing complexity of the runoff hydrograph downslope as runoff contributions from uphill cells arrive with different time lags; also, since it adds all the runoff from a cell to the precipitation of its immediate downslope neighbor, this implies that it is assumed that runoff will travel downslope in sheet form. However, these simplifications do not negate the effects of the three aforementioned sources of error, and thus do not detract from the central findings of this study. Conclusions The literature to date on the inverse modeling based parameterization of spatial crop models is dominated by uncoupled models. We studied three possible sources of error for such models: model error from lack of spatial coupling and water transport among different landscape locations, parameter error from biased weather in the years of yield data used for the parameterization process, and errors due to lack of knowledge of initial soil water conditions. Each of these sources of error impacted spatiotemporal yield prediction capability.
PAGE 63
41 With respect to model error, we showed analytical proof that the spatiotemporal infiltration behavior of a spatiallycoupled water balance model cannot be reproduced by modifying the parameters of an uncoupled model. The corresponding yield prediction limitations of the uncoupled model were confirmed, using an example, both at the parameter estimation and evaluation stages. In our example, however, parameter error due to weather biases and the error from lack of knowledge of initial conditions greatly impacted the predictive capability of the spatiallycoupled model, and had less effect on its uncoupled counterpart. Based on our analysis we concluded that the use of spatiallycoupled crop models requires highquality data. Practical precision agriculture applications are characterized by uncertain initial conditions and the possibility that the weather used for calibration is not representative. Under these circumstances, the use of a spatiallycoupled model may not be justified, especially for low landscape positions.
PAGE 64
CHAPTER 3 PLANNING CROP SCOUTING PATHS WITH OPTIMIZATION ALGORITHMS AND A SELFORGANIZING FEATURE MAP Introduction In the context of decades of falling commodity prices, climate change, and increasing environmental regulatory pressure, farmers need to sustain high crop yields and incomes year after year in order to survive. Effective risk mitigation requires that farmers make crop management decisions based on uptodate information. Crop scouting is a datacollection activity that is used to support crop management decisions such as when to make insecticide, fungicide, and herbicide applications. A crop scout typically walks through a field to get a general impression of its state, occasionally stopping to make more detailed measurements. The kind of information collected by crop scouts depends on the crop in question and the decisions to be made, but may include qualitative and quantitative assessment of the presence of insects, diseases, weeds, and water stress. The advent of precision agriculture (Pierce and Nowak, 1999) and precision integrated pest management (Fleischer et al., 1999) has brought the possibility of sitespecific applications. It is increasingly common for farmers to selectively apply pesticides, fertilizer, lime, and other products to areas in which the application will maximize profit. This in turn has led to the need for sitespecific scouting. Many farmers currently accumulate different types of spatial data in electronic format, using them in some form of geographical information system (GIS) to provide 42
PAGE 65
43 information about the spatial variability of factors that affect their crops. The amount of information available varies with the time since the farmer's adoption of spatial technologies, as well as his/her level of investment, but may range from having only a field boundary to multiple years of yield data, electrical conductivity, elevation, and multiple soil test datasets (NRC, 1997). Scouting can be integrated into a spatial data management system, using the scouting maps together with existing information in a spatial database to generate application maps (Nelson et al., 1999). Currently many crop scouts in the U.S. record their data on preprinted paper forms that are later completed and faxed to the farmer. A crop scout will typically service a number of growers, up to a total area of about 80001 0000 hectares, charging a fee per unit area. Their responsibility varies between growers, from making all prescriptions and supervising subsequent applications, to merely communicating their recommendations. Some scouts focus exclusively on insects; others also include diseases and weeds. The price per unit area may vary an order of magnitude depending on the level of services rendered. A crop scout working in such a regime needs to optimize the use of his/her time; spending too much time per unit area is uneconomical, and spending too little exposes the scout to making expensive errors. Moreover, if the intent is to describe spatial variability of the variable of interest for a precision agriculture / precision IPM application, the placement of the samples becomes especially relevant (Fleischer et al., 1999). The path chosen to link sampling locations strongly influences the use of the scout's time. An important condition to be met by this path is that it must be closed i.e. the scouting path must have the same starting and ending point. This condition eliminates
PAGE 66
44 the downtime required for the scout to return to his/her vehicle from a distant position in the field. In general, the search for a convenient scouting path can be imagined as the combination of two activities: Â• Determining the sampling locations, and Â• Finding the shortest tour (closed path) linking all of the locations. The optimal placement of sampling sites has been extensively treated in soil science (McBratney and Webster, 1981; Burgess and Webster, 1984; van Groenigen et al. 1999; Ferreyra et al. 2002). Optimizing the path through a set of scouting sites has not been given much attention in the agricultural literature, but is equivalent to a classic problem in computer science: the Traveling Salesman Problem (TSP). The goal of this study is to develop objective methods for solving the two points shown above, i.e., sample placement and scouting path construction, to build scouting maps. Two possible approaches to these problems are: 1 . Sequential, in which the optimal sampling locations are determined first, followed by a search process to solve the associated TSP, and 2. Simultaneous, in which sampling points and the tour are developed simultaneously. Our specific objectives were to apply both approaches in a representative case study and to compare their performance in terms of predictive error and practical applicability, using runtime as the criterion to assess the latter. Theory Sequential Approach: Sampling Locations The search for an optimal sampling location network depends on various factors. Although methods exist for determining the optimal number of samples required to represent a data set with a given error level (Gath and Geva, 1989), in agricultural
PAGE 67
45 practice the spatial sampling density is usually determined by the field's surface area and economical considerations. Moreover, there may be few layers of available data. An extreme scenario occurs when only the field boundary is available, digitized from a map or acquired via GPS. This datapoor scenario may be typical of many farming operations that are beginning to adopt information technology and precision farming. In sitespecific agriculture, given a set of sampling locations across a field (sampling scheme), it is desirable to spatially distribute the points in a way that allows the best prediction of data values at unsampled locations using the sampled data. This is an optimization problem. Typically, optimization problems involve the search for an optimal combination of data (sampling locations, in our case) that minimize (or maximize) an objective function or OF (Winston, 1994). In the datapoor scenario described above, a suitable OF to minimize is the Minimization of the Mean of Shortest Distances (MMSD) criterion defined by van Groenigen and Stein (1998). The MMSD function is the expectation of the distance between an arbitrarily chosen point within the study region and the sampling location nearest to it. For large sampling regions e.g. an infinite plane, this criterion produces an equilateral triangle grid. The criterion can be expressed as follows (van Groenigen and Stein, 1998): *MMS D (S) = I%^ (3D where S is the sampling scheme (set of sampling locations), M is the total number of evaluation points composing the field, xj is the j* evaluation point, and d(x J ,s)is the distance between point xj and the nearest sampling point. It is assumed that the evaluation
PAGE 68
46 points are distributed across the area of interest on a finely meshed grid (ten meters, for example). The OF must be combined with a generation mechanism i.e. a method to search iteratively for progressively better solutions to the problem. A powerful such method is Simulated Annealing (Aarts and Korst, 1990), a combinatorial optimization algorithm that has been applied successfully to replace exhaustive searches in large problems (Kirkpatrick et al. 1983) and is insensitive to local optima in the OF, unlike more traditional methods such as gradient descent. Using simulated annealing, the sampling scheme is iteratively perturbed by moving a randomly selected point in the scheme to a new random location, keeping the new scheme if it improves on the previous value of the OF, and rejecting it with an increasingly higher probability if it does not improve the OF value. Van Groenigen and Stein (1998) developed a variant of this method, called Spatial Simulated Annealing, which differs from the above primarily in that the distance that a point can be moved during a perturbation also decreases as the algorithm progresses. A sampling scheme for the datapoor scenario can made by coupling simulated annealing with the MMSD criterion using minimal data: a field boundary to make a raster map of the field interior (the evaluation points). A different criterion may be used when additional information is available, such as a semivariogram of the spatial random variable of interest. For example, if the scout's goal is to estimate crop yield throughout the field from values of yield (or a proxy such as number of grains per unit area) measured at the sampling locations, geostatistics can provide a principled optimal solution based on minimizing kriging variance (van Groenigen et al., 1999). The starting point is a spatial covariance model: a description of
PAGE 69
47 how similarity among values of the variable sampled at different locations varies with the distance between the locations. Considering a stationary spatial random variable Z, its semivariance is defined as: Y (h)=ivar(z(u + h)Z(u)) (32) where u is a location in space and h is a given displacement away from it (Deutsch and Journel, 1992). Ordinary Kriging (Goovaerts, 1997) is a popular method for spatial interpolation. For an arbitrary point u in the region of interest, the estimated value of the variable of interest Z is the weighted sum of the measured values of Z at the n sampling locations u i . Thus, z(u)=Â£vz(u,)(33) 1=1 where \j are the weights, determined using the semivariogram of Z and assuming a constant, albeit unknown, expectation E[z(u)] = m . The error or kriging variance (KV) for ordinary kriging is defined as i=l where \\i is a Lagrange multiplier as described by Webster and Oliver (1990). Kriging variance depends on the sampling scheme geometry and on the semivariogram, but is independent of actual data values (Goovaerts, 1997). It is zero at the sampling locations, and increases away from them. It can be used in an objective function; for example, van Groenigen (2000) proposed a method in which the mean KV value over the field is minimized, and another that minimizes the maximum KV (MMKV).
PAGE 70
48 Sequential Approach: the Traveling Salesman Problem (TSP) Imagine a salesman who has to travel across a network of cities, and that the distance between each pair of cities is known. The TSP consists of finding the shortest tour that will visit all the cities (once) and return to the starting point; it may seem simple, but no efficient solution to it is known. The TSP forms part of a family of problems known as NPcomplete (Cormen et al., 2001); the runtimes of known solutions to NPcomplete problems are exponential functions of the size of program input (the number of sampling locations, in this case). Thus, runtime increases dramatically with increasing input size. There is much ongoing research on the TSP, and numerous approximate solutions have been postulated for it (Golden et al., 1980). The Simultaneous Approach The Kohonen selforganizing feature map, or SOFM (Kohonen, 1982), is a form of neural network that can be used to transform highdimensional signal pattern inputs (such as several layers of GIS data for an agricultural field) into a lowerdimensional representation such as a onedimensional scouting path. In a scouting problem, the input data are vectors corresponding to the nodes of a grid (for example, with tenmeter isometric spacing) overlaid onto the field. These vectors are multidimensional; each dimension is an attribute of the corresponding location, such as x, y, and in datarich scenarios, electroconductivity, elevation, slope, past yields maps, etc. The output is the scouting path: the sequence of (sampling) locations to visit. These locations exist in the highdimensional space, but are topologically ordered in a lowerdimensional form i.e. a onedimensional sequence (node 1, node 2, etc). Since the input vector attributes are expressed in different units, the data are usually normalized in order to make the distance calculations meaningful.
PAGE 71
49 The SOFM is based on the idea of competitive learning. Each output node is represented by a neuron (a vector having the dimensionality of the input data), and the neurons compete for the input data vectors. This competition is based on distance (measured in the highdimensional space) between the input data and the neurons. The algorithm is iteratively presented a vector, selected randomly from the input data. The distance between the vector and each of the neurons is evaluated, and the neuron that is nearest to the input vector is declared the winner. The winning neuron is subsequently rewarded by being moved towards the input vector. The neurons that are near the winning neuron (this nearness is measured in the low dimensional space) are also moved with it to some extent, depending on the value of a neighborhood function. Haykin (1994) defined the function thus: let dj i denote the lateral distance of neuron j from the winning neuron i, measured in the low dimensional output space (such that adjacent neurons would have a distance of 1). Let 7ijj denote the value of the neighborhood function centered on the winning neuron i; its value is maximum for djj = 0, and must tend to zero as djj tends to infinity. A typical function used for this purpose is the following Gaussian: *u = ex P f d 1 ^ jj (35) 2*( n y y where a(Â«) is the effective width of the topological neighborhood after n iterations of the process. In each iteration the neurons are updated as follows: j(n + 1) = w, (Â») + n(n)n jM {n\x{n) w, (Â«)) (36) w
PAGE 72
50 where Wj(n) is the state of neuron j after n iterations, n is the learning rate, and i(x) is the winning neuron corresponding to input vector x. This iterative update process is repeated thousands of times. The learningrate parameter r used to update the weight vectors and the effective width of the neighborhood function a should decrease as the algorithm progresses, similar to the cooling process described earlier for simulated annealing. Ritter et al. (1992) proposed:
PAGE 73
51 We used two versions of the sequential approach and one of the simultaneous approach (a total of three different methods, henceforth called "cases") to propose sampling schemes and tours at nine different sampling densities ranging from 0.57 samples / ha to 10 samples / ha. Our study area was the McCallon 1 field near Murray, KY, USA (36Â° 32' N, 88Â° 27' W, elevation 222 m). Its surface area is 8.33 ha (22.1 acres). Soils in McCallon 1 are predominantly somewhat poorly drained Calloway soils (Glossaquic Fragiudalfs) and poorly drained Henry (Typic Fragiaqualfs) soils. Both have a fragipan. Available data were the field boundary, maize (Zea mays L.) yield maps for 1999 and 2001, and an additional yield map taken in the 1999 harvest at a nearby field called Suggs 4. The latter yield map was used to provide a semivariogram; we assumed its spatial covariance structure could be representative of the crop in other years and in similar fields such as McCallon 1 , and thus be useable to drive the MMKV criterion in the absence of actual previous McCallon 1 yield data. For each sampling scheme, maize yield data were obtained by averaging all the raw yield map data available within a 5 m radius of each sampling location. The resulting data were used to estimate the yield throughout the field on a 1 0 m grid of evaluation points, using ordinary point kriging as detailed further below. The Sequential Approach We used the SANOS program (van Groenigen and Stein, 1998) to determine the optimal sampling locations using both the MMSD and minimal maximum KV (MMKV) criteria. SANOS is a versatile program that can design sampling schemes in complex domains. It can accommodate a finite, discontinuous region composed of arbitrarily shaped subregions, and it can integrate existing sampling locations into its optimization.
PAGE 74
52 The runtime of the spatial simulated annealing algorithm in SANOS is userspecified, but an optimal value can be calculated within SANOS using the optimal initial transition probability estimation method proposed by Aarts and Korst (1990). For the TSP we used the 3Opt algorithm as implemented by Syslo et al. (1983). Although it is not guaranteed to produce the optimal solution to any given TSP, this algorithm produces highquality approximate solutions very rapidly, as shown by empirical studies such as that of Golden et al. (1980). During the remainder of this study, we refer to two sequential cases: MMSD+TSP (using SANOS with the MMSD criterion and using the 3Opt TSP solution), and MMKV+TSP (as above but using SANOS with the MMKV criterion). In both cases we used the field boundary to build a domain for SANOS, used SANOS to propose a sampling scheme, and then used the 3Opt code to propose the scouting tour. We repeated the process for several sampling densities. With respect to the spatial interpolation step, we used the Suggs 4 semivariogram for kriging in the MMKV+TSP case, and a linear, zeronugget semivariogram (typically the default in many geostatistical packages) in the MMSD+TSP case. The Simultaneous Approach The third case under study was a variant of a 1D SOFM. We altered the topological neighborhood function to force the SOFM to close on itself i.e. make a closed tour, calculating the distance d^j as follows: 1 ifabs(ji)>int(N/2) 2 then d = N abs(ji) 3 else d = abs(ji) 4 return d
PAGE 75
53 where N is the total number of neurons. Thus, if i, j are the first and last neurons, dj,j is now 1 instead of Nl. The field boundary was converted into 10meter raster data corresponding to the locations inside the field boundary. These 833 xy pairs were used as input to the SOFM algorithm. Considering the possibility of the results being parameterdependent, we made 21 realizations of the SOFM, keeping three parameters constant (n 0 = 1, x CT = 20000, x n = 20000) and varying a 0 from 0.5 to 1.0 in steps of 0.025, In order to represent typical SOFM solutions in the subsequent evaluation of the three cases, we picked the SOFM realization having the median MMSD value. For the spatial interpolation step we used a linear semivariogram and ordinary kriging as in the MMSD+TSP case. Evaluation of Results The three cases applied at 9 different sampling densities were evaluated according to a) MMSD values, b) capability of predicting spatial yield variability as expressed by the root mean squared error (RMSE) between observed and estimated yield maps over the 10meter grid of (833) evaluation points, c) predictive capability: relative error in predicting the field average yield calculated from the 833 evaluation points, and d) tour length. The observed values mentioned in point (b) were obtained by fitting semivariograms to the observed yield map data points and then using ordinary kriging to resample the observed yield map data onto the 833point grid. To explore the quality of the TSP solutions, we asked an expert crop consultant to plot what he felt was the shortest tour through the different sampling schemes, and compared his answers with the solutions provided by the 3Opt algorithm.
PAGE 76
54 A simulated annealing algorithm can be run for an arbitrary duration, or until a timeinvariant solution is found. SANOS specifies runtime as proposed by Aarts and Korst (1990), assuring a slow "cooling" of the system and maximum rejection of local minima. As applied in SANOS using parameter values suggested by van Groenigen et al. (1999), the algorithm's runtime is about 2.5 and 4 hours for the MMSD and MMKV criteria, respectively, on a Pentium PC with a 1 .3 GHz processor speed. This is excessively long for practical applications, so we explored how solutions changed if the TSP cases' runtime was decreased to a small fraction of the optimum (1 min). Finally, we compared the semivariograms from the 1999 and 2001 observed yield maps in the McCallon 1 field with the one used as a proxy from the nearby Suggs 4 field. Results and Discussion Sampling Location Layout Figures 31 A, 3IB and 31 C show the sampling locations and scouting tours obtained for a sampling density of 2.5 samples/ha (1/ac) by the MMSD+TSP, MMKV+TSP, and SOFM cases, respectively. The points are spread quite evenly over the field, although MMKV+TSP allocates more points to the periphery than the other methods. The optimal tours vary greatly between cases, and are not easily predictable a priori by observing the sampling schemes. Some spatial datacollection problems require sampling schemes with unevenly distributed sampling locations. These applications generally involve prior knowledge, such as the areas of the field more prone to fungal diseases. Another such application involves planning a soil sampling strategy that ensures all soil mapping units get sampled, even if they are very small. These kinds of stratified sampling are easily implemented in the MMSDTSP and MMKVTSP cases by modifying the generation
PAGE 77
55 mechanism to guarantee that sampling locations assigned a priori to a mapping unit remain within it. However, implementing stratified sampling in the SOFM case is more complex due to its simultaneous sample placement and path determination. An a priori assignment of sampling locations to polygons in the SOFM could result in complicated, suboptimal path shapes, unless the neighborhood functions and other algorithm parameters could be updated dynamically during operation, as in the Kalmanfilterdriven AutoSOFM algorithm (Haese and Goodhill, 2001). Predictive Accuracy Figure 32A shows how MMSD varied with sampling density in the three cases. The MMSD+TSP algorithm consistently produced the lowest MMSD values, although there was little variability among cases. The better performance of the MMSD+TSP case was expected, since MMSD is what was being optimized in it. Observed mean corn yields and standard deviations in McCallon 1 were 8,406 and 1,222 kg/ha (CV = 14.5%) respectively in 1999, and 9,912 and 1,174 kg/ha (CV = 1 1 .8%) respectively in 2001. Yields were higher and less variable in 2001, a more favorable weather year.
PAGE 78
56 4044150 4044100 4044050 4044000 4043950 370000 370050 370100 370150 370200 370250 370300 370350 4044150 4044100 40440504044000 4043950 370000 370050 370100 370150 370200 370250 370300 370350 4044150 4044100 4044050 4044000 4043950 370000 370050 370100 370150 370200 370250 370300 370350 Figure 31 . Sampling locations and tour lengths for the 3 cases at a sampling density of 2.5/ha (22points): (A) MMSD+TSP: 1,433 m; (B) MMKV+TSP: 1,395 m; (C) SOFM: 1,424 m.
PAGE 79
57 60 55 50 45 40 35 30 25 20 15 10 Â•& Â— MMSD+TSP x Â— MMKV+TSP ASOFM 2 4 6 8 Sample density (1 /ha) 10 1500 ^ 1250 eÂ— MMSD+TSP * Â— MMKV+TSP aÂ• SOFM 1500 0 2 4 6 8 10 Sampling density (1/ha) B R 8 10 MMSD+TSP MMKV+TSP SOFM Density (1/ha) D MMSD+TSP MMKV+TSP 0 2 4 6 8 10 Sampling density (1/ha) C 8% MMSD+TSP MMKV+TSP SOFM Density (1/ha) Figure 32. Evaluation of the sampling schemes produced by the three cases at different sampling densities: (A) MMSD, (B) yield prediction RMSE for 1999, (C) yield prediction RMSE for 2001, (D) percent error predicting 1999 mean field yield, (E) percent error predicting 2001 mean field yield.
PAGE 80
58 B 4044150 4044100 4044050 4044000B 4043950 370000 370050 370100 370150 370200 370250 370300 370350 370400 40441504044100 4044050 4044000 4043950 370000 370050 370100 370150 370200 370250 370300 370350 370400 40441504044100 4044050 4044000 4043950 11000 10000 H9000 8000 7000 6000 5000 4000 3000 2000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 370000 370050 370100 370150 370200 370250 370300 370350 370400 Figure 33. (A) Observed 1999 yield map; (B) MMKV+TSP estimate at 2.5 samples/ha (22 points); C) MMKV + TSP estimate at 10 samples/ha (88 points). Figures 32B and 32C show estimated yield RMSE vs. sampling density for 1999 and 2001. RMSE decreased with increasing density, and varied little between cases.
PAGE 81
59 None of the methods was clearly superior to the others for both years across the sampling density range. Figures 3 2D and 32E show relative error in field average interpolated yield vs. sampling density for 1999 and 2001. Error was generally low; average (across cases and years) relative error s r was below 5% (less than 28% of the CV) for all sampling densities, a great improvement with respect to estimating the field mean yield from one random location. Figure 33A shows the observed yield map for 1999. Figure 33B shows the surface interpolated from 22 points (sampling density: 2.5/ha Â« 1/acre) obtained with the MMKV+TSP algorithm. Although this layout estimated mean field yield with an error of 0.5% in 1999 and 3.2% in 2001, it reproduced spatial variability relatively poorly (RJVISE of 1,064 kg/ha in 1999 and 1,004 kg/ha in 2001). In contrast, Figure 33C shows the surface obtained with the same algorithm and 88 points (density: 10/ha Â« 25/acre). The relative error of prediction of the mean at this density was 0.7% in 1999 and 0.3% in 2001 and RJVISE improved to 742 kg/ha in 1999 and 786 kg/ha in 2001. Planning a sampling scheme and scouting path using multivariate data (of which the x,y pairs of the datapoor scenario are a special case) can be considered an attempt at accurately depicting the joint probability distribution of the data using a small sample. The SOFM tends to bias this representation by overrepresenting regions of low input density and underrepresenting regions of high input density (Haykin, 1994). This may actually be valuable in crop scouting, where small, distinct regions in the input data distribution may correspond to environmental conditions favoring pests, weeds, etc.
PAGE 82
60 Tour Length Figure 34A shows tour length vs. sampling density for the three cases. Tour lengths were quite similar, with the MMKV+TSP case producing the shortest tour at five densities, and MMSD+TSP producing the shortest at the remaining four. The SOFM produced tours that were, on average, 4% longer than the average of the other cases. Fig. 34B shows the difference between the tour lengths resulting from the TSP algorithm and the expert. Note that: Â• The algorithm and expertderived tour lengths coincided at the lowest (5point) sampling density, Â• The algorithmderived tours tended to be increasingly shorter than the expertderived tours as sampling density increased, and Â• The trend was stronger for MMKV than for MMSD. 3000 2500 ? 2000 C O) C 1500 0 L. 3 1000 500 0 MMSD+TSP * MMKV+TSP ASOFM 2 4 6 8 10 Sample density (1 /ha) Sample density (1 /ha) 2 4 6 8 10 MMSD MMKV MMSD MMKV A B Figure 34. (A) Comparison of the three cases' tour lengths. (B) Difference between MMSD / MMKV case tour lengths and expertderived tour lengths. Note how the algorithmderived tours tend to be shorter, and how the difference increases with sampling density.
PAGE 83
61 Runtime Our SOFM algorithm runs quickly (23 minutes), stopping when it reaches an invariant scheme. Thus we did not try timeconstrained SOFM runs. Likewise, the Syslo et al. (1983) implementation of the Opt3 algorithm we used to solve the TSP component of the other cases ran remarkably fast. The spatial sampling design stage performed with SANOS was where time reduction was necessary. The "optimal" value was derived from van Groenigen et al.'s (1999) suggestion of using a conservative value (over 0.99) for a parameter (called a) that sets how quickly the simulated annealing algorithm "cools". When runtime of the TSP cases was constrained to only 1 minute by reducing a, tour length decreased an average of 3.5% across cases and years, reflecting the more clumped structure of the suboptimal schemes. The RMSE remained essentially the same, as did the estimation of field mean, except for the MMKV cases, where estimation error was about twice that of the timeunconstrained cases. The latter effect results from the time required per iteration of the MMSD and MMKV cases: MMKV iterations involve solving several systems of linear algebraic equations in order to determine the weights X, shown in equations 33 and 34. Conversely, MMSD iterations are relatively very quick, only requiring arithmetical comparisons. Under very constrained runtimes there may not be enough iterations of the MMKV algorithm to attain proper equilibrium of the simulated annealing algorithm at each acceptance probability level, whereupon the method's rejection of local minima could break down. Thus, in practical timecritical applications, the MMSD+TSP algorithm is preferable.
PAGE 84
62 Semivariograms Table 31 shows the parameters of the exponential models fitted to the three semivariograms derived from raw yield map data. The nugget (CO), sill (C0+C1) and effective range (r) values differ between the three semivariograms. It has been noted by van Groenigen (2000) that changes in variogram parameters can impact the results of a sampling scheme based on minimizing kriging variance. Thus, although MMKV+TSP is based on sound geostatistical principles whereas MMSD+TSP is empirical, the advantage of MMKV+TSP when using a proxy semivariogram is questionable. Table 31. Standardized variograms of the 1999 and 2001 McCallon 1, and 1999 Suggs 4 maize data. Columns are the nugget effect or CO, sill or C0+C1 , and effective sv CO C0 + C1 r McCallon 1 '99 0.35 0.75 117 McCallon 1 '01 0.45 0.556 75 Suggs 4 '99 0.2 0.8 66 Practical Considerations Aside from improvement with respect to expertderived sampling schemes and paths, using one of the proposed methods to generate a scouting map has several advantages with respect to a paperbased approach: Â• Officebased map generation Â• The map can be downloaded to a GPSenabled handheld computer that can log results and ease the transfer of data back into a crop database. Â• Digital scouting tools allow the farmer to keep a permanent sitespecific record of crop state. Thus, scouting maps can be used to make application maps. Â• Increased accountability of scouting performance: files contain timestamps, location, etc. Â• Repeatedly scouting the same areas can allow comparison of the state of the field over time.
PAGE 85
63 Â• Greater potential for delegation. Â• It is possible to bias the samplelocating process to increase the chances of finding pests that are first detected in certain kinds of environments. It is thus possible to avoid (or prioritize) field edges, etc. Potential drawbacks include: Â• Additional hardware / software requirements. Â• The requirement of following a set path may potentially be less costeffective (more timeconsuming) for the crop scout. Â• The learning process of using new technology. Conclusions The methods shown herein provide a principled approach to the design of cropscouting activities as a form of spatial sampling. The methods are sufficiently quick and accurate to be usable in practical applications. The TSP methods (MMKV+TSP and MMSD+TSP) tended to make slightly shorter tours than the SOFM, although the three methods' tours were never longer than the expert opinions. The TSP methods also typically estimated yield slightly better than the SOFM. When runtime is unconstrained (and a semivariogram is available), the MMKV+TSP case seems most appropriate. Contrarily, when runtime is strongly constrained MMSD+TSP may be more dependable. In intermediate situations the three methods are practically equivalent.
PAGE 86
CHAPTER 4 REDUCING SOIL WATER SPATIAL SAMPLING DENSITY USPNG SCALED SEMIVARIOGRAMS AND SIMULATED ANNEALING Introduction Estimating the spatial and temporal patterns of soil water content in agricultural areas is of great value in various activities such as predicting of crop yields, assessing the fate of potentially contaminating crop inputs, and estimating soil erosion. The necessary data requirements must be met through spatial sampling, and the spatial density of the measurements will strongly influence the cost of the process, the quality of the results, and the feasibility of a longterm study. Careful design of the sampling scheme can save time and money, and spatial statistics may be used to optimize such a scheme (Van Groenigen and Stein, 1 998). The density reduction of an existing spatial network is a related problem, relevant in many regions of the world where funding for environmental monitoring is decreasing. The spatial dataset used in this study was taken from an ongoing experiment running since 1992 in an 8hectare microwatershed. The spatial variability of soil water content was characterized by repeatedly sampling soil water at 57 locations distributed throughout the microwatershed, but it was impractical to sample all the points with the desired temporal frequency over an extended period of time. It became necessary to develop a methodology to reduce the number of sampling points while maintaining the ability to describe spatial soil water content across the field. Our goal was to identify a timeinvariant relationship between the water content measurements across the 57 64
PAGE 87
65 locations. The existence of such a relationship would allow us to infer the spatial pattern of soil water content over the entire microwatershed at future dates by sampling a reduced subset of locations. Temporal Stability Formally, the concept of a timeinvariant relationship between water contents at different locations may hold only for covered plots draining freely; however, there is evidence that it has a wider range of application (Sisson, 1987). Vachaud et al. (1985) developed a technique for reducing spatial sampling density, based on the concept of temporal stability of soil water content. They defined this as a time invariant association between spatial location and classical statistical parameters, emphasizing the persistence of the rank of soil water content measured at different locations in a network. This idea has been subsequently tested under different conditions by several authors, with contradictory results as shown below. The work reported by Vachaud et al. (1985) involved data sets that showed no spatial correlation, perhaps due to the great heterogeneity of the soil properties in the study locations. This sample independence made it possible to study the temporal stability of the data using simple statistics. However, at many scales of interest, soils are not necessarily randomly distributed; Kachanoski and de Jong (1988) presented additional tests of temporal stability in the context of spatial associations in the data and scale dependency. Other researchers have observed temporal stability in soil water patterns: Goovaerts and Chiang (1993) in a longterm fallow plot, Kamgar et al. (1993) in bare soil laid out in furrows and beds, Zhang and Berndtsson (1988) on shortcut grass, Jaynes and Hunsaker (1989) in an irrigated wheat field, and Reichardt et al. (1993) under different conditions of land cover ranging from bare soil to a corn crop.
PAGE 88
66 Other studies produced mixed results. Cassel et al. (2000) observed greater temporal stability of water content in deeper soil layers than in shallow layers under a wheat crop. This effect could be attributed to the impact of crop root water uptake. Grayson and Western (1998) studied three catchments having significant relief, and observed that although the overall spatial soil moisture patterns were not time stable, the measurements in a specific subset of the locations within the measurement network were time stable and could adequately represent mean soil moisture over their areas of interest. Grayson and Western (1998) denoted the locations in this subset as catchment average soil moisture monitoring (CASMM) sites. Comegna and Basile (1994) obtained opposite results: they observed both spatial associations between water content across locations, and a timestable spatial structure for the water content in the top 90 cm of the soil profile. However, they were unable to find CASMM locations, and attributed this effect to the great homogeneity of the volcanic soil of their study site. Other researchers have observed a lack of temporal stability. Van Wesenbeeck et al. (1988) observed that the spatial pattern of surface (00.2 m) soil water content below a corn crop was not stable over time, but was a function of crop growth stage and mean soil water content. Mohanty et al. (2000) observed time instability of soil moisture patterns in a gently sloping range field, and suggested that this might be the consequence of lateral base flow and aspectdriven accelerated or decelerated evapotranspiration and condensation. Indeed, Kachanoski and De Jong (1988) pointed out that soil water content at a point is the product of hydrologic processes operating at different spatial scales. Variogram analysis has been used very effectively to study spatial associations (Vieira et al., 1983; McBratney and Webster, 1981; Burgess and Webster, 1980).
PAGE 89
67 However, in the context of temporal analysis, seasonal differences in rainfall and water content in a field may render a variogram calculated at one time not representative of the conditions at another. Kachanoski and De Jong (1988) noted that time stability results in time independence of the normalized semivariogram but not necessarily the ordinary semivariogram. Vieira et al. (1991) proposed a variogram scaling technique, dividing the variogram of the observations taken at a particular date by their sample variance, and subsequently merging several dates' variograms into one. Comegna and Basile (1994) and Vieira et al. (1997) applied this concept to time stability analysis of soil water content. Simulated Annealing The spatial sampling density reduction problem requires selecting a subset of the original dataset that will, in combination with a spatial interpolation algorithm, produce the best possible estimate of the variable of interest at the points that will no longer be sampled. This is a nontrivial combinatorial problem when the number of locations involved is high. An optimization algorithm may be used to search for a solution, but the algorithm in question should converge to the global optimum. Simulated annealing (Aarts and Korst, 1990) is such a method; different forms of the algorithm originally proposed by Metropolis et al. (1953) have been recently applied to numerous problems such as modeling spatial variability of heavy metal concentration in soils (Lin and Chang, 2000), spatial variability of phosphorus content and texture (van Groenigen et al., 1999), soil pore structure modeling (Moran and McBratney, 1 997), and soil parameter estimation for functional crop models (Calmon et al., 1999; Braga and Jones, 1998; Braga et al., 1998; Shen et al., 1998; Paz et al., 1998). Examples of combinatorial applications of simulated annealing range from printed circuit board design (Kirkpatrick et al., 1983) to the
PAGE 90
68 selection of representative nodes in a meteorological network (Robledo, 1994) and the determination of optimal soil sampling strategies for precision agriculture research (Van Groenigen et al., 2000). Our objectives in this study were a) to model the spatial variability of soil water across several dates in an 8 ha micro watershed using measurements taken at 57 locations, and b) to define a reduced subset of 1 0 of the original measurement network locations which could be used to adequately predict the soil water content in the rest of the network. Materials and Methods Study Location Our study area is an 8hectare microwatershed (64Â° 13' W, 31Â° 29' S) located 25 km to the south of the city of Cordoba, Argentina. The conditions in this field are considered representative of approximately 20000 hectares affected by water erosion in the region (Romero et al., 1995). A map of the microwatershed including elevation level curves is shown in Figure 41. Slope in the microwatershed varies from 0.8 to 1.2%, and runoff is discharged through a flume located at its southeastern corner (coordinates x = 0, y = 280 in Figure 41). The soil is a silty loam Typic Haplustoll. A modal horizon profile is Al (014 cm), A2 (1420 cm), Bw (2040 cm), BC (4060 cm), C (6084 cm), and Ck (84+ cm) . The soil is very deep, and the depth to the water table is approximately 20 meters. Soybeans (Glycine Max (L.) Merrill) were grown on the microwatershed under conventional tillage every year since 1 990. An Agripo maturity group VII variety was used in the 1991 / 92 season and an Asgrow maturity group VI variety was used thereafter.
PAGE 91
69 300 7 14 200 250 100 150 50 0 0 50 100 150 200 250 300 350 400 450 500 Figure 41. Layout of the micro watershed, showing the sampling locations as numbered crosses. Coordinates are expressed in meters. Note the rotated map orientation. The measurement layout consisted of a grid pattern of 57 locations covering the entire microwatershed, as shown in Figure 41 . The grid had an isometric interval of 41.66 m between adjacent points. Gravimetric soil water content measurements were performed in each of the grid points at depths of 030, 3065, and 65100 cm, and these values were used to estimate the total soil water content to a depth of 1 m. This study used measurements performed on 2/7/1992, 2/24/1992, 3/20/1992, 1/25/1993, and 12/23/1993. The first three dates were used to develop and calibrate a model of spatial variability; and the last two, belonging other cropping seasons, were used for validation. Semivariogram Modeling We analyzed the spatial variability of the soil water content of each soil layer and of the total water content in the first meter of the soil profile using variogram modeling
PAGE 92
70 (McBratney and Webster, 1986; Trangmar et al, 1985; Vieira et al., 1983). We calculated an experimental semivariogram for each measurement date and fitted a continuous function to it. The optimal semivariogram type was selected, and the semivariograms were tested, using crossvalidation. Apezteguia et al. (1999) reported these results. For validation we used a scaled semivariogram (Vieira et al., 1997), built by dividing the experimental semivariograms of each of the three calibration dates by the sample variance of each date's data, and fitting a new continuous function to the union of all the scaled data. Its usage will be described under Validation below. The Density Reduction Problem The sampling density reduction problem consisted of choosing the subset of given cardinality of the 57 measurement locations which would best approximate the spatial distribution of soil water content at the calibration dates, using the data measured at the subset to estimate the data at the remaining points by means of a spatial interpolation algorithm. The optimization process was formulated as the minimization of an objective or fitness function J that will be discussed below. Longterm sampling cost considerations set the subset cardinality to 10. Choosing an optimal subset of 10 points out of 57 is a complex combinatorial problem; using a bruteforce method to evaluate all the possible combinations would be very timeconsuming given that C(57,10) = 4.318 Â• 10 10 and each iteration is computationally intensive. Instead, we approached the problem using simulated two annealing algorithms: the one described by Sacks and Schiller (1988), and the newer Spatial Simulated Annealing algorithm proposed by van Groenigen and Stein (1998).
PAGE 93
71 The Sacks and Schiller algorithm (S&S) is designed to work on a discrete domain D (57 points, in our case). At any given time j, S J is a subset of D of the desired cardinality (10). The algorithm iteratively proposes and evaluates a new subset by replacing one point from the previous subset, and accepts or rejects the change according to the application of a simple acceptance criterion. In each iteration the new pattern S' is proposed by randomly choosing an entering point t Â€ (dS j ), followed by the deterministic selection of the replaced exiting point s* e S J that minimizes the fitness function J(S') i.e. J(S j u t s* ) = min J(S J uts). After each such change, the new seS J value of J, J(S') may or may not have improved (decreased) with respect to the previous iteration. If it improved, then the new pattern is accepted with a probability of 1 . If it did not improve, then the pattern is accepted with a probability given by a control parameter 71, such that 0 < 7i < 1 , and n is a function that tends to decrease through the algorithm's execution, making it progressively more improbable that the algorithm accepts new patterns that do not improve the solution. The Spatial Simulated Annealing algorithm (SSA) is different from the previous algorithm in three fundamental aspects: i) it is designed for a continuous domain, ii) instead of only using the control parameter to set the acceptance probability, it also includes the difference in fitness between the new and old patterns, and iii) instead of replacing a point of the subset S J with another one belonging to (dS j ), it chooses a point s within the subset and moves it over space to a new location shifted with respect to the original in a random direction and by a random distance, the latter bounded from above by a function h max that tends to decrease as the algorithm execution progresses. Thus, s may initially be shifted large distances, but as the algorithm progresses the
PAGE 94
72 movements become progressively smaller and less probable. In a manner similar to S&S, the SSA control parameter decreases with time. The nature of our problem did not allow the use of the SSA algorithm as originally described by van Groenigen and Stein (1998); it was only possible to consider locations from the discrete domain of 57 points due to the existence of several years of other data (crop biomass, yield, etc.) sampled only at those locations. We implemented a variation of the SSA algorithm that moved location s over space, but only to candidate locations on the grid. Otherwise, the algorithm is almost identical to the one presented by van Groenigen and Stein. A detailed description of both implemented algorithms (S&S and SSA) is provided in the Appendix. Fitness Functions In each iteration of the simulated annealing algorithms, we used a fitness function to describe the ability of the proposed pattern S' to predict the water content throughout D at all the dates of interest. The prediction of water content was performed with a spatial interpolation algorithm. We applied ordinary kriging (Deutsch and Journel, 1992), using each calibration date's calculated semivariogram. We performed the process with two different fitness functions, both of which simultaneously evaluated the performance of candidate subsets across all the calibration dates. The functions are described below. Scaled kriging variance (SKV) where N is the total number of date space combinations, slightly less than 3 57 (calibration set) or 2 Â• 57 (validation set) due to the existence of missing data. The SKV function adds the kriging variance of water content across all the points i of the The scaled kriging variance function is defined as SKV =
PAGE 95
73 microwatershed and across all the dates of interest j, scaling it by the variance of the observed water content data across the microwatershed on the corresponding date. Provided that the intrinsic hypothesis of geostatistics is valid, the predictive accuracy of ordinary kriging can be expressed by the kriging variance (van Groenigen, 2000). Thus, finding a solution that minimizes the kriging variance (or the SKV function, in this case) can be expected to maximize predictive accuracy. Scaled mean squared error (SMSE) i (e e ) 2 The scaled mean squared error function is defined as SMSE = Â— Â• Y Y Â— . This function adds the error of prediction of water content across all the points i of the microwatershed and across all the dates of interest j, scaling the square of each residual by the variance of the observed water content data across the microwatershed on the corresponding date. This allowed us to combine errors across different dates. We explored the four possible scenarios defined by combinations of the two fitness functions (SKV, SMSE) and two algorithms (S&S, SSA). We ran five repetitions (instances) per scenario, differing in their initial conditions and in the random numbers used throughout the process. The corresponding parameters are shown in the Appendix. Validation The optimal subset of D was used to estimate the water content in the top one meter of soil on January 25, 1993 and December 23, 1993. These two dates had not been used in the calibration process. As in the calibration phase, we estimated water content in the 47 points not in the subset using ordinary kriging as a spatial interpolator. However, we used the scaled semivariogram multiplying its nugget effect and scale by the variance of the data observed in the optimal 10point subset under consideration.
PAGE 96
74 We evaluated the optimal subsets obtained from the four algorithm / fitness function combinations (SKVS&S, SKVSSA, SMSES&S, SMSESSA). In order to observe the benefits of applying our proposed method, we also evaluated three regular grids (Table 41) and 132 randomly generated subsets. We calculated relative errors 0. Â—0. _ _y Â— y_. j 00 o /o standardized residuals for each location and validation date, tested hi for bias, and plotted the standardized residuals vs. the estimated total soil water content for each validation date to check for trends in the estimation. We also calculated the ShapiroWilk W statistic (Shapiro & Wilk, 1965) to verify whether the residuals were normally distributed, and tested for heteroscedasticity using regression between the mean estimated soil water content and the variance of 4point clusters of adjacent points, following Goovaerts (1997). In order to verify compliance with kriging assumptions, we calculated histograms of the residuals divided by their respective kriging standard deviation, and checked for normality, zero mean, and unit variance. To put the results into context, we also checked for temporal stability as defined by Vachaud et al. (1985), using temporal analysis of the differences between individual and spatial averages, and performing Spearman's rank correlation on the data of all possible pairs of the five available measurement dates. Table 41 . Locations contained in the most relevant patterns mentioned in the text, together with their values of scaled mean squared error (SMSE) and scaled kriging variance (SKV) over the calibration (subscript c) and validation (subscript v) data sets. Pattern Si S2 S3 s 4 Ss S6 S7 S8 s 9 SlO SKV C SKV V SMSE C SMSEv Best SKVbased 9 11 13 22 25 34 37 43 49 55 .3276 .1441 .7559 .6336 Best SMSEbased 5 7 13 16 30 37 41 50 53 56 .3844 .4268 .3635 .5174 Regular grid #1 3 7 15 19 29 37 40 47 51 57 .3467 .1723 .7537 .7117 Regular grid #2 4 7 8 23 26 32 40 45 53 56 .3549 .2072 .6187 .5286 Regular grid #3 7 8 11 27 30 41 44 51 53 56 .3590 .2594 .7297 .6814 Best random grid 5 9 13 17 28 33 34 45 55 57 .3731 .3730 .5042 .5040
PAGE 97
75 Results and Discussion Semivariograms For each date of interest, the semivariogram model that produced the best fit was an exponential model. The corresponding parameters are shown in Table 42. Due to the unavailability of data pairs with lag distances below 41 .66 meters, it was difficult to assess the existence of a nugget effect. We assumed a zero nugget throughout, based on the great similarity among the repetitions of water content measurements that were pooled for each point at each measurement date (not shown). The scaled semivariogram model coalesced from the three calibration set semivariograms was 34 le y(h) = CÂ„+C, (effective range). , with Co = 0 (nugget), C, = 0.1082 (scale), and a = 176 m Table 42. Mean total soil water content in the first meter of soil, phenological stage, and semivariogram model parameters per measurement date, and parameters for the scaled semivariogram model. Semivariogram Date e. parameters Phenological stage Co c, a Feb 7 1992 217 mm V4 (4 tn node on main stem) 0 520 110m Feb 24 1992 228 mm tfl V6 (6 node on main stem) 0 650 130 m Mar 20 1992 253 mm R5 (Beginning seed) 0 1115 130 m Jan 25 1993 196 mm R2 (Full flower) 0 270 135 m Dec 23 1993 142 mm Planting 0 320 100 m Scaled SV N/A N/A 0 1.082 176 m Density Reduction Figure 42 shows the progress of the five instances of one of the scenarios (SMSE, SSA). Note how the length of the process varied among instances, but the final value of the fitness function they all arrived at was approximately the same. This behavior was consistent among all scenarios. However, in both the SKV and SMSEbased scenarios
PAGE 98
76 the five instances of the S&S algorithm reached their optimum value faster than any of the SSA instances (not shown), probably due to the S&S algorithm's deterministic selection of the exiting point of the subset. This characteristic makes the S&S algorithm susceptible to converge towards local minima when n has reached low values; the effect is countered by making it possible to automatically increase it, and thus the probability of escaping a local minimum, when the fitness function has not improved over a given number of iterations. This automatic increase in n typically resulted in noisy output that did not converge to the optimum; the optimum would be reached at some intermediate point, and the algorithm would oscillate thereafter until the cutoff limit of M iterations without changes in n was reached. In contrast, the SSA algorithm took longer to reach its optimum, but always converged toward it. This is consistent with the asymptotic convergence proven by Aarts and Korst (1990) for simulated annealing algorithms using the Metropolis (fully stochastic vs. the partially deterministic S&S) perturbation method. The parameterization of the SSA algorithm was also simpler than with the S&S algorithm (see the Appendix). Both the S&S and SSA algorithms found the same optimum pattern for the SMSE criterion. However, only the SSA algorithm arrived at the optimal pattern shown for the SKV criterion; the S&S solutions were slightly inferior. There was some variability (not shown in Figure 43) among the results of the five instances of the process for each of the four calibration scenarios, depending on the initial conditions of the process and/or the sequence of random numbers involved. This suggests that the chosen parameters may have quenched the system too rapidly, despite the fact that, in the case of the S&S algorithm, our chosen parameter set (see the Appendix) was more conservative and thus should converge more slowly than the one proposed by Sacks
PAGE 99
77 and Schiller (1988). This leads us to recommend repeating the process for different initial conditions / parameter values, especially if the S&S algorithm is being used. 0) CO > UJ CO CO d o o 4Â— (/) in
PAGE 100
78 Calibration dataset CO to CO UU CO CO < w CO LU CO CO CO ro CO < CO CO CO s CD c c E o "O c ro Â— CO c 0 z Validation dataset Calibration criterion Figure 43. Results of the calibration (left) and validation (right) processes for each of the four scenarios, shown as scaled kriging variance (SKV, top) and scaled mean squared error (SMSE, bottom). Left panes: results of the density reduction process on the calibration data set. Right panes: application of the optimal calibrationphase patterns to estimate the water content in the validation set (using the scaled semivariogram). Since calibration used 3 sets of measurements and validation only 2, the absolute values of the errors shown in the left and right panes should not be compared with one another. Results for three regular grids and 132 random patterns (median and range) were added for contrast. The results of the density reduction process on the calibration data set are shown on the left half of Figure 43. Observe how the values of the scaled kriging variance were quite similar across the optima of the different calibration scenarios as well as the regular grids and the 132 random patterns. This is in great measure explained by two factors: i) we set a 300meter maximum search radius in the GSLIB kb2d routine (Deutsch and Journel, 1992) used for kriging, and ii) In the case of the randomly generated patterns, we
PAGE 101
79 did not consider patterns that did not contain estimates for all of the 47 points belonging to (DS ) . The kb2d routine will not predict values for points located further than the maximum search radius from any existing data point, so patterns with missing estimates would have very poorly distributed points and consequently poor (high) SKV values. During the calibration process, we penalized (by adding a large number to it) the fitness function of patterns that did not predict all of the 47 points belonging to (DS) . This was necessary in order to keep the algorithms from reducing the value of their fitness function by minimizing the number of estimates that contributed to it. Thus, very concentrated arrangements of points were avoided, and kriging variances were correspondingly low. As shown in Figure 43, SMSE results were more variable. The SMSE of prediction of scenarios in which SKV was optimized were over 50% greater than for scenarios where SMSE was optimized. The three regular grids produced similar results to the SKV scenarios, and the 132 random patterns produced highly variable results, always with at least 39% more error over the calibration set than the SMSEcalibrated scenarios. These results will be discussed below together with those of the validation step. Validation The right half of Figure 43 shows the results of applying the optimal patterns obtained in the calibration phase to estimate water content in the validation set using the scaled semivariogram. Note that the left half of Figure 43 was created using data from three dates and the right half was made using only two, so the absolute values of the errors shown in each half should not be compared with one another.
PAGE 102
80 3CO 250 200 150 100 SO 0 January 25, 1993 m 200 190 0 50 100 15C 200 250 300 350 400 450 500 Observed 300 7 + 250 13 2X 5 1 + '50 Â•X' 50 0 16 + 37 41 + + 50 53 + + 56 0 50 100 150 200 250 300 350 400 450 500 Predicted, best SMSE scenario 300 \ 250 200 150 + 11 100 & 50 0 25 + 22 + 34 + 43 + + \& 49 55 37 8 S 8 300 250 200 IOC 100 50 0 50 100 150 200 250 300 350 400 450 500 Predicted, best SKV scenario 19 + O) o '540 29 37 + 8? + 57 + 0 50 100 150 200 260 300 350 40C 450 500 Predicted, best grid (#1) 300 250 200 150 100 50 0 300 250 2C0 150 100 60 0 300 250 200 150 "XI 50 0 300 250 200 150 Â•00 50 0 December 23. 1993 '0 8 E< >VV y N 0 50 100 150 200250303350400450500 Observed + 30 # 1 + 6 0 50 100 150 200 250 3CC 350 400 450 500 Predicted, best SMSE scenario + OSl 25 + 11 + 9 22 9 34 43 +% + 3 ; 49 55 0 50 100 150 200 250 300 350 400 450 500 Predicted, best SKV scenario + 19 155 + 40 + 3 + 5 8 Â« 1**8, 51 15 Â» if Â« 3> 0 50 100 150 200 250 300 350 400 450 500 Predicted, best grid (#1 ) Figure 44. Maps of interpolated water content for both validation dates. The top row shows the observed data, and the three rows below it show the predictions corresponding to the best SMSE scenario, the best SKV scenario, and the best regular grid. For both dates, the best SMSE scenario reproduced the observed spatial variability more accurately than the others, especially the wetter southeast sector of the field. The ten locations composing each scenario's pattern are marked.
PAGE 103
81 As expected, the observed SKV behavior is similar to that of the calibration set: lower for the SKVcalibrated scenarios than for the SMSEcalibrated scenarios. The three regular grids had somewhat higher values than the SKVcalibrated values, and the random patterns produced very variable results, with SKV values even lower than those produced by the optimal SKVcalibrated patterns. Twelve of the random patterns had better values of SKV in the validation set than the optimum. However, they tended to have high values of SMSE in both data sets, and had higher values of SKV in the calibration set than the optimal SKVcalibrated pattern. Figure 44 shows maps of interpolated water content for the two validation dates. The top row shows the observed data, and the three rows below it show the results for the best SMSE scenario, the best SKV scenario, and the best regular grid. For both dates, the best SMSE scenario reproduces the observed spatial variability more accurately than the others, especially the wetter southeast sector of the field (due to the simultaneous presence of points 5, 7 and 13 in the pattern). These maps point out the inherent limitations of kriging from a very small subset of data: reproducing spatial water content variability in 8 hectares with only 10 points can capture limited detail. However, the SMSEcalibrated method attained a relatively low error of prediction. Figure 45 shows the map of relative error of prediction e r of the best SMSEcalibrated scenario for both validation dates, and Figure 46 shows the distribution at each validation date of e r in the estimated points not belonging to the optimal pattern for both the optimal SKVcalibrated, and the optimal SMSEcalibrated scenarios. Values of e r were mostly low across both dates: for the best SMSE scenario, on Jan 25, 1993, 50% of the relative errors fell within Â±5%, 82% within Â±10%, and 6.5% fell outside
PAGE 104
82 Â±15%. On Dec 23, 1993, 35% fell within Â±5%, 72% within Â±10%, and 23% fell outside Â±15%. There was better prediction accuracy on January 25 than on December 23, and the SMSEcalibrated scenario had a lower fraction of large errors (defined as having s r  > 15%) than the SKVcalibrated on both dates. January 25, 1993 ^ December 23, 1993 3CO . 250 S 13 + * 2C0 J 50 * Â°? 300 Â± o 250 ' 100 50 0 n 32 53 + + 16 37 41 0^ *> + cÂ» 56 Â•P + 200 150 100 50 0 W + 10 g+ + o + + OA 37 41 C 50 1O0 150 200 250 300 350 4C0 450 500 0 50 100 150 200 250 300 350 400 450 500 Figure 45. Map of relative prediction error of the best SMSEcalibrated scenario for both validation dates. in c o 8 o o o S c 5 o k. o Dl 60% 50% 40% 30% 20% 10% 0% SMSE SKV .1 1.. SMSE SKV Â£ >s> j> 0) o cm 8 5 o U5 at O Ml 2 o 2 2 in o Hi o O o o o lO o CM if) CM Relative error, Jan 25, 1993 Relative error, Dec 23, 1993 Figure 46. Distribution, for each validation date, of relative prediction error in the estimated locations not belonging to the optimal pattern for the optimal SKVcalibrated, and the optimal SMSEcalibrated scenarios. The method's better performance on January 25 may be related to its mean soil water content with respect to December 23. As shown in Table 41, both dates had lower mean soil water content than the three dates used for calibration. However, the December 23 mean field water content is especially low, 142 mm / m on average. This suggests that
PAGE 105
83 the predictive capability of the method may degrade for soil moisture scenarios beyond the method's range of calibration. However, extraneous factors may also be contributing to the error: the three points shown on Figure 46 with s r > +15% (large overprediction of water content) in the SMSEcalibrated scenario on December 23 correspond to extremely dry points (44, 48, 57) near the field border that had water contents in the first meter of 0 V Â« 0.1 15, the permanent wilting point for a Haplustoll in this area. This water extraction pattern near the field boundary is consistent with the presence of weeds during the winter. Residual Analysis of Validation Results and Tests of Kriging Assumptions Figures 47A and 47B show scatterplots of standardized residuals vs. estimated soil water content for each validation date, Figure 47C shows the variance for both validation dates of the same residuals in groups of 4 values set up for a test of heteroscedasticity, and Figure 47D shows a histogram of the residuals standardized by their kriging standard deviation for both validation dates. Table 43 summarizes the results of the significance tests performed on these data. At the 95% confidence level, the kriging standard deviationstandardized residuals for December 23 had a mean that was significantly different from 0, and a variance significantly different from 1 . The standardized residuals on January 25 had a statistically significant, trend with respect to the estimated water content. The former implies that the kriging assumptions were not fully respected for December 23, and the latter that on January 25, the proposed model does not capture all the phenomena causing spatial variability of water content. This is also true for December 23, given its comparatively larger fraction of high errors when compared with January 25.
PAGE 106
84 Jan 25, 1993 Dec 23, 1993 cd "D W (1) N cd "O c CD CO 170 180 190 200 210 220 230 Predicted soil water content CD 3 ^ 1 Â£ "S o N io .1 "D C CD 55 2 110 120 130 140 150 160 Predicted soil water content 170 B Jan 25, 1993: y = 104.96 + 0.357 * x; R = 0.0012 Dec 23, 1993: y = 229.67 0.759 * x; R 2 = 0.0058 400 Â• Jan 25, 1993 Â• o Dec 23, 1993 350 300 250 200 150 100 50 0 Â• Â• Â• 100 125 150 175 200 225 Predicted soil water content (mm in 1st meter) CD c Q 18 16 14 12 10 8 6 4 2 0 Jan 25 Dec 23 CNI io Â•Â«o CM CO cm" CO A Kriging residuals / kriging standard deviation D Figure 47. Residual analysis of validation results and tests of kriging assumptions for both validation dates. Figures 7 A and 7B show standardized residuals vs. estimated soil water content. Figure 7C shows the variance of the residuals (in groups of 4) set up for testing heteroscedasticity. Figure 7D shows a histogram of the residuals standardized by their kriging standard deviation.
PAGE 107
85 Table 43. Results of residual analysis. T? s ** c ed 3 2 a a m '3 Â«r, On Tesi Mej Stai devi a Standardized residuals Bias Â°070 1 105 0.339 1.103 0.671 0.050 Trend vs. estimated 0.161 0.009 0.401 0.095 0.006 0.546 * Normality 0.972 0.958 < 0.477 < 0.173 Heteroscedasticity .0012 .0058 0.035 0.080 0.919 0.824 Kriging standard deviation standardized residuals 7 0.055 Zero mean 0.389 0.722 0.042 * tt Â•. Â• 1.035 Unit variance 1215 0.343 0.024 * Normality 0.976 0.974 < 0.432 < 0.454 Temporal Stability Analysis Vachaud et al. (1985) plotted the ranked means and variances of 5j i , the relative difference between the observed water content G f jat each location i at time j, and the 0 Â— q fieldwide mean observed water content 6^5^= iJ _ 1 . The mean of this value over time for a given location was labeled 8, (mean relative difference) and its standard deviation, ct(5, j ). Figure 48 presents the measured data for the five dates pooled together for each location, with the 8j values ranked in ascending order from left to right. Note how certain locations systematically either overestimate (s~, a(5j j) > o) or underestimate (8j + a(5; p < o) the microwatershed average soil water content, irrespective
PAGE 108
86 of the observation date. This figure also puts the points comprising the optimal SKVbased subset and the optimal SMSEbased subset into the context of the temporal stability of the whole microwatershed. Note how the SMSEbased solution captures the full range of variation of the mean relative differences. Table 44 shows the results of the Spearman rank correlation test among the five dates taken two at a time. Correlation is significant through all combinations of two dates, indicating temporal stability of the soil water patterns. Unsurprisingly, correlation is somewhat higher among the three dates corresponding to the same cropping season. Table 44. Spearman's rank correlation tests for temporal stability. Each cell represents the results of the test performed between the data of the dates shown for the corresponding row and column. The top number is the rank correlation coefficient, and the bottom number is the pvalue. Note that all of the results are significant, indicating temporal stability. The average total soil water content (over the first meter of soil) at each date is shown for reference. Dates: Feb 7 Feb 24 Mar 20 Jan 25 Dec 23 1992 1992 1992 1993 1993 5 J (mm) 217.2 228.0 254.0 195.3 145.3 Feb 7 1992 1 0.692 p< 0.001 0.520 p < 0.001 0.550 p < 0.001 0.397 p< 0.001 Feb 24 1992 0.692 p< 0.001 1 0.506 p < 0.001 0.358 p< 0.001 0.436 p< 0.001 Mar 20 1992 0.520 p< 0.001 0.506 p< 0.001 1 0.400 p<0.001 0.354 p< 0.001 Jan 25 1993 0.550 p< 0.001 0.358 p< 0.001 0.400 p < 0.001 1 0.356 p< 0.001 Dec 23 1993 0.397 p< 0.001 0.436 p< 0.001 0.354 p< 0.001 0.356 p< 0.001 1 Sources of Error and Nonstationarity The fact that the SMSEbased solution appears better than the SKVbased solution is meaningful. Formally, given enough realizations the SKVbased method would be
PAGE 109
87 expected to also minimize the SMSE. This is the rationale behind optimal sampling network construction algorithms that minimize kriging variance (van Groenigen et al., 1999; McBratney at al., 1981). However, as pointed out by Deutsch and Journel (1992) SKV is not necessarily the best measure of local accuracy because it does not take the actual data values into account. Our result may reflect two possibilities: either there are not enough joint realizations of the random variable of interest and the error of the particular available realizations deviated from its expected value (given by the kriging variance), or else the random variable is spatially nonstationary. The former cannot be ruled out and is perhaps inevitable in a study with a limited number of measurement dates. The latter is probable; as shown above there was a violation of one of the kriging assumptions on the second validation date. ICO CD o Â£Z CD i_ CD TJ CD > 4Â— I 03 CD c CD CD 40 30 20 10 0 10 20 Â•30 t mark the points of the best SMSE scenario. I mark the points of the best SKV scenario. J . * A A 4127 214243 22 23 3751 1 3 1 7 31 33 46 55 574740 3 25 49 2 5019 7 11 5 ^ 1648 9 28264544 8 5239 1538305332122420291034561854 6 4 Locations, ranked by mean relative difference Figure 48. Ranked intertemporal relative deviation from the mean (across the microwatershed) spatial soil water content, \ . The measured data for the 5 dates are pooled together for each location, with the \ values ranked in ascending order from left to right. The locations comprising the optimal SKVbased subset and the optimal SMSEbased subset are marked with arrows.
PAGE 110
88 The violation of spatial stationarity is further demonstrated because the spatial pattern of soil water over the watershed is temporally stable. As seen in Figure 48, some points such as #5 are consistently wetter than the rest, and some points such as #41 are consistently drier than the rest. The SKVbased scenarios ignored this, and produced an optimal subset that covered a limited fraction of the total range of mean relative difference. Contrarily, the SMSEbased scenario covered the extremes, capturing a range 40% greater than in the SKV case. It incorporated the points that most strongly disobeyed the nonstationarity criterion and used them to reduce its error of prediction. The abovementioned nonstationarity is a result of different processes at work throughout the micro watershed. Reynolds (1970) enumerated several static and dynamic factors that affect spatial variability of soil water content, and Mohanty et al. (2000) equated this to factors governing time stability. Kachanoski & De Jong (1988) contended that spatial variability of hydrological processes degrades time stability of soil moisture patterns in landscapes with topographic redistribution of soil water. This argument was supported by results from Grayson and Western (1998) and Mohanty et al. (2000), who worked with catchments involving topographically routed lateral redistribution of soil moisture. In our case, the rank correlation coefficient values of the tests for temporal stability shown in Table 43 were high enough to indicate temporal stability, but it does not extend to the entire field: several points have highly variable values of 5 ( , resulting from a variety of processes. Point 57, for example, is on a header row and has a highly variable soybean stand quality as a result of the maneuvering of farm equipment; point 33 can form part of a waterway during intense rainstorms, etc.
PAGE 111
89 Topography is influential in the microwatershed under study: elevation decreases eastward, and there is also a small embankment on its eastern edge. Contributing area increases, and slope decreases eastward consistently with the observed tendency for ponding near the eastern edge of the microwatershed. It is therefore not surprising that the southeastern sector of the field has a highly variable water content, and that consequently the SMSEbased algorithm concentrated several points (#5, #7, #13) there to maximize its predictive accuracy. Although our results are encouraging, purely statistical models may be unable to fully capture soil water variability, especially following rainfall events that generate significant surface and / or subsurface flow. Indeed, Kachanoski and De Jong (1988) observed that soil drying did not alter the spatial pattern of soil water content, but time stability during recharge was scale dependent, correlating with surface curvature at spatial scales below a certain distance threshold (40 meters). This altered the spatial pattern of soil water content during recharge at large (fine) scales but not at small (coarse) scales. A convenient method of expressing topographical influence on soil water distribution is by using topographic indices such as the one proposed by Beven and Kirkby (1979). Their index represents the effects of variable contributing upslope areas and the slope at the point of interest. Nyberg (1996) found strong correlation between this topographic index and soil water content. Crave and GascuelOdoux (1997) successfully linked water content with a topographic index referring to downslope conditions, defined as the elevation difference between the point of interest and the outlet of the water pathway. In contrast, Ladson and Moore (1992) concluded that temporally varying soil
PAGE 112
90 water content was not well predicted by simple static topographic attributes over a gently sloping catchment in the Konza Prairie, Kansas. Subtracting a trend from the data and kriging only the residual component can be an effective means of minimizing the effect of nonstationarity (Goovaerts, 1997). However, in this case the trend itself is temporally and spatially variable given its dependence on hydrological processes. A simple physically based tool such as the Beven & Kirkby model (Beven et al, 1995; Beven and Kirkby, 1979) extended with a crop simulation model such as CROPGRO (Boote et al., 1998), SUCROS (van Laar et al., 1992) or GLYCIM (Acock and Trent, 1991) to account for the effects of crop cover, may be a valuable tool for generating temporally variable trend surfaces. Conclusions We combined the scaled semivariogram technique with two simulated annealing algorithms to reduce the number of locations necessary to describe water content in our 8hectare study area from 57 down to 10 points. The scaled semivariogram allowed us to incorporate data from several dates, both to reflect timeindependent behavior of water content and to compensate for the relatively small size of the individual datasets. Of the two simulated annealing algorithms, Spatial Simulated Annealing (van Groenigen and Stein, 1998) produced more consistent results than the Sacks and Schiller method (Sacks and Schiller, 1988), although the solutions provided by both algorithms were quite similar. Running multiple instances of the optimization process is recommended, especially if using the Sacks & Schiller method. Our proposed method predicted water content across the validation set with relatively low errors: over 70% of all the predicted water contents had an error within Â±10%, acceptable for the application it was designed for. The method also captured the
PAGE 113
91 spatial variability of water better than regular grids or randomly generated patterns. However, the SMSE (scaled mean squared error) based scenarios performed better than the scenarios using an SKV (kriging variance) criterion. We detected temporal stability in the dataset. This phenomenon implies the existence of spatial nonstationarity of the water content across the field, leading to the violation of kriging assumptions and the degradation of the quality of kriging estimates. However, the SMSEbased optimization scenarios incorporated temporally stable extreme (wet and dry) points into the optimal subset, using them to capture the nonstationary behavior. Our method may be improved by combining a spatial water movement model with a crop simulation model to provide a temporally variable trend that can be used to eliminate possible spatial nonstationarity.
PAGE 114
CHAPTER 5 A FASTER ALGORITHM FOR CROP MODEL PARAMETERIZATION BY INVERSE MODELING: SIMULATED ANNEALING WITH DATA REUSE Introduction Crop simulation models have been proposed as valuable tools for understanding the causes of spatiotemporal yield variability and developing optimized forms of management (Batchelor et al., 2002). A major challenge for such endeavors is the determination of the necessary soil parameters: in practical precision agriculture applications these inputs cannot be measured due to cost constraints, and must consequently be estimated. Welch et al. (1999a) proposed an inversemodelingbased method for estimating crop model genetic coefficients. The method exhaustively simulated all the parameter combinations in a discrete input space, and then examined the results to find the best set of parameters for each crop variety. The "best" parameter combination was defined as the one producing the minimum value of an objective function (OF) defined as the sum of squared residuals between simulated and observed data. Irmak et al. (2001) expanded on the grid search concept, using it to estimate soil properties. They compared grid search results with those of adaptive simulated annealing (Ingber, 1993), a sophisticated search method which had been used previously by other authors to parameterize crop models (Braga, 2000; Calmon et al., 1999; Paz et al 2001). Irmak et al. (2001) presented an example problem from a field in Iowa involving five parameters and two years of observed crop yield data, solving it as follows: 92
PAGE 115
93 Â• Discretize the input space into 74,536 combinations, describing the range of variation of one parameter with seven points, another parameter with eight, and the remaining three with 1 1 points each. Â• Run the CROPGRO model (Boote et al., 1 998) for each parameter combination and two weather years. Â• Apply landscapepositiondependent rules to reduce the number of parameters to estimate, based on expected parameter sensitivity. For example, in the second case study presented, three parameters were estimated for a soil described as not having drainage problems: the SCS CN2 curve number (USDA, 1972), SLPF (a soil fertility factor) and SLB (maximum rooting depth). If, on the other hand, the soil had drainage problems (but did not have tile drainage), only SLPF and KSAT (saturated hydraulic conductivity) would be estimated, assuming that the profile's capacity to get rid of excess water as expressed by KSAT, would be a more sensitive parameter than a runoff parameter or the rooting depth. Â• The rulecompatible parameter combination having the lowest rootmeansquared error (RMSE) between CROPGROpredicted and observed yield across the two years was chosen as the optimum. Â• Run the process independently for each of the 1 1 distinct environments (soil types) in the field. Irmak et al. (2001) reported that runtime for the grid search implementation described above was less than half that of adaptive simulated annealing. Although the grid search method requires simulating the whole input space in order to provide a parameter estimate, and can thus be considered very inefficient, these results can be understood in the precision agriculture context in which parameters are sought for multiple locations. Consider the hypothetical field shown in Figure 51, divided into a series of distinct environments (for example, soil map units) Ei, E 2 , and E 3 . Each environment i contains one or more locations of interest Ljj where model parameterization is desired. Locations within a distinct environment are considered to have similar but not identical soil properties, so the same discretized parameter space is used for all of them. A complete set of crop model runs is performed only once per
PAGE 116
94 environment, and a rapid search within the results yields parameter estimates for all the locations of interest within each environment. El 111/ L 2 1 Â• Â• / Â• IÂ— 90 Â• L32 Â• Â• ^33 Â• ^23 Â• Figure 51. A hypothetical field divided into three environments (soil types) E 1; E 2 , E 3 . Each environment i contains a number of locations Ljj of interest. The grid search algorithm runs the model to simulate the whole parameter space for each environment i, and the corresponding set of answers is searched for the minimumRMSE parameter combination for each location of interest Ljj. In contrast, the adaptive simulated annealing algorithm used by Irmak et al. (2001) was run independently for each location within an environment. The parameter estimation process for any given location could not reuse the OF values calculated previously for other locations. This could result in multiple (timecostly) crop model runs for the same parameter combinations at different locations, inefficient given that soil parameters are usually expected to be similar across locations within a soil type. Furthermore, the simulated annealing algorithm was run on a continuous parameter space, and so estimated parameters with a different (lower) level of uncertainty than the grid search. In light of these implementation differences, a direct performance comparison of the two algorithms is not appropriate. However, there is opportunity for
PAGE 117
95 further speed improvement by combining the two methods, incorporating data reuse and a discrete parameter space into a simulated annealing algorithm. Considering the previously mentioned assumption of similarity among the soil parameters of nearby locations in a mapping unit, our working hypotheses were that: Â• The runtime of a simulated annealing algorithm used for crop model parameterization would be greatly reduced if it were allowed to reutilize simulation results across locations within an environment, and Â• A simulated annealing algorithm thus modified could have similar accuracy and a significantly lower runtime than the grid search algorithm for a wide range of number of locations per environment. The objective of this study was to implement and test (relative to a pure grid search) a simulated annealing algorithm modified to work on a discrete parameter space and reuse previously simulated results. Materials and Methods Simulated Annealing Overview Simulated annealing (SA) is a combinatorial optimization algorithm derived from the work of Metropolis et al. (1953). Our study implemented a slightly modified version of the algorithm used by Ferreyra et al. (2000), which in turn was derived from the spatial simulated annealing algorithm of van Groenigen and Stein (1998). Simulated annealing algorithms have four main components: the fitness function, acceptance criterion, generation mechanism, and cooling schedule (Aarts and Korst, 1990). They are all briefly described below. Fitness function: the objective function (OF) to be minimized or maximized. Acceptance criterion: Let x J and x' be two vectors located in the parameter space such that x' is generated by perturbing x J . Let the corresponding fitness function values be
PAGE 118
96 J(x j ) and J(x') respectively. The acceptance criterion determines whether x' replaces x J or not. In a minimization problem, the acceptance probability is defined as: 1 exp 0(x J )J(x') A if J(x')J(x J ) (51) where c is a positive control parameter (or function) that decreases as the algorithm progresses. Generation mechanism: The pattern x' is generated from x j by adding a vector u to x J , where u is a randomly generated vector such that x'= x J + u corresponds to a valid point in the discrete parameter space, and the length of the projection of u along the k th coordinate axis does not exceed h times the total data range along that axis. Parameter h typically takes an initial value of 1 , and may decrease with time. Cooling schedule: The cooling schedule changes the value of c and h as the algorithm progresses. A number of iterations of the algorithm are performed at each level of c and h, after which a transition occurs and the parameters are updated: i+i c = a c c h 1+ ' = a, h' (52) with a c and a h having values slightly less than 1 . The SA algorithm stops when c becomes less than a preestablished final value c f . The total number of c and h transitions is derived from Equation 52, according to: n = round l0 Â§a ' c 1 (53) The total number of iterations N is linked to n by the expression N = (n + l)m (54)
PAGE 119
97 where m is the number of iterations performed at each value of c. It is desirable that m be a large number. This gives the system the opportunity to reach equilibrium at each level of c, which maximizes the probability of reaching an optimal solution (Ingber, 1993). Aarts and Korst (1990) provided a probabilistic mechanism for calculating an optimal value of m, but in this study m was used as an independent variable because of the emphasis placed on speed and control. Crop Model Management Successive iterations of the SA process began with a parameter set provided by the generating mechanism. Since the proposed parameter values could have been presented to the crop model in previous iterations, a simple data structure for recording model runs was devised: a multidimensional array containing a Boolean variable (or "flag") was created for each possible parameter combination. A 0 value of this flag indicated that the corresponding parameter combination had not been previously explored; the crop model was run for that combination, its simulated yield values were stored in multidimensional arrays of real numbers, and the flag was set to 1 . On the contrary, if the flag value was 1, the parameter combination had been previously simulated, and the yield data were retrieved from the arrays instead of being simulated by the crop model. Case Studies The performance of the grid search and simulated annealing methods were compared in two case studies: The first is a synthetic oneenvironment, onelocation, twoparameter case built on a 201x201 parameter space. The purpose of this study was to explore how different SA parameter values and the size of the parameter space change the solution to a complex problem having numerous local optima. The effect of parameter space size was explored
PAGE 120
98 by subsampling the original 201x201 grid to obtain smaller (e.g. 101x101, 51x51) spaces. The problem used the OF shown in Figure 52, built with a Bessel function as follows: z = J 0 (Vx 2 +y 2 14.93057) (55) The x and y ranges (1 .75 < x < 0.25; 0.25 < y < 1 .75) were selected so the maximum OF value was located away from the center of the parameter space. The intent of this displacement was to generate a less favorable situation when a h < 1 . In such cases, all of the parameter space locations cannot be reached from all the other locations, and the center of the domain tends to be visited more often than the periphery. Positioning the optimum away from the center of the domain makes the problem more difficult to solve. Figure 52. Objective function used in case study 1.
PAGE 121
99 The second case study consisted of a oneenvironment, 13 location, fourparameter, twoyear problem with 78,608 (17 3 xl6) parameter combinations of the maize {Zea mays L.) model CERESMaize (Ritchie et al., 1998). The purpose of this case study was to test the SA algorithm with a crop model and real data, and explore how its performance varied with a growing number of locations per environment. This was a minimization problem; the OF was the RMSE between predicted and observed yield of the 1999 and 2001 corn harvests in the Suggs 4 field, located near Murray, KY, USA (36Â° 32' N, 88Â° 27' W, elev. 222 m). Relative errors were used in order to keep the OF scaled approximately between 0 and 1 and to use the same values of cÂ° used in case study 1. Otherwise, the behavior of the acceptance criterion shown in eq. (1) would change significantly unless cÂ° were altered to accommodate the different scale. The crop model parameters and their limits are shown in Table 51. Table 51 . Crop model parameters and ranges for case study 2. Parameter Definition Units Minimum Maximum NÂ° Points KSAT Saturated hydraulic cm d" 1 0.0001 67l 16 conductivity, bottom soil layer CN2 SCS runoff curve number 72 92 17 SDEP Soil depth cm 45 165 17 PDEN Plant density Plants m' 2 3 8 17 In both case studies, the parameterization process was run for multiple scenarios defined by all the possible combinations of the SA parameter values shown in Table 52. For example, case study 1 had 6 x 7 x 1 x 1 x 3 x 7 x 5 = 4,410 scenarios. Each scenario was run seven times with different initial conditions. It is important to clarify the difference between a) the crop model (or objective function) parameters, i.e. the value combinations that form the parameter space for the
PAGE 122
100 optimization algorithm, and b) the SA parameters shown in Table 52, which control the duration and convergence properties of the SA algorithm. Table 52: Simulated annealing scenario parameters. SA Parameter Values a e 0.995, 0.99, 0.95, 0.9, 0.85, 0.8 cc h 1, 0.995, 0.99, 0.95, 0.9, 0.85, 0.8 cÂ° 1 hÂ° 1 c f 0.01,0.001 0.0001 M 1,2,5,10,20,50,100 Space size (case study 1) 2012, 1012, 672, 512, 252, 192 Locations (case study 2) 113 Results and Discussion Case Study 1 The results of this case study show how SA can converge to a local minimum if not allowed a sufficient number of iterations, especially if a h < 1 . Figure 53 shows the results of six runs: three for a h = 1 and three for a h = 0.995. In the former, all parameter combinations were always reachable from all the others, whereas in the latter, parameter changes were bounded by a progressively smaller radius (Equation. 52). Note how, after roughly 2000 iterations spent mostly at low OF values, the cc h = 1 runs all reached the close vicinity of the global optimum (where the OF = 1). Conversely, the three a h = 0.995 scenarios frequently converged to local optima. The best convergence to the global optimum in the 201x201 space occurred with cc c = 0.995, a h = 1, c f = 0.0001, and m > 5, which produced approximately a quarter of the number of OF calculations required by a grid search.
PAGE 123
101 Using ah < 1 was successful for solving other combinatorial optimization problems (van Groenigen and Stein 1998, Ferreyra 2002). Its poor performance in this study may be due to the limits imposed on the total number of iterations as determined by m and cc c . Attaining equilibrium at each level of c is important in SA yet is not possible when m is too small (Aarts and Korst, 1990). The oc h = 1 scenarios did well because the region around the global optimum was as probable as any other. On the contrary, when ah < 1 and m = 5 the region was not visited often enough to converge to the global optimum. 1 00 0.75 0.50 0 25S 0.00 1 0.25 ' 050 2000 4000 6000 Iterations 8000 10000 c o 1.00 0.75 c 050 CD > 0.25 o 0.00 CD IT 0.25 O 0.50 2000 4000 6000 8000 10000 Iterations c 1.00 5 0 75 Â§ 0.50 g 025 B 000 S0.25 Â° 0 502000 4000 6000 Iterations 8000 10000 2000 4000 6000 Iterations 8000 10000 o 1.00 i 0 75 c 0.50 1 CD 5 0.25 1 u 0 00 CD "S 0.25 ' O 0 50 ' 2000 4000 6000 Iterations 8000 10000 a h = 1 2000 a h = 0.995 4000 6000 Iterations 8000 10000 Figure 53. Objective function vs. number of algorithm iterations for six runs of the simulated annealing algorithm. The left column corresponds to a h = 1, the right to a h = 0.995. In all cases a c = 0.995, c f = 0.0001, and m = 5. Minimizing the fraction of N (total iterations) corresponding to unique objective function calls (or model runs), i.e. maximizing the number of database hits, is very
PAGE 124
102 desirable. Complex crop models such as CROPGRO tend to run slowly because they simulate many complex physical processes. For example, CROPGRO's ETPHOT gas exchange routine (Boote and Pickering, 1994) requires the iterative solution of a system of linear equations to simultaneously determine transpiration, leafscale energy balance, and photosynthesis. A proposed revised soil temperature simulation routine (Andales et al., 2000) has similar requirements. In contrast, the parameterization's database management overhead only requires integer arithmetic and very fast indexing operations in data structures contained wholly in the computer's memory. The SA process does require real arithmetic for the calculation of the OF, the acceptance criterion, and the cooling schedule. However, one CROPGRO run may use as much time as the whole SA process plus thousands of database hits. Thus, this study only compared runtimes in terms of model runs. Figure 54 shows the number of unique model runs required by the SA algorithm for three different parameter space sizes and three different values of m. The box plots on the left column correspond to a h = 1, the ones on the right to a h = 0.995. Each row of plots shows a different m; the bottom row (m = 100) represents 20 times more algorithm iterations than the top row (m = 5). The three box plots per graph show different parameter space sizes (the leftmost is the original 201x201 grid). All the scenarios have a c = 0.995 and c f = 0.0001. Each of the rows (showing six scenarios and seven repetitions per scenario) corresponded to the same number of iterations per row (9190, 36760, and 183800 for m = 5, 20 and 100, respectively), but the number of actual OF calculations (equivalent to crop model runs) varied greatly depending on the parameter space size and the value of
PAGE 125
103 a h . The dependence on parameter space size is explained by the number of available parameter combinations, much lower for a 51x51 space than for a 201x201 space, and by the probability of visiting a previously used cell (and retrieving data from the database instead of running the model), which is higher in the smaller space. The differences between results for different a h values have similar causes: the total number of model runs was higher for a h = 1 because every cell in the parameter space was always a valid destination, so the probability of revisiting a cell was relatively low. Conversely, when oc h < 1, the set of valid destinations shrank as h decreased, and the probability of a database hit (revisiting a previously simulated cell) increased. 40000 in 30000 ^ 20000 10000 0 40000 w S. o 30000 S ^ 03 " 20000 o E ^ 10000 0 40000 8 30000 ii 20000 E 10000 I ii 201 x201 101 x101 51 x51 201 x201 101 x 101 51 x51 a h= 1000 a h = 0.995 Parameter space size Figure 54. Unique objective function calculations (equivalent to crop model runs) for 18 scenarios (7 repetitions / scenario) in case study 1 . The dot shows the median value for the 7 repetitions; the whiskers show the extremes.
PAGE 126
104 Figure 54 shows how the simulated annealing approach as used by Braga (2000), Paz et al. (2001), and Irmak et al. (2001) can run slower than the grid search even for one location: if a high value of m is adopted (for example, m = 100, corresponding to 183,800 iterations) to assure convergence to the global optimum, the number of iterations (which corresponded to crop model runs in the abovementioned studies) greatly exceeds the size of the parameter space (at most 201x201 = 40,401). Case Study 2 Case study 1 provided the SA algorithm with nearly worstcase conditions; its wellbehaved scenarios (a c = 0.995, a h = 1.000, c f = 0.0001, and m > 5) were expected to behave well in case study 2, even though the number of iterations allowed would be a smaller fraction of the parameter space size. 80000 xxxxxxxxxxxxx *a h = 1.000 ocxh = 0.995 xGrid 0 5 10 15 Locations / environment Figure 55. Total model runs vs. number of locations per environment for case study 2. Both simulated annealing scenarios had c' = 0.0001 and m = 5. Figure 55 shows how the number of unique model runs grew with the number of locations of interest within the environment. The top line corresponds to the grid search, in which CERES was run for the whole parameter space irrespective of the number of
PAGE 127
105 locations of interest within the environment. The other curves correspond to the SA algorithm and different cc h values. Both SA scenarios tended asymptotically towards the parameter space size, but the c* h = 0.995 scenario did so more slowly that the a h = 1 scenario. This occurred because the algorithm converged faster with decreasing values of 0.25 ^ 0.20 c o I 5 0.15 I  0.10 o LD 0.05 0.00 I + i + i + * Â•2* <*h = 1.000 0 Â«h = 0.995 * Grid L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 Location ID Figure 56. Error at each location of interest for the grid search and two simulated annealing scenarios of case study 2. The simulated annealing scenarios have a c = 0.995, c f = 0.0001, and m = 5. As in case study 1, the a h = 0.995 scenario produced the greater error. Figure 56 shows the median and extreme final OF values for seven runs in each of 13 locations in the Suggs 4 field. Note how the cc h = 0.995 scenario had consistently higher and more variable OF values than the cc h = 1 scenario, which in turn had very similar values to the
PAGE 128
106 grid search case. The minor differences occurring between the latter two methods in some locations can generally be fixed at the end of the SA algorithm using a gradientdescentlike process consisting of a few dozen extra iterations with h Â« 1 and c = 0. This ensures that the algorithm converges to the maximum OF value in the vicinity of the current location. Using this final step in conjunction with a low value of m or ah does not compensate for the lack of iterations, however, because there is a high probability that the algorithm would converge to a local optimum. Despite the observation by Welch (1999b) that crop models tend to have a smooth response to parameter variation, Royce et al. (2001) showed that local optima may occur frequently. To add perspective to the results of Figure 55 for ah = 1, the case studies shown by Irmak et al. (2001) had 9 to 48 total locations distributed among up to 1 1 soil types, and Paz et al. (1998) used 100 locations divided among seven soil types. Using the proposed algorithm in precision agriculture projects such as these would result in runtime savings of 25% 75%. Additionally, landscape position dependent rules can be easily added to the SA generation mechanism to exclude invalid parameter combinations and reduce the effective parameter space size as suggested by Irmak et al. (2001). Unlike a grid search, the SA algorithm can be used to explore very large parameter spaces. Most crop modeling applications in precision agriculture studies to date have assumed that there is no coupling among the processes occurring in different landscape positions, and that consequently all the locations in the landscape can be simulated independently. In a situation with coupling due to spatial water movement, for example, the parameters at one location can affect the simulation results (and consequently, the parameter estimates) at another. The database method is inapplicable in this circumstance
PAGE 129
107 because the optimization problem size increases exponentially with the number of locations of interest. An approach using SA can still produce a good approximate answer, however. As an example of a large combinatorial problem solved with SA, the soil water monitoring network optimization problem solved by Ferreyra et al. (2002) had 4.318 x 10 10 parameter combinations and consistently converged to the same good * solution. Finally, the proposed algorithm can help introduce cropmodelingbased tools into practical industry applications. Figure 53 shows how the SA algorithm can generate good "first cut" solutions in only a few hundred iterations, making it possible to answer "whatif ' management optimization questions in a matter of seconds rather than hours. Conclusions This study shows that the runtime of a simulatedannealingbased crop model parameterization process is greatly reduced through the reuse of simulation results across successive iterations of the SA algorithm and across locations within an environment. The performance of the modified simulated annealing algorithm used was parameter value dependent. However, a conservative parameter combination was found (a c = 0.995, a h = 1.000, c f = 0.0001, and m > 5) that ran much faster than a grid search, its runtime tending asymptotically to that of the grid search as the number of locations of interest grew, while converging to objective function values (and the corresponding parameter combinations) practically identical to the global optima determined using the grid search method. Adoption of the proposed algorithm can produce runtime reductions on the order of 25% 75%, depending on the geometry of the simulation domain. Additionally, it can be
PAGE 130
108 used to parameterize coupled spatial crop models in which parameter values at one location can affect parameter values at other locations, a task not possible using a grid search. Finally, the SA algorithm can very quickly produce approximate answers useful in practical applications.
PAGE 131
CHAPTER 6 USING BAYESIAN NETWORKS TO HELP UNDERSTAND CAUSAL RELATIONSHIPS Introduction Agriculture is a complex endeavor: inflationadjusted commodity prices decreased steadily during the 20 th century (USDA NASS, 1994), forcing farmers to attain progressively better crop yields in order to survive. Concurrently, environmental regulations, labor constraints, and interannual climate variability also drive farmers toward risk management and the optimization of management decisions such as planting dates, seed treatments, fertilization rates, variety selection, etc. Understanding the causes of yield variability and crop response to management is necessary before management decisions can be optimized. Agricultural systems are very complex, in great part because of the multiple yieldaffecting interactions between crops, their environment, and management (Lawrence et al., 2000). This complexity may obscure decisionmakers' understanding of some causeeffect relationships in agricultural fields, such as the link between applied fertilizer amounts and crop yield. Farmers frequently seek the help of crop consultants and extension professionals for understanding the consequences of different management options. Farmers are also increasingly adopting information technology (Schmidt et al., 1994), using personal computers for numerous tasks such as accounting, managing inventory, researching prices, checking weather forecasts, reading news, buying and selling products, etc. Large corporate investments in webdelivered services and content, such as Farm Assist 109
PAGE 132
110 ( www.farmassist.com ) also suggest a growing use of computers for decision support by farmers. Crop consultants and extension professionals possess a great deal of knowledge about agricultural systems, but when the system is complex it may be difficult to transmit that knowledge, and evaluating the success of the communication process may not be easy (Lawrence et al., 2000). Many scholars and various active learning theories have stressed the importance of experience and reflection in learning and practice (Kotval, 2003); however, a thorough handson field exploration of the agricultural management options available to specific farmers is not costeffective. It would be desirable to have a computerassisted medium through which the crop consultant / extension professional and the farmer could discuss causeeffect relationships and jointly build a simple model of the agricultural system that behaves realistically while preserving a clear, easily understood representation of the causal relationships involved. This type of tool should be able to represent the expert knowledge of both the consultant and the farmer, but the model should evolve gradually, keeping pace with the discussion process. The objectives of this paper are to provide an overview of a powerful yet simple probabilistic causal modeling technology, to discuss how it can be used to help understand causeeffect relationships, and to present a detailed example, a method for quantifying probabilities, and an online source of additional information. Bayesian Nets as Simple Expert Systems to Help Explain and Understand How Things Work Expert knowledge has often been represented and used to make inferences by means of computer programs called expert systems. Most expert systems have been built
PAGE 133
Ill using a knowledge base, typically consisting of a large set of "ifthen" rules (Metaxiotis et al., 2002). Building, debugging, modifying and maintaining this type of model can be very complex, and is not easily mastered by the nonspecialist (Darwiche, 2000). Probability theory provides an alternative technology for building expert systems: Bayesian networks or BNs (Pearl, 1988). Formally, a Bayesian network can be defined as "a specification of a joint probability distribution of several variables in terms of conditional distributions for each variable " (Nadkarni and Shenoy, 2001). We will clarify the meaning of this definition below. A BN model is represented at two levels, qualitative and quantitative. Qualitatively, a BN is an acyclic graph (a network of nodes and arcs arranged so that it contains no loops) in which nodes represent variables and the directed arcs linking the nodes describe probabilistic relationships embedded in the model (Nadkarni and Shenoy, 2001). Assume two variables X and Y, each with a set of possible values or states (its state space) consisting of mutually exclusive and exhaustive values of the variable. If there is an arc pointing from X to Y, we say that X is a parent of Y. Nodes that have no parents are called root nodes. The quantitative component of a BN specifies the probabilities for the root and nonroot nodes. Each root node has a simple distribution of probabilities for its different states. These prior probabilities represent existing prior knowledge about the state of the variable. The probability for the states of other nodes i.e. those with parents, may depend on the state of each parent, and must be specified in those terms. This type of probability distribution is represented with a conditional probability table, examples of which are shown further below.
PAGE 134
112 The joint probability distribution in the BN definition above is the end result of combining the conditional probability tables with the prior probabilities. A BN is the specification or "roadmap" of how the joint probability distribution is built. Bayesian networks can be used to make inferences about system behavior. Probabilistic inference refers to the process of computing the probability distributions of a set of variables of interest after obtaining some observations of other variables in the model and propagating that information through the network (Nadkarni and Shenoy, 2001). There are two possible kinds of inferences in a BN, deductive and abductive. In deductive inference, the properties of effects are inferred from knowledge about their causes. In abductive inference, knowledge about the effects is used to propose the most likely distribution of the causes. Bayesian networks or their qualitative component, causal diagrams, have been used primarily in medical applications (Sierra et al., 2000) such as epidemiological modeling (Greenland et al., 1999), the automated discovery of adverse drug reactions (Orre et al., 2000) , cardiac diagnosis (Nikovsky, 2000), and predicting obesity risk (Bunn et al., 1999). Other applications include data mining (Heckerman, 1997), finance (Gemela, 2001) , data fusion for desertification studies (Stassopoulou et al., 1998), environmental impact studies (Marcot et al., 2001), and enhancing the functionality of Microsoft software (Helm, 1996). Agricultural applications in the literature include predicting crop yields and pest effects (Kristensen and Rasmussen, 2002), yield response to fungicides (Tari, 1996), and agricultural image processing (Onyango et al., 1997). There are several programs available for to help in the qualitative and quantitative phases of BN development. Software such as Netica (Norsys, 1998) allows a user to
PAGE 135
113 graphically edit the network's qualitative structure, create and populate the probability tables, and perform both deductive and abductive inference. The results are shown graphically as bar charts. A Simple Example We conducted onfarm research on an agricultural operation near Murray, KY during the 20002002 cropping seasons. Our work included building BNs to help understand crop yield variability over space and time for decisionmaking in precision agriculture (Morgan and Ess, 1997). Working with an NRCS soil scientist, crop consultants and the farmer, we built several Bayesian networks to aid in discussing and understanding the physical and biological processes that were taking place in the field. Our example focuses on one of the processes we discussed: surface water flow over the field following a storm. We agreed that the rate of runoff water flow over the surface is related, among other things, to the roughness of the soil surface. We made a simple model of soil surface roughness through discussion with the local experts, and used it to discuss possible management alternatives. Obtaining trustworthy probability estimates from experts may seem very difficult. However, inference results are more affected by the qualitative structure of the BN than by uncertainty in the probability estimates (Darwiche and Goldszmidt, 1994). It is more important to obtain a consistent way of mapping experts' perceptions to probabilities than it is to ensure that the probability values adhere strictly to reality. To achieve this consistency we used (and suggest) a modified form of a scale proposed by Renooij and Witteman (1999) shown in Figure 61.
PAGE 136
114 Certain Probable Expected Fiftyfifty Uncertain Improbable Impossible Figure 61. Scale used to translate between verbal quantifiers and probabilities (expressed as percentages). Adapted from Renooij and Witteman (1999). Deductive Inference Figure 62 shows a simple BN we built using Netica v. 1.12 software (Norsys, 1998), to predict soil surface roughness based on crop residue accumulation. Each box represents a variable and shows the possible states the variable can take (a design decision made during the discussion process) together with their corresponding probabilities in bar chart form. The root nodes (independent variables) are Tillage (the type of tillage, Minimum till or No till), Lastcrop (the crop previously grown in the field: soybeans or corn), Yieldoflastcrop (how much that crop yielded: low, medium or high), and Time_of_the_year (the time of interest: Preplanting, Crop growth season, and Postharvest). After identifying the root nodes, we used Last crop and Yield of last crop as parents of Plant material (how much plant material was produced), which we used together with Tillage to condition Residue (amount of residue available at the surface), IUU 85 75 50 25 15
PAGE 137
115 which was used in turn with Time of the year to condition Roughness, our dependent variable of interest. Note how the arrows in Figure 62 lead from causes to effects. After the qualitative definition we quantified probabilities. We set up the individual conditional probability tables, and then used the network to make inferences, exploring the network's results for different inputs. If results were unexpected or unreasonable, we reexamined and discussed the conditional probability tables, making adjustments if necessary, testing again, etc. Tillage Minimum till 25.0 No till 75.0 Last_Crop Soybean 50.0 Com 50.0 Ti m e_of_the_yea r Preplanting 33.0 Crop growth 34.0 Post harvest 33.0 Yi e 1 d_of_l a st_crop Low 25.0 Medium 50.0 High 25.0 / Plant_material Low 33.8 Medium 40.6 High 25.6 ^ i Residue Low 19.6 Medium 36.9 High 43.5 Roughness Very Low 5.10 Low 12.6 Medium 25.6 High 28.0 Very High 28.7 Figure 62. Causal model of soil roughness made using a Bayesian network. In Figure 62, the prior probabilities of the root nodes shown in the bar charts reflect the experts' initial ideas regarding the variables' probability distributions. The variables that do have parents i.e. Plant material, Residue, and Roughness, have conditional probability tables (Figure 63); the probabilities that are shown in these nodes' bar charts are the posterior probabilities i.e. the probabilities calculated using the
PAGE 138
116 probability tables and knowledge about the state (or probability distribution) of the parent variables. A (Plant Material): B (Residue): C (Roughness): Yield_of_l... Last_Crop Low Medium High Low Soybean 75.000 25.000 0.000 Low Corn 50.000 25.000 25.000 Medium Soybean 35.000 50.000 15.000 Medium Corn 25.000 50.000 25.000 High Soybean 25.000 50.000 25.000 High Corn 0.000 25.000 75.000 Plant mat... Tillage Low Medium High Low Minimumjill 90.000 10.000 0.000 Low Nojill 25.000 50.000 25.000 Medium Minimumjill 50.000 30.000 20.000 Medium Nojill 0.000 50.000 50.000 High Minimumjill 10.000 50.000 40.000 High Nojill 0.000 10.000 90.000 Time of t... Residue Veiyjow Low Medium High VeiyJHi... Preplanting Low 20.000 35 000 40.000 5.000 0.000 Preplanting Medium 10.000 20 000 35.000 25.000 10.000 Preplanting High 0.000 5 000 25.000 40.000 30.000 Crop_growth Low 20.000 35 000 40.000 5.000 0.000 Crop_growth Medium 10.000 20 000 35.000 25.000 10.000 Crop_growth High 0.000 5 000 25.000 40.000 30.000 Postjiarvest Low 0.000 25 000 40.000 30.000 5.000 Postjiarvest Medium 0.000 0 000 15.000 50.000 35.000 Postjiarvest High 0.000 0 000 0.000 10.000 90.000 Figure 63. Conditional probability tables for the soil roughness model: panels A, B, and C are for the Plantmaterial, Residue, and Roughness nodes, respectively. Figure 63 shows the conditional probability tables defined for the model: 63 A is for the Plant material node, 63 B is for the Residue node, and 63 C is for the Roughness node. Let us examine Figure 63A in detail; the two left columns show the different combinations of states of the parent nodes, and the three columns on the right show the probability of each possible state of the Plant material node given the states of its parents shown at left. For example, if the antecedent crop was soybean and its yield was low, there is a 75% probability that Plant material was Low, 25% that it was Medium, and 0%
PAGE 139
117 that it was High. If the antecedent crop was corn and its yield was Medium, the probabilities would have been 25%, 50%, and 25%. Careful qualitative modeling results in conditional probability tables that are easy to understand and, consequently, easy to populate with probabilities elicited from the discussion process. Note how the column headers of Figure 63 correspond to simple, easily understood verbal quantifiers: "Low", "Medium", "High", etc. Using these quantifiers is convenient and can simplify interaction with farmers and other domain experts, but consistency must be maintained at all times. Conditional independence (Jensen, 2001) is an important aspect of Bayesian networks. Two variables A and C are conditionally independent given variable B if P(A  B) = P(A  B, C), i.e. if given the state of B, knowing the state of C does not affect our probability of knowing the state of A. It is assumed that BN variables having the same parent are conditionally independent. This assumption is not important when experimenting with small networks as discussion support tools, but violating it may lead to incorrect inferences, especially in large networks. Introducing additional variables to simplify the network topology can help maintain conditional independence (Kwoh and Gillies, 1996). The structure of the BN shown in Figure 62 is very simple; we deliberately kept the number of parents and children of each node to a minimum. This resulted in easily understood variables and also minimized the size of the conditional probability tables. Experts also have an easier task estimating probabilities conditioned on two variables (e.g. roughness conditioned on Time_of_the_year and Residue) than conditioned on four;
PAGE 140
118 compare Figure 62 with a network in which Roughness directly had the four root variables as its parents. As shown, the bar chart of the Roughness box of Figure 62 represents the probability distribution of the Roughness variable given no prior knowledge about the specific state of the independent variables except their probability distributions. Variants including prior knowledge are shown in Figures 64 to 66. Tillage Minimum till 100 No till 0 Last_Crop Soybean 100 Com 0 Ti m e_of_the_yea r Preplanting 0 Crop growth 0 Post harvest 100 Plant_material Low 35.0 Medium 50.0 High 15.0 Residue Low 58.0 Medium 26.0 High 16.0 Yield_of_last_crop Low 0 Medium 100 High 0 / Roughness Very Low 0 Low 14.5 Medium 27.1 High 32.0 Very High 26.4 Figure 64. Deductive inference on a Bayesian network. Note how some variables (shown in a darkened box) are forced to known states, and how the network makes a deductive inference about the distribution of the Roughness variable conditioned on the forced inputs. Figure 64 shows how the probability distribution of Roughness changes when the user specifies additional prior knowledge, by forcing (through a mouse click) the independent variables to known states. Note how these forced variables (which now represent input data) appear darkened in the figure. In Figure 64 the specified states are
PAGE 141
119 Minimum till, Soybean, and a Medium yield. The model predicts that in the postharvest period Roughness will have a 58.4% probability of being high or very high, and a 14.5% probability of being low or very low. Figure 65 differs from 64 in that the tillage has been changed to no till. Note how the expected amount of residue increases as a result of a more residuefriendly management, and the distribution of Roughness changes accordingly: now the probability of high or very high is 87.8%, and of low or very low Roughness only 2.19%. Tillage Minimum till 0 No till 100 Last_Cro P Soybean 100 Corn 0 Time_of_the_year Preplanting 0 Crop growth 0 Post harvest 100 \ I High Yield_of_iast_crop Low 0 Medium 100 High 0 Plant_material Low 35.0 Medium 50.0 High 15.0 Residue Low 8.75 Medium 44.0 High 47.2 Roughness Very Low 0 Low 2.19 Medium 10.1 High 29.4 Very High 58.4 Figure 65. Deductive inference in the Bayesian network. Roughness tends to increase with respect to Figure 64 because Tillage is now in the No till state. Figure 66 differs from 65 in that the antecedent crop is now corn, which produces more biomass (Plant material) than soybeans. Consequently, the corresponding posterior probabilities in the child nodes change: the probability for very high Roughness is now 62.7%, versus 58.4% in Figure 65.
PAGE 142
120 Abductive Inference Figure 67 shows an example of abductive inference: the effect, Roughness, is known (forced to a given distribution) and the BN is used to make inferences about the probability distribution of a cause, Last Crop. The arrow directions do not change with respect to Figures 62 to 66 because the causal relationships assumed when defining the model are unchanged; only the way in which the network is being used is different. Abductive inference is powerful; it is often used in medical and other diagnostic BN applications, and sets Bayesian networks apart from regular expert systems: BNs can be used either forward or backward, whereas rulebased expert systems usually only make inferences in one direction because ifthen rules cannot be easily inverted: "if A then B" is not equivalent to "if B then A"! Tillage Minimum till 0 No till 100 Last_Crop Soybean 0 Com 100 Time_of_the_year Preplanting 0 Crop growth 0 Post harvest 100 Yield_of_last_crop Low 0 Medium 100 High 0 Plant_material Low 25.0 Medium 50.0 High 25.0 > 1 Residue Low 6.25 Medium 40.0 High 53.7 Roughness Very Low 0 Low 1.56 Medium 8.50 High 27.2 Very High 62.7 Figure 66. Deductive inference in the Bayesian network. Roughness increases because the corn crop produces more residue than soybeans.
PAGE 143
121 The model developed in this example is imperfect, a gross simplification of the real natural system. Crop residue accumulation is dependent on numerous factors not considered here, such as soil texture, landscape position, weather, etc. These factors came up in the discussions once the basic structure of the network had been set up; the group discussed their influence and the convenience of including them in the model. Adding additional variables is simple given the visual network editing tools included in Netica and similar programs, so new iterations of the design process could have followed, adding variables, creating new conditional probability tables, and testing the results. Tillage Minimum till 0 No till 100 Last_Crop Soybean 73.9 Corn 26.1 Ti me_of_the_yea r Preplanting 0 Crop growth 0 Post harvest 100 Plant_material Low 38.0 Medium 48.9 High 13.0 > 1 Residue Low 21.7 Medium 78.3 High 0 Yield_of_last_crop Low 0 Medium 0 High 100 / Roughness Very Low 0 Low 0 Medium 100 High 0 Very High 0 Figure 67. Abductive inference in the Bayesian network; note how Roughness is known, but Last crop is not.
PAGE 144
122 Conclusions Bayesian networks provide a powerful tool for discussing complex concepts. With some practice, crop consultants, extension professionals, and clients with minimal experience using personal computers can easily understand the probabilistic ideas behind Bayesian networks, as well as the use of interactive software tools for their construction. Effective model definition improves with experience; the crop consultants / extension professionals with which we interacted rapidly became skilled enough to define and populate simple models in 12 hours. Tools for Bayesian network modeling are readily available on the market. For example, a powerful trial version of the Netica software used for this study can be downloaded from the manufacturer's website ( www.norsys.com ). Finally, the techniques briefly described here are not limited to the discussion of agricultural systems management. Any extension activity that requires discussion of causeeffect relationships, such as health care, safety, and mechanicsrelated topics, can benefit from Bayesian network supported dialogue.
PAGE 145
CHAPTER 7 INTEGRATING MULTIPLE KNOWLEDGE SOURCES FOR PARAMETERIZED SPATIAL CROP MODELS WITH INVERSE MODELING Introduction In Chapter 1 we discussed how the development of a practical tool for the accurate simulation of spatiotemporal crop yield variability (henceforth, a spatial crop model or SCM) could provide a valuable analytical and decisionsupport tool for precision agriculture. In Chapter 2 we showed different sources of error in SCMs: model error (spatiallycoupled vs. uncoupled forms of SCM), initial conditions, and parameter error. Crop model results are indeed sensitive to their parameters, some of which affect model outputs significantly more than others (FavisMortlock and Smith, 1990; Leenhardt et al., 1994). When the inputs to which crop models are very sensitive are measured carefully, model predictions can be very accurate (Braga, 2000). However, in a precision agriculture context, input parameters usually cannot be measured throughout the field, and estimation procedures must be used. A popular, openloop method of estimating soil parameters involves pedotransfer functions (Bouma, 1989; Wosten et al., 2001), which are used to estimate critical but unavailable variables such as soil water holding limits and saturated hydraulic conductivity from more readily available data. These functions are typically implemented with regression models (Gupta and Larson, 1979; Rawls et al., 1982; Saxton et al., 1986), although other approaches are being increasingly used, including fractals (Gimenez et al., 1997; Perrier at al, 1996; Rawls and Brakensiek, 1995), classification and regression trees 123
PAGE 146
124 (Rawls and Pachepsky, 2002), and neural networks (Minasny et al., 1999; Schaap and Leij, 1998; Schaap et al., 1998). The data used as pedotransfer function inputs are usually textural fractions, organic matter content, and bulk density, although recent research has sought to improve the predictive quality of pedotransfer functions by incorporating other data such as soil structure (Pachepsky and Rawls, 2003) and topographic variables (Pachepsky et al, 2001; Rawls and Pachepsky, 2002). Many of these input variables are not always available, especially at the spatial sampling density needed in a precision agriculture context. In many practical cases, the only existing data source is the soil survey (SS), which associates the description of a "representative" soil profile to a mapping unit that may represent a large fraction of the field. The description is typically expressed in terms of textural class per soil horizon. Applying the pedotransfer function approach to SS data and assigning the estimated soil parameters to an entire SS mapping unit may introduce errors into SCM results due to a) the spatial heterogeneity and scaledependent behavior of actual spatially distributed soil properties vs. their representation by a single lumped value per grid cell, obtained from laboratory measurements performed on soil cores (Brakensiek et al., 1981 ; Gijsman et al., 2002); b) biases and errors specific to the chosen pedotransfer function (Gijsman et al., 2002); and c) errors due to imprecise soil type delimitation throughout the landscape. Thus, the openloop approach described above may be inappropriate in practical SCM applications in precision agriculture. An alternative, closedloop approach is to abduct parameter values with an inverse modeling (IM) scheme. Some form of optimization algorithm such as simulated
PAGE 147
125 annealing (Kirkpatrick et al., 1983; Metropolis et al, 1953) is used to propose an optimal combination of parameters for each cell that minimizes an objective function such as the RMSE between simulated and observed crop yield. Paz et al. (2001) and Irmak et al. (2001) showed examples of this approach. However, in these studies the link between the parameter estimates and reality was the optimal match between simulated and observed yield across two or three crop seasons. The effect of yieldmodifying processes not contemplated by the model (such as topographic water redistribution) could thus be spuriously explained by the IM process through the model parameters, as shown in Chapter 2. Balancing the need to identify and represent these processes in the SCM with practical limitations in runtime and data collection requirements poses a great challenge for spatial crop modeling. Other data assimilation techniques, such as Kalman filtering (Wikle and Cressie, 1999) and Bayesian parameter estimation (Omlin and Reichert, 1999; Qian et al., 2003), have been applied to the spatiotemporal modeling of environmental systems, especially in meteorologic and oceanographic studies (Dowd and Meyer, 2003; Natvik et al., 2001; Vallino, 2000). Bayesian methods have also been applied to the parameterization of agronomic and hydrologic models (Durand et al, 2002; Makowski et al, 2002; Thiemann et al., 2001). These methods make use of existing a priori data and observations to update the model's parameters and/or state. For SCMs to be practical as decisionsupport tools in industry, crop consultants must be able to parameterize them easily and rapidly (J.R. Murdock, pers. comm.), having a minimum of quantitative information regarding model parameters and their probability distributions. This may preclude the use of complex spatiallycoupled models
PAGE 148
126 parameterized by inverse modeling, due to the computational complexity of solving the underlying combinatorial optimization problem. Soil properties frequently exhibit spatial associations (Goovaerts, 1 997), although IM schemes used to date for soil parameter estimation in SCMs have ignored this spatial context except by considering landscape position as a way of selecting parameters to optimize (Irmak et al. 2001). This spatial context can operate at the parameter level rather than the process level. Instead of the aforementioned tightly coupled landscape cells that interchange water and nutrients on a daily basis in a spatial crop model, crop simulation could be performed in individual cells as per the approach of Paz et al. (2001); the spatial context could work by subjecting the parameters to a number of quantitative or qualitative constraints. Such a scheme could have lighter computational and quantitativedatarequirement burdens than other aforementioned data assimilation techniques. We called this parameterlevel model spatial parameter model (SPM). Expert opinion is an important source of information regarding agricultural crops' spatiotemporal behavior. Farmers, crop consultants, and soil scientists possess a wealth of knowledge about agricultural systems that can be harnessed to provide parameter constraints. However, much of an expert's knowledge is in qualitative form, and, irrespective of its qualitative or quantitative nature, the expert knowledge must be elicited from its owner, who may be unable to formalize it without external help. There exist numerous techniques for this knowledge elicitation (Diaper, 1989). One such technique is Bayesian networks (Pearl, 1988), the use of which was exemplified in Chapter 6. We hypothesized that, using practically available data and expert opinion, the spatial context in an agricultural field can be modeled and used to constrain the parameter
PAGE 149
127 estimation process of a SCM in a way that improves its predictive ability while preserving its compatibility with practical applications. In this study we developed methods that will allow testing of this hypothesis. The objectives of this study were to qualitatively and quantitatively define a spatial parameter model (SPM) using available sources of knowledge, to apply it to the parameterization by IM of a simple spatial crop model (SCM), and to evaluate the SCM in a representative case study, including a comparison with the spatiallycoupled model developed in Chapter 2. Materials and Methods This section is structured in four parts: Â• Description of the case study, including the data available for supporting the construction of the SPM. Â• Description of the parameter estimation process, including the IM framework, the SPM and the criteria used therein, and how those criteria were populated with data. Â• Description of the uncoupled SCM (used with the IM framework) and of the spatiallycoupled SCM (derived from Chapter 2 and used for comparison with the IM framework results); description of the data used to drive them. Â• Description of the analyses we performed on the data. Case study: the Suggs 4 Field We studied a field near Murray, Kentucky: Suggs 4 (36Â° 32.3' N, 88Â° 27.5' W, elev. 222 m). This field is part of Ponderosa Farms, owned and farmed by Rick Murdock. This field has been managed using notill and approximately the same (two year) rotation scheme for over two decades. The rotation consists of maize {Zea mays L.) grown on the field on the first year (planted around April 10), and wheat (Triticum aestivum L.) and soybeans (Glycine max L. Merr.) grown the second year.
PAGE 150
128 Elevation and topographic attributes We used GPSmeasured elevation to describe the landscape. AgConnections, Inc., surveyed the Suggs 4 field in November of 2000 using a pair of surveygrade realtime kinematic (RTK) GPS units. One of these was used as a base station and the other was mounted on an allterrain vehicle. The RTK approach is recognized as a very accurate GPSbased technique for obtaining elevation data for use in sitespecific agriculture (Renschler et al., 2002). Clark and Lee (1998) obtained vertical errors of 49 cm using this technique. However, Wilson et al. (1998) warned that relatively small elevation errors across different landscape positions could result in large variations of hydrologically relevant topographical attributes calculated from those elevation data, and recommended using the maximum practical sampling density. Consistent with this prescription, the data used for this study were collected onthego with the vehiclemounted GPS unit, on 15meter swaths. There were 7030 sampling locations, shown in Figure 71. The gaps shown were caused by problems in the measurement of soil electroconductivity (EC). This variable was measured concurrently with the RTK elevation measurements. When an EC measurement was not valid, typically due to wheat residue collecting on the electrodes, the whole record (including elevation) corresponding to that measurement was lost. We used GS+ for Windows, version 5.0 (Gamma Design Software, 2001) to estimate a semivariogram for the elevation data. We then used ordinary point kriging to resample the original data onto a 10meter isometric grid. These data were in turn used to calculate a wetness index, or WI (Beven and Kirkby, 1979), using a topographic index calculation utility distributed with TOPMODEL (Beven et al., 1995).
PAGE 151
129 4:4 4;::, Figure 71 : Suggs 4 field. The dots show the sampling locations for RTK elevation and Veris electroconductivity data. Soil electroconductivity Soil electroconductivity (EC) has been used in several studies to predict soil properties (Johnson et al., 2001). A set of EC data was collected by AgConnections, Inc. concurrently with the elevation data set in the winter of 2000. The equipment used was a Veris 5600. This device has two sets of coultershaped electrodes that produce two different attributes for each sample location: the "surface" measurement, that is meant to represent soil electroconductivity from 0 to 60 cm, and the "deep"measurement, which is representative of the characteristics to a depth of 1 50 cm. We used GS+ 5.0 to estimate a semivariograms for the EC data. We then used ordinary block kriging to resample the two original data layers onto a 10meter isometric grid. These data were later used to assist discussion with domain experts.
PAGE 152
130 Soil data Soils in the region were formed in loess approximately 120 cm thick, and tend to have a fragipan at a landscapepositiondependent depth (USDA SCS, 1973). The fragipan limits root growth and causes perched water tables during the late winter and early spring, when there is steady precipitation, low evapotranspiration due to low temperatures and solar radiation, and low (or inexistent) plant water demand. Figure 72: Original division of the Suggs 4 field into soil types, adapted from soil survey data (USDA SCS, 1973). Each oval identifies its corresponding map unit. The parenthesized elements set apart map units having the same soil type.
PAGE 153
131 The field tends to slope downward from northwest to southeast, with an elevation range of approximately five meters. The soil types in the field (Figure 72) generally correspond to different landscape positions (USDA SCS, 1973). The ridgetop contains Loring B soils; the welleroded slope on the northwest border of the field has Loring C2. These soils are described as being moderately well drained, somewhat eroded, and having a fragipan. The sideslopes have Granada soils. These are relatively deep and moderately welldrained soils, somewhat eroded, but considered the best soils in the region by our domain experts. Fairly shallow and fairly poorly drained Calloway soils occupy lower positions. This sequence of soils is typical of the area, which corresponds to the GrenadaCalloway Association (USDA SCS, 1973). Yield maps Maize yield data were available for the 1997, 1999, and 2001 cropping seasons. Soybean data were available for 1998 also. The farmer collected yield maps using a scalecalibrated yield monitor. In 19971999 he used a GreenStar yield monitor; in 2001 he used an Ag Leader 3000. Yield map lag times were corrected using the yield import tool found in AgLink Professional, version 5.5 (AGRIS Corporation, 1998). We used GS+ 5.0 to estimate a semivariogram for each yield map. We then used ordinary block kriging to resample the lagcorrected data onto a 1 0meter isometric grid. These yield maps were used as the observed reference against which to compare model simulations, and also to generate a multiyear, aggregated yield product called a normalized yield (NY) map. This map was obtained by registering the different maps to a common grid, expressing the yield of each map's cells as a percentage of the corresponding year's average yield for the field, and averaging the coregistered cell
PAGE 154
132 values across years. The NY map was built using the 1997 and 1999 maize and the 1998 soybean data, and was used together with individual year yield maps to assist discussion with domain experts. Soil water data We collected a spatiotemporal soil water content dataset. It was used for evaluating the parameter sets obtained in the parameterization process rather than for parameterization proper. The data were collected in the field on 1 1 dates between March 27 and September 17, 2001, enveloping the maize cropping season. We used a Trime FM3 time domain reflectometry (TDR) system with plastic access tubes. Data were measured at 15cm depth intervals at each location. The measurements for the top depth range, 015 cm, were taken with a threeprong FM1 electrode, whereas the deeper measurements were taken using access tubes. The measurements were replicated three times. When the soil dried, it became so hard and restrictive that inserting the threeprong electrode into the soil was not possible without danger of damaging the device. We solved this problem with a custombuilt predrilling device (Figure 73) that used sharpened stainless steel bars of similar diameter and identical spacing to that of the FM1 electrodes to predrill holes into which we inserted the TDR electrodes. The tubes were installed according to the recommended practice of the manufacturer, iteratively a) inserting the tube into the ground a few inches by using a mallet (or a modified post driver, in the higherclay soils) to beat a ramming head affixed both to the tube and to a pipe running the length of the tube (to rest on the metal cutting edge at the bottom of the tube), and b) using an auger to extract the material collected inside the pipe.
PAGE 155
Parameter Estimation Process Knowledge elicitation Integrating multiple sources of knowledge presents general problems such as putting all the knowledge in a form that can be used in a common framework. Eliciting complex biophysical information from precision farming domain experts also poses a set of problems, not least among them the need to develop a common language between the crop system modeler and the experts. I obtained the cooperation of several domain experts: Rick Murdock (farmer and crop consultant), Pete Clark (crop consultant), John Potts (Agronomist), all with AgConnections, Inc., and Jerry Mcintosh, a soil scientist with the NRCS (U.S. Natural
PAGE 156
134 Resources Conservation Service). I made four weeklong visits to the headquarters of AgConnections during March, May, July, and August 2002, meeting with the experts at least twice per visit, for several hours at a time. The knowledge elicitation activities performed during these meetings included the following: Â• Delimiting and qualitatively modeling the physical system of interest using causal diagrams. The process included whiteboard discussions and interactive work using Netica Bayesian network software version 1.12 (Norsys, 1998) and Microsoft PowerPoint presentation software (Microsoft Corp., 1999). Â• Identifying relevant parameters. This diagram was built collectively with the domain experts and provided an effective way of explaining and discussing how a crop model operates, and of intuitively communicating ideas such as the link between limiting factors and parameter sensitivity. Â• Adopting a system for translating between qualitative and quantitative probability scales. We adopted a slightly modified version of the scale proposed by Renooij and Witteman (1999). Â• Eliciting probabilities about spatial parameter relationships, expected sensitivity of the crop model to specific parameters, and other quantitative questions about the system. To maximize the reliability of the causal diagrams arising from our process, we strove to achieve consensus among the multiple experts for both the qualitative and quantitative aspects of the models, following Nadkarni and Shenoy (2001), and revisited the diagrams during successive visits to confirm the validity of (or to update) the maps as our understanding increased. Updating the soil map, selecting the simulation locations As stated before, soil survey maps provide a limited depiction of the spatial variability of soil properties. Traditionally, soil mapping unit borders have been inferred visually using aerial photos, or else drawn on a map by a soil scientist familiar with the area but with limited resources with which to take dense samples (J. Mcintosh, pers.
PAGE 157
135 comm.). This can result in the omission of soil mapping units or in delimitation errors (Nolin and Lamontagne, 1991). One of the objectives arising from our meetings with domain experts was to update the soil map as a useful step before selecting sites for simulations. This update was done by examining other sources of data and looking for information inconsistent with the labeled soil type. We used the following (previously described) sources of data: Â• Elevation (EL): defines landscape position. Â• Wetness index (WI): identifies areas prone to saturation due to their high contributing areas and/or low slopes. Â• Veris Electroconductivity (EC): an indicator of soil erosion and wetness. Â• Normalized yield (NY): an indicator of temporal stability. Â• Original soil survey map (SS) (USDA, 1 973). During this process the participants delimited areas on the maps and built hypotheses about the soil type corresponding to the observed combination of EL, WI, EC, NY, and SS. These hypotheses were discussed within the group, and were later tested in the field. For the latter we extracted cores to a depth of 122 cm with a soil probe and discussed the extracted material. In Chapter 3 we developed automatic spatial sampling methods that minimize some proxy for prediction error (of spatial interpolation) such as kriging variance (Deutsch and Journel, 1 992) or the Minimization of the Mean of Shortest Distances (MMSD) criterion (van Groenigen and Stein, 1998). We also showed how this idea could be used to capture the spatial variability of crop yield from a limited set of points. We elaborated further in Chapter 4, showing how, in order to best predict the spatiotemporal variability of a spatially autocorrelated random field from a limited subset of sample locations, the best
PAGE 158
136 spatial sampling scheme included locations with temporally stable data, which captured the spatial nonstationarity of the field. These concepts and outcomes from Chapters 3 and 4 were used in the present study to select the simulation domain for the IM process. Instead of using a grid, we chose to simulate locations showing characteristic behaviors throughout the field. We limited ourselves to one location per original soil survey mapping unit, plus one location per new (updated) soil map area. Very small areas were discarded in order to keep the IM problem relatively small. We obtained 13 locations from this process; they will be described in detail in the Results and Discussion section. The IM framework The parameter estimation problem is a nonlinear multiobjective optimization problem that seeks the parameter set capable of best satisfying a set of criteria. The multiple criteria are aggregated into a single objective function, and the optimization process is structured like a simulated annealing (Kirkpatrick, 1983) algorithm, having a generating mechanism, an acceptance criterion, and a cooling schedule. In each algorithm iteration the generating mechanism randomly selects one of the aforementioned 13 landscape cells and proposes a perturbation of its parameters. The criteria are evaluated for the new parameter set, an aggregation operator is used to generate a unique measure of fitness from the multiple criteria, and the difference between this fitness value and the fitness of the previous parameter combination is fed to an acceptance criterion following Aarts and Korst (1990). The cooling schedule makes it progressively more difficult for the algorithm to accept new parameter combinations that do not produce an improvement in the fitness value.
PAGE 159
137 Figure 74 shows the different components of the IM framework. The three boxes along the top of the diagram represent the available sources of knowledge: Â• Data: history of crop yield, weather, elevation, soil survey, etc., Â• Expert opinions about the processes occurring in the field and their spatial variability, as can be elicited from farmers, crop consultants, and soil scientists, and Â• Knowledge about the effects of environmental conditions on crop growth and development, obtained through a crop model such as CERES or CROPGRO. The four boxes below the knowledge sources in Figure 74 represent different four criteria that we propose for evaluating the appropriateness of a given parameter set: Â• Yield history. This is the classic criterion heretofore used in SCMrelated inverse modeling: the RMSE between simulated and observed yield throughout the field. Â• Parameter sensitivity. An important agricultural concept is the limiting factor, based on von Liebig's Law of the Minimum (van der Ploeg et al., 1999): crop yield is determined by the amount of the essential input (nutrients, water, CO2, light, etc.) in shortest supply. If the deficient input is supplied, yields can improve to the point where another input becomes limiting, etc. Although crops do not, sensu stricto, behave according to this law (Sinclair and Park, 1993), it is nonetheless a valuable approximation for our purposes because domain experts can relate to it. Domain experts can frequently identify the most important factor (for example, soil depth) that limits crop yield in different parts of a field. The limiting factor frequently corresponds to a soil parameter used by the SCM. When this happens, small changes in this parameter should affect yield to a greater extent (i.e., the parameter should be more sensitive) than small changes in other parameters. Â• Geostatistical. The parameter set obtained through the estimation process should have a spatial covariance structure equivalent to that of the corresponding "real" soil parameters. Knowing the latter is not possible because the real parameter values (and hence, their spatial covariance structure) are unknown, but approximations are possible using an observable proxy variable, or a covariance model may be taken from the literature. Â• Soil map unit neighborhood. When an agricultural field is spatially variable, parts of it tend to behave consistently with respect to others; for example, soil water content will tend to be higher in depressions than in ridgetops. As discussed in Chapter 4, this spatial nonstationarity implies the temporal stability defined by Vachaud et al. (1985). According to the neighborhood criterion, given two sets of parameters, whichever one best reproduced an expected pattern of temporal stability throughout the field would be preferable.
PAGE 160
138 Figure 74: IM framework for the proposed SCM parameter estimation process. The four parameterevaluation criteria The yield history criterion for a crop year i, YHC h was defined as follows: VUr 1 1 kSMSYE, YHC^Xe ' (71) where A: is a constant and SMSYE, is the scaled mean squared yield error for year /', equal to the mean squared error between simulated and observed yields for year /' across the locations of interest, scaled by that year's observed data variance. In equation form, SMSYE, = (72)
PAGE 161
139 where Yy is simulated yield for year i, location j, Y Uj is the corresponding observed yield, Oi(Yij) is the observed year /' yield variance, and N is the number of locations of interest. The values of Equation 71 vary greatly depending on the value of parameter k (Figure 75). A higher k penalizes yield error more harshly than lower values. The value of A: could conceivably be trained from data, or selected based on the user's confidence on a particular year's data quality or the lack thereof due to extraneous yieldlimiting factors not considered by the crop model, etc. However, we arbitrarily chose a base value of k = 1 to provide sensitivity at high error levels. Exploring the sensitivity of the IM results to different values of k will be the object of additional future study. 0 0.5 1 1.5 2 SMSYEj Figure 75: Dependence of the yield history criterion for year i (T#C,) on the value of parameter k. The SMSYE, variable is the mean squared error, across all locations of interest, between simulated and observed yield values for year i, scaled by the yield variance during that year. Separate instances of the yield history criterion are created for each year of available yield data, and the results are aggregated, together with the results of all the other criteria (soil map unit neighborhood, etc.), at the objective function level. This is somewhat different from the approach taken by other authors such as Irmak et al. (2001)
PAGE 162
140 and Paz et al. (2001), who calculated the RMSE across both years and locations. We believe this latter methodology is prone to biases because the variance of a spatial yield dataset varies greatly from year to year, depending primarily on weather conditions: good years with adequate rainfall tend to have less variable yield than very dry or very wet years. Thus, the variability of dry years would tend to dominate a criterion based on the RMSE of several yield years. The soil map neighborhood criteria can be based on the spatial relationships of model parameter values (such as soil depth or the SCS curve number) or model outputs (e.g., soil water). The criteria are expressed in terms of compliance with a series of constraints, which are inequalities having two attributes: type (greater than, less than, equal) and strength (strong, medium, weak). The constraints are defined using the constraint functions shown in Figure 76. The difference between the values (of soil depth, for example) at two locations of interest is plotted on the xaxis, and the level of satisfaction of the constraint is the corresponding yaxis value of the constraint function. A B C Figure 76. Functions used for evaluating the neighborhood constraints. The xaxis is the difference between the values of the model parameter (or output) of interest at two locations (a and b); the yaxis provides the corresponding error function value. The three cases correspond to the different constraint types: A should be used when the constraint is that (v a v b ) should be greater than 0; B should be used when the constraint is that (v a v b ) < 0; C should be used when the constraint is that (v a v b ) * 0.
PAGE 163
141 The type of constraint determines which one of the curves to use (Figure 76A for greater than, Figure 76B for less than, Figure 76C for equality), and the strength of the relationship determines how steep the slopes are (a "strongly greater than" would have a steeper slope than a "weakly greater than"). Compliance of the parameters to these constraints is scaled from zero, corresponding to full compliance, to one, corresponding to total noncompliance. This scaling results from the definition of the IM problem as a minimization of the objective function. (The objective function will be described in detail below). Within the neighborhood criterion, all of the individual constraint function results (which, as seen in Figure 76, vary in the range [0,1]) are aggregated to obtain a single value to represent the criterion. This value is also in the interval [0,1]. We used the arithmetic mean as an aggregation operator in this case, reflecting the collective perception that a linear combination of the criteria was appropriate. The parameters of these functions could conceivably be obtained by training with a large data set, but the practical impossibility of obtaining such a dataset prompted us to use nominal, expertopinionderived thresholds. Also, to keep the constraint networks as sparse as possible, we only expressed relationships between adjacent soil units. Objective function (aggregation of criterion results) The fitness function is derived from the previously discussed criteria, some of which may be instantiated more than once (soil map unit neighborhood criteria for different parameters and model results, yield history criteria for multiple years of yield maps, etc.). All of the criteria values are scaled to the range 0 (full compliance) to 1 (absolute noncompliance), and are aggregated using an ordered weighted average operator or OWA (Yager, 1988).
PAGE 164
142 Given two ndimensional vectors ,4 = [a b a 2 , ... a n ] and B = [b h b 2 , ... bÂ„] such that B is obtained by sorting A, making b, be equal to the j th largest element of the aÂ„ an OWA operator (of dimension n) is a mapping/ 91" > S JI with an associated ndimensional vector W= [w h w 2 , ... wÂ„] , such that three conditions are met: 1) w,e[0,l] 2) 2>,=1 3) f(a l ,a i ,...aÂ„) = Yi W j b j j Using the B vector in the definition of / (the OWA proper) above is what confers nonlinearity to the OWA operator: a weight w, is not associated with a specific argument a h but with the i th position within B (sorted values of A) instead. The OWA operator is a very flexible aggregation tool that can implement, through the appropriate selection of its weights, operators including the median, maximum, minimum, and a large class of means and other summarizing statistics (Yager, 2003; Yager, 1993). For this study we set the OWA weights according to a form of Olympic aggregator, analogous to the scoring methods used in several judged events in the Olympics, which operate on ranked scores (Stefani, 1999). Assuming that we had N criteria as the a, inputs, we gave the N2 central weights of the W vector values of 1/(N1), and gave the two extreme weights w, and w N a smaller weight, l/(2N2). The purpose of adopting an Olympic aggregator relates to the arcs shown linking the knowledge sources and the criteria in Figure 74. These arcs represent the level of participation of the knowledge sources in each criterion. The four combinations are different, so different combinations of confidence in the available knowledge would
PAGE 165
143 affect our confidence in the four criteria differently. The Olympic aggregator deemphasizes extreme criteria results, allowing for varying levels of confidence in the quality of the knowledge from the different sources. There are alternatives to a fixed set of O WA weights determined a priori; for example, obtaining the weights from data (Beliakov, 2003; Filev and Yager, 1988; Torra, 2000). However, implementing such a system would require a large set of inputoutput combinations on which to train the system; this would be impractical for the current state of precision agriculture, in which the number of available years of yield maps is still low. Simulations The (uncoupled) spatial crop model We used CERESMaize (Ritchie et al., 1998) as our uncoupled SCM in the IM process, as in the second case study of Chapter 5. Also as in that case, we assumed one environment (equivalent to a common parameter space for all the locations of interest), and made the parameter ranges sufficiently broad to accommodate the parameter value combinations of all the locations we chose to simulate. The major difference with respect to the Chapter 5 exercise was that the parameters to fit in this chapter resulted from the discussion process: we built a qualitative model of parameter sensitivity based on the limitingfactor idea, and used it to choose sensitive parameters. We ran the IM process for different combinations (scenarios) of the available yield data years (1997, 1999, and 2001). The planting date reported by the farmer was April 15 (day of the year, DOY, 105) for all three years. Row spacing was 76 cm; the farmer's targeted planting density was 6.25 plants/m 2 . These simulation scenarios, including the parameters that were estimated, and any additional criteria or sources of knowledge involved, will be described in greater detail later.
PAGE 166
144 The spatiallycoupled crop model We modified the inversemodeling scheme used with the spatiallycoupled model of Chapter 2 to work with CERESMaize and to fit the parameters selected through the discussion process. We used it to estimate parameters for different combinations of the available years of yield data (1997, 1999, and 2001) in the Suggs 4 field, under the same management conditions shown above. Weather data needed for crop simulations Mean annual rainfall in Murray between 1970 and 1999 was 1407 mm, distributed fairly evenly throughout the year but decreasing somewhat in the late summer and early fall (Figure 77). There is significant interannual variability of monthly precipitation (Figure 77), which frequently results in lateseason droughts for maize and soybean crops. 300 250  200  150 2 > 100 n +Â» c Â£ 50 0 Jan Feb Mar Apr May Jurt Jul Aug Sep Oct Nov Dec Figure 77: Monthly rainfall in Murray during 197099. The central point of each box plot shows the corresponding month's median monthly precipitation over the 30year period. Boxes represent interquartile ranges, and the whiskers are the nonoutlier maximum and minimum, defined as the mean Â± 2 standard deviations. (Medians are linked by lines to emphasize seasonal trends, and not to imply interpolation.) JJ ...]
PAGE 167
145 We used 12 years (1990 to 2001) of daily weather data for crop simulations. Some of these weather years (1997, 1999, 2001) were used for the IM process and its evaluation, and the rest were used to obtain genetic coefficients as described further below. The year 2001 was also used for evaluation with the spatiallycoupled model. We built a dataset comprising the four weather variables from the DSSAT minimum data set (Hunt and Boote, 1998): maximum and minimum temperature, total solar radiation, and precipitation. The temperature and rainfall data from mid1999 to the end of 2001 were collected onsite with a Davis Instruments Wizard III automatic weather station installed near the center of the field. The remaining temperature and rainfall data were obtained from the Midwest Regional Climate Center (MRCC), and correspond to the Murray station (36Â° 35'N, 88Â° 18'W). All the solar radiation data were also obtained via the MRCC and correspond to the nearest airport with a solar radiation record: Paducah, KY (37Â° 04'N, 88Â° 46'W). We used the WeatherMan program (Pickering et al., 1994) to convert the weather data into the DSSAT format and to estimate values to fill data gaps. We also checked radiation data quality using the envelope approach as proposed by Allen (1996). Initial conditions In order to simplify implementation of both the uncoupled and spatiallycoupled SCMs, we assumed that simulations began in all cells with the same initial conditions: drained upper limit (DUL) for the first 30 cm of the soil profile, and saturation below that. This is consistent with an "average" behavior of the region as described by the soil survey (USDA, 1973).
PAGE 168
146 Genetic coefficients For the years 1999 and 2001 we used the Pioneer 33A14 hybrid. For 1997 we used the Pioneer 3281 W hybrid. We obtained DSSAT genetic coefficients to characterize growth and development for these hybrids by adapting an existing medium season hybrid parameter set in the DSSAT 3.5 MZCER980.CUL data file. We used 12 years of weather data (19902001) and used the CERES model in potential yield mode. We first adjusted 33A14, and then parameterized 3281 W by modifying the 33A14 parameters. We calibrated the parameters sequentially; for 33A14, we first adjusted the PI parameter from 200 to 260 Â°C day" 1 and P5 from 800 to 960 Â°C day" 1 so that the 12year mean time to flowering and maturity occurred 65 days after crop emergence and 55 days after flowering respectively, as specified by Bitzer et al. (2003). Second, we changed the fruit growth rate (G2) from 8.5 to 9.2 (mg day" 1 ). Andrade et al. (1996) reported a maximum fruit growth rate of 9.5 mg day" 1 for a similar hybrid grown under optimal temperature. Modifications introduced to G2 increased the simulated seed weight from 180 to 320 mg / seed, which adequately reproduced observed values (350 mg / seed) for a hybrid similar to Pioneer 33A14 (Andrade et al., 1996). We also set G3 to a maximum of 700 kernels/plant. These modifications allowed us to reproduce the yield range and maximum yields attained in Murray, KY (Pearce and Poneleit, 1997) with the 33A14 hybrid. The parameter values corresponding to 3281 W were PI = 280 Â°C day" 1 , P5 = 980 Â°C day" 1 , G2 = 6.5 mg day" 1 , and G3 = 600. We set the photoperiod sensitivity parameter at 0 for both hybrids, since we used a fixed planting date.
PAGE 169
147 Analyses We defined scenarios as different runs of the IM process, performed using different combinations of criteria. We defined two yield history criteria scenarios using the IM framework and the uncoupled model. We also defined two yieldbased scenarios for evaluating the coupled SCM (developed in Chapter 2), and for comparing its results with those of the uncoupled model. We compared the results using RMSE and parameter values. (Note that the RMSE was used for comparisons Â— in both cases we used an OWA and the error function shown in equation 71.) We also discussed differences in simulated and observed soil water content for 2001 for the 3year scenarios. We also evaluated the IM framework with one soil map unit neighborhood criterion. The five scenarios are shown in Table 71. We did not include any instances of the sensitivity or geostatistical criteria shown in Figure 74. Table 71 . Different IM scenarios, showing number of instances of each criterion. Name Model Yield history Neighborhood Description criterion instances crit. instances 1A Uncoupled 2 (1997,99) 0 Standard, yieldonly IM IB Uncoupled 3 (1997,99,2001) 0 Standard, yieldonly IM 1C Uncoupled 0 1 Expertonly IM framework 2A Coupled 2(1997,99) 0 Standard, yieldonly IM 2B Coupled 3 (1997,99,2001) 0 Standard, yieldonly IM The selection of parameters to estimate was a result of the discussion process. We selected a subset among soil depth, fraction of available water (see Chapter 2), SCS curve number, saturated hydraulic conductivity, KSAT, and plant density. After estimating the selected parameters, we plotted their values for the different simulation locations, and noted their similarities and differences with respect to soilprobe field observations, expert opinion, and values taken from the soil survey and published lab tests.
PAGE 170
148 Results and Discussion Elevation, Wetness Index As mentioned previously, elevation was sampled 7030 times over the Suggs 4 field, which has an area of about 17.6 ha. The resulting high sampling density, approx. 400 samples/ha, made it possible to use narrow (10meter) lag classes to estimate the elevation (isotropic) semivariogram, while keeping a large number of pairs of points per lag class. Indeed, the leastpopulated lag class had 9337 pairs of points. The best fit was obtained with a Gaussian model (Figure 78) with parameter values of Co = 0.03 m (nugget); C 0 +Ci = 4.069 m 2 (sill); a = 428.16 m (effective range). The model fit the data well (R 2 = 0.995). V E <8 If ,<5 <6Â° j6 o ye 0 400 100 200 300 Separation Distance (m) Figure 78. Semivariogram estimated from elevation data. The dotted line shows the data variance. The elevation semivariogram model had a very low nugget effect (C)/(Co+C 0=0.993)), but it did not have a clear sill. The low nugget effect suggests a smooth, noisefree RTK GPSderived elevation surface, and translates into high confidence in the subsequent resampling (via ordinary kriging) of the elevation data into a regular grid. The lack of a sill is due to nonstationarity, which was expected given that
PAGE 171
149 the field slopes downward toward the southeast. The nonstationarity is not a problem, however, given the high spatial density of the dataset. Kriging algorithms used for spatial interpolation usually restrict the number of points that contribute to a point estimate. We used the default ordinary kriging algorithm implemented in Surfer 7.0 (Golden Software, 1999), which uses the 24 nearest, albeit homogeneously distributed, points to the location of interest. For most of the elevation map, this condition was met with points only a few meters away from the candidate location. At such short distances, the field's nonstationarity (at a scale of hundreds of meters) is irrelevant. The resampled elevation data are shown, in wireframe form, in Figure 79. Note the different landscape positions in the field, and their corresponding soil types (Figure 72): the ridge, containing Loring soils; the slope with Grenada soils, and the lower landscape positions (surrounding the waterways) with Calloway soils. Figure 79: Wire frame elevation map of the Suggs 4 field, clipped to the field boundary. The xaxis and yaxis show UTM coordinates, and the zaxis shows elevation above an arbitrary reference. (All expressed in meters).
PAGE 172
150 The wetness index map (Figure 71 0) shows additional information about the field, including artificial drainage ways, shown as depressions in Figure 79 and as highWI lines running northsouth in Figure 710. When this map was discussed, the farmer recalled that his predecessor had made them by moving some earth with a bulldozer. 369200 369300 369400 369500 Figure 710. Wetness index (Beven and Kirkby, 1979) calculated for the Suggs 4 field. The 13 IM framework simulation domain locations are shown as crosses. Electroconductivity (EC) The electroconductivity dataset also had a high spatial sampling density; we used 5meter lag classes to estimate its isotropic semivariogram. The leastpopulated lag class was again the first, with 10,612 pairs of points for surface EC and 9,876 for deep EC. The best semivariogram model fit for both EC layers was obtained with an exponential model (Figure 711, A and B) and parameter values of C 0 = 1.5 mS 2 /m 2 (nugget); C 0 +Ci = 13.53 mSV (sill); a = 134.7 m (effect, range) for the surface EC data, and C 0 = 2.37 mS 2 /m 2 ;
PAGE 173
151 2 2 Co+Q = 17.98 mS /m ; a = 84.3 m for the deep EC data. In both cases, the model fit the data well (R 2 = 0.995 and R 2 = 0.968 for surface and deep EC data, respectively). 20 CM i 15 55 E CD o c 10 ra > E 5 50 100 150 Separation Distance (m) 200 20 ? 15 00 E CD O CO > E CD 00 10 n ^.QQOOOoOoOQOnn ^^^X. .Â°Pq.c >oo t B 1 0 50 1 1 Â— 100 150 Separation Distance (m) 200 Figure 711. Semivariogram estimated from surface (A) and deep (B) electroconductivity data. The dotted line shows the data variance. The surface electroconductivity maps (Figure 712 A) show spatial variability of EC in the Suggs 4 field. Electroconductivity has been shown to correlate with several soil properties, including clay content, organic matter content, water content, salinity level, etc. (Johnson et al., 2001). In the case of Suggs 4, the opinion of our domain experts was that the variability of EC would indicate differences in clay content (due to erosion or soil type), or in soil moisture (primarily due to landscapepositionmediated spatial water
PAGE 174
152 movement). We studied the EC maps together with other data (elevation, WI, soil type, normalized yield) to allow us to distinguish between clayand moisturespecific effects. 4044300 369200 369300 369400 369500 36960 A: Surface EC 4044300 I 1 1 1 1 Â— 369200 369300 369400 369500 369600 B: Deep EC 20 18 16 14 12 10 8 6 Figure 712. Veris electroconductivity maps of the field. Surface EC refers to a set of coulters that represents nearsurface conditions, whereas Deep EC integrates conditions down to approximately 1.5 m. The 13 IM framework simulation domain locations are shown as crosses. The EC maps (Figure 71 2) show erosion on the ridge and some highEC areas further east that could be either wet or highclay. In the case of the areas around the CaA(N) point, high EC is probably a combined effect of a higher clay content near the surface associated with the making of the waterway with a higher water content in the waterway proper. Northeast of the GrB point, high EC is probably due to erosion, given the presence of a small (lowWI) hilltop there. In the region located southeast of the Hn point, high EC may be associated with wetness due to extremely poor drainage, combined with the presence of some clay particles brought down by erosion from higher
PAGE 175
153 landscape positions. Finally, the elevated EC of the region north of the GrA point is apparently associated with mechanical soil disturbances during the making of an old road that traversed the field in an eastwest direction. Soils The ridge on the western edge of Suggs 4 field (Figure 79) is covered with a Loring B soil, which is characterized by erosion and a fragipan. As discussed above, the erosion is consistent with the high EC shown in Figure 712. The restrictive fragipan does not translate into flooding because, due to its elevated landscape position and relatively high gradient, the area mostly has a low wetness index (Figure 710); it does not receive water from other landscape positions, and it should be able to drain laterally if necessary. The fragipan does impose another problem, however, which is aggravated by erosion effects: lack of water. Next below the Loring soil is the Grenada soil, which also has a fragipan, although perhaps not as well developed in some areas as the one underlying the Loring soils (J. Mcintosh, pers. comm.) This region is less eroded than the Loring soils, and has received silt from them; thus, EC is lower (Figure 712) and the distance to the fragipan should be higher. The WI is higher than that of the Loring soils (Figure 710) due to lower slopes and greater contributing areas. This is expected to result in more availability of water, and thus in greater yields. Below the Granada soils lay the Calloway soils. These also have a fragipan, more clay than the Grenada soil, and are wetter. Maize crops typically have germination and emergence problems in this region.
PAGE 176
154 Yield Maps The spatial sampling density of the yield data was higher than that of elevation and EC. This allowed us to use narrower (3or 5meter) lag classes to estimate the isotropic yield semivariograms. In all cases, the best fit was obtained with exponential models. Figure 713 shows the semivariograms for all the years considered. Table 72 shows the parameters, a measure of goodness of fit, and the lag class width used in each case. Nugget values were generally quite high (Ci/(Co + Ci) < 0.873), suggesting the presence of highfrequency noise due to lags and the complex dynamics of crop redistribution within the harvester and its temporal interaction with the grain flow sensor, as discussed by Birrell et al. (1996) and Pierce et al. (1997). This highfrequency noise motivated us to increase the number of points contributing to the interpolated yield maps to 32 from the default (24) used in Surfer. Table 72. Semivariogram parameters for the yield map data. Year Co (kg/ha)' Co + C, (kg/ha) 2 a(m) r 1 Lag class width (m) 1997 Maize 710,000 4,840,000 66.6 0.999 3 1998 Soybeans 61,400 202,000 37.2 0.974 5 1999 Maize 613,000 2,592,000 78.0 0.996 5 2001 Maize 383,000 3,022,000 58.2 0.993 5 Figures 714, 715, and 716 show the resampled yield maps used for building the normalized yield (NY) map. Despite the interannual spatial yield variability shown, some behaviors were conserved between years, such as the crescentshaped highyield zone stretching southward from the western end of the northernmost waterway almost to the southern field boundary, the highyielding Loring and Grenada soils in the northwest sector of the field, and the lowyield spots around the central waterway. These features are shown more clearly in the normalized yield map (Figure 717).
PAGE 177
155 Maize 1997 5000000 ? 4000000 x: g 3000000 0) o ? 2000000 > E $ 1000000 0 >o" J* 20 30 Separation Dstance (m) T 50 Soybean 1998 250000 ? 200000 J! 150000 o fl 100000 a E $ 50000 y6 eoooo ooooooÂ° /6 25 50 75 Separation Dstance (m) 100 Maize 1999 3000000 2000000 1000000 i Separation Distance (m) Maize 2001 3500000  ~ 3000000' U JS 2500000I 1 ~ 2000000' B U  1500000 (0  1000000 X 500000 0 J3r c 1 eecrrrtro 25 50 75 Separation Dstance (m) jure 713. Semivariograms fitted to the observed yield data. 100
PAGE 178
369200 369300 369400 369500 369600 369700 Figure 714. Resampled 1997 maize yield data. 404480040447004044600404450040444004044300369200 369300 369400 369500 369600 369700 Figure 715. Resampled 1998 Soybean yield data.
PAGE 179
157 4044800 4044700 4044600 4044500 4044400 4044300 N 369200 369300 369400 369500 369600 369700 Figure 716. Resampled 1999 maize data. 4044800 4044700 4044600 4044500 4044400 4044300 369200 LoC2 CaA(N) G rB(N) C aB x x x LoB CaA(W) GrB X X x CaA(E) X GtA X 130 125 120 115 110 105 100 95 90 85 80 369300 369400 369500 369600 Figure 717. Normalized threeyear (1997, 1998, 1999) yield (NY) map. The marked points show the 13 locations selected for simulation.
PAGE 180
158 Updating the Soil Map, Selecting the Simulation Locations Several elements of the normalized yield map (Figure 717) can be predicted by inspecting the soil map (Figure 72) and elevation map (Figure 79): the lower yields of the eroded LoC2 soils along the field boundary in the northwest corner of the field; relatively high yields in the relatively deep, relatively welldrained Loring (LoB) and (especially) Grenada (GrB) soils; lower yields in the shallower, drainageimpaired Calloway soils, etc. 9) Mysteriously lowyielding area 7) LoA? GrA? Check slope (should be lower) 3) Why does this have higher EC? If it's a hump, perhaps erosion. 1) This follows changes in EC and, to a lesser extent, yield. Maybe this is a soil with a pan closer to the surface, such as PuB (Purchase B). 6) Ca unless hump; otherwise eroded Gr? 2) LoA? GrA? Check slope (should be lower) Figure 718. Summary of anomalies and candidate zones for additional soil map units identified during discussion sessions with the domain experts.
PAGE 181
159 Other temporallystable behaviors were unexpected, however. For example, the low yielding zones in the north center and in the southwest corner of the field, the high yielding spots in the center of the field, etc. We collected and mapped the set of unexpected behaviors (Figure 718), and generated the following hypotheses (numbered as in Figure 718): 1 ) Zone 1 of Figure 718 (erosionprone, lowWI landscape position; high EC; low NY) corresponds to a very eroded Loringlike soil (Purchase series) with a ' fragipan very close to the surface. Purchase soils (coarsesilty, mixed, thermic Ochreptic Fragiudalfs) have been previously reported in a neighboring field studied with a firstorder soil survey (Mueller et al., 2003). 2) The characteristics of Zone 2 (medium WI; low EC, high NY with respect to other regions in similar landscape positions), the previously mentioned crescentshaped region, suggest that water does not limit growth here as much as elsewhere. We hypothesized this is a Grenada soil, either exceptionally deep, or having a poorly developed fragipan. 3) Zone 3 (low WI, high EC, high NY) is an eroded Grenada soil on an elevated landscape position. 4) Zone 4 (very high WI, high EC, low NY) is a very wet Henry soil. Henry soils (coarsesilty, mixed, active, thermic Typic Fragiacualfs) are common in low landscape positions throughout the region (SCS, 1973), and have been previously reported in a neighboring field (Mueller et al., 2003). 5) Zone 5 (high WI, low surface EC indicating low erosion, high deep EC indicating water or clay, low NY) is a Calloway soil with drainage problems. 6) Zone 6 (low WI, high EC, medium NY) is an eroded Grenada soil. 7) Zone 7 (low WI, low EC, high NY) is a deep Loring or Grenada soil, similar to that of Zone 2. 8) Zone 8 (low WI, low EC, high NY) is a deep Grenada soil, as in Zone 2. 9) Zone 9 (low/medium WI, low surface EC, high deep EC, very low NY) was taken out of consideration because the farmer remarked that it was very wet (and usually had very poor stand quality) because in the past a roadbuilding crew had moved earth in a way that prevented water running off the field in that sector. Water ponds as a consequence.
PAGE 182
160 Table 73. Soil probe observations corresponding to the anomalies in Figure 718. a o Q N 5 = ss E c c c u u on u Q 3 403 GrB 4 407 Hn 1 All PuB2 Purchase soil. There are 7.5 cm of topsoil, 7.5cm of transition. Fragic properties begin at 30 cm. Roots should not go much further than 45 cm. Very dry! 2 402 GrB(Ba) 23 cm of topsoil, the first 10 cm very clean; 1023 cm, very mottled; 2356, dominantly brown, highlymottled gray subsoil; 5681 cm, mottled gray and brown (wetter); 81122 cm, Dominantly grey. No restrictions to 122cm. This is a Kurk soil, i.e. a Calloway without restrictions. 2 433 GrB(Ba) Has mottling. Brown subsoil below 23 cm with redox features. At 61 cm the (predominantly brown) subsoil becomes very mottled, still good soil, with no restrictions. At 97 cm, hit gray & clay. No fragipan until the bottom (122+ cm), but clay may be restrictive. Behaves like a Grenada. Very thick topsoil, 23 cm. Subsoil gets darker. Virtually no root restrictions to 76 cm. Some redox at 66 cm. Minor restriction at 107 cm. Grey begins at 89 cm. Water perches there. Very mild fragic properties at 122 cm. The best soil we've seen. It looks depositional. It's more like a Loring than a Grenada, but 6 m SE of here it's more like a Grenada. Wet! First few cm has a lot of OM. Total topsoil about 1012 cm. Below that, grey subsoil. Deeper grey (more clay) at 56 cm. Saturated at 81 cm, still a grey, heavy clay mess. Approximately 30% clay, 6568% silt. Fragic properties at 1 12 cm. Henrylike, but the pan is a little deeper. Behaves like Routon. Call it Henry. 5 405 CaA(W) Topsoil 18 cm. Some mottling, wetness. Below it there's grey (water). Grey but not too clayey down to 51 cm. Drying out from 66 cm. At 91 cm we start getting more fragic properties. It's not a strong pan but it's got a fragic character. It's a Calloway. This looks like a disturbed Grenada soil. It's saturated from 38 cm down. The material feels alluvial. No mechanical restrictions until 91 cm, when it gets harder. There seems to be a pan at 107 cm. 20 cm of beautiful topsoil; hit pan at 51 cm; 1238 cm, beautiful subsoil; 3851 cm, mottled subsoil. Pan at 51 cm. This soil is an eroded Loring, based more on depth to the pan than on topsoil thickness. Compacted zone at 1320 cm. Grenadalike. There are lots of weeds around this point. Clay (grey) starts at 81 cm. At 91 cm the probe goes no further. Mottling at 13 cm, grey at 25. "Going through butter" for the next 38, but very grey, very wet. No restrictions. Significant pick up of clay at 76 cm. Depth 122+ cm. The "welldrained" pan soils will be better for row crops than these worsedrained soils with no mechanical restrictions. Calloway. 6 A08 GrA 7 401 N/A 8 404 9 419 GrB(N) N/A
PAGE 183
161 Soil Probe Observations We made a series of soil probe observations (Table 73) to address our previously mentioned soilmapanomaly hypotheses. A soil scientist (Jerry Mcintosh, NRCS), a crop consultant (John Potts, AgConnections, Inc.), and myself traversed the field, making and discussing soil probe observations (Figure 719). Figure 719. Field observations with a soil probe. John Potts (left) and Jerry Mcintosh (right) examining a soil core. (Photo courtesy of Rick Murdock.) Most of our observations (Purchase soil of Zone 1 , Henry soil of Zone 4, Calloway soil of Zone 5, Grenada soil of Zone 8, Calloway soil of Zone 9) supported their corresponding hypotheses. The apparently disturbed soil at Zone 6 was initially surprising, but the farmer recalled that in the past a road crossed the field eastwest, running near the Zone 6 observation point. He suggested that material removed from the roadbed could have been deposited in Zone 6.
PAGE 184
162 Our two observations in Zone 2 revealed somewhat different soils, but both shared the lack of a strong limiting horizon. For simplicity we decided to consider them as part of the same soil type, which we called GrB(Ba). The observation in Zone 3 revealed a soil more appropriately described as Loring than Grenada. However, a quick probe a few meters away from the sampling location showed a Grenada soil. We decided to label the zone as Grenada B (GrB). The results from Zone 7 were somewhat contradictory. The topsoil and subsoil did not seem eroded. This is consistent with a Grenada soil. However, the depth to the pan was low (51 cm), which is inconsistent with the zone's high yield. Other observations conducted nearby showed fragipan depths of approximately a meter. This discrepancy suggests that this zone may have spatiallyvariable fragipan development. The Simulation Domain As explained before, we wished to simulate a reduced but representative set of locations in the field, attempting to represent a distinct soil type with each location. We simplified the candidate regions, consisting of the union of the original soil types of Figure 72 and the additional zones shown in Figure 718, to the 13 soil mapping units shown in Figure 720. For this we discarded Zone 7 (as probably very similar to Zone 2) and Zone 9 (considering it impractical to simulate due to its runoff problems), and chose one point per resulting soil type, located in or near a temporally stable NY zone (Figure 716). This latter criterion follows from the conclusions of Chapter 4 regarding the convenience of sampling at temporallystable locations as a way to maximize the predictive power of a spatiotemporal simulation model.
PAGE 185
163 LoC2 CaB "PuB2^ V8> 3 CgaA(S)^) Figure 720. Set of 13 soil types used as the IM framework simulation domain. Free ovals label the soil map units containing them. Ovals contained within larger ovals denote new soil units arising from our anomalydriven hypotheses. Knowledge Elicitation: Populating the Neighborhood Criteria Figure 721 shows the soil depth criterion data, elicited from interaction with the experts. In the discussion sessions related to this criterion we made inferences about the soil depth in successive pairs of adjacent soil mapping units based on all the available information, reached a consensus regarding our level of confidence in the inferred spatial relationship, and represented it as an arc on the background bitmap of the field using Microsoft Visio Professional 2002 (Microsoft Corp., 2001). We revisited and successively refined the network over several meetings.
PAGE 186
164 An example of the inference process follows: EC, landscape position (LP), and our soil probe data suggested that the LoB soil is deeper than the PuB soil, because the latter has higher EC, a more erosionprone (highgradient) LP, a very shallow depth to the fragipan, and a very eroded topsoil (Table 73). Similarly, EC, NY, and LP suggested that the LoB is deeper than the LoC2 soil; in LoC2, EC is higher, the slope is greater, and the NY is lower than in LoB. However, our confidence in this relationship was lower than in the previous case. The soil depth neighborhood criterion (Figure 721) describes soil depth, a crop model parameter. The same kind of criterion can be used with a model output such as wetness. Figure 722 shows the wetness neighborhood criterion, a network of expected inequalities of plantextractable soil water in the first 45 cm of soil. This criterion was populated using the inferential methods described above. Finally, Figure 723 shows a network of relationships of an input variable, plant density, that is environmentally dependent (i.e., can vary from year to year) but cannot be predicted by CERES. We chose to use two of the criteria shown above, soil depth and wetness, in the IM framework implementation for this study. We set the constraint function (Figure 76) parameters of both criteria to the arbitrary values shown in Table 74. Table 74. Values adopted for the neighborhood criterion's constraint thresholds; s is a very small constant used for avoiding numerical divisionbyzero errors. Constraint x 2 *3 *4 x 5 x 6 Strongly greater than 100 0 E 100 Greater than 100 0.05 0.05+8 100 Weakly greater than 100 0.1 0.1+6 100 Strongly less than 100 8 0 100 Less than 100 0.05e 0.05 100 Weakly less than 100 0.16 0.1 100 Strongly similar to 100 0.025e 0.0083 0.0083 0.025+e 100 Similar to 100 0.056 0.017 0.017 0.05+8 100 Weakly similar to 100 0.16 0.033 0.033 0.1+8 100
PAGE 187
165 Figure 721: Soil depth neighborhood criterion. The figure shows the Suggs 4 field, the soil mapping units it is divided into, and the soil depth relationships elicited from the experts. Free ovals label the soil map units that contain them. Ovals nested within larger ovals denote soil units not shown in the soil survey but identified from anomalies in EL, WI, EC, and NY maps and fieldwork with a soil probe. The ovals' labels correspond to the NRCS nomenclature for soil series. The part of the labels in parentheses differentiates between different map units containing the same soil type. With respect to the arc labels, GT is "greater than". For example, the Loring B (LoB) soil is expected to be deeper than the highly eroded Purchase B2 (PuB2) soil; "Sim" means "similar to". The dotted lines represent a weaker relationship than the solid lines.
PAGE 188
166 Figure 722: Wetness neighborhood criterion. The figure shows the Suggs 4 field, the soil mapping units it is divided into, and the wetness relationships elicited from the experts. Free ovals label the soil map units that contain them. Ovals nested within larger ovals denote soil units not shown in the soil survey but identified from anomalies in EL, WI, EC, and NY maps and fieldwork with a soil probe. The ovals' labels correspond to the NRCS nomenclature for soil series. The part of the labels in parentheses differentiates between different map units containing the same soil type. With respect to the arc labels, WT is "wetter than". For example, the Henry (Hn) soil is expected to be wetter than the Grenada B (GrB) soil, and "Sim" means "similar to". The dotted lines represent a weaker relationship than the solid lines.
PAGE 189
167 Figure 723: Plant density neighborhood criterion. The figure shows the Suggs 4 field, the soil mapping units it is divided into, and the plant density relationships elicited from the experts. Free ovals label the soil map units that contain them. Ovals contained within larger ovals denote soil units not shown in the soil survey but identified from anomalies in EL, WI, EC, and NY maps and fieldwork with a soil probe. The ovals' labels correspond to the NRCS nomenclature for soil series. The part of the labels in parentheses differentiates between different map units containing the same soil type. With respect to the arc labels, GT is "greater than". For example, the Grenada B (GrB) soil is expected to have a greater plant density than the Henry (Hn) soil, and "Sim" means "similar to". The dotted lines represent a weaker relationship than the solid lines.
PAGE 190
168 Knowledge Elicitation: Parameter Sensitivity and Parameter Selection Causal diagrams (Eden et al., 1992; Howard and Matheson, 1984) are directed graphs that model perceived causeeffect relationships among different variables. Variables are represented as nodes, and the relationships between them as arcs. Typically relationships are either positive (labeled with a "+") or negative ( ""). A positive relationship is such that a change in the predecessor node (for example, an increase) causes a change in the same direction (i.e. an increase) in the successor node. In a negative relationship, the changes in predecessor and successor have opposite directions. As a first step in selecting the parameters to estimate in the IM framework, we made a causal diagram model of water balance in the field, initially populated by myself based on the CERES water balance model (Ritchie, 1985), and successively modified during group discussions. We adopted the following ideas from Carley and Palmquist (1992) regarding knowledge representation in model form: Â• Mental models are internal representations. Â• Language is the key to understanding mental models (i.e., mental models can be represented linguistically). Â• Mental models can be represented as networks of concepts. Â• The meaning of a concept for an individual is embedded in its relations to other concepts in the individual's mental model. Â• The social meaning of a concept is not universally defined. Instead, it is defined through the intersection of individuals' mental models. Thus, we used language as our primary information exchange tool, through a series of (open and directed) questions I posed to the domain experts. Aware of the lack of universality mentioned above, we supported our use of language with drawings and diagrams such as the ones shown in Figure 724.
PAGE 191
169 Figure 724. Using diagrams to support the discussion and knowledge elicitation process. Jerry Mcintosh (NRCS) explains spatial water movement in the region. The resulting causal diagram (Figure 725) provides a simplified representation of how the waterrelated aspects of our spatial crop model work, including the relationships between readily available data (bottom row of nodes), the relevant physical processes in the system (central region of the diagram), and model parameters (top row of nodes). Some of these parameters (e.g., roughness), and the processes they control (e.g., subsurface flow) do not exist in the CERES water balance, however. Making the causal map helped the domain experts and myself attain, in terms of the previously mentioned ideas of Carley and Palmquist (1992), common meanings for the relevant water balance concepts. We could then proceed to selecting parameters to estimate with the IM framework. (With these data we could also have populated parameter sensitivity criteria (Figure 74)).
PAGE 192
170 Figure 725. Conceptual water balance model expressed as a causal map. The "LPD" label refers to a landscapepositiondependent relationship. The "SB" label refers to a relationship dependent on nonscalar texture data. Figure 726 shows the record of a discussion session. In this session we revisited a previous discussion about the limiting factors of each of the domain units shown in Figure 720. 1 also presented some additional questions. In all cases, the participating experts were asked to reach a consensus about their level of confidence in their answer, expressed as a probability or a verbal quantifier ("Certain", "Probable", etc.) To obtain comparable results across sessions, and to provide the experts with a common conversion from verbal quantifiers to probabilities, we used a modified version of the scale proposed by Renooij and Witteman (1999), shown in Figure 61. In a subsequent discussion session, the data shown in Figure 726 were combined into a common representation (Figure 727) including the group's perceptions on how the limiting factors are related to CERES water balance parameters. We then chose the parameters to estimate (Table 75).
PAGE 193
171 1 Discussion Session #7 Participants . Jerry Mcintosh (NRCS) * Rick Murdock (Ponderosa Farms, AgConnections, Inc.) > Andres Ferreyra (University of Florida) Agenda: Soil limiting factors We're going to review the yield limiting factors Andres discussed with John & Rick on 7/5/02. PuB LoC2 LoB Probable Expected Not e. peeled Improbable What limits yield is primarly the shallow depth to the fragipan. Also, very shallow topsoil depth. Relevance: harder to plant, worse seed soil contact, less available nutrients, N&P wil run off, denitrify, etc. Probable E xpected Limiting factor: High water output, low water Input. Secondary factor: fragipan depth. Greater runoff due to slope. Does not get extra water from anywhere else on the landscape. Probable 4 Not expected Improbable Limiting factor: RaHih lfw bycr at 36". (John's choice) Divergent landscape position, but not as strong as LoC2. GrB GrB (Ba) CaA(N) Probable < Expected Not expected improbable impossible 1 YbkMrnithg factor: liming byer When day at bottom dries, I wil get hard even f 1 does not have a strong pan (John's choice. Jerry demotes t to secondary). Jerry thnks depth to nestrrtrve features (fragpan or +ctay%) is the way to express this. Note that GrB gets more water from upsbpe than Lo. The area would be expected to yetl better than the rest Probable Expected Not expected Improbable Dynamic balance between early spring wetness problems (refers to moffling depth) and hardness of clay when the soil gets dry. This is a weatheryear dependent behavior, but the wetness problem is NOT typical I.e. ifs Doped toward the clay hardness induced iimitaoons This region showed no significant mechanical impedance down to at least 4'. Jerry: physicochemical processes ' Probable Expected Not expected Improbable Dynamic balance between early spring wetness problems {refer to mottling depth ft grayness) and hardness of clay when the soil gett dry. T his is aweatheryear.de pe nd e nt Â• 'lit 1U1 > ekHci to the iurlic* "Lateseason soybeans are the one case where a perched water bote can be a good CaA (W) Certain j100 Probable Expected Not expected Improbable Limiting factor: accumulation of water (high Wi) affecting stand quality in the early season. This area could benefit for late season soybeans.  Certain Probable Expected Not expected Improbable Hn Limiting factor: poor stand quality due to excessive water in the early season. Jerry adds the hardness of the clay when it dries out. See points 407 and 428. if [ Certain CaA (S) Probable Expected Not expected Improbable i Limiting 'actor : poor Stand quality due to excessive water M the ealy season. i Whan thev grey stuff dries out, tt wril limit root system depth when it dries This item not as certain because 'the brown stun* In the subsoil gives you more mm to wiggle" Ntfl forgiving. Why' This is the wettest of the three Calloway A regions. Somewhat similar to Henry soil behavior although subsoil is brown, not grey May have sixface effects (ponding). Note how in 408 and AOS we noted that the topsoil was not good, yet Che subsoil was great Consider infiltration * convergent LP. as limiting. 12 CaB2 [Jerry: CaA (E)] GrA Certain Probable4 Expected Not expected Improbable Limiting factor: start of the day at 18" in point 415 will play a part if it gets dry in the summer, pan at at 32" also. This region probably gets rid of its extra water into the waterways, both in surface & subsurface flows. Probable Expected Not expected Improbable Limiting factor: start of the clay at 21" in point 408 will play a part if it gets dry in the summer, pan at 46" also. We infer this from point 408, which is transibonal between Grenada and Calloway. i Probable 4 Expected Not expected Improbable CaB Â• Limiting factor: start of the clay will play a part if it gets dry tn the summer, pan also. . Points 416, 418, A04. It does not have the excesswaterrela ted problems of other Calloway regions. A04 is probably not very trustworthy because there has been a lot of traffic out there. 15 GrB(N) Topsoil thickness The 48" limit Probable < Expected Nol expected Improbable < Limiting factor: pan at 36", start of the day at 24" in point 449 will play a part if it gets dry In the summer. Points 420, 404, 449, transition to 418. Jerry: This is probably a continuation of the GrB (Ba) "banana", that was interrupted by the waterway. 16 . Andres: Why is topsoil thickness important' Jerry: The critical property of topsoil thickness is its association with organic matter. This carries with it a greater CEC & potential availability of nutrients. Jerry: The greater the OM, the greater the infiltration. Rick: greater OM% * greater water holding cap. Rick: maybe we can add an arrow in the process network from OM to Infiltration. Andres: OM 8i residue as different entities? Rick: "OM is residue + dme". Andres: What would you propose as an upper limit to the soil depth allowable in the IM process for places where we dkJnt find restrictions up to 487 Jerry (started recording here): the soil gets good again below the pan. Jerry: I donl really care what's going on below 48". In my mind, if you've got 48" of decent soil, thaf s enough. Landscape position will modify the validity of this statement, however. 17 Figure 726. Record of a discussion session with domain experts.
PAGE 194
172 SLDEP KSAT DUL CN2 PPOP Divergent Land Pos. Depth to pan Clay hardening PuB2 LoC2 Very High High Medium Low Surface ponding Wetness (incl. conv. land pos.) CaB CaA(S) GrB(N) ) [Ca A(E) \ GrB  [ GrA 1 [GrB(Ba) CaA(N) CaA(W) Figure 727. Compact representation of the limiting factor data. The upper set of nodes represents CERES model parameters. The central set of nodes represents processes occurring in the field. The lower set of nodes corresponds to the 13 simulation domain locations. The upper set of arrows shows how the crop model parameters are related to the limiting factors. The lower set of arrows shows to what extent each factor influences yield in each soil type. Table 75. Crop model parameters and ranges used in IM framework. Parameter Definition Units Minimum Maximum NÂ° Points KSAT Saturated hydraulic cm d" 1 0.0001 0.1 16 conductivity, bottom soil layer CN2 SCS runoff curve number 72 92 17 SDEP Soil depth Cm 45 165 17 PPOP Plant density Plants m" 3 8 17 A noteworthy result of the parameter selection process is that the soil water holding limits (DUL, LL, SAT) were not chosen as parameters to estimate using the IM framework. The soil scientist indicated that the total available water (DUL LL) would remain practically constant throughout the field, although the lower limit (LL) could be expected to vary somewhat. These results are similar to those of Ritchie et al. (1999).
PAGE 195
173 However, variation of the LL can conceivably influence the model results independently of its relationship with DUL. The LL is used by the CERES water balance routine to estimate unsaturated hydraulic conductivity, which in turn is used to compute potential daily root water uptake (Ritchie, 1998). Variability in the LL could thus modify the crop's behavior under supplylimited conditions. In order to assess the expected level of variation of the LL across the field, we obtained textural fractions from laboratory measurements made by the National Soil Survey Center (NSSC) on samples of three of the soils found in the field: Calloway (NSSC, 1991a), Grenada (NSSC, 1991b), and Loring (NSSC, 1991c). We also got textural fractions for the Henry soil from the Soil Survey of Calloway County (USDA SCS, 1973). We used the Saxton pedotransfer functions (Saxton et al., 1986) to estimate the corresponding LL, DUL, and SAT values. The results, interpolated to fit in the soil layer structure used by CERES, are shown in Table 76. Note the great similarity among the soil water characteristics at different soil types. We consequently adopted a unique set of characteristics for the whole field, using values averaged over the different soil types. Table 76. Soil water holding characteristics obtained by applying the Saxton pedotransfer functions to textural fractions taken from the literature. Loring Calloway Grenada Henry Depth LL DUL SAT LL DUL SAT LL DUL SAT LL DUL SAT 5 0.140 0.330 0.510 0.140 0.320 0.510 0.140 0.330 0.510 0.140 0.320 0.510 15 0.140 0.330 0.510 0.140 0.320 0.510 0.140 0.320 0.510 0.140 0.320 0.510 30 0.140 0.330 0.510 0.120 0.320 0.510 0.120 0.320 0.510 0.120 0.320 0.510 45 0.140 0.330 0.510 0.120 0.320 0.500 0.155 0.330 0.510 0.120 0.320 0.500 60 0.140 0.330 0.510 0.150 0.310 0.510 0.150 0.320 0.510 0.150 0.310 0.510 75 0.130 0.320 0.500 0.140 0.320 0.500 0.140 0.320 0.500 0.140 0.320 0.500 90 0.120 0.310 0.500 0.130 0.320 0.500 0.140 0.310 0.500 0.130 0.320 0.459 120 0.120 0.310 0.500 0.130 0.320 0.500 0.140 0.310 0.500 0.130 0.320 0.500 150 0.120 0.310 0.500 0.130 0.320 0.500 0.140 0.310 0.500 0.130 0.320 0.500 180 0.120 0.310 0.500 0.130 0.320 0.500 0.140 0.310 0.500 0.130 0.320 0.500
PAGE 196
174 Observed Yields in the IM Framework Domain Figure 728 shows the observed crop yield in the 13 locations of interest for the three years under study. The locations are ranked by average yield over the three years. The lines linking the points on the graph do not imply spatial interpolation (some of the adjacent locations in the graph are not contiguous in space, e.g. LoB and CaA(S)); they are used to highlight yield trends. Note how the highly eroded PuB2 soil had the lowest average yield, followed mostly by shallow, poorly drained Calloway soils with relatively high clay content. On the other end, the GrB(Ba) soil had the highest average yield, followed by the GrB and GrB(N) soils. These soils, especially the GrB(Ba) are deeper than the rest, and are considered the best soils in the field (J. Potts, Pers. Comm.) 16000 i Figure 728. Observed crop yield in the 13 locations of interest. Despite the ranking made for visualization purposes, the yields in the three years under study are not directly comparable. The Suggs 4 field was planted with the same hybrid (Pioneer 33A14, yellow) in 1999 and 2001, but was planted with a somewhat
PAGE 197
175 loweryielding white maize (Pioneer 3281 W) in 1997. In order to compare the three years' behavior better, we used relative yields. Figure 729 shows the same yields of Figure 728, expressed as fractional deviations with respect to each year's mean yield across the 1 3 locations. Again, the lines are meant to simplify visualization, and do not imply any form of interpolation. 0.8 0.6 T3 0.4 '> 0.2 > 'J co 0 CD 0.2 0Â£ 0.4 0.6 1997 Â— Â• Â— o 1999 Â— ir2001 Figure 729. Observed relative crop yield in the 1 3 locations of interest for the two calibration years (1999 and 2001) and validation year (1997). Figure 728 shows that crop yield was consistently higher in 2001 than in 1999 at the 13 locations. Figure 729 shows that in relative terms, the yield in 2001 varied little around the mean. The variation in 1999 was higher, and in 1997 was higher still, especially for the two highestyielding soil types. Below we analyze the amount and timing of rainfall during the three seasons to help clarify the reasons for this observed interannual variability.
PAGE 198
176 1 31 61 91 121 151 181 211 Day of the year Figure 730: Cumulative rainfall from Jan. 1 to Aug. 23 during 1997, 1999, and 2001. Figure 730 shows the cumulative rainfall in the three years of interest. Note how, on the three years, there was a similar amount of rainfall in the weeks preceding the planting date of DOY 105. None of the crops lacked water in early growth, but the situation in the critical window around flowering was different. Figure 731 shows this in greater detail. There was abundant rainfall before flowering in 2001, which served to build a good soil water supply. After flowering, there was hardly any more rain during the critical window, so there was ample available solar radiation, and the ears set a large number of seeds. The situation in 1999 was somewhat different: a few days without rain before flowering, and ample rain (and consequently, less solar radiation) after flowering. This is consistent with the relatively constant yield difference between 1999 and 2001 across the field, shown in Figure 728. The 1997 season was different; it had the least amount of rainfall during (or immediately before) the critical window. Thus, differences across the landscape in soil
PAGE 199
177 depth, CN2, and KSAT probably played an important role in determining the relative differences in grain number and consequently, yield, shown in Figure 729. This situation is important in the context of the previously discussed (in Chapter 2) differences in sensitivity of the runoff and soil water holding parameters in the DSSAT models. Since the crop set its grain number in 1999 and 2001 under landscapeinvariant conditions of good radiation and good soil water (in 2001), or good water but lower radiation (in 1999), the soil water holding parameters would not have been sensitive in an IM parameter estimate using only yield data from 1999 and 2001, and would have been estimated poorly. This lends additional support to the idea of incorporating additional information to constrain the parameter estimation process. 101 116 131 146 161 176 191 206 221 236 251 Day of the year Figure 731. Rainfall during the crop season. Simulated flowering and physiological maturity dates are shown with arrows.
PAGE 200
178 Evaluation We examined in detail the parameter values, yield, and soil water content at four locations on the field (Figure 732) located on the slope at approximately regular distance and elevation intervals (Figure 733). 4044800 4044700 4044600 4044500 4044400 4044300 130 125 120 115 110 I 105 I 100  95 190 85 80 369200 369300 369400 369500 369600 Figure 732. Evaluation locations, shown on the normalized yield map. 152 151 c Â£ 150 05 I 149 LLJ 148 LoB CaA(W) GrB(Ba) Hn 50 100 150 Distance (m) 200 250 Figure 733. Relative position on the landscape of the four locations used for evaluation. Elevations are expressed with respect to an arbitrary reference. Distances are measured along the polygonal line linking the locations.
PAGE 201
179 We chose these locations because of their contrasting characteristics: a welldrained, relatively highyielding Loring soil; the Calloway A (W) soil which, although relatively high in the landscape, has drainage problems; the highyielding Grenada B (Ba) soil on the slope, and the lowlying, poorlydrained Henry soil. Simulations with a neighborhood criterion In scenario 1C (Table 71) we populated the OWA objective function using only one criterion, soil depth neighborhood. Figure 734 shows the results of five realizations of its parameterization process. Bars of the same color denote the same realization across the different figures. When the IM framework is run with only one objective function input, such as the soil depth neighborhood criterion (Figure 72 1 ), the parameters are not constrained to a unique solution Â— there are many combinations of parameters that can produce equally good solutions. In fact, the five solutions of Figure 734 all produced optimal (i.e. zerovalued) objective function results. Note the great variability of CN2 (Figure 734A), KSAT (Figure 734C), and PPOP (Figure 734C, and which, as shown in Table 75, was only allowed to vary between 3 and 8 plants/m ), across realizations. This was expected, because those parameters are not constrained in any way. The behavior of soil depth (SLDEP) (Figure 734B, and which controls the criterion) is different. Soil depth parameter values are more stable across realizations and soil types. Moreover, the stability is dependent on how constrained the corresponding soil mapping unit is. For example, the GrB(Ba) unit SLDEP is less variable than the LoB SLDEP. This happens because the constraints to which the LoB SLDEP parameter is subjected are less restrictive than those constraining GrB (Ba): soil depth at GrB (Ba) is expected to be greater than that of its three neighbors, each one of which is in turn
PAGE 202
180 subjected to multiple constraints. On the contrary, some of the LoB soil's neighbors are unconstrained beyond their relationship with the LoB soil. 95 90 CM z o 85 80 75 70 LoB CaA(W) A III GrB(Ba) LoB CaA(W) GrB(Ba) B Hn 0 1 0.08 Â£ E 0.06 0.02 0 LoB CaA(W) GrB(Ba) c Hn Figure 734. Five realizations of parameterization using the IM framework with only one objective function input, the soil depth neighborhood criterion. The five different realizations correspond to the five colors shown (e.g. the white bars in all the panes correspond to the same realization). IM with coupled and uncoupled models Figure 735 shows the parameter estimates from scenarios 1 A and 2A, 2year IM processes. In general, the parameter values are not remarkably different between the coupled and uncoupled models. The parameter values of the LoB soil, uppermost in the toposequence, are necessarily equal because the top cell of the coupled model does not receive runon from above. However, some differences exist downslope, caused by the need of the uncoupled model to compensate the lack of runon by means of a deeper soil or a lower runoff curve number, as previously shown in Chapter 2 (Figure 26).
PAGE 203
181 Â» 0 30 0.25 e o 20 o n E 0 15 35 0 10 005 0 00 LoB ' >:' " ,,' : ; Â— ; Â— i CaA(W) GrB(Ba) Coupled Uncoupled : >..i : i i . y Hn CaA(W) GrB(Ba) B D 140 120 ? 100 Q_ in a 80 BO 40 65 6.0 5.5 H 50 Q. a. O 4.5 LoB 4 0 3 5 LoB Still CaA(W) GrB(Ba) CaA(W) GrB(Ba) Figure 735. Parameter estimates of 2year coupled and uncoupled model IM Hn scenarios. g 14000 12000 c 10000 o> 8000 6000 4000 >^ 1750 1500 '250 5 1000 g iQ 750 tr 500 250 0 Coupled Uncoupled LoB Observed Simulated, coupled Simulated, uncoupled LoB CaA(W) GrB(Ba) Li Hn CaA(W) GrB(Ba) Q 14000 12000 Â§ 10000 en Â™ 8000 *" 6000 4000 Hn LVI Observed EM Simulated, coupled Simulated, uncoupled LoB CaA(W) GrB(Ba) Figure 736. Errors and comparison of yields for 2year coupled and uncoupled scenarios relative to observed values. Hn model IM
PAGE 204
182 96 94 92 90 Kh 86 84 82 Ri Coupled Uncoupled LoB i_j ; CaA(W) GrB(Ba) Â« 0 30 0.25 ^ 0.20 1 015 w 0 10 0.05 0.00 LoB CaA(W) GrB(Ba) Hn Hn B 1 40 I 20 ICO BO 80 40 20 0 ! 6 5 6 u g 5.5 i I 50 w iiiih^ LoB o_ O 45 0. 11 4.0 3.5 WW'y : : : "vS"'.'.' ; CaA(W) GrB(Ba) LoB CaA(W) GrB(Ba) Hn Figure 737. Parameter estimates of 3year coupled and uncoupled model IM scenarios. \ 1750 1500 1250 I 1000 w 750 a 500 252 0 C 14000 12000 g 10000 I 8000 6000 4000 Coupled Uncoupled LoB CaA(W) GrB(Ba) Â— is Hr LoB CaA(W) GrB(Ba) B 14000 12000 Â£ 10000 f 8000 >6000 4000 D 14000 12000 x: 10000 a 8000 >I I Observed H3 Simulated, coupled Simulated, uncoupled LoB 6000 4000 LoB CaA(W) GrB(Ba) Hn CaA(W) GrB(Ba) Hn Figure 738. Errors and comparison of yields for 3 year coupled and uncoupled model IM scenarios relative to observed values.
PAGE 205
183 The relative values of CN2 (Figure 73 5 A) for different landscape positions are realistic: the CaA(W) soil is situated in a highWI (due to a low gradient rather than a high contributing area) zone, where water can sometimes collect. This is consistent with low CN2 values. The GrB (Ba) and LoB soils are on a ridge and slope, respectively. This is consistent with higher CN2 values, although LoB would be expected to have a higher CN2 than GrB(Ba). Finally, the Hn soil is at a highWI position situated low in the landscape. The coupled model sends all of the runoff from positions higher in the landscape to it, and it is assumed that it flows down the slope in sheet form, and that all of it is available for infiltration downslope. This sheet runoff assumption is unrealistic, and creates a large CN2 difference between the uncoupled and coupled modelÂ— the CN2 value of the Hn soil grew to allow the coupled model to rid itself of excess runon, whereas the CN2 value of the spatiallyuncoupled model decreased to compensate for the absence thereof. Soil depth (Figure 73 5B) values in the LoB soil were somewhat higher than expected for an eroded soil with a fragipan. However, Loring soils may have poorly developed fragipans (J. Mcintosh, pers. comm.), which may allow roots to penetrate beyond the top of the fragipan. Soil depth for the CaA(W), GrB(Ba), and Hn soils is slightly lower than expected (Table 73) in the case of the coupled model. This may reflect both the experts' perception that the effective soil depth of the CaA(W) soils may be reduced because of clay at the bottom of the profile (Figure 727), as well as the aforementioned lack of realism of the sheetrunon assumption. In the uncoupled model case, soil depth was increased to compensate the lack of runon.
PAGE 206
184 Yields were simulated accurately by both the coupled and uncoupled models (Figure 736A), although the error increased markedly for the CaA(W) soil, in which yields were overestimated by the model in year 1997. Expert opinion (Figure 726) suggested that yield is limited in this soil by excessive wetness in the early season, which adversely impacts stand quality. This effect is captured by the decreased PPOP (Figure 735D). Moreover, the primary contribution to the RMSE in the CaA( W) soil came from the 1997 season, which had higher rainfall in the weeks before planting (Figure 730). The model captured the effects of excess water by reducing plant stand density, but it could not capture the effects of standing water on stand uniformity, which is what probably caused the extremely low observed yield in 1997. Pommel and Bonhomme (1998) showed how uneven stand uniformity has a greater impact on yield than low plant density in a uniform stand. The latter is the situation assumed by CERES. A noteworthy result is that the PPOP value for the LoB soil is lower than expected; the domain experts predicted that PPOP should be highest in the LoB soil, rather than in GrB(Ba). This is consistent with the unexpectedly high soil depth and unexpectedly low CN2 values in LoB Â— the IM process spuriously compensated an excessive availability of water with a decreased plant population. Results for the 3year scenarios, IB and 2B, (Figures 737 and 738) were similar to those of the 2year scenarios, with some noteworthy exceptions. The LoB case is now more consistent with expert opinion, although there may have been some tradeoff between a large CN2 value and a large SLDEP. Another noteworthy effect is the "trade" that the uncoupled model made in its compensation for the lack of runon in lower landscape positions: instead of primarily reducing its runoff curve number (CN2) as in
PAGE 207
185 the 2year scenario, it greatly increased its soil depth. This behavior may be caused by the discrete nature of the parameter space, or by effects specific to the 2001 season (the biasedweather concept presented in Chapter 2). Both possible causes support adding additional, spatialcontextdependent constraints to the parameterization process to reduce the effects that unaccountedfor yieldreducing effects have on the parameter estimation process. Soil water observations (Figures 739 through 754) reveal another source of error identified in Chapter 2: lack of knowledge of initial conditions. As previously described, we assumed DUL as the initial soil water content for all simulations. This implies the assumption that excess water would eventually drain. In some cases (Figures 746, 749, 752), the assumption was valid, because initially high (aboveDUL) observed soil water content eventually converged to the simulated water content. In many other cases our initial conditions assumption seems unreasonable in light of the results, which highlight the drainage limitations of some of these soils. Most notable among these cases were the lower layers of the Grenada and Henry soils (Figures 750, 753, 764). The LoB soil stands out as the one with the most poorly simulated water balance. Observed data correspond to a soil with little or no crop water demand. This was probably the result of the tube being located off the row and near the center of the furrow, and thus exposed to lower demand from the maize crop, as observed by Prinsloo et al. (2003). In our case, the farmer initially avoided planting over the tubes, creating this artifact. In contrast, simulation in the CaA (W) soil was relatively successful, as was the simulation for the first 30 cm of the Hn soil. The marked increase in observed soil water in the lower layers of the Hn soil may be due to unaccountedfor subsurface flow.
PAGE 208
186 Comparing the water content simulated by the spatiallyuncoupled and coupled models (black and grey lines, respectively, in Figures 743 to 754), differences generally increase lower in the toposequence and later in the season. Especially noteworthy is that the soil water content in the Hn and, to a lesser extent, GrB(Ba) soils is actually lower in the coupled model than in the spatiallyuncoupled model. This is not due to a greater demand by the crop simulated by the coupled model. On the contrary, in the Hn soil the spatiallyuncoupled model simulated more transpiration than the coupled model (Figure 755). The answer to this apparent contradiction lies in the soil depth estimated by the IM process (Figure 740D), over twice as deep in the spatiallyuncoupled model. This has the result that water demand is split between more layers in the spatiallyuncoupled model, reducing the demand from any given layer. Our first future research step is to explore new parameterization scenarios of this same system, using a full integration of neighborhood criteria and yield history criteria. We expect this to produce more realistic results than the scenarios shown in Table 71, avoiding the inconsistencies and spurious parameter tradeoffs shown above. Recommendations for Building Neighborhood Criteria For each inconsistency shown in the previous paragraphs, interaction with experts had a priori provided the information necessary to avoid spurious behavior. This knowledge can be codified as neighborhood criteria and evaluated by means of simple algebraic equations (Figure 76). Practical construction of neighborhood criteria can follow the following steps: Â• Build a common understanding of the field, the capabilities of crop models, and the role of different crop model parameters. This can be done using causal diagrams or Bayesian networks.
PAGE 209
187 Â• Select the simulation locations; their spatial distribution will depend on the temporal stability, and spatial covariance structure, of yield in the field (Chapters 3 and 4). The selection process should be hypothesisdriven, and done in the context of discussion of the available data such as elevation, electroconductivity, wetness index, and yield maps. Â• Select the parameters to estimate at each location. This can follow a discussion on the factors that limit yield in each of the chosen simulation locations, and on which crop model parameters correspond to those factors. Â• Elicit spatial constraints for the selected criteria for all pairs of locations corresponding to adjacent soil mapping units. We coded this knowledge in map form (Figures 721 to 723), but it can also be coded directly in the form of a triangular matrix, each row and column of which correspond to one of the simulation locations. Only a subset of this matrix contains valid information: the cells corresponding to adjacent mapping units. Appendix B shows our matrix for the soil depth neighborhood criterion, as well as the code (in Borland Pascal 7.0) for evaluating the neighborhood and yield criteria. Neighborhood criteria provide a measure of compliance with a priori notions of a temporally stable pattern of the model input or output under consideration. We implemented the neighborhood criteria by aggregating the piecewise linear functions shown in Figure 76. This may seem arbitrary when compared with the Spearman rank correlation method used to evaluate temporal stability in Chapter 4 following the work of Vachaud et al. (1985). However, applying the rank correlation method to our case would require having an expected a priori ranking of the model input (or output) of interest (e.g., soil depth, wetness) throughout the field, and calculating the Spearman rank correlation between the a priori ranks and the ranked data corresponding to the parameter set under consideration in each IM iteration. The fundamental problem with using Spearman rank correlation in this context is its requirement for a global set of ranks, the elicitation of which may not be simple. Our experts seemed comfortable with, and had a high confidence when answering questions about, paired adjacent mapping units (e.g., CaA(W) vs. LoB in Figure 720). However,
PAGE 210
188 they felt markedly less certain when asked to compare globally (e.g. compare the soil depth of CaA(W) with that of CaB). This can be accommodated by dividing the problem into smaller portions, and aggregating the results of tests performed on subsets of the total soil map. At the limit, when the values are evaluated two at a time, the rankcorrelationbased method is equivalent to our neighborhood criterion, scaled to a [1,1] interval instead of [0,1], and assuming a highconfidence "greater than" or "less than" comparison in which X3 Â« X4 (Figure 76) which thus produces extreme (1 or 1) results only. Our neighborhood criterion provides the additional capability of allowing the expert to reduce the impact of a comparison that he/she does not have much confidence in. An alternative method for expressing and aggregating the beliefs of experts is the DempsterShafer theory of evidence (Dempster, 1968; Shafer, 1976). This theory has a strong theoretical basis, but it requires the separate handling of two probabilities: a degree of belief, and a degree of plausibility. DempsterShafer theorybased methods have been used in such successful expert systems as MYCIN (Shortliffe, 1 976), but were not practical for our case, in which we sought to develop a process that could enable users (crop consultants, for example) to develop their own expert systems for spatial crop model parameterization. Moreover, the DempsterShafer method can generate counterintuitive results that are difficult to grasp (Zadeh, 1 984). In contrast, particular sets of constraints that can cause problems with the evaluation of neighborhood criteria, such as closed loops, can be easily detected and corrected by the user on a map.
PAGE 211
189 LoB, 015 cm, 2001. IM with 3 yield years, spatiallyuncoupled and coupled models 0 500 n ADD CO E .o co 0 300 E 0 200 > Â® 0 100 0 000 M Â« i . A 1 i i i i 100 130 160 190 Day of the year 220 Figure 739. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the LoB soil location, using IM with 3 years of yield data and no neighborhood criteria. These results are the same for both the spatiallyuncoupled and coupled models (scenarios IB and 2B). LoB, 1530 cm, 2001. IM with 3 yield years, spatiallyuncoupled and coupled models 0.500 _ 0.400 CO I 0.300 CO E & 0.200 > 0.100 0.000 100 130 160 190 220 Day of the year Figure 740. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the LoB soil location, using IM with 3 years of yield data and no neighborhood criteria. These results are the same for both the spatiallyuncoupled and coupled models (scenarios IB and 2B).
PAGE 212
190 LoB, 3045 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled and coupled models I 0.300 E 3, 0.200 0.100 0.000 100 130 160 190 Day of the year 220 Figure 741. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the LoB soil location, using IM with 3 years of yield data and no neighborhood criteria. These results are the same for both the spatiallyuncoupled and coupled models (scenarios IB and 2B). LoB, 4560 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled and coupled models 0.500 CO 0.400 E o 0.300 CO E 0.200 > Â® 0.100 0.000 100 130 160 190 Day of the year 220 Figure 742. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the LoB soil location, using IM with 3 years of yield data and no neighborhood criteria. These results are the same for both the spatiallyuncoupled and coupled models (scenarios IB and 2B).
PAGE 213
191 CaA(W), 015 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 0.400 CO E 0.300 E 0.200 > Â® 0.100 0.000 100 130 160 190 Day of the year 220 Figure 743. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the CaA(W) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). CaA(W), 1530 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 j_ 0.400 CO E Â£ 0.300 CO E 30.200 > 0.100 0.000 100 130 160 190 220 Day of the year Figure 744. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the CaA(W) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 214
192 CaA(W), 3(M5 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 o 4fin CO E o m 0.300 E 0.200 > 0.100 0.000 100 130 160 190 Day of the year 220 Figure 745. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the CaA(W) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). CaA(W), 4560 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0 500 CO 0 400 E 0 300 E > 0 200 0 100 0 000 100 130 160 190 Day of the year 220 Figure 746. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the CaA(W) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 215
193 GrB(Ba), 015 cm, 2001 . IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0 500 n U /inn E .o 0 300 (cnr 0 200 > Â® 0 100 0 000 100 130 160 190 Day of the year 220 Figure 747. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the GrB(Ba) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). GrB(Ba), 1530 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 CO 0.400 E ,o 0.300 CO E 0.200 > 0.100 0.000 100 130 160 190 Day of the year 220 Figure 748. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the GrB(Ba) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 216
194 GrB(Ba), 3CM5 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 0.400 CO E 0.300 CO E 0.200 > 0.100 0.000 100 130 160 190 Day of the year 220 Figure 749. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the GrB(Ba) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). E CO E > Â® GrB(Ba), 4560 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 100 130 160 190 Day of the year 220 Figure 750. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the GrB(Ba) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 217
195 Hn, 015 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 U.HUU CO E o CO 0.300 E 0.200 > CD 0.100 0.000 At 1 1 1 1 100 130 160 190 Day of the year 220 Figure 751. Simulated and observed soil water data for the 2001 crop season, 015 cm layer of the Hn soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). Hn, 1530 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 0.400 CO E o 0.300 CO E 0.200 > CD 0.100 0.000 100 130 160 190 Day of the year 220 Figure 752. Simulated and observed soil water data for the 2001 crop season, 1530 cm layer of the Hn soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 218
196 Hn, 3045 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 U.4UU CO E o 0.300 co E 0.200 > Â® 0.100 0.000 100 130 160 190 Day of the year 220 Figure 753. Simulated and observed soil water data for the 2001 crop season, 3045 cm layer of the Hn soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B). Hn, 4560 cm, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 0.500 CO 0.400 E o CO 0.300 E > 0.200 0.100 0.000 100 130 160 190 Day of the year 220 Figure 754. Simulated and observed soil water data for the 2001 crop season, 4560 cm layer of the Hn soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B).
PAGE 219
197 Hn, 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 400 300 E Â£ 200 o_ LU 100 0 100 130 160 190 220 Day of the year Figure 755. Simulated and observed cumulative transpiration for the 2001 crop season, Hn soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario 1 B); the grey line relates to the coupled model (scenario 2B). GrB(Ba), 2001. IM with 3 yield criteria, spatiallyuncoupled (black) and coupled (grey) models 400 300 ? Â£ 200 o_ LU 100 0 100 130 160 190 220 Day of the year Figure 756. Simulated and observed cumulative transpiration for the 2001 crop season, GrB(Ba) soil location, using IM with 3 years of yield data and no neighborhood criteria. The black line corresponds to the spatiallyuncoupled model (scenario IB); the grey line relates to the coupled model (scenario 2B)
PAGE 220
198 Farmers, crop consultants, and soil scientists possess valuable knowledge that can be harnessed to constrain the parameterization of spatial crop models, increasing the realism of the simulations. Crop consultants can be trained to manage the discussion process that generates the knowledge representations for formalizing, for their specific conditions, neighborhood criteria such as the ones shown in Figures 721 to 723. A problem with the IM framework proposed herein is that, to date, it lacks analytical ways to estimate parameter uncertainty. However, the high speed at which the IM framework can evaluate neighborhood criteria (as opposed to running a crop model) suggests a development route in which the knowledgebased elements of the IM framework are separated from running the crop model. Using a Monte Carlo approach, it may be possible to estimate a joint a priori parameter distribution for use in a formal parameterization setting that combines crop models with a tool such as an ensemble Kalman filter (Bostick et al., 2003; Koo et al., 2003). We see this as an important avenue for future research. Michalewicz and Fogel (2000) stated that multiobjective optimization problems can be solved either by simplifying the problem so traditional methods are applicable, or by keeping the structure of the problem and using a nontraditional approach for its solution. The process of aggregating the multiple criteria into a unique fitness value implies the former and, subsequently, changes the nature of the original optimization problem. This could be avoided by using a dominancebased multiobjective method to produce a set of nondominated solutions or Pareto surface (Corne et al., 2003). However, at this stage of our work, the value that a Pareto surface may have for a crop consultant is unclear to us.
PAGE 221
199 The OWA operator is a flexible tool for aggregating multiple sources of data. It is a special case of the Choquet integral (Yager, 2002), the application of which in the IM framework would allow additional, confidencebased weighting of the criteria. This implies that a priori knowledge regarding confidence in a knowledge source or the results of a particular criterion could be used to modify their influence on the objective function independently of the OWA weights. This is a topic for further study. Conclusions In this study we introduced a novel concept for parameterizing spatial biophysical models: a framework for eliciting and using expert knowledge (from domain experts such as the farmer, crop consultants, and an NRCS soil scientist), together with historical yield data, in an inverse modeling context. We qualitatively and quantitatively defined a model of spatial relationships between model inputs or outputs over space (spatial parameter model, or SPM), and applied it to a representative case study using locally available sources of knowledge. The SPM consists of a series of criteria, each of which constrains a different model input or output, that are aggregated using an orderedweighting average (OWA) operator. We found that, based on existing data such as RTK elevation maps, electroconductivity maps, soil survey maps, and simple field observations made with a soil probe, it was possible to identify appropriate simulation locations, to select parameters to estimate, and to populate the different criteria that comprise the SPM. We applied our framework (using criteria based on yield, or on soil depth) to the parameterization by inverse modeling (IM) of simple spatiallyuncoupled and coupled crop models. The models' ability to reproduce observed yield was good with both two and three years of yield data used for parameter estimation. However, their ability to
PAGE 222
200 reproduce observed spatiotemporal soil water patterns was landscapepositiondependent. There were several spurious compensations between parameters, which resulted in parameter combinations that best simulated observed yields, but were not consistent with soil probe observations or expert opinions. The results were similar; the coupled model did not produce noteworthy improvements over the spatiallyuncoupled model, although the latter clearly compensated the lack of runon by means of spurious parameter value combinations. These artifacts can be countered using tools such as the soil map neighborhood criteria developed herein. Numerous opportunities for further research exist, especially in developing a method that can provide the user with estimates of parameter uncertainty, which our IM framework currently lacks. Separating the knowledgebased elements of the IM framework from running the crop model, together with Monte Carlo techniques, may provide such an opportunity. Our IM framework could be used to estimate a joint a priori parameter distribution for use in a formal parameterization setting combining crop models with a tool such as an ensemble Kalman filter.
PAGE 223
CHAPTER 8 CONCLUSIONS The literature to date on the inversemodelingbased parameterization of spatial crop models is dominated by models in which the different uncoupled models. In Chapter 2 we studied three possible sources of error for such models: error from biased weather in the years of yield data used for the parameterization process, errors due to lack of knowledge of initial soil water conditions, and error from lack of spatial coupling and water transport among different landscape locations. We showed analytical proof that the spatiotemporal infiltration behavior of a coupled water balance model cannot be reproduced through a modification of the parameters of an uncoupled model. The corresponding yield prediction limitations of the uncoupled model were confirmed, using an example, both at the parameter estimation and validation stages. In our example, however, weather biases and the knowledge of initial conditions greatly impacted the predictive capability of the coupled model, and had less effect on its uncoupled counterpart. We concluded that the use of fully coupled spatial crop models requires highquality data. Practical precision agriculture applications are characterized by uncertain initial conditions and the possibility of biased weather. Under these circumstances, the use of a coupled model may not be justified, especially for low landscape positions. In Chapter 3 we explored three methods for solving the cropscouting problem of concurrently obtaining an optimal spatial sampling scheme for a phenomenon of interest 201
PAGE 224
202 (e.g. yield), and an optimal closed scouting path that links the locations of the sampling scheme. The three methods belong to two different groups: two search for an optimal sampling scheme and then link the sampling locations into a closed scouting path by solving the Traveling Salesman Problem (TSP). The remaining method solves for the sampling locations and the scouting path simultaneously, using a modified Kohonen selforganizing feature map (SOFM). The three methods provide a principled approach to the design of cropscouting activities as a form of spatial sampling, and they are sufficiently quick and accurate to be usable in practical applications. The TSP methods (MMKV+TSP and MMSD+TSP) tended to make slightly shorter tours than the SOFM, although the three methods' tours were never longer than the expert opinions. The TSP methods also typically estimated yield slightly better than the SOFM. When runtime is unconstrained (and a semivariogram is available), the MMKV+TSP case seems most appropriate. Contrarily, when runtime is strongly constrained MMSD+TSP may be more dependable. In intermediate situations the three methods are practically equivalent. In Chapter 4 we combined the scaled semivariogram technique proposed by Vieira et al. (1991) with two simulated annealing algorithms to reduce the number of locations necessary to describe water content in our 8hectare study area from 57 down to 10 points. The scaled semivariogram allowed us to incorporate data from several dates, both to reflect timeindependent behavior of water content and to compensate for the relatively small size of the individual datasets. Of the two simulated annealing algorithms. Spatial Simulated Annealing (van Groenigen and Stein, 1998) produced more consistent results, i.e. greater repeatability,
PAGE 225
203 than the Sacks and Schiller method (Sacks and Schiller, 1988), although the solutions provided by both algorithms were quite similar. Running multiple instances of the optimization process is recommended, especially if using the Sacks & Schiller method. Our proposed method predicted water content across the validation set with relatively low errors: over 70% of all the predicted water contents had an error within Â±10%, acceptable for the application it was designed for. The method also captured the spatial variability of water better than regular grids or randomly generated patterns. However, the SMSE (scaled mean squared error) based scenarios performed better than the scenarios using an SKV (kriging variance) criterion. We detected temporal stability in the dataset. This phenomenon implies the existence of spatial nonstationarity of the water content across the field, leading to the violation of kriging assumptions and the degradation of the quality of kriging estimates. However, the SMSEbased optimization scenarios incorporated temporally stable extreme (wet and dry) points into the optimal subset, using them to capture the nonstationary behavior. Our method may be improved by combining a spatial water movement model with a crop simulation model to provide a temporally variable trend that can be used to eliminate possible spatial nonstationarity. In Chapter 5 we showed that the runtime of a simulated annealing based crop model parameterization process was greatly reduced through the reuse of simulation results across successive iterations of the algorithm and across locations within an environment.
PAGE 226
204 The performance of the modified simulated annealing algorithm we used was parameter value dependent. However, a conservative parameter combination was found (etc = 0.995, oc h = 1 .000, c f = 0.000 1 , and m > 5) that ran much faster than a grid search, its runtime tending asymptotically to that of the grid search as the number of locations of interest grew, while converging to objective function values (and the corresponding parameter combinations) practically identical to the global optima determined using the grid search method. Adoption of the proposed algorithm can produce runtime reductions on the order of 25% 75%, depending on the geometry of the simulation domain. Additionally, it can be used to parameterize coupled spatial crop models in which parameter values at one location can affect parameter values at other locations, a task not possible using a grid search. Finally, the SA algorithm can very quickly produce approximate answers useful in practical applications. Chapter 6 dealt with Bayesian networks: probabilistic tools for making inferences with different sources of knowledge. Bayesian networks provide a powerful tool for discussing complex concepts. With some practice, crop consultants, extension professionals, and clients with minimal experience using personal computers can easily understand the probabilistic ideas behind Bayesian networks, as well as the use of interactive software tools for their construction. Effective model definition improves with experience; the crop consultants / extension professionals with which we interacted rapidly became skilled enough to define and populate simple models in 12 hours.
PAGE 227
205 Tools for Bayesian network modeling are readily available on the market. For example, a powerful trial version of the Netica software used for this study can be downloaded from the manufacturer's website (www.norsys.com). The techniques briefly described in Chapter 6 are not limited to the discussion of agricultural systems management. Any extension activity that requires discussion of causeeffect relationships, such as health care, safety, and mechanicsrelated topics, can benefit from Bayesian network supported dialogue. In Chapter 7 we introduced a novel concept for parameterizing spatial biophysical models: a framework for eliciting and using expert knowledge (from domain experts such as farmers, crop consultants, and NRCS soil scientists), together with historical yield data, in an inverse modeling context. We qualitatively and quantitatively defined a model of spatial relationships between model inputs or outputs over space (spatial parameter model, or SPM), and applied it to a representative case study using locally available sources of knowledge. The SPM consists of a series of criteria, each of which constrains a different model input or output, that are aggregated using an orderedweighting average (0 WA) operator. We found that, based on existing data such as RTK elevation maps, electroconductivity maps, soil survey maps, and simple field observations made with a soil probe, it was possible to identify appropriate simulation locations, to select parameters to estimate, and to populate the different criteria that comprise the SPM. We applied our framework (using criteria based on yield, or on soil depth) to the parameterization by inverse modeling (IM) of simple spatiallyuncoupled and coupled crop models. The models' ability to reproduce observed yield was good with both two
PAGE 228
206 and three years of yield data used for parameter estimation. However, their ability to reproduce observed spatiotemporal soil water patterns was landscapepositiondependent. There were several spurious compensations between parameters, which resulted in parameter combinations that best simulated observed yields, but were not consistent with soil probe observations or expert opinions. The results were similar; the coupled model did not produce noteworthy improvements over the spatiallyuncoupled model, although the latter clearly compensated the lack of runon by means of spurious parameter value combinations. These artifacts can be countered using tools such as the soil map neighborhood criteria developed herein. Numerous opportunities for further research exist, especially in developing a method that can provide the user with estimates of parameter uncertainty, which our IM framework currently lacks. Separating the knowledgebased elements of the IM framework from running the crop model, together with Monte Carlo techniques, may provide such an opportunity. Our IM framework could be used to estimate a joint a priori parameter distribution for use in a formal parameterization setting combining crop models with a tool such as an ensemble Kalman filter.
PAGE 229
APPENDIX A THE SIMULATED ANNEALING ALGORITHMS USED IN CHAPTER 4 Sacks and Schiller Algorithm The structure of the fitness function is J(S) with SeD, where D represents the spatial domain of interest, and S is a subset of D containing 10 locations. We used two different fitness functions, SKV and SMSE, described in the text. The Sacks and Schiller generation mechanism and cooling schedule are as follows: 1 . Create initial subset SÂ° e D from the union of 10 randomly selected locations of D. Set / = 0.7. 2. Initiate the (j+1 ) th iteration of the process by randomly selecting a tentative entering location t from (DS J ). 3. Find an optimal leaving location s* g S j so that J(S J uts*) = minJ(S J uts). seS J 4. Take S J+1 S J uts* ifJ(S J uts ) if J(S J uts*)> J(S J ) S J with probability (l7i J ) if J(S J u t s* ) > J(S J ) 5. If S j+1 = S j in step 4, randomly select an alternative entering location t e DS J t and go to step 3 without incrementing] and replacing t with t' . 6. Repeat step 5 up to L times, where L is a constant. If after L attempts there have been no changes, assign S J+I =S J , tc j+1 = min(l,Tt J /(ld'))and go to step 2. >Jt , J(ld)7i J if J(S J+, )<(la)minJ(S K ) rc J otherwise 8. Stop if there have been M iterations without changes in n' , where M is a constant. Otherwise go to step 2. 207
PAGE 230
208 Our adopted parameter values were somewhat more conservative than the ones proposed by Sacks and Schiller (1988), as follows: a = 0.01, / = 0.7, d = 0.3, d' = 0.2, L = 57 10 = 47, and M = 500. These values cool and reheat the system more slowly than the optimum values proposed by Sacks and Schiller. Spatial Simulated Annealing Acceptance Criterion Let S J and S' be two subsets of the domain D such that S is generated by perturbing S J . Let the corresponding fitness function values be J(S j ) and J(S ) respectively. The acceptance criterion determines whether S replaces S J or not. The acceptance probability is defined as: :J +1 1 if J(S)J(S J ) where c is a positive control parameter / function that decreases as the algorithm progresses. Generation Mechanism The way a new pattern S is generated from S j in the SSA algorithm is by shifting the position of a randomly chosen point s e S J over a vector h , with the direction of h being chosen randomly, and its magnitude h set by generating a random value between 0 and a function h max , which has an initial value greater than or equal to the length of the sampling region, and decreases with time. The modification we made to this generation mechanism was to discretize it so only locations on the 57point isometric grid could be selected as destinations for s. We simplified the direction selection to the random selection of a quadrant around point s,
PAGE 231
209 then randomly picked one of the unused microwatershed sampling points located within the specified quadrant and at a distance less than or equal to h max from s. Cooling Schedule The cooling schedule changes the value of c and h max as the algorithm progresses. The functions are: Â° J = a c c J ^ w t k t ^ e va j ues Q f ttc anc j ah tÂ» e i n g on iy slightly less than 1 . hmax J+l = a h hmax J We stopped the algorithm if M iterations passed without improvement in the value of J(S). The parameter values we adopted were: cÂ° = 1, a c = 0.995, a h = 0.997, M = 2000.
PAGE 232
APPENDIX B NEIGHBORHOOD CRITERIA DATA AND SOURCE CODE Depth Criterion The soil neighborhood criteria described in Chapter 7 are encoded as matrices. Table Bl shows how the soil depth criterion is encoded. The information contained in the table corresponds to Figure 721. Only the first three columns are usedthe last two (rightmost) columns are for reference purposes. Table B1 . Matrix encoding for the soil neighborhood criterion. Row Column Link type, Origin Destination strength soil unit soil unit 1 n/a PuB2 PuB2 I 2 n/a PuB2 CaA(E) j 3 n/a PuB2 CaA(W) 4 n/a PuB2 GrA 5 n/a PuB2 CaB 6 n/a PuB2 Hn 7 n/a PuB2 CaA(N) 8 Â« PuB2 LoB 9 n/a PuB2 LoC2 10 n/a PuB2 GrB(N) 1 1 n/a PuB2 CaA(S) 12 n/a PuB2 GrB 13 n/a PuB2 GrB(Ba) 2 1 n/a CaA(E) PuB2 2 2 n/a CaA(E) CaA(E) 2 3 n/a CaA(E) CaA(W) 2 4 <= CaA(E) GrA 2 5 < CaA(E) CaB 2 6 < CaA(E) Hn 2 7 n/a CaA(E) CaA(N) 2
PAGE 233
211 Table Bl. Continued. Row Column Link type, Origin Destinati strength soil unit soil unit 3 3 n/a CaA(W) CaA(W) 3 4 n/a CaA(W) GrA 3 5 n/a CaA(W) CaB 3 6 n/a CaA(W) Hn 3 7 n/a CaA(W) CaA(N) 3 8 >= CaA(W) LoB 3 9 n/a CaA(W) LoC2 3 10 n/a CaA(W) GrB(N) 3 1 1 n/a CaA(W) CaA(S) 3 12 >= CaA(W) GrB 3 13 < CaA(W) GrB(Ba) 4 1 n/a GrA PuB2 4 2 >= GrA CaA(E) 4 3 n/a GrA CaA(W) 4 4 n/a GrA GrA 4 5 n/a GrA CaB 4 6 n/a GrA Hn 4 7 n/a GrA CaA(N) 4 8 n/a GrA LoB 4 9 n/a GrA LoC2 4 10 n/a GrA GrB(N) 4 1 1 GrA CaA(S) 4 12 n/a GrA GrB 4 13 n/a GrA GrB(Ba) 5 1 n/a CaB PuB2 5 2 > CaB CaA(E) 5 3 n/a CaB CaA(W) 5 4 n/a CaB GrA 5 5 n/a CaB CaB 5 6 n/a CaB Hn 5 7 CaB CaA(N) 5 8 N/a CaB LoB 5 9 N/a CaB LoC2 5 10 > CaB GrB(N) 5 1 1 N/a CaB CaA(S) 5 12 N/a CaB GrB 5 13 N/a CaB GrB(Ba) 6 1 N/a Hn PuB2 6 2 > Hn CaA(E) 6 3 N/a Hn CaA(W) 6 4 N/a Hn GrA 6 5 N/a Hn CaB 6 6 N/a Hn Hn 6 7 N/a Hn CaA(N) 6 8 N/a Hn LoB 6 9 N/a Hn LoC2 6 10 N/a Hn GrB(N)
PAGE 234
212 Table Bl. Continued. Row Column Link type, Origin Destination strength soil unit soil unit 6 1 1 Hn CaA(S) 6 12 > Hn GrB 6 13 n/a Hn GrB(Ba) 7 1 n/a 11/ Cl CaACN) PuB2 7 2 n/a 1 1/ a CaAfNH CaA(E) 7 3 n/a CaA(N) CaA(W) 7 4 n/a CaA(N) GrA 7 5 CaA(N) CaB 7 6 n/a CaA(N) Hn 7 7 n/a 11/ c* CaAfN) CaA(N) 7 8 n/a CaA(N) LoB 7 9 n/a 11/ a CaAfN") LoC2 7 10 > GrB(N) 7 1 1 n/a CaA(S) 7 12 > GrB 7 13 n/a CaAfN") GrB(Ba) 1 Â» LoB PuB2 8 2 n/a 1 1/ = LoB LoC2 8 10 n/a LoB GrB(N) 8 1 l n/a I oB CaA(S) 8 12 LoB GrB 8 13 Â« LoB GrB(Ba) 9 1 n/a I oC2 PuB2 9 2 n /a 1 1/ a LoC2 CaA(E) 9 3 n/a LoC2 CaA(W) 9 4 ii/ a LoC2 GrA 9 5 11/ a CaB 9 6 11/ a Hn 9 7 n/a LoC2 CaACN) 9 8 <= LoC2 LoB 9 9 n/a I oC2 LoC2 9 10 1 1 a GrB(N) 9 1 1 n /a i or? CaA(S) 9 12 n/a LoC2 GrB 9 13 n/a LoC2 GrB(Ba) 10 1 n/a GrB(N) PuB2 10 2 n/a GrB(N) CaA(E) 10 3 n/a GrB(N) CaA(W) 10 4 n/a GrB(N) GrA 10 5 < GrB(N) CaB
PAGE 235
213 Table Bl. Continued. Row Column Link type, Origin strength soil unit Destination soil unit 10 6 N/a GrB(N) Hn 10 7 < GrB(N) CaA(N) 10 8 N/a GrB(N) LoB 10 9 N/a GrB(N) LoC2 10 10 n/a GrB(N) GrB(N) 10 1 1 n/a GrB(N) CaA(S) 10 12 GrB(N) GrB 10 13 n/a GrB(N) GrB(Ba) 1 1 1 n/a CaA(S) PuB2 1 1 2 > CaA(S) CaA(E) 1 1 3 n/a CaA(S) CaA(W) 1 1 4 CaA(S) GrA 1 1 5 n/a CaA(S) CaB 1 1 6 CaA(S) Hn 1 1 7 n/a CaA(S) CaA(N) 1 1 8 n/a CaA(S) LoB 1 1 9 n/a CaA(S) LoC2 1 1 10 n/a CaA(S) GrB(N) 1 1 1 1 n/a CaA(S) CaA(S) 1 1 12 > CaA(S) GrB 1 1 13 n/a GrB(Ba) 12 1 n/a 11/ a GrB PuB2 12 2 >= GrB CaA(E) 12 3 <= GrB CaA(W) 12 4 n/a 11/ a GrB GrA 12 5 n/a GrB CaB 12 6 < GrB Hn 12 7 < GrB CaA(N) 12 8 GrB LoB 12 9 n/a GrB LoC2 12 10 GrB GrB(N) 12 1 1 < GrB CaA(S) 12 12 n/a GrB GrB 12 13 Â« GrB GrB(Ba) 13 1 n/a 11/ Cl GrBfBa^ PuB2 13 2 n/a 11/ a GrB(Ba) CaA(E) 13 3 > GrBfBa) VI 1 1 ' \ I'll / CaA(W) 13 4 11/ a TirBfBa^ GrA 13 5 1 1/ Cl CaB 13 6 n/a GrB(Ba) Hn 13 7 n/a GrB(Ba) CaA(N) 13 8 Â» GrB(Ba) LoB 13 9 n/a GrB(Ba) LoC2 13 10 n/a GrB(Ba) GrB(N) 13 1 1 n/a GrB(Ba) CaA(S) 13 12 Â» GrB(Ba) GrB 13 13 n/a GrB(Ba) GrB(Ba)
PAGE 236
214 Source Code for Soil Map Neighborhood and Yield History Criteria, OWA Operator The source code for evaluation of the IM framework criteria used in this study, as well as the OWA operator was written as a unit in Borland Pascal 7.0 (Borland, 1992). It is listed below. Unit Criteria; (* Data structure: each constraint table has to have a list of its nonnil *) (* relationships, so we evaluate only what we need. *) (* There is a boolean constant, cUselipperOnly, which can be used to speed things up by using a triangular (upper half) matrix. *) C* Taking only those entries in which j > i will use the upper half. INTERFACE const cUselipperOnly = true; (* Used to speed things up assuming symmetric constraints *) cNoData: Real = 99.9; (* Used to communicate calculation errors *) cAlmostZero = le6; (* Used to avoid errors and implement some criteria *) cMaxTableEntries = 1024; (* used to limit size of data structures *) cMaxNodes = 32; (* used to limit size of data structures *) cMaxCalibCols = 5; cUseMSE = 0; C* Tells yield criterion to use RMSE *) cUseExponential = 1; (* Tells yield criterion to use exponential *) eNumNodesMismatch = 1000; (* An error condition *) cNoRelationship = 0; cStronglyGreater = 1; cGreater = 2; cWeakl yGreater = 3; cStronglyLesser = 4; cLesser = 5; cWeakl yLesser = 6; cstronglysimilar = 7; cSimilar = 8; cWeakl ysi mi lar = 9; cError = 10; cMaxYieldCols = 5; cMaxCriteria = 20; type plDVector = AtiDVector; tIDVector = Array[l. .cMaxYieldCol s] of integer; pDatavector = Atuatavector; tDatavector = Array[l. .cMaxNodes] of Real; pCalibYieldDatavector = AtcalibYieldDataVector; tCalibYieldDatavector = Array [1. . cMaxNodes , 1. . cMaxcal i bcol s] of Real; tcriteriavector = Array[l. .cMaxCriteria] of Real; ( * *j
PAGE 237
215 pNeighborhoodCriterion = AtNeighborhoodCriterion; tNeighborhoodCn'ten'on = object ArcTai 1 s : ArcHeads : ArcRelationships NumArcs: Error: Integer; GT_e_Strong: Real; GT_e_Medium: Real; GT_e_weak: Real; LT_e_Strong: Real; LT_e_Medi um: Real; LT_e_Weak: Real; EQ_e_Strong: Real; EQ_e_Medium: Real; EQ_e_Weak: Real; EQ_d_Strong: Real; EQ_d_Medium: Real; EQ_d_Weak: Real; Array[l. .cMaxTableEntries Array[l. .CMaxTableEntries Array[l. .CMaxTableEntries word; of word; of word; of integer; Range: Constructor Destructor Function Function end; Real ; Init(Filename: OpenString; iGT_e_Strong, iGT_e_Medi um, iGT_e_weak: i LT_e_Strong , i LT_e_Medi um, iLT_e_weak: i EQ_e_Strong , iEQ_e_Medium, iEQ_e_weak, iEQ_d_Strong, iEQ_d_Medium, iEQ_d_weak: i Range: Real); Done ; EvaluateArc(RelDiff : Double; Code: Integer): Evaluate(TheData: pDataVector) : Real; Real ; Real ; Real ; Doubl e ; ( * pYieldCriterion = AtYieldCriterion; tYieldCriterion = object *) Error: NumNodes : Calibcols: k: Method: ObsData: variance: YearMean : Constructor Destructor Function Function word): Word; end ; (* tOWA = object OutFile: Numcriteria: Error: Weights: val ues : integer; Word ; word; (* Number of yield years used for calibration *) Real ; word ; teal i bYi el dDataVector ; Array [1. .cMaxCalibCols] of Real; Array [1. .CMaxCalibCols] of Real; lnit(Filename: OpenString; NumYieldCols, iMethod: Word; TheK: Real); Done ; Evaluate (Si mData: pCal i bYi el dDataVector) : Real; Coi nci denceCode(Si mData : peal i bYi el dDataVector ; Locati on , NumYears : 0 Reportlnterval , Counter: (* User is responsible for populating the vectors *) text; word ; Integer; tCriteriavector; tCriteriaVector; Longlnt; Rank2Acc: OWAAcc : Val ueAcc : Constructor Destructor Function end; Array [1. .cMaxCriteria] of Double; Double; Array[l. .cMaxCriteria] of Double; lnit(iNumCriteria, iter: word; i Reportlnterval Done; Evaluate: Real ; Longlnt) ;
PAGE 238
216 (* IMPLEMENTATION function AlmostEqual (x,y : real): Boolean; begin if (abs(xy) < cAlmostZero) then AlmostEqual := true el se AlmostEqual := false; end ; ( * Constructor tNeighborhoodCriter ion. in it (Filename: OpenString; iGT_e_Strong, iGT_e_Medium, iGT_e_Weak: Real; i LT_e_Strong , i LT_e_Medi urn, iLT_e_Weak: Real; i EQ_e_Strong , iEQ_e_Medium, iEQ_e_weak, i EQ_d_Stronq , iEQ_d_Medium, iEQ_d_weak: Real; i Range: Real); var f: text; Tail, Head, NodeNum: RelStr: String; Relstr2: String [4]; Relationship: Integer; i: Integer; begin Word ; GT_e_Strong GT_e_Medi urn GT_e_weak LT_e_Strong LT_e_Medium LT_e_weak EQ_e_Strong EQ_e_Medi urn EQ_e_weak EQ_d_Strong EQ_d_Medium EQ_d_weak Range iGT_e_Strong ; iGT_e_Medi urn; i GT_e_weak ; i LT_e_Strong ; iLT_e_Medium; iLT_e_weak; i EQ_e_Strong; i EQ_e_Medi urn; i EQ_e_weak ; iEQ_d_Strong; iEQ_d_Medium; i EQ_d_weak ; i Range; NumArcs := 0; Assign(f , FileName) ; Reset(f) ; error := lOResult; if (error = 0) then begi n Readln(f , NodeNum); error := lOResult; while (error = 0) and (not eof(f)) do begi n readln(f .Tail , Head, RelStr); error := lOResult; if (error begi n 0) then (* Clean RelStr *) while (Length(RelStr) > 1) and ((Relstr[l] = RelStr := Copy(Rel Str , 2 , Length(Rel Str)1) ; ') or (RelStr[l] = #9)) do length(RelStr)) and (i > 0) do ') and (RelStr[i] <> #9) then RelStr2 : = i := 1; while (i < begi n if (RelStr[i] <> begi n RelStr2 := RelStr2 + upcase(RelStr[i]) ; inc(i) ; end el se i := 1; end;
PAGE 239
217 if (Relstr2 = if (RelStr2 = if (RelStr2 = if (RelStr2 = if (RelStr2 = if (Relstr2 = if (Re1str2 = if (RelStr2 = if (RelStr2 = if (Relstr2 = Relationship N/A') then Â»') >') >=') Â«') <') <=') ==') = ') then then then then then then then then then cError; Rel ationshi p Relationship Relationship Relationship Relationship Rel ationshi p Relationship Relationship Relationship Relationship cNoRelationship else cStronglyGreater else cGreater else cweakl yGreater else cStronglyLesser else cLesser else cweaklyLesser else cstronglysimilar else csimilar else cweaklysimilar else if (Tail >= 1) and (Tail <= NodeNum) and (Head >= 1) and (Head <= NodeNum) and (Relationship <> cError) then begi n if cuseupperonly and (Tail >= Head) or (Relationship = cNoRelationship) then begin end else begi n i nc(NumArcs) ; ArcTail s [NumArcs] ArcHeads [NumArcs] ArcRelationships [NumArcs] end ; end el se error := eNumNodesMismatch; end ; end; Close(f); end; end; Tail ; Head; Relationship; Destructor tNeighborhoodCriterion.Done; begin end; ( * Function RelDiff(a,b, Range: Real): Real; begin if (abs(Range) < cAlmostZero) then RelDiff := cNoData else RelDiff := (a b) / Range; end ; (* Function tNeighborhoodCriterion.EvaluateArc(RelDiff : Double; Code: Integer): Double; var x,y, xl,x2,x3,x4,x5,x6, yl,y2,y3,y4,y5,y6: Double; begin case Code of cStronglyGreater: begin xl = 100; yi l; x2 = 50; y2 i; x3 = GT_ y3 1; x4 GT y4 0; x5 50; ys 0; x6 = 100; 0; end ;
PAGE 240
cGreater: begin xl = 100; yi l; x2 = 50; = x3 = die y3 i; x4 GT_e y4 0; x5 50; y5 0; x6 = 100; y6 0; end ; cweakl yGreater xl = 100; yi i; x2 = 50; y2 1; x3 = GT_e y3 1; x4 GT_e y4 0; x5 = 50; ys 0; x6 = 100; y6 0; end; cStronglyLesse xl = 100; yi 0; x2 = 50; y2 0; x3 = LT_e y3 0; x4 LT_e y4 1; x5 = 50; ys l; x6 = 100; y6 l; end ; cLesser: begin xl = 100; yi 0; x2 = 50; y2 0; x3 = LT_e y3 0; x4 LT_e y4 1; x5 = 50; ys l; x6 = 100; l; end; cweakl yLesser: xl = 100; yi 0; x2 = 50; y2 0; x3 = LT_e y3 0; x4 LT_e. y4 1; x5 = 50; ys i; x6 = 100; y6 1; end; e_Medium + cAlmostZero begin Ji/eak ; _weak + cAlmostzero; begin cAl mostZero .Medium; begin _weak CAlmostzero; _weak; cstronglysimilar: begin xl yi x2 y2 100; i; EQ_e_Strong cAlmostzero 1;
PAGE 241
219 x3 y3 x4 y4 x5 y5 x6 end; EQ_d_Strong; 0; EQ_d_Strong; 0; EQ_e_Strong + cAlmostZero; i; 100; i; csimilar: begin xl yi x2 y2 x3 y3 x4 y4 x5 y5 x6 y6 end; = 100; i; EQ_e_Medium; 1; EQ_d_Medi um; 0; EQ_d_Medium; 0; EQ_e_Medium; 1; 100; l; cWeaklySimilar: begin xl yi x2 y2 x3 y3 x4 y4 x5 ys x6 y6 end ; 100; 1; EQ_e_Weak; l; EO_d_weak; 0; EQ_d_weak; 0; EQ_e_weak; 1; 100; i; end; x := RelDiff; (* interpolate *) if (x <= x2) then y := yl + (x xl) * (y2 yl) / (x2 xl) el se if (x <= x3) then y := y2 + (x x2) * (y3 y2) / (x3 x2) el se if (x <= x4) then y := y3 + (x x3) * (y4 y3) / (x4 x3) el se if (x <= x5) then y := y4 + (x x4) * (y5 y4) / (x5 x4) el se y := y5 + (x x5) * (y6 y5) / (x6 X 5); EvaluateArc := y; end; ( * '0 Function tNeighborhoodCriterion.Evaluate(TheData: pDatavector) : Real var i , j : word; Code: Integer; ConstraintValue.Accum: Double; Tail Data, HeadData: real; TheRelDiff: Real; begi n
PAGE 242
220 Accum := 0; if (NumArcs <> 0) then begin for i := 1 to NumArcs do begin Code Tail Data HeadData ArcRelationships[i] ; TheDataA [ArcTai 1 s [i 1 ] ; TheDataA [ArcHeads [i ] ] ; if (Code <> cNoRelationship) then begin TheRelDiff := RelDiff (Tail Data, HeadData, Range) ; if (TheRelDiff <> cNoData) then Constraintvalue := Eval uateArc(TheRelDi ff , Code) ; Accum := Accum + Constraintvalue; end; end; Accum := Accum / NumArcs end el se Accum := cNoData; Evaluate := Accum; end; ( * *) Constructor tYieldCriterion.lnit(Filename: Openstring; NumYieldCols, iMethod: Word; TheK Real) ; var f: Text; i,j: word; ID: word; s: String; Counter: Array[l. . cMaxCal i bCol s] of Longlnt; function Minl(x,y: Integer): Integer; begi n if (x <= y) then Mini := x el se Mini := y; end; begin NumNodes Calibcols Method k = 0; = 0; = iMethod; = TheK; for i := 1 to cMaxCalibCols do begin Variance [i] YearMean [i ] Counter [i] end; c* Assign(f , FileName) ; Reset(f) ; error := lOResult; if (error = 0) then begin Readln(f .NumNodes, Calibcols) ; (* read number of nodes, calibration columns *) error := lOResult; Readln(f.s); (* Get column name header *)
PAGE 243
221 error := lOResult; if (error = 0) then (* Get column IDs for the calibration years *) begi n Cal i bcol s : = Mi ni (Mi ni (Cal i bCol s , cMaxCal i bcol s) , NumYi el dcol s) ; i := 1; if (Cal i bCol s <> 0) then while (not eof(f)) and (error = 0) and (i <= NumNodes) do begi n read(f ,ID); if (ID <> i) then wri tel n ( ' warning ! Observed yield data file has incorrect ID ordering!'); for j := 1 to Cal i bcol s do begin if (j = Cal i bCol s) then Readl n(f ,ObsData[i , j]) el se Read (f , ObsData [i , j ] ) ; if not AlmostEqual (ObsData [i , j] , cNoData) then begin variance[q] := Variance[j] + sqr(ObsData[i , j]) ; YearMean[j] := YearMean[j] + ObsDatafi,]]; inc(Counter[j]) ; end; end; inc(i); end ; end; ( * *) Close(f); end; if (CalibCols <> 0) and (NumNodes <> 0) then begi n for j := 1 to CalibCols do begi n VarianceM] := (variance[j] Counter[j] * sqr(YearMean[j]/Counter[j])) / (Counter [j] 1); end; end; end; Destructor tYieldCriterion.Done; begin end; Function tYieldCriterion.Evaluate(SimData: pCalibYieldDataVector) : Real; var Counter: Longint; i,j: word; ErrorFunction : Double; begi n ErrorFunction := 0; Counter := 0; for i := 1 to NumNodes do begi n for j := 1 to CalibCols do begi n if ((not AlmostEqual (ObsData [i , j] , cNoData)) and (not AlmostEqual (SimDataA[i ,j] , cNoData))) then begi n ErrorFunction := ErrorFunction + sqr(SimDataA [i , j] ObsDatafi , j]) / variance[j]; inc(Counter) ; end ; end ; end; ErrorFunction := ErrorFunction / Counter; if (Method = cUseMSE) then begin
PAGE 244
222 if (ErrorFunction > 1) then (* COO! *) ErrorFunction := 1; } end el se begi n ErrorFunction := 1 exp(k * ErrorFunction); end; Evaluate := ErrorFunction; end ; Functi on tYi el dCri terion .Coi nci denceCode(Si mData: pCal i bYi el dDatavector ; Location.NumYears: word): word; var i : Word; TheMask: Word; const Powers2: Array[1..8] of word = (1,2,4,8,16,32,64,128); begin TheMask := 0; for i := 1 to NumYears do begin if AlmostEqual (Si mDataA [Location, i] , ObsData [Location , i ] ) then TheMask := TheMask + Powers2[ij; end ; Coincidencecode := TheMask; end ; ( * *) Constructor tOWA.lnit(iNumCriteria, Iter: Word; i Reportlnterval : Longint); var i : word; s: String[12]; begin Str(lter,s) ; while (Length(s) < 4) do s := '0' + s; s := 'OWA_' + s + ' .csv' ; Assign(OutFile,s) ; Rewnte(OutFile) ; error := IOResult; OWAAcc := 0; Numcriteria := iNumCriteria; for i := 1 to cMaxCriteria do begi n values[i] := 0; weights[i] := 0; Rank2Acc[i] := 0; ValueAcc[i] := 0; end ; Reportlnterval := i Reportlnterval ; Counter := 0; if (Error = 0) then begin wri te(OutFi le, ' n ' ) ; for i := 1 to Numcriteria do write(Outfile, ' ,C ,i) ; for i := 1 to Numcriteria do write(Outfile, ' ,R2_' ,i); writeln(OutFile, ' ,OWA') ; end; end;
PAGE 245
223 c . *) Destructor tOWA.Done; begi n if (Error = 0) then Close(OutFile) ; if (IOResult = 0) then ; end; (* *) Function tOWA. Evaluate; (* We assume the values and weights have been populated *) var i , j : Word; AuxR,AuxR2: Real ; AuxW: Word; Rankvectorl,Rankvector2: Array[l. .cMaxCriteria] of word; other: tcriteriavector; (* Rankvector 1 says which of the incoming criteria is in each position of the sorted vector. [3 2 14] means that the third criterion has the highest value, the fourth criterion has the lowest, etc] Rankvector 2 is perhaps more useful. It shows what position each incoming criterion ended up in. [3 2 14] says that the first criterion was the 3rd largest, the second criterion was the second largest, the third criterion was the largest, etc. *) begi n ( * *) inc(Counter) ; for i := 1 to NumCriteria do begi n Rankvectorl[i] := i; (* Init *) ValueAcc[i] := ValueAcc[i] + Values[i]; (* Add criteria values to Acc *) end; (* First bubblesort the data *) if (NumCriteria > 1) then for i := 1 to NumCriteria1 do begin for j := 1 to NumCriteria i do begi n if (values[j] < values[j+l]) then begi n AuxR := ValuesM] ; values[j] := values [j+1] ; Values[j+1] := AuxR; Auxw := Rankvectorl[j] ; RankVectorl[j] := RankVectorl[ j+1] ; RankVectorl[]+l] := Auxw; end ; end; end; (* Now apply OWA *) AuxR := 0; for i := 1 to NumCriteria do AuxR := AuxR + (Values[i] * weights[i]); OWAACC := OWAACC + AUXR; (* write more data *) for i := 1 to NumCriteria do begin Rankvector2[RankVectorl[i]] := i; Rank2Acc[RankVectorl[i]] := Rank2Acc[RankVectorl[i]] + i; end; if ((Counter mod Reportlnterval ) = 1) then begi n write(OutFile, Counter) ;
PAGE 246
for i := 1 to NumCriteria do begi n if (Counter <> 1) then auxr2 := valueAcc[i] / Reportlnterval el se auxr2 := ValueAcc[i] ; write (OutFile, 1 , 1 ,AuxR2:l:5) ; ValueAcc[i] := 0; end; for i := 1 to NumCriteria do begi n if (Counter <> 1) then AuxR2 := Rank2Acc[i] / Reportlnterval el se AuxR2 := Rank2Acc[i]; write(OutFile, ' , 1 ,AuxR2:l:5); Rank2Acc[i] := 0; end; if (Counter <> 1) then AuxR2 := OWAAcc / Reportlnterval el se AuxR2 := OWAAcc; writeln (out File, ' , 1 ,AuxR2:l: 5) ; OWAAcc := 0; end; Evaluate := AuxR; end; ( * end.
PAGE 247
LIST OF REFERENCES A&L Labs, Inc., 2003. Fee Schedule 2003. Online at http://www.allabs.com/analytical services/CurrentFeeSchedule.pdf Accessed 10/24/2003. Aarts, E., Korst, J., 1990. Simulated Annealing and Boltzmann Machines. Wiley, New York, NY. Acock, B., Trent, A., 1991. The soybean crop simulator, GLYCIM: documentation for the modular version 91. Department of Plant, Soil and Entomological Sciences, University of Idaho, Moscow, Idaho, 242pp. AGRIS Corp. 1998 AgLink Reference Manual. AGRIS Corporation, Roswell, GA. Allen, R.G., 1996. Assessing integrity of weather data for reference evapotranspiration estimation. J. Irrig. Drainage EngASCE 122(2): 97106. AlMahasneh, M. A., Colvin, T.S., 2000. Verification of yield monitor performance for onthego measurement of yield with an inboard electronic scale. Trans. ASAE 43 (4), 801807. Andales, A. A., Batchelor, W.D., Anderson, C.E., 2000. Modification of a soybean model to improve soil temperature and emergence date prediction. Trans. ASAE 43 (1), 121129. Anderson, N.W. and Humburg, D.S., 1997. Application equipment for sitespecific management, p. 245281. In: F.J. Pierce and E.J. Sadler, (ed.), The State of Site Specific Management for Agriculture. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI. Andrade, F., Cirilo, A., Uhart, S., Otegui, M., 1996. Ecofisiologia del cultivo de maiz [Maize Ecophysiology]. Dekalb Press, Balcarce, Argentina. Pp 292. Apeztegui'a, H.P., Sereno, R., Aoki, A.M., Ateca, M.R., Romero, L.E., Mendoza, R.I., Esmoriz, G.F., Robledo, C.W., 1999. Spatial distribution of soil moisture in a small watershed of Cordoba, Argentina, Agricultura Tecnica (Chile) 59(3), 233241. Arslan, S., Colvin, T.S., 1999. Laboratory performance of a yield monitor. Applied Engineering in Agriculture 15 (3), 189195. 225
PAGE 248
226 Atherton, R.W., Schainker, R.B., Ducot, E.R., 1975. On the statistical sensitivity analysis of models for chemical kinetics. AIChE, 21: 441448. Batchelor, W.D., Basso, B., Paz, J.O., 2002. Examples of strategies to analyze spatial and temporal yield variability using crop models. Eur. J. Agron. 18 (12): 141158. Beliakov, G. (2003) How to Build Aggregation Operators from Data, International Journal of Intelligent Systems, Vol 18(8), 903923. Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology, Hydrol. Sci. Bull. 24 (1), 4369. Beven, K.J., Lamb, R., Quinn, P.F., Romanowicz, R., Freer, J., 1995. TOPMODEL. In Singh, V.P. (Ed.), Models of Watershed Hydrology. Water Resource Publications, Colorado, pp 527668. Birrell, S.J., Sudduth, K.A., Borgelt, S.C., 1996. Comparison of sensors and techniques for crop yield mapping. Comp. Elec. Agric. 14, 215233. Bitzer, M., Herbek, J., Bessin, R., Green, J.D., Ibendahl, G., Martin, J., McNeill, S., Montross, M., Murdock, L., Vincelli, P., Wells, K., 2003. A Comprehensive Guide to Corn Management in Kentucky. Online at http://www.ca.uky.edu/agc/pubs/id/idl 39/ID1 39%20htm.HTM . Accessed 10/16/2003. Boote, K.J., Jones, J.W., Hoogenboom, G., Pickering, N.B., 1998. The CROPGRO model for grain legumes. In Tsuji, G.Y, Hoogenboom, G., and Thornton, P.K. (Ed.) Understanding Options for Agricultural Production. Kluwer Academic Publishers, Dordrecht, pp. 99128. Boote, K.J., Jones, J.W., Pickering, N.B., 1996. Potential Uses and Limitations of Crop Models. Agron. J. 88 (5), 704716. Boote, K.J., Pickering, N.B., 1994. Modeling photosynthesis of row crop canopies. HortScience 29 (12), 14231434. Borgelt, S.C., 1993. Sensing and measuring technologies fir site specific management. P. 141157. In Robert, P.C. et al. (ed.) Soil specific crop management. ASA Misc. Publ., ASA,CSSA, and SSSA, Madison, WI. Borland International, Inc., 1992. Borland Pascal with Objects: Programmer's reference. Borland International, Inc., Scotts Valley, CA. Bostick, W.M., Koo, J., Jones, J.W., Gijsman, A.J., 2003. Combining measurements and models to estimate carbon sequestration. In Abstracts of the 2003 Annual Meeting of the ASA/CSSA/SSSA, Nov. 26, Denver, CO.
PAGE 249
227 Boughton, W. C, 1989. A Review of the USDA SCS Curve Number Method. Aust. J. Soil Res. 27 (3), 51 1523. Bouma, J., 1989. Using soil survey data for quantitative land evaluation. Adv. Soil Sci. 9, 177213. Boydell, B.C., Green, H.M., Pocknee, S.A., Tucker, M, Kvien, C.K., Vellidis, G., 1995. Yield mapping of peanuts. Agronomy Abstracts. ASA, Madison, WI, p. 300. Braga, R.N., 2000. Predicting the spatial pattern of grain yield under water limiting conditions. Ph.D. Dissertation, University of Florida, Gainesville, FL. Braga, R.P., Jones, J.W., 1998. Spatial parameter estimation for a fieldscale surface hydrology model. Proc. 1 st . Int. Conference on Geospatial Information in Agriculture and Forestry, Lake Buena Vista, Fla., vol. II. ERIM Int., Ann Arbor, MI, pp. 105112. Braga, R.P., Jones, J.W., Basso, B., 1998. Weather induced variability in sitespecific management profitability: A case study. In Proc. 4 th International Conference in Precision Agriculture, 1853, St. Paul, Minn. ASA/CSSA/SSSA, Madison, WI. Brakensiek, D.L., Engleman, R.L., Rawls, W.J., 1981. Variation within texture classes of soilwater parameters. Trans. ASAE 24 (2), 335339. Bunn, C. C, Du, M., Niu, K.Y., Johnson, T.R., Poston, W.S.C., Foreyt, J.P., 1999. Predicting the Risk of Obesity Using a Bayesian Network. J. Am. Med. Inf. Assoc. Suppl. vl999, 1035. Burgess, T.M., Webster, R., 1984. Optimal sampling strategies for mapping soil types. 2. Risk functions and sampling intervals. J. Soil Sci. 35 (4), 655665. Burgess, T.M., Webster, R., 1980. Optimal interpolation and isarithmic mapping of soil properties. 1. The semivariogram and punctual kriging. J. Soil Sci. 31, 315331. Calmon, M.A., Jones, J.W., Shinde, D., Specht, J.E., 1999. Estimating parameters for soil water balance models using adaptive simulated annealing. Appl. Eng. Agric. 15 (6), 703713. Cambardella, C.A., Colvin, T.S., Karlen, D.L., Logsdon, S.D., Berry, E.C., Radke, J.K., Kaspar, T.C., Parkin, T.B., Jaynes, D.B., 1996. Soil property contributions to yield variation patterns. Proc. of the Third Int. Conf. on Precision Agriculture. ASACSSASSSA, Madison, WI, pp. 189195. Camp, C.R., Sadler, E.J., 1998. Site specific crop management with a center pivot. Journal of Soil and Water Conservation 53 (4), 312314. Campbell, R.H., Rawlins, S.L., Han, S., 1994. Monitoring methods for potato yield mapping. ASAE Paper No. 941584. ASAE, St. Joseph, MI.
PAGE 250
228 Carley, K., Palmquist, M., 1992. Extracting, representing, and analyzing mental models. Social Forces 70 (3), 601636. Cassel, D. K., Wendroth, O., Nielsen, D.R., 2000. Assessing spatial variability in an agricultural experiment station field: opportunities arising from spatial dependence. Agron. J. 92 (4), 706714. Chagas, C.I., Santanatoglia, O.J., Castiglioni, M.G., Marelli, H.J., 1995. Tillage and cropping effects on selected properties of an Argiudoll in Argentina. Commun. Soil Sci. Plant Anal. 26 (56), 643655. Clark, R.L., Lee, R., 1998. Development of topographic maps for precision farming with kinematic GPS. Trans. ASAE 41 (4), 909916. Comegna, V., Basile, A., 1994. Temporal stability of spatial patterns of soil water storage in a cultivated Vesubian soil. Geoderma 62, 299310. Cormen, T.H., Leiserson, C.E., Rivest, R.L, Stein, C, 2001. Introduction to algorithms, 2 nd edn. MIT Press, Cambridge, USA, 1 184 pp. Corne, D.W., Deb, K., Fleming, P.J., Knowles, J.D., 2003. The good of the many outweights the good of the one: evolutionary multiobjective optimization. Connections (Newsletter of the IEEE Neural Networks Society) 1(1), 913. Crave, A., GascuelOdoux, C, 1997. The influence of topography on time and space distribution of soil surface water content. Hydrological Processes 1 1 (2), 203210. Dardanelli, J.L., Bachmeier, O.A., Sereno, R., Gil, R., 1997. Rooting depth and soil water extraction patterns of different crops in a silty loam Haplustoll. Field Crop. Res. 54, 2938. Darwiche, A. 2000. Model based diagnosis under realworld constraints. AI Magazine 21 (2), 5773. Darwiche, A., and M. Goldszmidt. 1994. On the relation between kappa calculus and probabilistic reasoning. In: Mantaras, R.I., Poole, D. (Ed.), Uncertainty in Artificial Intelligence: Proceedings of the 10 th Conference. Morgan Kaufmann, San Francisco, CA, pp. 145153. Dempster, A.P., 1968. A generalization of Bayesian inference. J. Roy. Stat. Soc. B 30, 205247. Deutsch, C.V., Journel, A.G., 1992. GSLIB, Geostatistical Software and User's Guide. Oxford Univ. Press, Oxford. Diaper, D. (Ed.), 1989. Knowledge elicitation: principles, techniques and applications. Ellis Horwood Limited, Chichester, 270 pp.
PAGE 251
229 Dowd, M, Meyer, R., 2003. A Bayesian approach to the ecosystem inverse problem. Ecol. Modell. 168,3955. Durand, P., GascuelOdoux, C, Cordier, M.O., 2002. Parameterisation of hydrological models: a review and lessons learned from studies of an agricultural catchment (Naizin, France). Agronomie 22 (2), 217228. Eden, C.F., Ackernann, F., Cropper, S., 1992. The analysis of cause maps. J. Manage. Stud. 29(3), 309323. EPA, 2003. Total Maximum Daily Loads. Website at http://www.epa.gov/owow/tmdl/ (accessed 17 October 2003). Everett, M.W., Pierce, F.J., 1996. Variability of corn yield and soil profile nitrates in relation to sitespecific N management. In P.C. Robert, R.H. Rust, W.E. Larson (Ed.), Precision Agriculture. ASACSSASSSA, Madison, WI. Fallick, J.B., Batchelor, W.D., Tylka, G.L., Niblack, T.L., Paz, J.O., 2002. Coupling soybean cyst nematode damage to CROPGROsoybean. Trans. ASAE 45 (2), 433441 . FavisMortlock, D.T., Smith, F.R., 1990. A sensitivity analysis of EPIC. In Sharpley, A.N., Williams, J.R. (Ed.), EPICErosion / Productivity Impact Calculator: 1 . Model Documentation U.S. Department of Agriculture Technical Bulletin No. 1768, pp. 178190. Ferreyra, R.A. 1998. Desarrollo de una metodologia para la identification y caracterizacion cuantitativa de zonas aptas para el cultivo del mam' (Arachis hypogaea L.) utilizando un modelo de simulation de cultivos (Development of a methodology for the quantitative identification and assessment of zones suitable for the peanut crop using a crop simulation model. MS Thesis. Univ. Nacional de Cordoba, Facultad de Cs. Agropecuarias, Cordoba, Argentina. Ferreyra, R.A., Podesta, G.P., Messina, CD., Letson, D., Dardanelli, J.L., Guevara, E., Meira, S., 2001. A linkedmodeling framework to estimate maize production risk associated with ENSOrelated climate variability in Argentina. Agric. For. Meteorol. 107 (3), 187192. Ferreyra, R.A., Apezteguia, H.P., Sereno, R., Jones, J.W., 2002. Reduction of soil water spatial sampling density using scaled semivariograms and simulated annealing. Geoderma 110 (34), 265289. Filev, D., Yager, R., 1998. On the issue of obtaining OWA operator weights. Fuzzy Sets and Systems 94 (2), 157169. Fleischer, S.J., Blom, P.E., Weisz, R., 1999. Sampling in precision IPM: When the objective is a map. Phytopathology 89 (1 1), 11121118.
PAGE 252
230 Fulton, J. P., Shearer, S.A., Chabra, G., Higgins, S.F., 2001. Performance Assessment and Model Development of a VariableRate, SpinnerDisc Fertilizer Applicator. Trans. ASAE 44 (5), 10711081. Gardner, R.H., O'Neill, R.V., Mankin, J.B., Carney, J.H., 1981. A comparison of sensitivity analysis and error analysis based on a stream ecosystem model. Ecol. Model. 12, 173190. Gath, L, Geva, G., 1989. Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11, 773781. Gamma Design Software, 2001. GS+ 5.0 User Manual. Gamma Design, Plainwell, MI. Gemela, J., 2001. Financial Analysis Using Bayesian Networks. Appl. Stoch. Models. Bus. Ind. 17(1), 5767. Gijsman, A.J., Jagtap, S.S., Jones, J.W., 2002. Wading through a swamp of complete confusion: how to choose a method for estimating soil water retention parameters for crop models. Eur. J. Agron. 18 (12), 75105. Gimenez, D., Perfect, E., Rawls, W.J., Pachepsky, Ya.A., 1997. Fractal models for predicting soil hydraulic properties: a review. Eng. Geol. 48 (34), 161183. Goense, D., 1997. The Accuracy of Farm Machinery for Precision Agriculture: A case for Fertilizer Application. Netherlands Journal of Agricultural Science 45 (1), 199215. Golden, B., Bodin, L., Doyle, T., Stewart Jr., W., 1980. Approximate traveling salesman algorithms. Operations Research 28 (3), 69471 1. Golden Software, Inc., 1999. Surfer v. 7.0 Users' Guide. Golden Software, Golden, CO. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press, Oxford. Goovaerts, P., Chiang, C.N., 1993. Temporal persistence of spatial patterns for mineralizable nitrogen and selected soil properties. Soil Sci. Soc. Am. J. 57, 372381. Grayson, R. B., Western, A.W., 1998. Towards areal estimation of soil water content from point measurements: time and space stability of mean response. J. Hydrol. 207(12), 6882. Greenland, S., Pearl, J., Robins, J.M., 1999. Causal Diagrams for Epidemiologic Research. Epidemiology 10(1), 3748.
PAGE 253
231 Gupta, S.C., Larson, W.E., 1979. Estimating soil water characteristic from particle size distribution, organic matter percent, and bulk density. Water Resour. Res. 15, 16331635. Haas, T.C., 1990. Lognormal and moving window methods of estimating acid deposition. J. Am. Stat. Assoc. 85(412), 950963. Haese, K., Goodhill, G.J., 2001. AutoSOM: recursive parameter estimation for guidance of selforganizing feature maps. Neural Comput. 13, 595619. Hamby, D.M., 1994. A review of techniques for parameter sensitivity analysis of environmental models. Environ. Monit. Assess. 32, 135154. Haykin, S., 1994. Neural networks. A Comprehensive Foundation. Macmillan, New York. Heckerman, D., 1997. Bayesian Networks for Data Mining. Data Min. Knowl. Discov. 1 (1), 79119. Heinemann, A.B., Hoogenboom, G., Chojnicki, B., 2002. The impact of potential errors in rainfall observation on the simulation of crop growth, development and yield. Ecol. Modell. 157(1), 121. Helm, L., 1996. Improbable Inspiration. Los Angeles Times, October 28, 1996. Hergert, G.W., Pan, W.L., Huggins, D.R., Grove, J.H., Peck, T.R., 1997. Adequacy of current fertilizer recommendations for sitespecific management, p. 283300. In: F.J. Pierce and E.J. Sadler, (ed.), The State of Site Specific Management for Agriculture. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI. Hofman, V., Panigrahi, S., Gregor, B., Walter, J., 1995. In field monitoring of sugarbeets. ASAE Paper No. 9521 14. ASAE, St. Joseph, MI. Hoogenboom, G., Jones, J.W., Wilkens, P.W., Batchelor, W.D., Bowen, W.T., Hunt, L.A., Pickering, N.B., Singh, U., Godwin, D.C., Baer, B., Boote, K.J., Ritchie, J.T., White, J.W., 1994. Crop models. In Tsuji, G.Y., Uehara, G., Balas, S. (eds.) DSSAT v3, Volume 22. University of Hawaii. Honolulu, Hawaii, pp. 95244. Howard, R.A., Matheson, J.E., 1984. Influence diagrams. In Howard, R.A. and Matheson, J.E. (ed.) Readings on the Principles and Applications of Decision Analysis II, Strategic Decisions Group, Menlo Park, California, pp 721762. Hunt, L.A., Boote, K.J., 1998. Data for model operation, calibration, and evaluation. In Tsuji, G.Y, Hoogenboom, G., and Thornton, P.K. (Ed.) Understanding Options for Agricultural Production. Kluwer Academic Publishers, Dordrecht, pp. 939. Ingber, L., 1993. Simulated annealing: practice versus theory. Mathematical and Computer Modeling 12 (8), 967973.
PAGE 254
232 Irmak, A., Jones, J.W., Batchelor, W.D., Paz, J.O., 2002. Linking multiple layers of information for diagnosing causes of spatial yield variability in soybean. Trans. ASAE 45 (3), 839849. Irmak, A., Jones, J.W., Batchelor, W.D., Paz, J.O., 2001. Estimating spatially variable soil properties for application of crop models in precision farming. Trans. ASAE 44 (5), 13431353. Jaynes, D.B., Colvin, T.S., 1997. Spatiotemporal Variability of Corn and Soybean Yield. Agron. J. 89(1), 3037. Jaynes, D.B., Hunsaker, D.J., 1989. Spatial and temporal variability of water content and infiltration on a flood irrigated field. Trans. ASAE, 32 (4), 12291238. Jensen, F. 2001 . Bayesian Networks and Decision Graphs. SpringerVerlag, New York, NY. Johnson, C.K., Doran, J.W., Duke, H.R., Wienhold, B.J., Eskridge, K.M., Shanahan J.F., 2001. Fieldscale electrical conductivity mapping for delineating soil condition. Soil Sci. Soc. Am. J., 65 (6), 18291837. Jones, J. W. and Luyten, J.C., 1998. Simulation of biological processes. In Peart, R.M., Curry, R.B. (ed.) Agricultural systems modeling and simulation. Marcel Dekker, New York, pp. 1962. Jones, J.W., Hunt, L.A., Hoogenboom, G., Godwin, D.C., Singh, U., Tsuji, G.Y., Pickering, N.B., Thornton, P.K., Bowen, W.T., Boote, K.J., Ritchie, J.T., 1994. Input and output files. In Tsuji, G.Y., Uehara, G., Balas, S. (eds.) DSSAT v3, Volume 22. University of Hawaii. Honolulu, Hawaii, pp. 194. Kachanoski, R.G., De Jong, E., 1988. Scale dependence and the temporal persistence of spatial patterns of soil water storage. Water Resour. Res. 24 (1), 8591 . Kamgar, A., Hopmans, J.W., Wallender, W.W., Wendroth, O., 1993. Plot size and sample number for neutron probe measurements in small field trials. Soil Science, 156 (4), 213224. Kessler, M.C., LowenbergDeBoer, J., 1998. Regression analysis of yield monitor data and its use in finetuning crop decisions. Proc. of the Fourth Int. Conf. on Precision Agriculture. ASACSSASSSA, Madison, WI, pp. 821828. Khakural, B.R., P.C. Robert, and D.J. Mulla. 1996. Relating corn/soybean yield to variability in soil and landscape characteristics. Proc. of the Third Int. Conf. on Precision Agriculture. ASACSSASSSA, Madison, WI, pp. 1 17128. Kirkpatrick, S., Gelatt Jr., CD., Vecchi, M.P., 1983. Optimization by simulated annealing, Science 220 (4598), 671680.
PAGE 255
233 Klute, A. (ed.), 1986. Methods of Soil Analysis: Part 1 Physical and Mineralogical Methods. Agr. Monogr. 9. ASA and SSSA, Madison WI. 1 183 pp. Kohonen, T., 1982. Selforganized formation of topologically correct feature maps. Biological Cybernetics 43, 5969. Koo, J., Bostick, W.M., Jones, J.W., Gijsman, A.J., 2003. Estimating soil carbon in agricultural systems using ensemble Kalman filter and DSSATCENTURY. In Abstracts of the 2003 Annual Meeting of the ASA/CSSA/SSSA, Nov. 26, Denver, CO. Kotval, Z., 2003. University Extension and Urban Planning Programs: An Efficient Partnership. Journal of Extension [Online] 41(1). Available at: http://www.ioe.org/ioe/2003february/a3.shtml Accessed 1 June 2003. Kristensen, K., Rasmussen, I.A., 2002. The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput. Electron. Agric. 33(3), 197217. Kwoh, C. K., Gillies, D.F., 1996. Using Hidden Nodes in Bayesian Networks. Artif. Intell. 88 (12), 138. Ladson, A. R., Moore, I.D., 1992. Soilwater Prediction on the Konza Prairie by microwave remote sensing and topographic attributes. J. Hydrol. 138 (34), 385407. Lark, R.M., Stafford, J.V., Bolam, H.C., 1997. Limitations on the spatial resolution of yield mapping for combinable crops. J. Agr. Eng. Res. 66, 183193. Lawrence, D.N., Cawley, S.T., and Hayman, P.T., 2000. Developing answers and learning in extension for dryland nitrogen management. Aust. J. Exp. Agric. 40(4), 527539. Leenhardt, D., Voltz, M., Bornand, M., 1994. Propagation of the error of spatial prediction of soil properties in simulating crop evapotranspiration. European Journal of Soil Science, 45, 303310. Lin, Y. P., Chang, T.K., 2000. Simulated annealing and kriging method for identifying the spatial patterns and variability of soil heavy metal. Journal of Environmental Science and Health Part AToxic/Hazardous Substances & Environmental Engineering 35 (7), 10891115. LowenbergDeBoer, J., Swinton, S.M., 1997. Economics of sitespecific management in agronomic crops, p. 369396. In: F.J. Pierce and E.J. Sadler, (ed.), The State of Site Specific Management for Agriculture. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI.
PAGE 256
234 Makowski, D., Wallach, D., Tremblay, M., 2002. Using a Bayesian approach to parameter estimation; comparison of the GLUE and MCMC methods. Agronomie 22 (2), 191203. Mallarino, A.P., Hinz, P.N, Oyarzabal, E.S., 1996. Multivariate analysis as a tool for interpreting relationships between site variables and crop yields. Proc. of the Third Int. Conf. on Precision Agriculture. ASACSSASSSA, Madison, WI, pp. 151158. Marcot, B.G., Holthausen, R.S., Raphael, M.G., Rowland, M.M., Wisdom, M.J., 2001. Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. For. Ecol. Manage. 153,2942. McBratney, A.B., Webster, R., 1986. Choosing functions for semivariograms of soil properties and fitting them to sampling estimates. Journal of Soil Science 37, 617639. McBratney, A.B., Webster, R., Burgess, T.M., 1981. The design of optimal sampling schemes for local estimation and mapping of regionalized variables: 1 . Comput. Geosci. 7,331334. McBratney, A.B., Webster, R., 1981. The design of optimal sampling schemes for local estimation and mapping of regionalized variables: 2. Program and examples. Comput. Geosci. 7, 335365. Messina, CD., Hansen, J.W., Hall, A.J., 1999. Land allocation conditioned on El NinoSouthern Oscillation phases in the Pampas of Argentina. Agric. Syst. 60, 197212. Metaxiotis, K.S., Askounis, D., Psarras, J., 2002. Expert systems in production planning and scheduling: A stateoftheart survey. J. Intell. Manuf. 13 (4), 253260. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., 1953. Equation of state calculations by fast computing machines. J. of Chemical Physics 21 (6), 10871092. Michalewicz, Z., Fogel, D.B., 1999. How to solve it: modern heuristics. Springer, New York, 482 pp. Microsoft Corporation, 2001. Visio 2002 Professional Edition User Guide. Microsoft Corporation, Seattle, WA. Microsoft Corporation, 1 999. Microsoft PowerPoint 2000 User Guide. Microsoft Corporation, Seattle, WA. Miller, M.S., Smith, D.B., 1992. A direct nozzle injection controlled rate spray boom. Trans. ASAE 35(3), 787791.
PAGE 257
235 Miller, M.P., Singer, M.J., Neilsen, D.R., 1988. Spatial variability of wheat yields and soil properties on complex hills. Soil Sci. Soc. Am. J. 52, 1 1331 141. Minasny, B., McBratney, A.B., Bristow, K.L, 1999. Comparison of different approaches to the development of pedotransfer functions for waterretention curves. Geoderma 93 (34), 225253. Mohanty, B. P., Skaggs, T.H., Famiglietti, J.S., 2000. Analysis and mapping of fieldscale soil moisture variability using highresolution, groundbased data during the Southern Great Plains 1997 (Sgp97) Hydrology Experiment. Water Resour. Res. 36 (4), 10231031. Moran, C. J., McBratney, A.B., 1997. A twodimensional fuzzy random model of soil pore structure. Math. Geol. 29 (6), 755777. Morgan, M., Ess, D., 1997. The precision farming guide for agriculturists. John Deere Publishing, Moline, Illinois. Morisawa, S., Inoue, Y., 1974. On the selection of a ground disposal site by sensitivity analysis. Health Phys. 26, 251261. Mueller, T.G., Karathanasis, A.D., Cornelius, P.L., Cetin, H., 2003. Hyperspectral imagery: variability within and between soil map phases. Robert, P.C., Rust, R.H., Larson, W.E. (ed.) Proc. 6th International Conference on Precision Agriculture. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI. Published on CD Myers, R.H., 1990. Classical and Modern Regression with Applications. PWSKent Publishing Co., Boston. Nadkarni, S., Shenoy, P.P., 2001. A Bayesian network approach to making inferences in causal maps. Eur. J. Oper Res. 128, 479498. National Research Council (NRC), 1997. Precision Agriculture in the 21 st century: geospatial and information technologies in crop management. NRC National Academy of the Sciences, Washington, D.C. Natvik, L.J., Eknes, M., Evensen, G. 2001 . A weak constraint inverse for a zero dimensional marine ecosystem model. J. Mar. Syst. 28, 1944. Nelson, M.R., Orum, T.V., JaimeGarcia, R., 1999. Applications of geographic information systems and geostatistics in plant disease epidemiology and management. Plant Disease 83 (4), 308319. Nikovsky, D., 2000. Constructing Bayesian networks for medical diagnosis from incomplete and partially correct statistics. IEEE Trans. Knowl. Data Eng. 12 (4), 509516.
PAGE 258
236 Nolin, M.C. and Lamonagne, L., 1991. Reliability of a detailed soil survey on flat terrain. Can. J. Soil Sci. 71 (3), 339353. Norsys Software Development. 1998. Netica 1.05 User's Guide. Available at: www.norsys.com Accessed 1 June 2003. NSSC, 2001a. Lab Data Sheet S75LA041017. US National Soil Survey Center, Lincoln, Nebraska. NSSC, 2001b. Lab Data Sheet S80KY035001. US National Soil Survey Center, Lincoln, Nebraska. NSSC, 2001c. Lab Data Sheet S82KY055005. US National Soil Survey Center, Lincoln, Nebraska. Nyberg, L., 1996. Spatial variability of soil water content in the covered catchment at Gardsjon, Sweden. Hydrological Processes 10 (1), 89103. Omlin, K. and Reichert, P., 1999. A comparison of techniques for the estimation of model prediction uncertainty. Ecol. Modell. 115 (1), 4559. Onyango, C. M., Marchant, J. A., Zwiggelaar, R., 1997. Modelling Uncertainty in Agricultural Image Analysis. Comput. Electron. Agric. 17 (3), 295305. Orre, R., Lansner, A., Bate, A., Lindquist, M., 2000. Bayesian neural networks with confidence estimations applied to data mining. Comput. Stat. Data Anal. 34 (4), 473493. Pachepsky, Ya.A., Rawls, W.J., 2003. Soil structure and pedotransfer functions. Eur. J. Soil Sci. 54 (3), 443451. Pachepsky, Ya.A., Timlin, D.J., Rawls, W.J., 2001. Soil water retention as related to topographic variables. Soil Sci. Soc. Am. J. 65 (6), 17871795. Paice, M. E. R., Miller, P. C. H., Day, W., 1996. Control Requirements for Spatially Selective Herbicide Sprayers. Computers and Electronics in Agriculture 14 (23), 163177. Parry, M., Rosenzweig, C, Iglesias, A., Fischer, G., Livermore, M., 1999. Climate change and global food security: a new assessment. Global Environmental Change 9: S51S67. Paz, J.O., Batchelor, W.D., 2000. What causes soybean yield variability? ASAE paper No. 003035, ASAE, St. Joseph, MI. Paz, J.O, Batchelor, W.D., Colvin, T.S., Logsdon, S.D., Kaspar, T.C., Karlen, D.L., 1998. Analysis of water stress effects causing spatial yield variability in soybeans. Trans. ASAE 41 (5), 15271534.
PAGE 259
237 Paz, J.O., Batchelor, W.D., Colvin, T.S., Logsdon, S.D., Kaspar, T.C., Karlen, D.L., Babcock, B.A., Pautsch, G.R., 1999. Modelbased technique to determine variablerate nitrogen for corn. Agric. Syst. 61 : 6975. Paz, J.O., Batchelor, W.D., Tylka, G.L., Hartzler, R.G., 2001. A modeling approach to quantify the effects of spatial soybean yield limiting factors. Trans. ASAE 44 (5), 13291334. Pearce, W.L., Poneleit, C.G., 1997. 1997 Kentucky Hybrid Corn Performance Test. Progress Report 397, Agricultural Experiment Station, University of Kentucky, pp 32 Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA. Perrier, E., Rieu, M., Sposito, G., de Marsily, G., 1996. Models of the water retention curve for soils with a fractal pore size distribution. Water Resour. Res. 32 (10), 30253031. Perez Munoz, F., Colvin, T.S., 1996. Continuous Grain Yield Monitoring. Trans. ASAE 39 (3), 775783. Pfeiffer, D.W., Hummel, J.W., Miller, N.R., 1993. Realtime corn yield sensor. ASAE Pap. 931013. ASAE, St. Joseph, MI. Pickering, N.B., Hansen, J.W., Jones, J.W., Wells, CM., Chan, V.K., Godwin, D.C., 1994. Weatherman a utility for managing and generating daily weather data. Agron. J. 86 (2): 332337. Pierce, F.J., Anderson, N.W., Colvin, T.S., Schueller, J.K., Humburg, D.S., McLaughlin, N.B., 1997. Yield mapping. In F.J Pierce and E.J. Sadler (Ed.) The State of SiteSpecific Management for Agricultural Systems. ASA / CSSA / SSSA, Madison, WI. Pierce, F.J., Nowak, P., 1999. Aspects of precision agriculture. Advances in Agronomy 67, 185 Pommel, B., Bonhomme, R., 1998. Variations in the vegetative and reproductive systems in individual plants of an heterogeneous maize crop. Eur. J. Agron. 8 (12), 3949. Prinsloo, M.A., Du Toit, A.S., Erasmus, H., Boote, K.J., 2003. Simulation of different tillage practices effect on water extraction rate in maize produced in wide row spacing in South Africa. In Abstracts of the 2003 Annual Meeting of the ASA/CSSA/SSSA, Nov. 26, Denver, CO. Qian, S.S., Stow, C.A., Borsuk, M.E., 2003. On Monte Carlo methods for Bayesian inference. Ecol. Modell. 159 (23), 269277.
PAGE 260
238 Rawlins, S.L., Campbell, G.S., Campbell, R.H., Hess, J.R., 1995. Yield mapping of potato. P. 5968 in Robert, P.C., Rust, R.H., Larson, W.E. (Ed.) SiteSpecific Management for Agricultural Systems. ASA Misc. Publ., ASA / CSSA / SSSA, Madison, WI. Rawls, W.J., Brakensiek, D.L., 1995. Utilizing fractal principles for predicting soil hydraulic properties. J. Soil Water Conserv. 50 (5), 463465. Rawls, W.J., Brakensiek, D.L., Saxton, K.E., 1982. Estimation of soilwater properties. Trans. ASAE 25 (5), 13161320, 1328. Rawls, W.J., Pachepsky, Ya.A., 2002. Using field topographic descriptors to estimate soil water retention. Soil Sci. 167 (7), 423435. Reichardt, K., Bacchi, O.O.S., Villagra, M.M., Turatti, A.L., Pedrosa, Z.O., 1993. Hydraulic variability in space and time in a dark red latosol of the tropics. Geoderma 60, 159168. Renooij, S., Witteman, C, 1999. Talking probabilities: communicating probabilistic information with words and numbers. Int. J. Approx. Reason. 22, 169194. Renschler, C.S., Flanagan, D.C., Engel, B.A., Kramer, L.A., Sudduth, K.A., 2002. Sitespecific decisionmaking based on RTK GPS survey and six alternative elevation data sources: Watershed topography and delineation. Trans. ASAE 45(6): 1 8831895. Reynolds, S.G., 1970. The gravimetric method of soil moisture determination, part III: an examination of factors influencing soil moisture variability. J. Hydrol. 11, 288300. Ritchie, J.T., 1998. Soil water balance and plant water stress. In: G.Y. Tsuji, G. Hoogenboom, and P.K. Thornton (Ed.), Understanding Options for Agricultural Production, pp. 4154. Kluwer in cooperation with ICASA, Dordrecht. Ritchie, J.T. 1985. A useroriented model of the soil water balance in wheat. In: W. Day and R.K. Atkin (ed.), Wheat growth and modeling, pp. 293305. Series A: Life Sciences Vol. 86. Plenum Press, NY. Ritchie, J.T., 1981. Soil water availability. Plant Soil 58, 327338. Ritchie, J.T., Gerakis, A., Suleiman, A., 1999. Simple model to estimate fieldmeasured soil water limits. Trans. ASAE 42 (6), 16091614. Ritchie, J.T., Singh, U., Godwin, D.C., Bowen, W.T. 1998. Cereal growth, development and yield. In G.Y. Tsuji, G. Hoogenboom, P.K. Thornton (ed.) Understanding Options for Agricultural Production. Kluwer Academic Publishers, Dordrecht, 7998. Ritter, H.J., Martinetz, T., Schulten, K, 1992. Neural Computation and SelfOrganizing Maps: An Introduction. Addison Wesley, Reading, MA.
PAGE 261
239 Robledo, C.W., 1994. Application of geostatistical methods in the design of an agrometeorological network. M.S. Thesis. Escuela de Graduados, Convenio Facultad de Agronomia de la Universidad de Buenos Aires e INTA, Buenos Aires. Romero, L.H., Apezteguia, H.P., Esmoriz, G.F., Sereno, R., Aoki, A., Ateca, M.R., Mendoza, R., Robledo, W., 1995. Caracterizacion de una microcuenca sembrada con soja de la region semiarida central de la provincia de Cordoba (Argentina). Agriscientia XII, 5966. Rosenzweig, C, Iglesias, A., 1998. The use of crop models for international climate change impact assessment. In Tsuji, G.Y, G. Hoogenboom and P.K. Thornton (Ed.) Understanding Options for Agricultural Production. Kluwer Academic Publishers, Dordrecht. Pp. 99128. Royce F.S., Jones, J.W., Hansen, J.W., 2001. Modelbased optimization of crop management for climate forecast applications. Trans. ASAE 44 (5), 13191327. Sacks, J., Schiller, S., 1988. Spatial designs. In: Gupta, S.S., Berger, J.O. (Ed.) Statistical Decision Theory and Related Topics IV vol. 2 SpringerVerlag, New York, pp 385399. Sadler, E.J., Gerwig, B.K., Evans, D.E., Busscher, W.J., Bauer, P.J., 2000. Sitespecific modeling of corn yield in the SE coastal plain. Agric. Systems 64, 189207. Sadler, E.J., and Russell G. 1997. Modeling Crop Yield for Site Specific Management, p. 6980. In: F.J. Pierce and E.J. Sadler, (ed.), The State of Site Specific Management for Agriculture. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI. Saxton, K.E., Rawls, W.J., Romberger, J.S., Papendick, R.I., 1986. Estimating generalized soilwater characteristics from texture. Soil Sci. Soc. Am. J. 50 (4), 10311036. Schaap, M.G., Leij, F.J., van Genuchten, M.T., 1998. Neural network analysis for hierarchical prediction of soil hydraulic properties. Soil Sci. Soc. Am. J. 62 (4), 847855. Schaap, M.G., Leij, F.J., 1998. Using neural networks to predict soil water retention and soil hydraulic conductivity. Soil Till. Res. 47 (12), 3742. Schmidt, D., Rockwell, S.K., Bitney, L., Sarno, E.A., 1994. Farmers Adopt Microcomputers in the 1980s: Educational Needs Surface for the 1990s. Journal of Extension [Online] 32(1). Available at: http://www.ioe.org/ioe/1994iune/a9.html Accessed 1 June 2003. Schrock, M.D., Kuhlman, D.K., Hinnen, R.T., Oard, D.L., Pringle, J.L, Howard, K.D., 1995. Sensing grain yield with a triangular elevator. P. 635650. In Robert, P.C. et al. (ed.) Sitespecific management for agricultural systems. ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI.
PAGE 262
240 Searcy, S.W., Schueller, J.K., Bae, Y.H, Borgelt, S.C., Stout, B.A., 1989. Mapping of spatially variable yield during grain combining. Trans. ASAE 32(3), 826829. Shafer, G., 1976. A mathematical theory of evidence. Princeton University Press, Princeton, NJ. Shapiro, S.S., Wilk, M.B., 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 59161 1. Shen, J., Batchelor, W.D., Kanwar, R., Mize, C.W., 1998. Prediction of spatial soil water balance in a soybean field. ASAE Paper No. 981 109. ASAE, St. Joseph, MI. Shortliffe, E.H., 1976. Computerbased medical consultations: MYCIN. American Elsevier, New York. Sierra, B., Inza, I., Larranaga, P., 2000. Medical Bayes Networks. Medical Data Analysis, Proceedings, Lect. Notes Comput. Sc. 1933, 414. Silverstein, C, Brin S., Motwani, R., Ullman, J., 2000. Scalable techniques for mining causal structures. Data mining and knowledge discovery 4 (23), 163192. Sinclair, T.R., Park, W.I., 1993. Inadequacy of the Liebig limitingfactor paradigm for explaining varying crop yields. Agron. J. 85, 742746. Sisson, J.B., 1987. Drainage from layered field soils: fixed gradient models. Water Resour. Res. 23,20712075. Stafford, J.V., Ambler, B., Lark, R.M., Catt, J., 1996. Mapping and interpreting the yield variation in cereal crops. Comp. Elec. Agric. 14, 101109. Stassopoulou, A., Petrou, M., Kittler, J., 1998. Application of a Bayesian Network in a GIS Based Decisionmaking System. Int. J. Geogr. Inf. Sci. 12 (1), 2345. Stefani, R.T., 1999. A taxonomy of sports rating systems. IEEE Trans. Syst. Man Cybern. Part ASyst. Hum. 29(1), 116120. Sudduth, K.A., Drummond, S.T., Birrell, S.J., Kitchen, N.R., 1996. Analysis of spatial factors influencing crop yield. Proc. of the Third Int. Conf. on Precision Agriculture. ASACSSASSSA, Madison, WI, pp. 129140. Syslo, M.M., Deo, N., Kowalik, J.S., 1983. Discrete Optimization Algorithms with Pascal Programs. PrenticeHall, Englewood Cliffs, NJ. Tari, F. 1996. A Bayesian network for predicting yield response of winter wheat to fungicide programmes. Comput. Electron. Agric. 15 (2), 111121.
PAGE 263
241 Thiemann, M., Trosset, M, Gupta, H., Sorooshian, S., 2001. Bayesian recursive parameter estimation for hydrologic models. Water Resour. Res. 37 (10), 25212535. Tomer, M.D, Anderson, J.L., Lamb, J. A., 1995. Landscape analysis of soil and crop data using regression. P. 273284 in Robert, P.C., Rust, R.H., Larson, W.E. (Ed.) SiteSpecific Management for Agricultural Systems. ASA Misc. Publ., ASA / CSSA / SSSA, Madison, WI. Torra, V., 2000. Learning weights for Weighted OWA operators. In Proceedings of the IEEE International Conference on Industrial Electronics, Control and Instrumentation (CDROM) (IECON 2000), Nagoya, Japan, pp. 25302535. Trangmar, B.B, Yost, R.S., G. Uehara, 1985. Application of geostatistics to spatial studies of soil properties. Advances in Agronomy 38, 4591 . USDA NASS, 1994. Prices Rec'd by Farmers: Historic Prices & Indexes 19081992. Available at: http://ian.mannlib.cornell.edu/datasets/crops/92 1 52/ Accessed 1 June 2003. USDA Soil Conservation Service, 1973. Soil Survey of Calloway and Marshall Counties, Kentucky. USDA, Washington DC. USDA Soil Conservation Service, 1972. National Engineering Handbook, Hydrology, Section 4, Chapters 410. USDA, Washington DC. Vachaud, G., Passerat de Silans, A., Balabanis, P., Vauclin, M., 1985. Temporal stability of spatially measured soil water probability density function. Soil Sci. Soc. Am. J. 49 (4), 822828. Vallino, J. J., 2000. Improving marine ecosystem models: use of data assimilation and mesocosm experiments. J. Mar. Res. 58, 1 17164. Van der Ploeg, R.R., Bohm, W., Kirkham, M.B., 1999. On the origin of the theory of mineral nutrition of plants and the Law of the Minimum. Soil Sci. Soc. Am. J. 63, 10551062. Van Groenigen, J.W., 2000. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma 97, 223236. Van Groenigen, J. W., Gandah, M., Bouma, J., 2000. Soil sampling strategies for precision agriculture research under Sahelian conditions. Soil Sci. Soc. Am. J. 64 (5), 167480. Van Groenigen, J.W., Siderius, W., Stein A., 1999. Constrained optimisation of soil sampling for minimisation of the kriging variance. Geoderma 87, 239259.
PAGE 264
242 Van Groenigen, J. W., Stein, A., 1998. Constrained optimization of spatial sampling using continuous simulated annealing. J. Environ. Qual. 27 (5), 107886. Van Laar, H.H., Goudriaan, J., van Keulen, H. (Ed.), 1992. Simulation of crop growth for potential and waterlimited production situations, as applied to spring wheat. Simulation Reports 27, CABOTT, 72 pp. Van Wesenbeeck, I. J., Kachanoski, R.G., Rolston, D.E., 1988. Temporal persistence of spatial patterns of soil water content in the tilled layer under a corn crop. Soil Sci. Soc. Am. J. 52, 934941. Vieira, S.R., Hatfield, J.L., Nielsen, D.R., Biggar, J.W., 1983. Geostatistical theory and application to variability of some agronomical properties. Hilgardia 51, 175. Vieira, S.R., Lombardi Neto, F., Burrows, I.T., 1991. Mapeamento da chuva diaria maxima provavel para o estado de Sao Paulo, Revista Brasileira de Ciencia do Solo, Campinas 15, 9398. Vieira, S.R., Tillotson, P.M., Biggar, J.W., Nielsen, D.R., 1997. Scaling of semivariograms and the kriging estimation of fieldmeasured properties, Revista Brasileira de Ciencia do Solo, Vicosa 21, 525533. Wagner, L.E., Schrock, M.D., 1989. Yield determination using a pivoted auger flow sensor. Trans. ASAE 32 (2), 409413. Way, T.R., Von Bargen, K., Grisso, R.D., Bashford, L.L., 1992. Simulation of chemical application accuracy for injection sprayers. Trans. ASAE 35 (4), 1 1411 149. Weber, R.W., Grisso, R.D., Shapiro, C.A., Kranz, W.L., Schinstock, J.L., 1993. Accuracy of anhydrous ammonia application. ASAE Paper No. 931548. ASAE, St. Joseph, MI. Webster, R., Oliver, M.A., 1990. Statistical methods in soil and land resource survey. Oxford University Press, Oxford. Welch, S.M., Zhang, J., Sun, J., Mak, T.Y., 1999a. Efficient estimation of genetic coefficients for crop models. Proc. of the 3 rd Int. Symp. on Systems Approaches for Agric. Development, 810 November, 1999, Lima, Peru. Welch, S.M., Mak, T.Y., Wang, D., Zhang, J., 1999b. Coefficient estimation with CROPGROSoybean: sumofsquared residuals response surface topology. In Abstracts of the 29 th Annual Meeting of the Biological Systems Simulation Group, 2224 March 1999, Manhattan, KS. Whelan, B.M., McBratney, A.B., 1997. Sorghum grain flow convolution within a conventional combine harvester. P. 759766 in Stafford, J.V. (Ed.) Precision Agriculture '97: Proceedings of the lsr European Conference on Precision Agriculture, BIOS Scientific Publishers, Oxford, UK.
PAGE 265
243 Wikle, C.K., Cressie, N., 1999. A dimensionreduced approach to spacetime Kalman filtering, Biometrika 86, 815829. Wilson, J. P., Spangrud, D. J., Nielsen, G. A., Jacobsen, J. S., Tyler, D.A., 1998. Global positioning system sampling intensity and pattern effects on computing topographic attributes. Soil Sci. Soc. Am. J. 62: 14101417. Winston, W., 1994. Operations Research, Applications and Algorithms. Duxbury Press, Belmont CA. Wosten, J.H.M., Pachepsky, Ya.A., Rawls, W.J., 2001. Pedotransfer functions: bridging the gap between available basic soil data and missing soil hydraulic characteristics. J. Hydrol. 251 (34), 123150. Yager, R.R., 2003. Toward a language for specifying summarizing statistics. IEEE Trans. Syst. Man Cybern. Part BCybern. 33(2), 177187. Yager, R.R., 2002. Uncertainty representation using fuzzy measures. IEEE Trans. Syst. Man Cybern. Part BCybern. 32(1), 1320. Yager, R.R., 1993. Families of OWA operators. Fuzzy Sets and Systems 59, 125148. Yager, R.R., 1988. On ordered weighting averaging operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man and Cybernetics 18, 183190. Zadeh, L.A., 1984. Review of A Mathematical Theory of Evidence. AI Magazine 5 (3), 8183. Zhang, T., Berndtsson, R, 1988. Temporal patterns and spatial scale of soil water variability in a small humid catchment. J. Hydrol. 104, 1 1 1128.
PAGE 266
BIOGRAPHICAL SKETCH Rafael Andres Ferreyra was born in Cordoba, Argentina. He spent most of his K12 years abroad, living with his parents and sister in the United States, Colombia, and Chile. Andres graduated as an Electric and Electronic Engineer from the National University of Cordoba in 1994. He began working before graduation, designing small computers for data collection and process control applications, weather satellite image collection and processing equipment, and software. He continued this type of work from 1 993 to 1 999 as a managing partner in Dexar, a private company. During his time in Dexar, he also designed and managed a weather information delivery system for a newspaper, and led a computer peripheral design team for a PC equipment manufacturer. Starting in 1 992, Andres was also involved in developing remote sensing capabilities for the province of Cordoba. He was part of a group charged with the design, construction, and operation of an HRPT (High Resolution Picture Transmission) environmental satellite reception and processing station. The project began with a grant from the Secretariat of Science and Technology of Cordoba, and later spawned the Remote Sensing Group of CEPROCOR (Center of Excellence in Products and Processes). Andres was a codesigner of the station and, as a CEPROCOR Research Fellow from 1994 to 1999, was in charge of the scientific and information technology aspects of developing environmental applications for the station. Andres's interest in biological systems analysis and simulation developed while developing a satellite imagebased firedetection system for CEPROCOR' s HRPT station 244
PAGE 267
245 for use by the local Civil Defense agency. His ensuing MS degree in Agrometeorology, from which he graduated in 1998, was from the Faculty of Agronomic Sciences of the National University of Cordoba. His thesis proposed improvements in a peanut model's simulation of biophysical processes necessary for its use in identifying zones suitable for the peanut crop. Andres was also a faculty member in the Catholic University of Cordoba from 1997 to 1999, teaching computer programming to electronic engineering students. Andres came to the University of Florida in the fall of 1999, funded primarily by an Alumni Graduate Fellowship from the College of Agricultural and Life Sciences. His Ph.D. program is in Agricultural and Biological Engineering, with a minor in Computer Sciences. He received several honors while at UF: he was listed in Who 's Who in American Universities and Colleges, received an UF Presidential Recognition Award, and was inducted into several honor societies. He obtained scholarships and grants from the Scientific Society of Sigma Xi; the Honor Society of Phi Kappa Phi; and the Neural Networks Society of IEEE, the Institute of Electric and Electronic Engineers. He was also involved in student organizations: he was president and secretary of the Mayors' Council; alternate ABE representative to the Graduate Student Council; and president of the UF chapter of Alpha Epsilon, the honor society for agricultural and biological engineering. He also codeveloped the ABE Town Hall, a forum in the Agricultural and Biological Engineering Dept. for the discussion and exchange of ideas among students and faculty. Andres is married to Liliana Ferrer, a music educator from Cordoba. They have two sons: Nicolas, born in Cordoba in 1998, and Tomas, born in Gainesville in 2001 .
PAGE 268
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ames W. Jones, Chain/ isf inguished Professor"" of Agricultural and Biological Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kenneth J. Boote v/ Professor of Agronomy I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Douglas D. Dankel, III Assistant Professor of Computer and Information Sciences and Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy^ idv D. uranam / X Wendy Professor of Agricultural and Biological Engine I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Gerrit Hoogenboom Professor of Biological and Agricultural Engineering, University of Georgia, Griffin, Georgia
PAGE 269
This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. December, 2003 fiv^W* l/r^y^l^ Pramod P. Khargonekar Dean, College of Engineering Winfred M. Phillips Dean, Graduate School
