This item has the following downloads:
1 GEOSPATIAL MODELING OF SOIL ORGANIC CARBON AND ITS UNCERTAINTY By XIONG XIONG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013
2 2013 Xiong Xiong
3 To my parents grandparents and sister
4 ACKNOWLEDGMENTS I thank my major advisor Sabine Grunwald without whose advise, guidance and support I would never have finish ed this dissertation. I also thank my committee members Willie Harris, Ronald Corstanje, Wendell Cropper and Stefan Gerber for their advice and suggestions to improve this study. M y thanks also go to Brent Myers, Jongsung Kim, Wade Ross, Pasicha Chaikaew Baijing Cao Brandon Hoover, Kafui Awuma Congrong Yu and other friends for their help and friendship in the past four years I give my special thanks to my parents and Ce Yang who have always been supportive for me to fi nish the program. Funding for this doctoral research was provided by the project Rapid Assessment and Trajectory Modeling of Changes in Soil Carbon across a Southeastern Landscape ( National Institute of Food and Agriculture (NIFA) U.S. Department of Agriculture) and the matching fund from the Institute of Food and Agricultural Sciences at the University of Florida.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS .................................................................................................. 4 LIST OF TABLES ............................................................................................................ 8 LIST OF FIGURES .......................................................................................................... 9 ABSTRACT ................................................................................................................... 11 CHAPTER 1 INTRODUCTION .................................................................................................... 13 Digital Soil Mapping ................................................................................................ 13 Digital Soil Mapping Approaches ............................................................................ 14 Pedot ransfer Functions .................................................................................... 14 Geostatistics and Related Methods .................................................................. 14 State Factor (CLORPT and SCORPAN) Methods ........................................... 15 STEPAWBH Model for Spatially and Temporally Explicit Modeling ................ 16 Soil Organic Carbon ......................................................................................... 16 2 OPTIMAL SELECTION OF PREDICTING VARIABLES FOR SOIL ORGANIC CARBON MODELING ............................................................................................ 20 Overview ................................................................................................................. 20 Materials and Methods ............................................................................................ 24 Study Area ........................................................................................................ 24 Soil Organic Carbon Data ................................................................................. 24 Environmental Variables ................................................................................... 25 Modeling and Assessment Methods ................................................................. 25 Variable Selection Techniques ......................................................................... 28 Categorical Variable Reduction ........................................................................ 29 Results and Discussion ........................................................................................... 29 Charac teristics of Soil Organic Carbon Measurements .................................... 29 Variable Importance and Controlling Factors of Soil Organic Carbon Variation ........................................................................................................ 30 Soil Organic Carbon Maps and the Curse of Categorical Variables ................. 38 Summary and Conclusions ..................................................................................... 41 3 BAYESIAN GEOSTATISTICAL MODELING OF SOIL ORGANIC CARBON WITH UNCERTAINTY ANALYSIS .......................................................................... 57 Overview ................................................................................................................. 57 Materials and Methods ............................................................................................ 60 Study Area ........................................................................................................ 60
6 Soil Organic Carbon Data................................................................................. 61 Covariates ........................................................................................................ 6 1 Conventional Geostatistical Analysis ................................................................ 62 Bayesian Geostatistical Models ........................................................................ 62 Bayesian Geostatistical Model Validation ......................................................... 63 Results and Discussion ........................................................................................... 64 Summary Statistics of SOC Measurements ..................................................... 64 Spatial Autocorrelation of SOC in Florida ......................................................... 65 Bayesian Geostatistical Models ........................................................................ 67 Scaling up of SOC in Florida ............................................................................ 69 Summary and Conclusions ..................................................................................... 72 4 UNRAVELING FINE SCALE SPATIAL VARIABILITY OF SOIL ORGANIC CARBON ................................................................................................................ 86 Overview ................................................................................................................. 86 Materials and Methods ............................................................................................ 88 Descri ption of Study Area and Sites ................................................................. 88 Optimized Unbalanced Spatially Nested Sampling .......................................... 89 Soil Sampling and Laboratory Analysis ............................................................ 90 Hierarchical Analysis of Variance with Restricted Maximum Likelihood (REML) .......................................................................................................... 90 Geostatistical Analysis ...................................................................................... 91 Model with Ordinary Least Square (OLS) versus Generalized Least Square (GLS) ............................................................................................................ 92 Results .................................................................................................................... 92 Descriptive Statistics of Soil Organic Carbon ................................................... 92 Hierarchical Analysis of Variance of Soil Organic Carbon ................................ 93 Semivariogram Analysis of Soil Organic Carbon .............................................. 93 Accounting for Spatial Correlation in Linear Model ........................................... 95 Discussion .............................................................................................................. 95 Summary and Conclusions ................................................................................... 101 5 SOIL ORGANIC CARBON STOCK CHANGE AND ITS LINK TO LAND USE AND LAND COVER CONVERSION AND CLIMATE GRADIENT ........................ 112 Overview ............................................................................................................... 112 Materials and Methods .......................................................................................... 114 Study Area ...................................................................................................... 114 Historical Soil Organic Carbon Dataset .......................................................... 115 Current Soil Organic Carbon Dataset ............................................................. 115 Harmonization of Historic and Current Datasets ............................................ 116 Carbon Sequestration Rate ............................................................................ 117 Land Use and Land Cover and Climate Data ................................................. 118 Data Analysis ................................................................................................. 118 Results .................................................................................................................. 119 Effects of Land Use and Land Cover on Soil Organic Carbon ........................ 119
7 Soil Organic Carbon Change between 19651996 and 20082009. ............... 120 Impact of Land Use and Land Cover and Its Change on Soil Organic Carbon ........................................................................................................ 120 Impact of Land Use and Land Cover and Climate on Soil Organic Carbon Sequestration Rate ..................................................................................... 121 Discussion ............................................................................................................ 122 Summary and Conc lusions ................................................................................... 127 6 SUMMARY AND SYNTHESIS .............................................................................. 140 LIST OF REFERENCES ............................................................................................. 148 BIOGRAPHICAL SKETCH .......................................................................................... 160
8 LIST OF TABLES Table page 2 1 Assembled environmental variables representing STEP AWBH factors. ........... 43 2 2 Descriptive statistics of soil organic carbon at 020cm in Florida. ....................... 47 2 3 Descriptive statistics of continuous variables characterizing soil and environmental properties at the 1,080 sampling sites ......................................... 48 2 4 Cross validation results of exhaustive, all relevant models and parsimonious models. ............................................................................................................... 49 3 1 Descriptive statistics of soil organic carbon observations at 0 20cm in Florida. 74 3 2 Descriptive statistics of covariates used to model the global spatial trend at the 1,080 sampling sites ..................................................................................... 75 3 3 Semivariogram model parameters of conventional geostatistical models using log transformed soil organic carbon of 756 calibration samples ............... 76 3 4 Model parameters of Bayesian geostatistical models using log transfor med soil organic carbon of 756 calibration samples. .................................................. 77 3 5 Frequencies of validation samples with obs erved soil organic carbon values falling in 25 75 and 2.597.5 percentile of posterior prediction distributions. ...... 78 4 1 Summary of soil properties and environmental settings of the sampling sites. 103 4 2 Hierarchy of the optimized unbalanced spatially nested scheme. .................... 104 4 3 Descriptive statistics of soil organic carbon stock at 020 cm ........................... 105 4 4 Estimated and deriv ed parameters of semivariograms ..................................... 106 4 5 Results for analysis of variance models fitted by ordinary least square and generalized least square .................................................................................. 107 5 1 Descriptive statistics of historical and current soil organic carbon observations at 020 cm in Florida. ................................................................... 129 5 2 General linear models showing the effects of land use and land cover, temperature and precipitation on soil organic carbon sequestration rate. ........ 130
9 LIST OF FIGURES Figure page 2 1 Topological representation of variable sets ........................................................ 50 2 2 Schematic workflow of t he variable selection processes .................................... 51 2 3 A total number of 1080 sampling sites and elevation of Florida, USA ................ 52 2 4 Importance of the all relevant variables identified by the Boruta algorithm. ........ 53 2 5 Model val idation results ...................................................................................... 54 2 6 Soil organic carbon maps at 020 cm depth from parsimonious models ............ 55 2 7 Validation results of exhaustive models and models d eveloped with only continuous variables from the exhaustive variable set ...................................... 56 3 1 A total number of 1080 sampling sites and elevation in Florida .......................... 79 3 2 Histograms of soil organic carbon of 1080 samples ........................................... 80 3 3 Semivariograms of logtrans formed soil organic carbon of 756 calibration samples .............................................................................................................. 81 3 4 Posterior mean predictions of soil organic carbon (SOC) vs. observed SOC ..... 82 3 5 So il organic carbon prediction at 020 cm depth in Florida from Model 1. .......... 83 3 6 Soil organic carbon prediction at 020 cm depth in Florida from Model 2. .......... 84 3 7 Histogr ams of soil organic carbon predictions from two models ......................... 85 4 1 Sampling sites under five land use and land cover types and soil orders in Florida. ............................................................................................................. 108 4 2 The optimized unbalanced spatially nested scheme. ...................................... 109 4 3 Components and accumulated components of variance values for soil organic carbon in five land use and land cover types ....................................... 110 4 4 Cressie Hawkins robust estimators of semivariogram and fitted Matrn semivariogram for soil organic carbon in five land use and land covers. .......... 111 5 1 The spatial distribution of sites sampled between 2008 and 2009 on top of the general land use and land cover map between 2008 and 2011 ................. 131
10 5 2. Interactively re weighted least square regression to convert Walkley Black soil organic matter to soil organic carbon ................................................................ 132 5 3 A segmented PTF model for the conver sion of loss on ignition meas urements to soil organic carbon ....................................................................................... 133 5 4 Violin plot of soil organic carbon of the 1,080 current samples grouped by land use and land cover upon sampling (20082009). ..................................... 134 5 5 Histogram of soil organic carbon change between 19651996 and 20082 009 at the 194 collocated sites ................................................................................ 135 5 6 Soil organic carbon change between 19651996 and 20082 009 at the 194 collocated sites ................................................................................................ 136 5 7 The effect of maximum annual temperature on soil organic carbon sequestration rate in four land use and land cover and the change types ........ 137 5 8 The effect of mean annual temperature on soil organic carbon sequestration rate i n two land use and land cover and the change types ............................... 138 5 9 General land use and land cover change between 1970s and 20082011 in Florida. ............................................................................................................. 139
11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy GEOSPATIAL MODELING OF SOIL ORGANIC CARBON AND ITS UNCERTAINTY By Xiong Xiong August 2013 Chair: Sabine Grunwald Co C hair: Will ie G. Harris Major: Soil and Water Science Florida stores the largest amount of soil organic carbon (SOC) among the conterm inous U.S. states (2.26 Pg) due to its unique edaphic, climatic, topographic and hydrological conditions. To better understand the role this large SOC pool plays in the global carbon cycle, we need to know the SOC stock and its change. T h e objective of th is dissertation was to enhance our knowledge on the spatial and temporal variation of SOC in Florida First ly an objective way was explored to strategically select predictors from a comp re hensive predictor pool of environmental variables to develop geospa tial SOC prediction models Results confirmed the key factors controlling the SOC variation in Florida were vegetation and soil moisture and parsimonious SOC models showed comparable model performance to exhaustive model Secondly geospatial model s of SOC with uncertainty were developed using the Bayesian geostatistical method showing that parameter uncertainty in geospatial models can result in considerable uncertainty in spatial prediction and including covariates may reduce this uncertainty Thirdly, the finescale (< 500 m) variation of SOC was investigated under five prevalent land use and land cover types, i.e., Sandhill, Hardwood Hammock and Forest, Pineland,
12 Improved Pasture, and Dry Prairie. Results showed that five sites had different spatial structure Hardwood Hammock and Forest and Improved Pasture demonstrated both large variation at both coarse scale (67 and > 200 m) and very fine scale (2 m). Sandhill, Pineland and Dry Prairie were dominated by variation at very fine scales (2 and 7 m). A ll the five sites showed large variability at very fine scales, indicating the close coupling of SOC variation to structure and composition of vegetation. Lastly, the SOC change coupled with LULC and climat ic factors over the past four decades was studied. Significant SOC accumulation was observed between 1965 1996 and 20082009 and concomitant LULC and LULC change significantly a ffect ed SOC change less so climatic factors. The study improved the knowledge of the spatial and temporal variation of SOC in th e complex soil landscape continuum of Florida with implications for carbon cycling and sequestration, land resource management and ecosystem service assessment.
13 CHAPTER 1 INTRODUCTION Digital Soil Mapping According to the definition of the international Working Group on Digital Soil Mapping (WG DSM), D igital Soil Mapping (DSM) is the creation and the population of geographically referenced soil databases generated at a given resolution by using field and laboratory observation methods coupl ed with environmental data through quantitative relationships. Conventional soil survey relies heavily on collecting field soil data, and thus is a slow and expensive process and there is a crisis in collecting new field data worldwide that leads to the l ack of spatial soil data (Grunwald et al., 2011) Conventional soil survey is representing soil surveyors knowledge about soil coupled with interpretation of soil landscape relationships in which soil surveyors expertise and subjective judgment have major impact on the precision of soil maps (McBratney et al., 2003) Furthermore, conventional soil survey delineates soil entity by polygons based on qualitative criteria such that the maps do not adequately express the complexity and variat ion of soils across a landscape in an easily understandable way (Sanchez et al., 2009) These shortcomings prevent an extensive and credible use of the soil information in policy making, land resource assessment, and research within the spatial data platform. At the advent of the digital age, DSM has the potential to overcome some of the shortcomings of conventional soil mapping paradigms DSM makes extensive use of technological advances, including Global Positioning System ( GPS ) receivers, proximal
14 and re mote sensors and computational power which make the production and distribution of soil information much cheaper and more precise (McBratney et al., 2003) Digital Soil Mapping Approaches So far, various working frames and quantitative techniques have be en initiated and applied in DSM (McBratney et al., 2003; Grunwald et al., 2006; Grunwald, 2009) Pedot ransfer Functions Pedotransfer functions (PTFs), a term coined by Bouma ( 1989) as translating data we have into what we need, which can be interpreted as predictive functions of certain soil properties from other more available, easily, routinely, or cheaply measured properties . Pedotransfer functions originated from estimation of water retention curves and hydraulic conductivity, and the most comprehensive work was conducted in predicting hydraulic characteristics (Wsten et al., 2001) while PTFs for soil carbon prediction are much less common. Geostatistics and Related Methods T he theoretical foundation for geostatistics was provided by Matheron ( 1963) and it was introduced into soil science around 30 years ago by Burgess and Webster ( 1980a; b) and Webster and Burgess ( 1980) in their application of kriging in soil survey. Since then, geostatistics, particularly various forms of kriging, has been an increasingly prevalent method in spatial modeling of soil properties. In general, kriging predicts values ( ) by a linear combination of of random field nearby locations ( = 1 2 , ) as in Eq. (1 1) and Eq. (1 2). ( ) = ( ) ( 1 1 ) = 1 ( 1 2 )
15 w here, ( ) is the predicted value, ( ) is a Best Linear Unbiased Estimator (BLUE) of a random variable ( ) is the kriging weight. State Factor (CLORPT and SCORPAN) Methods Jenny's (1941) well known conceptual model for soil formation and alteration has paved the way for quantitative research in soil science, in which five soil formation factors were included in implicit form as in Eq. (13) = ( , , ) ( 1 3 ) w here, is soil, c is the climatic factor, is organisms (vegetation and fauna), is relief (the topographic factor), is parent material, and is the time factor. The dots provide the flexibility of introducing additional factors if necessary or advisabl e. This canonical model had been the theoretical background of early quantitative work (Yaalon, 1975) which was aimed at understanding the formation of soils, not yet quantitatively predicting soil properties from environmental factors. McBratney (2003) f urther developed the CLORPT model (Eq. (1 4)) into a general SCORPAN model, based on decades of advances in pedometrics. [ ~ ] = ( [ ~ ] [ ~ ] [ ~ ] [ ~ ] [ ~ ] [ ] ) ( 1 4 ) w here, is soil attribute or soil class, is soils, other attributes of the soil at a point, is climatic factor, is organisms, is relief (the topographic factor), is parent material, is age, the time factor, is represented by the vector of spatial coordinates and is time (where is defined as an approximate time)
16 STEPAWBH Model for Spatially and Temporally Explicit Modeling More recently, Grunwald et al. (2011) developed the STEP AWBH model which extends the SCORPAN model to account for the accumulative effect of more dynamic environmental factors on soil formation (Eq. (1 5)). ( , ) = ( , ) ( ) ( ) ( ) ; ( ) ( ) ( ) ( ) (1 5) w here, SA is the target soil property, S represents ancillary soil properties, T represents topographic properties (e.g., elevation, slope, compound topographic index), E represents e cological properties (e.g., physiographic region, ecoregion), P represents the parent material and geologic properties (e.g., geologic formation), A represents atmospheric properties (e.g., precipitation, temperature, solar radiation), W represents water properties (e.g., available water holding capacity, surface runoff), B represents biotic properties (e.g., vegetation or land cover, spectral indices derived from remote sensing, organisms), and H is human induced forcings (e.g., land use and land use change, contamination). j is the number of predictors, j = 1, 2, n, px is a pixel in size x (width = length = x) at a specific location on Earth, tc is the current time, ti is the time to tc with time steps i = 0, 1, 2, m, and z is soil depth. Soil Organic Carbon It has been estimated that soil organic carbon (SOC) constitutes about twothirds of the Earths terrestrial carbon pool (Post et al., 1990). Its active interactions with vegetation and atmosphere carbon pools make it a critical component in the global carbon cycle (Kutsch et al., 2010) Substantial scientific attention has been dr awn to the
17 SOC pool because of the huge potential to deposit carbon belowground with a relatively slow turnover rate (Post et al., 1982). Florida soils hold approximately 2.26 Pg of SOC which is the highest value in the conterminous U.S. states according to the U.S. General Soil Map (STATSGO2, 2006). This is primarily due to the extensive coverage of Histosols in the 46,951 km2 of wetlands in the state especially in south Florida (U.S. Fish and Wildlife Service, 2009). In addition, Florida features the SOC rich Spodosols which cover about 32% of the area. The formation of these high SOC soils is attributed to the unique climatic (high temperature and pr ecipitation), topographic (flat landscape) and hydrologic (high water table) conditions (Stone et al., 1993; Vasques et al., 2012b). Florida has been experiencing significant LULC shifts which include rapid urban growth and losses of agricultural and fores t land for the past decades (Kautz et al., 2007). This change may have caused significant SOC change in Florida because SOC has been shown in numerous studies to be closely interlinked to LULC (Houghton et al., 1999; Post and Kwon, 2000; Guo and Gifford, 2002; Vesterdal et al., 2011; Minasny et al., 2013). Richter et al. (2011) pointed out that in the Anthropocene soil change has accelerated at global scale in response to anthropogenic induced stressors, however, the magnitude and rate varies geographically Unfortunately, there is still substantial lack in knowledge about how SOC has been changed over the past decades due to the LULC change in Florida which has great significance if soils act as a sink or source for carbon. Climate change has raised concern s on its impact on global SOC stocks. H owever, there has been no consensus on soils role as a sink or source in response to
18 global warming (Davidson and Janssens, 2006). The controversy results from the complex interaction among soil, LULC and climate sys tems. Some studies have shown that the warming temperature can accelerate the SOC decomposition and cause a net loss of C to the atmosphere (Davidson et al., 2000; Bellamy et al., 2005; Dorrepaal et al., 2009). However, other studies argued that the increasing temperature can lead to a net gain of SOC by promoting plant derived C input which exceeds the increase of decomposition (Nemani et al., 2003; BondLamberty and Thomson, 2010). The debate reflects the complexity of interaction effects between temperat ure and SOC decomposition. Soil organic mat ter (SOM) ha s a wide range of intrinsic temperature sensitivity of decomposition because they consist of thousands of organic compounds each of which has its own inherent decomposition property (Kutsch et al., 2010) Furthermore, the temperature sensitivity can be further confounded by some edaphic and environmental conditions that interfere with the decomposition process. For example, SOM can be physically protected from decomposition when they form inside of the soil aggregates, or chemically protected if they are adsorbed onto soil mineral surfaces (Davidson and Janssens, 2006). Land use and land cover plays a critical role in the response of SOC to climate change. On the one hand, LULC controls the quantity and quality of organic compounds that enter soil s that determine the intrinsic temperature sensitive of SOM decomposition, On the other hand, LULC also changes the soil and environmental conditions that can further affect the apparent temperature sensitivity ( Post and Kwon, 2000; Jones et al., 2005; Davidson and Janssens, 2006). There is a need to study the interacting effect of LULC and climate on SOC changes in order to better understand how SOC responds to climate change. Florida s boundary
19 extends wide in t erms of longitude and latitude, covering a large range of climatic conditions. It has both subtropical and tropical climate which makes Florida an ideal area to study the spatial and temporal variation in response to various environmental factors and stres sors Therefore, the objectives of this dissertation were to 1) investigate the spatial and temporal variations of SOC in a southeastern U.S. landscape; 2) identify the effects of environmental and anthropogenic factors on the SOC spatial variation; 3) det ermine the spatial scale at which these drivers and forcings operate; 4) study the effects of environmental and anthropogenic forcings on the SOC temporal change emphasizing impact of LULC and climate change. The overall hypothesis of this dissertation is that the spatial and temporal variation of SOC are not random and much of the variation can be explained by its environmental factors and forcings with some degree of uncertainty that stems from various sources such as our imperfect or inco mplete knowledge/data about SOC dynamics
20 CHAPTER 2 OPTIMAL SELECTION OF PREDICTING VARIABLES FOR SOIL ORGANIC CARBON MODELING Overview Soillandscape modeling has been widely used to predict soil properties and classes and identify relationships between soil forming factors and soils. Rooted in previous work by Jenny (1941) and Dokuchaev (Glinka, 1927) conceptualizing the CLORPT soil formation model, the SCORPAN approach formulated by McBratney et al. (2003) has been adopted to empirically model the relationships between soil and spatially referenced soil and environmental factors (such as soils, climate, organisms/biota, relief, and parent material). For the past three decades, human activities have been generating profound shifts in the Earth system, exerting critical impact on the pedosphere in terms of soil formation, change, and degradation (Richter and Markewitz, 2001) In response, the STEP AWBH model ( S: soil, T: topography p, E: ecology, P: parent mat erial, A: Atmosphere, W: Water, B: Biota, H, Human) was devised to explicitly account for the effects on the soil system induced by humans (Grunwald et al., 2011) These factorial models (such as SCORPAN, STEP AWBH etc. ) have in common that they relate environmental variables and soil properties or classes within a spatially and temporally explicit framework. Implementation of the factorial soil models has primarily focused on one or a few of the soil and environmental factors typically pre selected by users based on domain expertise (McBratney et al., 2003; Grunwald, 2009) Selection of factors in these studies was generally guided by the researchers subjective knowledge of underlying soil forming processes. In some cases, a l imited set of predictor variables could lead to biased and suboptimal model performance (Grunwald, 2009) A more unbiased
21 approach is to select a broad set of environmental variables that represent the spectrum of possible soil forming processes operating in a given landscape. The more exhaustive such a set of environmental variables is, the higher the potential is to unravel complex soilenvironmental interactions and identify an unbiased, optimal model to predict a soil property of interest. Poggio et al. (2013) have successfully used stepwise methods to select the most predictive variables from a large pool of satellite derived variables in a regional application of mapping properties, however their covariate scope was confined in DEM and vegetation indices derived from ModerateResolution Imaging Spectroradiometer ( MODIS ) products while other important factors (e.g., soil, atmosphere, water ) were not considered. W ith the advance and application of G eographic I nformation S ystem (GIS), Global Positioning System ( GPS ) and remote and proximal sensing technologies, more and more spatially exhaustive data products are available to characterize soil and environmental properties and populate factorial soil models These spatially explicit environmental datasets are available in much more abundance and finer spatial resolutions when compared with sparser sampled pedon data. In that sense, digital environmental covariates serve as critical predictors to infer on soil properties, though it is usually not known whic h combination of the environmental predictors has the highest predictive power in a given geographic region due to their scale dependent behavior (Vasques et al., 2012b) Collecting a large set of predictive variables for models can potentially be problematic as well. Some key issues are redundancy and collinearity between the variables, and the deleterious effects of noisy or noninformative variables. M odeling
22 procedure t hat objectively reduce the input dataset and/ or remove noninformative variables are needed to guide the process to optimize sets of soil environmental variables aiming to make predictions of soil properties Strategic variable selection is a commonly adopted approach to meet these goals and can be used to build parsimonious models. In addition, variable selection can reduce model development and application time, increase model interpretability, and reduce overfitting (Kohavi and John, 1997) .Variable selection can also help to more objectively identify the important soil process es (Guyon and Elisseeff, 2003) Machine learning and data mining have made remarkable improvements over the past decades; however, they have only recently received attention by soil scientists (Grunwald et al., 2009). Extensive variable selection techniques have been dedicated to solving the socalled minimaloptimal problem, which is aimed at searching for the possible minimum set of predictor variables yielding the best prediction accuracy (Guyon and Elisseeff, 2003; Nilsson et al., 2007) Another concept allre levant problem has also received increasing interest (Nilsson et al., 2007; Kursa and Rudnicki, 2010) Instead of finding the least redundant predictor variables as does the minimal optimal problem, it searches for all relevant variables to the target property. Therefore, finding allrelevant variables has more value in understanding the mechanisms underlying the soil environment relationship. Nilsson et al. (2007) gave an indepth discussio n about the relationships between the two problems and showed that the minimaloptimal set is a subset of the all relevant set when the data conform to strictly positive distributions which is the case for most data encountered in practical applications ( Figure 21)
23 Environmental variables that represent soil landscape processes may come as different data types, generally continuous and categorical (including ordinal and nominal). Nominal categorical variables (such as land use and geology type) discretiz e a population of measurements into unbalanced groups and may impose problems for soil predictions based on factorial digital models This problem is magnified when a calibration and validation approach is implemented that may randomly split nominal classe s. Classes with small membership may leave insufficient populations in either the calibration or validation dataset. Further more applying a soil model to the whole mapping area, the possibility remains that not all mapped classes of a nominal predictor ar e in either the calibration or validation dataset, and therefore are unknown to the model. Hence, one would not be able to make a prediction for the unknown category, in essence, creating blank areas on a soil map. The probability that one location fal ls into at least one unknown category of any categorical variable increases as more categorical variables are included in the model and as the number of categories increases. Therefore, it would be reasonable to build models that strike the balance between model performance to predict soils and the number of categorical variables used. Soil organic carbon (SOC) is a key property that not only indicates soil quality but also has profound significance to the global climate system (Lal, 2003) Therefore, it has been one of the most widely investigated soil properties in DSM (Grunwald, 2009); and thus, the focus of this study is the SOC assessment in Florida, USA. The aim of the study was threefold: 1) from a comprehensive set of environmental variables, ident ify an allrelevant set of variables to develop models to predict SOC in the topsoil; 2) from the
24 allrelevant set, identify the minimal optimal set that both optimize the model performance and simplify the model; 3) explore the possibility of reducing the use of categorical variables in predictive soil models. Materials and Methods Study Area The study area is the state of Florida, located in the southeastern region of the 2 (United States Census Bureau, 2000). The climate is humid and subtropical in northern and central Florida and is humid and tropical in southern Florida. The mean annual precipitation of Florida is 1,373 mm and the mean annual temperature is 22.3 C (National Climatic Data Center, 2008). Overall, soils in Florida are sandy in texture. Dominant soil orders of Florida are: Spodosols (32% ), Entisols ( 22%), Ultisols (19 %), Alfisols (13%), and Histosols (11%). Most frequent soil subgroups are: Aeric Alaquods, Ultic Alaquods, Lamellic Quartzipsamments, Typic Quartzipsamments, and Arenic Glossaqualfs ( Natural Resources Conservation Service, 2009). Land use and land cover consists mainly of wetlands ( 28%), p inelands (18 %), u rban and barren lands (1 5 %) agriculture (9%), rangelands (9%), and improved pasture (8%) (Florida Fish and Wildlife Conservation Co mmission, 2003). Floridas topography is muted with gentle slopes varying from 0 to 5 % in almost the whole State ( Figure 23 ) (United States Geological Survey, 1984). Soil Organic Carbon Data A total of 1,080 soil samples in the topsoil (020 cm) across Fl orida ( Figure 23 ) were collected between 2008 and 2010 based on a random sampling design stratified by the combination of soil suborder and reclassified LULC (Table 2 1). The number of
25 samples designated to each stratum is proportional to the area of the strata. The reclassification of LULC is based on the data produced by Florida Fish and Wildlife Conservation Commission (2003) Essentially, the original LULC classes with similar soil moisture regime were generalized into a broader class. The purpose of reclassification was to reduce the number of LULC classes and to obtain an affordable number of soil suborder LULC strata (89). Total carbon (TC) was analyzed by zed by 5000A). Soil organic carbon (SOC) was derived by subtraction (TC IC). The laboratory SOC measurements in mass units (mg kg1) were converted to stock units (kg m2) using the measured bulk density and soil depth (20 cm). Environmental Variables A comprehensive set of predicting variables (210 variables) representing the STEPAWBH factors were compiled from various data sources with ArcGIS 9.3 (Environmental Systems Research Institute, ESRI Inc., Redlands, CA) (Table 2 1). Seventeen percent of these variables (37) are categorical, including soil taxonomic properties, soil drainage and hydrological classes, LULC, vegetation type, etc., while 83% (175) are continuous, including, soil water holding capac ities, organic matter content, primary and secondary topographic, climatic and biotic variables. Modeling and Assessment Methods The theory for the correlation of SOC and the environment variables resides in the STEP AWBH model as shown in Eq. ( 2 1) (Grunw ald et al., 2011) In essence, SOC is modeled in dependence of numerous environmental factors that have an effect on SOC stock formation.
26 ( , ) = ( , ) ( ) ( ) ( ) ; ( ) ( ) ( ) ( ) (2 1) w here, SA is the target soil property, S represents ancillary soil properties, T represents topographic properties (e.g., elevation, slope, compound topographic index), E represents ecological properties (e.g., physiographic region, ecoregion), P represents the parent material and geologic properties (e.g., geologic formation), A represents atmospheric properties (e.g., precipitation, temperature, solar radiation), W represents water properties (e.g., available water holding capacity, surface runoff), B represents biotic properties (e.g., vegetation or land cover, spectral indices derived from remote sensing, organisms), and H is human induced forcings (e.g., land use and land use change, contamination). j is the number of predictors, j = 1, 2, n, px is a pixel in size x (width = length = x) at a specific location on Earth, tc is the current time, ti is the time to tc with time steps i = 0, 1, 2, m, and z is soil depth. Four treebased modeling techni ques were used to develop soil models, including Bagged Regression Tree (BaRT), Boosted Regression Tree (BoRT), Random Forest (RF), and Cubist (Quinlan, 1993; Breiman, 1996, 2001; Friedman, 2002) The first three are ensemble methods that combine the predi ctions of several regression tree models in order to improve accuracy and robustness over a single tree. The BaRT uses bootstrapping samples to develop a set of different regression tree models and the outputs are the averaged result. Unlike BaRT, BoRT giv es greater weights to the stronger models. Instead of bootstrapping samples, RF grows different trees by randomly and repeatedly selecting predictor variables and training cases to develop a random population of trees, which, like BaRT is averaged for prediction. Cubist is a
27 ruledbased regression technique that builds multivariate linear regression models at the terminal leaves of a tree. The four methods were implemented with ipred, gbm, randomForest, and Cubist packages in R 2.14.2 (R Development Core Team, 2011) 2011) respectively. The models were assessed in both cross validation and validation modes The whole dataset was split 70/30 into calibration and validation datasets. The 70% calibration samples were randomly sampled from a two stage stratification including the soil suborder and the reclassified LULC. The number of samples in each stratum was proportional to its area. The Kolmogorov Smirnov test was applied on the SOC distribution of the two sets (i.e., calibration and validation sets) to ensure they have the same distribution. All models were developed on the calibration set and 10fold cross validation was used to optimize model parameters. In addition, the models were validated with the validation set. The error metrics used to compare models were the coefficient of determination (R2), root mean squared deviation (RMSD, Eq. (22 )), = ( ) (2 2) and residual prediction deviation (RPD, Eq. ( 2 3)) (Williams, 1987) = / ( ) (2 3) w here, are the model predicted values, y are the observed values, n is the number of predicted or observed values in the heldout dataset (in cross validation) or validation set (in validation) with i = 1, 2, n, and SD is the standard deviation of the validation set (in validation).
28 Variable Selection Techniques The Boruta all relevant variable selection method was applied to identify the environmental variables that are strongly or weakly relevant to SOC variati on. The Boruta method is based on the RF algorithm and therefore can identify nonlinear relationships as well linear ones. In brief, the Boruta algorithm generates five random probes whose values are obtained by shuffling values of the original predictors to remove their correlations with SOC. Then RF regression is performed on the original predictors and random probes combined and the importance of each variable (Z score) is obtained. The maximum Z score among the random probes (MZRP) is identi fied and use d as a reference to determine if a variable is relevant to SOC with a twosided test of equality only the variable with the z score significantly higher than the MZRP was flagged as the relevant variable (Kursa and Rudnicki, 2010) Ideally, the minimal o ptimal set is found by searching exhaustively all possible subsets of all relevant set s. However, it is computationally intensive and even prohibitive when the number of all relevant variables (n) is very large (the number of all subsets is 2n). Thus, stra tegic subset selection techniques are needed. Four optimization approaches, greedy forward, greedy backward, hill climbing, and simulated annealing, were used in this study (Cormen et al., 1990; Russell et al., 2010) The greedy algorithms search local optima heuristically. Greed forward first evaluates all subsets that have only one predicting variable with LeaveOne Out cross validation and finds the best one. Then, it finds the best subset consisting of two variabl es with the other variable from the remaining n1 variables. Afterwards, it finds the best subsets with three, four and n variables, and finally the best subset is determined by comparing all the best subsets. The greedy backward works similarly, except that it
29 starts with n variables and progresses toward only one variable. Hill climbing is also a local optimization algorithm that starts with an arbitrary selection of variables and then attempts to find a better subset by incrementally including or eli minating one variable. An incremental change is made only if the change produces a better solution (a subset of variables with higher predictive power). Stimulat ed annealing is a global optimization method that resembles the annealing in metallurgy. In analogy, it considers a decreasing progression of temperatures and for each temperature parameter it proposes and evaluates a random update to a subset, usually not too far away from the current solution. It accepts not only updates that improve predict ions, but also the ones that do not improve predictions, depending on the probability constructed on the temperature parameter. This defining feature gives it ability to escape local optima to reach the global optimum (Wehrens, 2011) Categorical Variabl e Reduction In order to explore the possibility of reducing the reliance on categorical variables in DSM, the performance of model s developed with mere continuous variables from the exhaustive set ( 210 variables ) was compared with that of exhaustive models The overall workflow is schemed in Figure 22. Results and Discussion Characteristics of Soil Organic Carbon Measurements The SOC in the top 20 cm soils showed considerable variation with a range of 33.7 kg m2, mean of 5.0 kg m2, and median of 3.4 kg m2 ( Table 22). The data were strongly positively skewed with most values lying on the low value end. A high kurtosis value evidenced that the deviations of infrequent high SOC values accounted for a large amount of the SOC variation as opposed to frequent modest deviations. Both skewness
30 and kurtosis indicate that the distribution of SOC values was highly nonnormal. These SOC observations represent the major soil types with different tendencies to accumulate carbon across the State of Florida. According t o Vasques et al. (2012 b ) soils with high carbon content, including organic peats in isolated wetlands and freshwater marshes (Histosols), are embedded within low carbon, quartz rich upland soils resulting in highly diverse SOC content ranging from 0.03 to 54.6 % sometimes at very close range. Table 22 shows the characteristic distribution of soil SOC stock observations of calibration and validation sets. Statistics of these sets resembled those of the whole set indicating that both calibration and validat ion sets were representative. The Kolmogorov Smirnov test confirmed that the two sets shared a common distribution (p = 0.64). Variable Importance and Controlling Factors of Soil Organic Carbon Variation The Boruta variable selection technique identified 4 3 variables that are relevant to discern SOC variation ( Figure 24 ). The Florida LULC classification map (LULC, Table 21) stood out as the most relevant to explain the SOC variation. A similar dataset, i.e., the national land cover classification map (Lan dCovCls, Table 21), showed strong relevance to infer SOC variation as well. Also relevant to SOC variation were soil moisture related properties, including plant available water holding capacity in the upper soil profile, 0 to 50 cm and 0 to 25 cm, and soil hydric rating and drainage class. Soil taxonomic variables, i.e., soil great group, suborder and order, and soil organic matter also showed strong relationships with SOC. In addition, vegetation related variables demonstrated strong connectivity to SOC variation, as shown by a number of vegetation indices such as vegetation type, cover and height, and biophysical
31 settings), cropland data layer, and the phenological property (SmallNdviPkInt, representing the seasonally active vegetation, see Table 21) derived from NDVI (from MODIS). Topographic properties, i.e., slope from SSURGO and elevation from National Elevation Dataset, and climatic properties, showed moderate correlation with SOC. Atmospheric properties, i.e., temperature and precipitation from PRI SM and solar radiation by National Climatic Data Center had relatively weak relevance to SOC in Florida. These results suggest that the processes forming SOC in the top 20 cm soils in Florida were mainly driven by vegetation and soil water gradient in a positive feedbac k loop. Net primary production provides the input of carbon into the soil, while the soil water controls decomposition rates in addition to promoting the net primary production (Schimel et al., 1994; Jones et al., 2005; Smith et al., 2005; Davidson and Janssens, 2006) The relationship between SOC and soil water is further revealed by the SOCs significant positive correlation to available water capacity (AWC25 and AWC50) and the significant negative correlation to the distance to stream ( Tabl e 23). Although the AWC is not a direct measure of the amount of soil water content, given the fact that Florida is a region of high rainfall, the AWC is likely correlated to soil water content. This process is a probable cause for the positive correlatio n (r = 0.40, p < 0.001) between SOC and AWC50. This result also confirms the finding by Vasques et al. (2012b ) in Florida. There have been extensive studies that reported the effects of climatic factors (precipitation, temperature, etc.) on SOC dynamics. Most studies suggest that SOC stocks increase with increasing precipitation. Precipitation is regarded to promote gross prim ary productivity (GPP) by supplying water available to plants, while higher
32 temperature expedites the microbial activity in SOC decomposition (Post et al., 1982; Wynn et al., 2006; Saiz et al., 2012) However, in this study climatic properties had weak rel ationships to SOC based on Boruta all relevant variable selection. Additionally, no significant correlation was found between SOC and precipitation or temperature variables ( Table 23) similar to the findings by Vasques et al. (2010) in the Santa Fe Waters hed, Florida. The study by Percival et al. (2000) also found that precipitation explained little of the soil C variation of grasslands in New Zealand. This indicates that complex interplays among climatic, biotic and pedogenic factors may exert controls on SOC in Florida. The result confirmed that GPP was indeed positively correlated to annual mean precipitation (r = 0.14, p < 0.001), which suggests higher SOC input at the locations with higher precipitation. One possibility to explain the weak SOC precipit ation relationship in this study is that the top layer SOC was leached down into the lower profile as a result of the high annual precipitation and sandy soil texture across Florida ( Table 23), forming the prevalent Spodosols (32% coverage within Florida) Another explana tion may be that many of Florida soils are very well drained. In these cases while precipitation and NPP are high, SOC tends not to accumulate because of higher SOC degradation rates. Furthermore, the relatively homogeneous spatial patter n of precipitation in Florida, as indicated by the low coefficient of variation (CV) values ( Table 23), may also have a more muted effect on the spatial variation of SOC compared to other regions. Similarly, temperature variations, such as the annual means of maximum and minimum daily temperature variation, were very small with CV of 0.05 and 0.12, respectively, and showed weak correlations with SOC. In fact, there has been much debate about the effect of global warming on soil C because it influences both net
33 primary production and C decomposition (Trumbore, 1997; Cox et al., 2000; Thornley and Cannell, 2001; Bellamy et al., 2005; Fierer et al., 2005; Davidson and Janssens, 2006) As Davidson and Janssens (2006) pointed out if C stored belowground is trans ferred to the atmosphere by a warming induced acceleration of its decomposition, a positive feedback to climate change would occur. Conversely, if increases of plant derived C inputs to soils exceed increases in decomposition, the feedback would be negativ e. Moreover, several environmental constraints (e.g., management which affects aggregate formation) obscure the intrinsic temperature sensitivity to their decomposition, causing lower apparent temperature sensitivity, and these constraints themselves may be sensitive to climate. These confounding effects of temperature on SOC forming process may explain the muted correlation between temperature and SOC. It is interesting to find that slope showed significant negative correlation to SOC even though Florida has relatively flat terrain with gentle slopes varying from only 0 to 5 % across most of the state. Guo et al. (2006) also found that level topography had twice the SOC stock as did the other slope classes in the conterminous USA. This was also the case in our study. Almost flat topography can be found in southern Florida featuring highly organic soils in the Everglades wetland area, while greater slopes can be found in central and northern Florida, particularly in the Panhandle area where the soils had low er SOC (compare Figure 23 and Figure 26 ). However, this would not exclude another possible explanation that soil erosion accelerated in high gradient areas may also be a factor inducing SOC losses (Olson, 2010)
34 It is also interesting that surficial geol ogy showed high correlation to SOC which is less documented in the literature. Surficial geology was documented to be a good predictor of vegetation patterns and soil water content (Eberhardt and Latham, 2000) Therefore, a possible explanation could be th at surficial geology exerted impact secondarily on SOC through biotic and soil water processes. The paramount importance of LULC of all the environmental variables indicates the dominant control of LULC on SOC stocks in Florida. Land use and land cover has been well documented to be a key factor controlling SOC variation and a relevant forcing of SOC change (Post and Kwon, 2000; Guo and Gifford, 2002; Murty et al., 2002; Laganire et al., 2010; Poeplau et al., 2011) With a comprehensive metaanalysis of 74 publications, Guo and Gifford (2002) found changes of land use from native ecosystems to agriculture in various climate zones (tropical, subtropical and temperate) reduced SOC stocks, while the reverse processes usually increased SOC. For instance, SOC stocks dropped by 42% under conversion from native forest to crop and 59% from pasture to crop. In contrast, restoration of crop land to pasture, forest plantation and secondary forest increased SOC stocks by 19, 18 and 53% respectively (Guo and Gifford, 2002) Surprisingly, soil organic matter from SSURGO was overpowered by some environmental and edaphic variables in terms of relevance to predict current SOC ( Figure 24 ). This may emphasize the need to produce an updated SOC map for Florida. Because the S SURGO database was created based on historical soil survey data, it might not accurately reflect the current SOC stock which is a relatively dynamic soil property and altered by various factors such as LULC change ( Guo and Gifford,
35 2002), climate change ( S chimel et al., 1994) and wild fires (Hernndez et al., 1997) .Search for parsimonious models Table 24 and Figure 2 5 compare four minimal optimal variable search methods across four modeling techniques in terms of both prediction accuracy and model complex ity. The cross validation and validation results of exhaustive models that used all 210 variables were also included in the table to compare them to parsimonious models. Results show that all relevant models had fairly comparable performance to exhaustive models across all four prediction methods. This indicates that the 43 all relevant variables contained almost equivalent predictive power to that of all 210 variables. Including the 167 irrelevant variables in the exhaustive models result ed in essentially similar model accuracy, but dramatically increased model complexity. Results show that in general the four search methods could further reduce model complexity at some or no cost of model performance compared with all relevant models across all four predic tion methods. The greedy forward method selected only 4 out of the 43 all relevant variables, including LULC, a vegetation type classification (VegTpSysGrp2), AWC25, and soil hydration (SoilHydration) ( Table 24). This resulted in the most parsimonious models with some sacrifice of model performances (especially in validation mode) compared with the all relevant models. The greedy forward method only captured the major SOC factors in Florida vegetation and soil water condition. In contrast, the greedy bac kward technique tended to select a larger set of predictors (29 variables), which improved the model performance of all prediction methods compared with greedy forward. This is because the 29 variables ( Table 24) captured the biotic and soil water factors as well as soil texture, topographic, climatic and parent material
36 factors, which explains the gain of model performance compared to the greedy forward method. The hill climbing method struck a balance between model complexity and performance for all four modeling techniques it greatly reduced the number of predictors to 14 while preserving most of the predicting power to infer SOC. Compared with greedy forward, models built on hill climbing selection were slightly more complex and had superior performances; and compared with greedy backward, models were less complex and had slightly inferior performances. The hill climbing algorithm selected a moderate set of predictors with 14 variables that cover biotic, topographic, and pedogenic factors. However, it hardly involved soil water factor s, except the variable distance to streams (DistStream) which was only an indirect indicator of soil water condition. This may explain the loss of model performance compared with greedy backward models. The simulated annealing selected a similar set of variables compared to the greedy backward method and yielded the most parsimonious models in terms of performance and slightly more parsimonious models than the greedy backward method. It is not surprising that different minimaloptimal variable sets were found by the four methods. The search result of local optimization methods depended on the initial set that the algorithm started searching with. For instance, the greedy forward algorithm started with an empty set and iteratively added one variable only if the one added had improved the model prediction. It stopped at the fourth variable because no single added variable in the remaining all relevant variable set could improve the model performance. In essence, the algorithm became trapped at local optima and missed the opportunity to search for the global optima. However, the simulated annealing algorithm escaped the local optima by accepting some models that had been degraded due to
37 including or excluding additional sets of variables. This result also indicates that one variable alone might not work to improve model prediction, but it might work collectively with other variables (Guyon and El isseeff, 2003) This complementary effect between variables should be given more attention in future research, because there are some important SOC processes that occur due to the interaction among environmental properties (i.e., soil forming factors) In general, each data reduction technique had its advantages. The greedy forward models were the simplest and had the least variable redundancy; however, they fell short in model performance and could only reflect the major SOC processes (biota and soil water ) in Florida. The simulated annealing and greedy backward models had the best performance and could reflect more SOC processes (topography, climate) in addition to the major ones, but they were still relatively complex (with 27 and 29 variables, respectively). The hill climbing models compromised between performance and complexity but failed to account for the effect of soil water gradient which is a major SOC process driver in Florida. The problem of multi collinearity or redundancy among predicting variables has been a major concern for classic multivariate regression models, because it biases the estimation of regression parameters and makes them even impossible (Ns and Mevik, 2001) In this study, multi collinearity was expected in the exhaustive models because of multiple rendering of an environmental property by more than one variable, e.g., multicollinearity among monthly averages of precipitation, among soil clay, silt and sand contents, among AWC variables, and between LULC and reclassified LULC (L ULCRecls). The allrelevant variable searching method (Boruta) greatly reduced the
38 multicollinearity as most of the climatic and topographic variables were filtered out. However, there were still obvious redundant variables in the all relevant variable set, such as land cover and vegetation properties from different sources, and between AWC25 and AWC50. The minimaloptimal variable search methods further eliminated the somewhat redundant variables. For instance, in greedy backward, LULCRecls was excluded; in simulated annealing, soil silt content (SoilSilt) was excluded; and in greedy forward, only four variables with the least redundancy were left (see the footnote of Table 24). This demonstrates that all the data reduction methods were able to reduce the multicollinearity among variables. However, no obvious model performance gain was observed as the variable redundancy decreased, which does not favor the claim made by Guyon and Elisseeff (2003) However, this result provides evidence that treebased reg ression models are not sensitive to the curse of multi collinearity and dimensionality (Death and Fabricius, 2000) Soil Organic Carbon Maps and the Curse of Categorical Variables In addition to providing a basis for inferences regarding SOC processes, SO C models can also be used to produce continuous maps by making spatial predictions. Figure 26 shows three Florida SOC maps (30 30 m pixel resolution) at 020 cm obtained by parsimonious RF models, namely, greedy forward, simulated annealing and continuous all relevant. All the maps show clearly the spatial patterns of SOC corresponding to the respective predictors. Generally, the three maps shared some similar patterns, such as high SOC stock values in the Everglades agricultural area to the south of Lak e Okeechobee and in wetlands interspersed in pine forests in northern Florida, and low SOC stock values in northcentral Florida. The missing values can be found in open water areas and more noticeably in the Everglades in southern Florida in
39 all three maps. The missing predictions in the Everglades are due to the lack of soil data from the SSURGO database for this region. In essence, all the three maps generally reflect the soil moisture pattern in Florida as AWC variables were included in all three models. On the other hand, the map produced by the simulated annealing RF model ( Figure 26 B ) shows more details, especially in central Florida, than the other two maps probably because more variables (27 versus 4 and 19) disclosed more of the SOC variatio n. The map produced by the continuous RF model ( Figure 2 6 C ) appears smoother than the other two maps. This might be because the continuous variables gradually varied across space in contrast to the abrupt changes across categories of categorical variables A subtle difference between the three maps was the missing predictions in maps Fig ure 2 5 ( A and B ) due to the inclusion of categorical variables in the models (as discuss in the introduction). No missing values were found in Figure 26 C other than in th e open water areas and Everglades, because the continuous AWC variables from SSURGO were not available in the Everglades area. As categorical variables were introduced to the model, missing values were produced as shown in Figure 26 A and Figure 26 B In a ddition, there were more missing values in Figure 26 B than in Figure 2 6 A suggesting more categorical variables resulted in more missing predictions. A common remedy to the failed prediction problem is generalizing the classes of a categorical variable i nto broader classes to ensure all the possible classes exist in the calibration samples. Obviously, this method requires extra expert knowledge to guide legitimate reclassification. One could envision the work load would increase dramatically with the incr eased number of categorical variables used in a model. Another
40 disadvantage of this remedy is that, the generalization of a categorical variable could result in loss of predicting power, as indicated by the variable importance of LULCRecls compared with that of LULC in Figure 24 To avoid the disadvantage, another possible option to deal with failed prediction problem would be restricting the number of categorical variables used in a model. Categorical variables showed profound relevance to describe SOC variation as demonstrated in Figure 24 ; 23 of the 28 most important all relevant variables were categorical. Nevertheless, categorical variables were not indispensable for prediction models in terms of model performance. Figure 27 shows how the models developed with all continuous variables compared with the exhaustive models which used a mix of categorical and continuous predictors in validation mode. Interestingly enough, continuous models compared reasonably well to the exhaustive models in terms of mode l performance. The continuous variables represented all STEP AWBH factors and captured the major processes relevant to the SOC cycle in Florida (see Table 21), i.e., soil moisture and vegetation, which may explain their good predictive power. These findin gs suggest continuous variables may act as good surrogates to their categorical counterparts and should be considered as an option to reduce the curse of categorical variables in DSM. The significance of continuous variables for DSM is that continuous vari ables majorly represented by remote sensing products such as MODIS and Landsat can be easily acquired at high spatial and temporal resolution as opposed to the categorical ones such as LULC which requires profound domain knowledge to produce. Furthermore the remote sensing products generally are globally available which can greatly extend the DSM to a global scale.
41 Summary and Conclusions In this study, a new strategy of developing factorial soil models based on the STEPAWBH concept was demonstrated. An exhaustive set of 210 potential environmental variables was compiled to characterize Floridas soil landscape based on state of the art pedological knowledge and technical and computational capabilities. Models were developed to predict SOC (020 cm) us ing the comprehensive predictor set rooted in factorial soil landscape conceptual modeling paradigms under constraints to select the best performing models which are also striving towards parsimony. Our approach is inverse compared with current DSM documented in the literature which commonly select the first few environmental variables based on domain expertise and then develop models utilizing the variables to predict a soil property. The latter DSM studies may be biased by subjective opinions of experts, whereas our approach is more objective in the sense that it uses strategic, machine learning data reduction techniques to identify soil models that optimize predictive capabilities (fitting), accuracy, and parsimony. It has the obvious advantage of imparti ality in selecting predicting variables compared with the way of presumably determining the variables to be used in modeling purely based on domain knowledge. Our approach also allowed objective identification of the relevant variables to explain SOC variation, and hence, to make inferences about major SOC processes at regional scale Florida, U.S. Results confirmed that vegetation and soil water gradient were the driving factors that impart control on SOC variation. The paramount importance of LULC in explaining SOC variation raises the concern of potential impacts of humaninduced LULC changes on future SOC storage. Categorical variables showed great relevance to SOC, but they can be difficult to use as predictors when sampling of nominal classes is spars e, or if there are missing
42 endmember cases. These issues can limit modeling and application of the model to make a prediction map. However, it was possible to reduce the use of them at limited or no cost of model performance, because the continuous variables contained similar information that could capture the factors of SOC processes. These findings open a new view of selecting and utilizing variables for predicting SOC and other soil properties. This research has relevance for largeregion DSM and modeli ng, specifically soil property predictions at continental and global scales where a balance between pedological rooting of soil prediction models, computational efforts, model performance, and parsimony is needed.
43 T able 21. Assembled environmental variables representing STEP AWBH factors (S: soil, T: topography p, E: ecology, P: parent material, A: Atmosphere, W: Water, B: Biota, H, Human). Variable a Relevant variable Abbreviation N a Factor Data type a Source a Original scale/ Resolution (m) Date Soil t axonomic order SoilOrder 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil taxonomic suborder SoilSuborder 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil taxonomic great group SoilGreatGrp 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil particle size class 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil family particle size class 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil family CEC a ctivity class 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil family reaction class SoilReaction 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil family temperature class 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil family moisture subclass 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil muck 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil hydration expansion SoilHydration 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil leaching potential 1 S Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil runoff potential SoilRunoff 1 S/W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Soil albedo SoilAlbedo 1 S Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil sand content (0 20cm) SoilSand 1 S Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil silt content (0 20cm) SoilSilt 1 S Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil clay content (0 20cm) SoilClay 1 S Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil organic matter (0 20cm) SOM 1 S Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil map unit key 1 S Cat. USDA/NRCS/STATSGO2 1:250,000 2009 Elevation (30m, 90m, 1km) 3 T Con. USGS 30/90/1000 1999 Slope (30m, 90m, 1km) 3 T Con. USGS 30/90/1000 1999 Flow accumulation (30m, 90m, 1km) 3 T Con. USGS 30/90/1000 1999 CTI (30m, 90m, 1km) 3 T Con. USGS 30/90/1000 1999 Soil slope SoilSlope 1 T Con. USDA/NRCS/SSURGO 1:24,000 2009 Distance from coast 1 T Con. FMRI 1:40,000 1993 Distance from sinkhole 1 T Con. FGS N/A 2009 Distance from stream DistStream 1 T/W Con. USGS 1:100,000 2002 Distance from open water 1 T/W Con. USGS 1:2,000,000 2006 Easting, northing b 2 T Con. Field sampling N/A 2009 Ecological regions EcoRegion 1 E Cat. USGS 1:250,000 1995 Physiographic province 1 E Cat. USGS 1:2,000,000 2000 Environmental geology 1 P Cat. USGS 1:250,000 2001 Surficial geology SurGeology 1 P Cat. USGS 1:100,000 1998 Surficial geology epoch and period 2 P Cat. USGS 1:100,000 1998
44 Table 21. Continued. Variable a Relevant variable Abbreviation N a Factor Data type a Source a Original scale/ Resolution (m) Date Precipitation c PrecipFebruary, 13 A Con. PRISM climate group 4,000 1971 2000 Temperature c MaxTemp, 26 A Con. PRISM climate group 4,000 1971 2000 Solar radiation c SolarRadAugust, 13 A Con. NOAA/NCDC 32,000 1979 2009 Soil annual minimum water table 1 W Con. USDA/NRCS/SSURGO 1:24,000 2009 Soil available water capacity (0 25cm, 050cm, 0100cm and 0150cm) AWC25,AWC50, AWC100, AWC150 4 W Con. USDA/NRCS/SSURGO 1:24,000 2009 Flooding frequency class 1 W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Ponding frequency class PondFreq 1 W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Drainage class DrainCls 1 W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Hydrologic group HydroGrp 1 W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Runoff class Runoff 1 W Cat. USDA/NRCS/SSURGO 1:24,000 2009 Vegetation type VegType 1 B Cat. LANDFIRE 30 2009 V egetation type system group 1 VegTpSysGrp1 1 B Cat. LANDFIRE 30 2009 Vegetation type system group 2 VegTpSysGrp2 1 B Cat. LANDFIRE 30 2009 Vegetation type order 1 B Cat. LANDFIRE 30 2009 Vegetation type class 1 B Cat. LANDFIRE 30 2009 Vegetation type subclass VegTpSubcls 1 B Cat. LANDFIRE 30 2009 B iophysical settings BiophySet 1 B Cat. LANDFIRE 30 2009 E nvironmental site potential EnvSitePot 1 B Cat. LANDFIRE 30 2009 Vegetation height VegHeight 1 B Cat. LANDFIRE 30 2009 V egetation cover VegCov 1 B Cat. LANDFIRE 30 2009 Forest canopy properties 4 B Con. LANDFIRE 30 2009 Landsat ETM+ band 1, 2, 3, 4, 5 and 7 LsatB1, LsatB7 6 B Con. USGS 30 2004 Landsat ETM+ tasseled cap indices LsatTC1, LsatTC6 6 B Con. USGS 30 2004 Landsat ETM+ principal components LsatPC1, LsatPC6 6 B Con. USGS 30 2004 Monthly MODIS NDVI 12 B Con. MODIS4NACP 500 2005 Monthly MODIS EVI EviApril, 12 B Con. MODIS4NACP 500 2005 Monthly MODIS LAI 12 B Con. MODIS4NACP 500 2005 Monthly MODIS FPAR 12 B Con. MODIS4NACP 500 2005
45 Table 21. Continued. Variable a Relevant variable Abbreviation N a Factor Data type a Source a Original scale/ Resolution (m) Date Annual min, max and mean NDVI 3 B Con. MODIS4NACP 1,000 2005 NDVI greenup, peak and browndown day of year 3 B Con. MODIS4NACP 1,000 2005 NDVI greenup and browndown rate 2 B Con. MODIS4NACP 1,000 2005 NDVI Season length 1 B Con. MODIS4NACP 1,000 2005 NDVI amplitude and base NDVI level 2 B Con. MODIS4NACP 1,000 2005 Large NDVI peak integral d 1 B Con. MODIS4NACP 1,000 2005 Small NDVI peak integra d l SmallNdviPkInt 1 B Con. MODIS4NACP 1,000 2005 Canopy coverage and Imperviousness 2 B Con. MRLC/NLCD 30 2001 Basal area weighted canopy height 1 B Con. WHRC/NBCD 30 2000 Aboveground live dry biomass DryBiomass 1 B Con. WHRC/NBCD 30 2000 Gross and net primary production GPP, NPP 2 B Con. MODIS4NACP 1,000 2005 Land cover class LandCovCls 1 B/H Cat. MRLC/NLCD 30 2001 Cropland data layer Cropland 1 B/H Cat. USDA/NASS 30 2004 Land use and land cover LULC 1 B/H Cat. FFWCC 30 2003 Land use and land cover e LULCRecls 1 B/H Cat. FFWCC 30 2003 a Abbreviations: CEC, Cation Exchange Capacity; CTI, Compound Topographic Index; Landsat ETM+, Enhanced Thematic Mapper; MODIS, ModerateResolution Imaging Spectroradiometer; NDVI, Normalized Difference Vegetation Index; EVI, Enhanced Vegetation Index; LAI, Leaf Area Index; FPAR, Fraction of Photosynthetically Active Radiation; USDA/NRCS/SSURGO, United States Department of Agricult ure/Natural Resources Conservation Service/Soil Survey Geographic Database; STATSGO2, State Soil Geographic Database; USGS, United States Geological Survey; FMRI, Florida Marine Research Institute; FGS FL geological survey ; PRISM, Parameter elevation Regr essions on Independent Slopes Model; NOAA/NCDC, National Oceanic and Atmospheric Administration/National Climatic Data Center; LANDFIRE, LANDscape FIRE and resource management tools project; MODIS4NACP MODIS for North American Carbon Project; MRLC/NLCD, M ultiResolution Land Characteristics Consortium/National Land Cover Data; WHRC /NBCD, Woods Hole Research Center/National Biomass and Carbon Dataset; FFWCC: Florida Fish and Wildlife Conservation Commission. N, number of variables; Cat., Categorical, Con., Continuous. b Easting and northing are the projected coordinates where soil samples were collected.
46 c The 13 precipitation variables are 12 monthly averages over 19712000 and one overall average. The 26 temperature variables are 24 monthly averages of dai ly max and min temperatures plus 2 long term averages (19712000). The 13 solar radiation variables are 12 monthly averages over 19792009 and one long term average. d Small peak integral, given by the area of the region between the fitted function and the average of greenup NDVI and browndown NDVI values, represents the seasonally active vegetation, which may be large for herbaceous vegetation cover and small for evergreen vegetation cover. Large peak integral, given by the area between the fitted functi on and the zero NDVI value bounded by the greenup time and browndown time, represents the total vegetation stand and is a proxy for vegetation production. e Reclassified land use and land cover layer was created by combining relatively small and similar groups.
47 Table 22. Descriptive statistics of soil organic carbon at 020cm in Florida. Number of samples Min Median Mean Max MAD SD CV Skewness Kurtosis kg m 2 Whole set 1080 0.45 3.40 4.98 34.15 2.05 4.38 0.88 2.52 8.48 Calibration 756 0.45 3.40 5.00 29.26 2.16 4.42 0.88 2.41 7.48 Validation 324 1.01 3.48 4.92 34.15 2.00 4.30 0.87 2.80 11.02 Abbreviations: MAD, median absolute deviation; SD, standard deviation; CV, coefficient of variation.
48 Table 23. Descriptive statistics of continuous variables characterizing soil and environmental properties at the 1,080 sampling sites and their correlations to soil organic carbon (SOC) at 020cm in Florida. Abbreviated variable (See Table 1) Mean SD Min Max Skewness Kurtosis CV Correlation with SOC a AWC25 (cm cm 1 ) 2.52 1.80 0.00 10.00 1.90 3.52 0.71 0.39* AWC50 (cm cm 1 ) 4.93 3.55 0.00 20.00 1.93 3.78 0.72 0.40* SOM (% weight) 10.68 22.70 0.00 82.50 2.29 3.46 2.13 0.43* SoilAlbedo 0.24 0.07 0.00 0.50 0.40 0.17 0.31 0.39* SoilSand (% weight) b 90.96 11.63 7.20 99.46 3.70 16.77 0.13 0.29* SoilClay (% weight) b 4.34 6.01 0.14 58.00 5.00 33.49 1.38 0.04 SoilSlit (% weight) b 4.70 6.69 0.10 54.10 3.89 19.81 1.42 0.08* SoilSlope (%) 1.75 1.81 0.00 28.00 5.86 64.06 1.04 0.41* DistStream (m) 1557 2289 0 15608 2.89 9.91 1.47 0.10* LsatB5 (DN) 56 23 10 148 1.05 0.59 0.41 0.23* LsatB7 (DN) 35 17 11 120 1.45 2.22 0.49 0.22* LsatPC1 (DN) 104 33 35 276 1.29 2.13 0.31 0.23* LsatTC1 (DN) 123 26 49 319 1.43 4.85 0.21 0.23* DryBiomass (kg m 2 ) 7.49 2.85 2.00 24.90 2.14 6.95 0.38 0.13* EviApril 0.42 0.10 0.10 0.86 0.27 0.39 0.23 0.07* SmallNdviPkInt 4.49 2.19 0.00 35.68 3.36 38.39 0.49 0.10* GPP (kg C m 2 ) 20.30 13.44 5.08 65.53 2.57 5.99 0.66 0.18* NPP (kg C m 2 ) 10.14 6.79 3.15 32.77 2.54 5.67 0.67 0.13* PrecipFebruary (mm) 86 24 44 144 0.67 0.78 0.28 0.06 PrecipSeptember (mm) 160 22 103 215 0.29 0.44 0.14 0.04 PrecipAvg (mm) 1409 122 1212 1760 0.97 0.20 0.09 0.03 MaxTemp (C) 27.4 1.3 24.4 30.0 0.07 1.15 0.05 0.02 MinTemp (C) 15.2 1.8 11.2 19.6 0.09 1.04 0.12 0.06 SolarRadAugust (W m 2 ) 234.8 6.4 217.6 254.5 0.09 0.02 0.03 0.02 a Spearman correlation coefficient with indicating significant correlation at 0.05 significance level. b Organic soil samples were excluded due to lack of procedures to determine the texture of organic soils.
49 Table 24. Cross validation (on the 70% calibration set) results o f exhaustive, all relevant models and parsimonious models with four minimaloptimal optimization techniques. Bagged regression Boosted regression Random forest Cubist R 2 RMSD (kg m2) R 2 RMSD (kg m2) R 2 RMSD (kg m2) R 2 RMSD (kg m2) Exhaustive (all 210 variables) 0.61 2.70 0.58 2.82 0.63 2.64 0.59 2.79 All relevant (43) a 0.61 2.71 0.57 2.85 0.63 2.64 0.59 2.82 Greedy forward (4) b 0.60 2.75 0.58 2.81 0.57 2.85 0.58 2.87 Greedy backward (29) c 0.60 2.74 0.57 2.86 0.62 2.66 0.57 2.86 Hill climbing (14) d 0.60 2.75 0.55 2.96 0.61 2.68 0.56 2.89 Simulated annealing (27) e 0.60 2.72 0.57 2.85 0.62 2.65 0.57 2.84 Abbreviations: RMSD, root mean squared deviation; RPD, residual prediction deviation. a See Figure 2 4 for the 43 allrelevant variables. b The 4 variables (categorical in bold italics) are LCLU, VegTpSysGrp2, AWC25 and SoilHydration (see Table 1 for abbreviations). c The 29 variables are LULC, LandCovCls, VegType, VegCov, VegHeight, VegTpSysGrp1, VegTpSysGrp2, DryBiomass, Cropland, EviApril, SmallNdviPkInt, LsatPC1, LsatTC1, LsatB5, LsatB7, SoilSand, SoilClay, SoilSilt, Runoff, SoilRunoff, PondFreq, HydroGrp, SoilHydration, SoilReaction, DistStream, SoilSlope, PrecipSeptember, SolarRadAugust and SurGeol ogy. d The 14 variables are LULC, LandCovCls, VegType, VegCov, VegHeight, DryBiomass, Cropland, LsatTC1, SOM, SoilSlope, SoilSand, SoilClay, SoilAlbedo and DistStream. e The 27 variables are LULC, LandCovCls, VegType, VegCov, VegHeight, VegTpSysGrp1, DryBi omass, Cropland, EviApril, SmallNdviPkInt, LsatPC1, LsatTC1, LsatB5, LsatB7, SoilSand, SoilClay, AWC50, SoilRunoff, PondFreq, HydroGrp, SoilHydration, SoilReaction, DistStream, SoilSlope, PrecipSeptember, SolarRadAugust and SurGeology.
50 Figure 21. Topological representation of variable sets. The circle denotes all variables, the dashed line all relevant set and the dotted line the minimal optimal set. The figure was redrawn based on Nilsson (2007)
51 Figure 22. Schematic workflow of the variable selection processes.
52 Figure 23 A total number of 1080 sampling sites and elevation of Florida, USA.
53 Figure 24 Importance (Z score) of the all relevant variables identified by the Boruta algorithm. See Table 1 for variable abbreviations. Capital letters in parentheses indicate the represented STEP AWBH factors. S: soil, T: topography p, E: ecology, P: parent material, A: Atmosphere, W: Water, B: Biota, H, Human. denotes categorical variables. randMax randMean and randMin are the three random probes with maximum, mean and minimum importance respectively.
54 Figure 25 Model v alidation results A) All relevant bagged regression tree (BaRT). B) Allrelevant boosted regression tree (BoRT). C) All relevant random forest (RF). D) All relevant cubist. E) Irrelevant bagged regression tree (BaRT). F) Irrelevant boosted regression tree (BoRT). G) Irrelevant random forest (RF). H) Irrelevant cubist. A B C D E F G H
55 Figure 26 Soil organic carbon (SOC) maps at 020 cm depth produced by parsimonious models A ) Greedy forward random forest model. B ) Simulated annealing random forest model. C ) Random forest model with all of the 19 continuous all relevant variables identified by Boruta algorithm. The inset maps in Panel A and B flag the pixels with no predictions due to the inclusion of categorical predictors in the respective prediction models. The inset map of Panel A for instance, was derived by spatially overlying the map in Panel A on top of Panel C and the pixels with no predictions in Panel A but with predictions in the map Panel C were flagged. All the GIS layers of predictors were in the same extent and free of missing values. A B C
56 Figure 27 Validation results of exhaustive models and models developed with only continuous variables from the exhaustive variable set A) Exhaustive bagged regression tree (BaRT). B) Exhaustive boosted regression tree (BoRT). C) Exhaustive random forest (RF). D) Exhaustive cubist. E) Continuous bagged regression tree (BaRT). F) Continuous boosted regression tree (BoRT). G) Continuous random forest (RF). H) Continuous cubist. A B C D E F G H
57 CHAPTER 3 BAYESIAN GEOST ATISTICAL MODELING OF SOIL ORGANIC CARBON WITH UNCERTAINTY ANALYSIS Overview The soil organic carbon (SOC) pool plays a key role in global carbon cycling due to its enormous size in the global carbon reservoir and it has great potential to regulate the global climate system by influencing the atmospheric carbon dioxide level (Batje s and Sombroek, 1997; Trumbore, 1997; Melillo et al., 2002; Schuur et al., 2009) As a result, numerous studies have been aimed to quantify the SOC pool, characterize the spatial variability and predict the spatial distribution of SOC regionally, nationall y, and globally (Grunwald, 2009) In a recent review of digital mapping of soil carbon, Minasny et al. (2013) found that most of the studies only presented predictions but did not show any uncertainty of the predictions. The lack of uncertainty analysis does not allow for assessing the quality of predictions, and thus, limits the level of confidence in soil carbon predictions with confidence (Minasny et al., 2011) In digital soil mapping (DSM), uncertainty in prediction is inevitable because it can stem from various sources Kennedy and OHagan (2001) decomposed the sources of uncertainty into six classes parameter uncertainty, model inadequacy residual variability, parametric variability, observation error and algorithm uncertainty This classificatio n of uncertainty source has major significance in theory However in practice, it is intractable to separate these uncertainty sources due to lack of data and imperfect knowledge on the true process of interest Thus, most studies that quantify uncertainty essentially confound various uncertainty sources into one, e.g., parameter uncertainty (Phillips and Marks, 1996; Minasny et al., 2011; Sun et al., 2013) This provides a practical way to quantify uncertainty. The past decade has witnessed the
58 advance of theoretical frameworks for DSM (McBratney et al., 2003; Grunwald et al., 2011) Meanwhile, geostatistics that is a widely used spat ial prediction technique in DSM has also been undergoing development from the moment based method to the likelihood based method in model parameter estimation (Matheron, 1963; Cressie, 1993) However, these methods that are rooted in the frequentist approach, also called conventional geostatistical methods by Diggle et al. (1998) can only provide single realization of model pa rameters without revealing the uncertainty in parameter estimation. To address this issue, the Bayesian approach allows one to assess uncertainty with multiple realizations of model parameter by treating parameters as random variables. Further, the predict ion distribution of the target variable can be readily derived by integrating over the parameter space (Diggle et al., 1998) Even in a moderately complex Bayesian model, the analytical forms of posterior distributions are intractable, therefore, Markov Chain Monte Carlo (MCMC) simulation is commonly used to sample from posterior distributions and summarize inference (Bernardo and Smith, 2000) In the spatial statistics context, recent advances in MCMC algorithms to implement the Bayesian inferences have gi ven rise to quite a few successful applications of Bayesian geostatistical models in a wide range of disciplines (e.g., Handcock and Stein, 1993; Zhang, 2002; Reid et al., 2010) In the soil science realm, there also have been a few studies that applied the Bayesian approach to cope with the uncertainty associated with the prediction of soil properties. Recently, Aelion et al. (2009) adopted the Bayesian kriging method to predict heavy metal concentrations in surface soil. Minasny et al. (2011) used a newly developed MCMC simulation algorithm (DiffeRential Evolution Adaptive Metropolis) to summarize the posterior distributions of
59 model parameters and predicted thickness of A horiz on, pH, and SOC and compared model outcomes to conventional geostatistical methods. They found that the prediction accuracies were similar but the Bayesian method is more efficient than conventional methods in terms of optimizing the model parameter estim ation. Most recently, Sun et al. (2013) mapped the uncertainty in evaluating the effect of urbanization on soil pH and particle fractions in Hong Kong based on Bayesian inference. However, Bayesian models and uncertainty assessment have not been applied to model SOC in hydrologic complex soil landscapes where carbon rich and poor soils coincide. these applications are generally limited to small areas because Bayesian computation with MCMC is computationally intensive (Minasny et al., 2011). In fact, some computational steps in implementing Bayesian models can be readily alleviated by parallel computing that can exploit modern computers with multiple cores and multiple computers connected in a network (Wilkinson, 2005; Schmidberger et al., 2009). For example, the computation of independent parallel MCMC chains and the spatial predictions at new locations can be easily parallelized. Soil organic carbon stocks are a composite derived from SOC concentrations and bulk density measurements. In this investigation, S OC stocks were focused as a surrogate metric and do not consider separately the uncertainty derived from bulk density and SOC concentration measurements. It was beyond the scope of this research to investigate any other uncertainties as suggested by Kennedy and OHagan (2001). Therefore, the objectives of this study were to: 1) apply the Bayesian geostatistical approach to characterize spatial variability of SOC at 0 20 cm depth and compare them to conventional geostatistics; 2) utilize the Bayesian geostat istical
60 models for spatial prediction of SOC with uncertainty assessment; 3) validate the Bayesian derived predictions of SOC and credible interval using an independent dataset. Modeling focused on a large region in the southeastern U.S. characterized by a mosaic of carbonrich and poor soils covering a wide spectrum of soil carbon values where expected uncertainties are especially large. Materials and Methods Study Area The study area is the state of Florida, located in the southeastern United States, wit Florida covers approximately 150,000 km2 (United States Census Bureau, 2000) The climate is humid and subtropical in northern and central Florida and is humid and tropical in southern Florida. The mean annual precipitation of Florida is 1,373 mm and the mean annual temperature is 22.3 C (National Climatic Data Center, 2008) Overall, soils in Florida are sandy in texture. Dominant soil orders of Florida are: Spodosols (32 % ), E ntisols (22 % ), Ultisols (19 % ), Alfisols (13 % ), and Histosols (11% ). Most frequent soil subgroups are: Aeric Alaquods, Ultic Alaquods, Lamellic Quartzipsamments, Typic Quartzipsamments, and Arenic Glossaqualfs (Natural Resources Conservation Service, 2009) Land use and land cover consists mainly of wetlands ( 28% ), p inelands (18 % ), u rban and barren lands (1 5 % ) agriculture (9 % ), rangelands (9 % ), and improved pasture (8% ) (Florida Fish and Wildlife Conservation Commission, 2003) Floridas topography is muted with gentle slopes varying from 0 to 5 % in almost the whole State ( Figure 31) (United States Geological Survey, 1999)
61 Soil Organic Carbon Data A total of 1,080 soil samples in the topsoil (020 cm) across Florida ( Figure 41) were collected between 2008 and 2010 based on a random sampling design stratified by the combination of soil suborder and reclassified LULC. The reclassification of LULC is based on the data produced by (Florida Fish and Wildlife Conservation Commission, 2003) Essentially, the orig inal LULC classes with similar soil moisture regime were generalized into a broader class. The purpose of reclassification was to reduce the number of LULC classes and to obtain an affordable number of soil suborder LULC strata (89). Total carbon (TC) was (Shimadzu SSM 5000A). Soil organic carbon (SOC) was derived by subtraction (TC IC). The laboratory SOC measurements in mass units (mg kg1) were converted to stock units (kg m2) using the measured bulk density and soil depth (20 cm). Covariates Chapter 2 identified the relevant SOC covariates from a comprehensive set of 210 variables in predicting SOC The following covariates that have the highest linear correlation coefficients with SOC ( Table 32) were selected: soil water holding capacities at 025 cm depth (AWC), soil organic matter at 020 cm depth (OM), dry surface soil albedo (Albedo), Landsat Principle Component 1 (Lan dsat PC1), compound topographic index (CTI), and leaf area index (LAI). These covariates represent soil moisture, vegetation, and topographic properties which are the key factors that control SOC variations in Florida.
62 Conventional Geostatistical Analysis T wo conventional nonBayesian methods were used to estimate the parameters of the semivariogram model One method was restricted maxim um likelihood (REML) (Lark, 2000) The other one was using weighted least squares to model the CressieHawkins robust estimators as shown in Eq. (3 1 ) (Cressie, 1993) ( ) =1 2 1 ( ) () 1 / 2 ( )40 457 +0 494 ( ) ( 3 1) w here, ( ) is the number of data pairs separated by a distance of ( ) and are the SOC observation at location an d respectively. Then the empirical semivariogram is fitted to the exponential model in Eq. (3 2). ( ) = + 1 ( 3 2) w here, 2 is the nugget variance, 2 is the partial sill and is the range parameter that depicts the rate of decay of spatial correlation. The two methods were applied on both log transformed SOC data and residuals derived from a multiple linear regression model with the six covariates regressing SOC. Bayesian Geostatistical Models For Bayesian geostatistical analysis, t he Gaussian spatial l inear mixed model (Eq. (3 3)) was used ( Diggle et al. 1998) ( ) = ( ) + ( ) + ( ) ( 3 3) w here, the random variable ( ) is the variable of interest (i.e., SOC in this case) at location is an 1 design matrix of covariates representing the fixed effect, is a 1 vector of model coefficient s, represents the spatial random effect which is a Gaussian process with mean of 0, variance of 2 (partial sill) and correlation function
63 ( ; ) In this study, the exponential correlation function was used : ( ; ) = exp ( ) based on an exploratory analysis and is the residual with mean of 0 and variance of 2(nugget variance). For the Bayesian computation in this study, MCMC simulation proceeds via Gibbs sampler. For the prior distributions, it is proposed that follows a multivariate Gaussian distribution, 2 and 2 inversegamma distributions, and uniform distribution (Finley et al., 2007) For each model, three MCMC chains were run for 15,000 iterations. The iterations before the chains mixed (indicating convergence) were burnt in (discarded) to mitigate the effect initial values have on the posterior inference and the remaining samples were retained to derive posterior distributions. Ba yesian Geostatistical Model Validation The models were assessed using independent validation. The whole dataset was split 70/30 into calibration and validation sets. The 70% calibration samples were randomly selected from a two stage stratification includi ng the soil suborder and the reclassified LULC. The Kolmogorov Smirnov test on the SOC distribution of the two sets (i.e., calibration and validation sets) was applied to ensure they have the similar distributions. The error metrics used to compare models were the coefficient of determination (R2), root mean squared deviation (RMSD, Eq. ( 3 4)), = ( ) ( 3 4) a nd residual prediction deviation (RPD, Eq. ( 3 5 )) (Williams, 1987) = / ( ) ( 3 5)
64 w here, is the th mean of posterior prediction distribution is the th observ ation, n is the number of predicted or observed values in the validation set with = 1 2 , and is the standard deviation of the validation set. In addition, the Deviance Information Criterion (DIC) was used to assist measuring the model performance. This index is calculated as the sum of posterior prediction deviance and penalty for model complexity (effective number of parameters). Therefore, lower DIC indicates a better mode l (Spiegelhalter et al., 2002) In this study, all analyses were conducted with R 2.15.2 (R Development Core Team, 2012) The p ackages geoR and spBayes were used for conventional and Bayesian geostatistical modeling (Ribeiro and Diggle, 2001; Finley et al. 2007) Two workstations each of which has a 4core CPU and 8 GB RAM were clustered over a local network with the snow package for Bayesian predictions to produce SOC maps at 1 km resolution (Rossini et al., 2007) It required around 3.5 hours to make predictions with the Bayesian geostatistical model with spatial random effect but no fixed effects and 8 hours for the model with both fixed and spatial random effects. Results and Discussion Summary Statistics of SOC Measurements The SOC stock at 0 20 cm depth showed considerable variation with a range of 33.7 kg m2, mean of 5.0 kg m2, and median of 3.4 kg m2 in the whole dataset ( Table 31). The data were strongly positively skewed with most values lying on the low value end ( Figure 32A ). A high kurtosis value evidenced that the deviations of infrequent high SOC values accounted for a large amount of the SOC variation Both skewness and kurtosis indicate that the distribution of SOC values was highly nonGaussian. After log -
65 transformation, the data approximated a Gaussian distribution as shown in Figure 32 B as indicated by the fact that the kernel density curve matched the Gaussian distribution curve much better. Table 31 shows the characteristic distribution of soil SOC observations of calibration and validation sets. Statistics of these sets resembled those of the whole set indicating that both calibration and validation sets were representative. The Kolmogorov Smirnov test confirmed that the two sets shared a common distribution (p = 0.64). These SO C observations represent the major soil types with different tendencies to accumulate carbon across the State of Florida. According to Vasques et al. (2012a) soils with high carbon content, including organic peats in isolated wetlands and freshwater marsh es (Histosols), are embedded within low carbon, quartz rich upland soils resulting in highly diverse SOC content ranging from 0.03 to 54.6% sometimes at very close proximity. Spatial Autocorrelation of SOC in Florida Figure 33A displays a semivariogram and Monte Carlo envelope obtained from 99 independent random permutations of log transformed SOC (logSOC) values from the calibration set. The MC obtained from 99 random permutations of the logSOC values represents the spatial correlation that occurred by chance. The empirical semivariogram fell out of the envelope at distances from 0 up to approximate 50 km, which indicates that the increasing trend in the empirical semivariogram was statistically significant and confirms the presence of p ositive spatial autocorrelation among calibration samples. Figure 33B shows the similar analysis was applied to the residuals from a linear regression model (R2 = 0.42) that accounts for the fixed effects (Eq. (3 6)).
66 = 1 4690 + 0 1073 + 0 0076 2 0640 0 0016 1 + 0 0091 + 0 0096 ( 3 6) w here, AWC is a vailable w ater capacity at 025 cm depth (cm cm1) ; OM is soil o rganic m atter content at 0 20 cm depth (% weight ) ; LandsatPC1 is Landsat Principle Component 1 (DN) ; CTI is Compound Topographic Index; LAI is Leaf Area Index (% area). The residuals still show significant spatial autocorrelation as indicated by the MC envelope even though the spatial autocorrelation was weaker than that of raw SOC values Accounting for the fixed effects reduced the partial sills and range parameters, implying the six covariates explained part of logSOC variation ( Table 33) Similarly, Vasques et al. (2010) and Rivero et al. (2007) foun d that the partial sills were much lower for semivariogram models of residuals from regression models that had accounted for a marked part of soil variability. The nugget effect was not clearly reduced in the residuals compared to that in the raw logSOC data, meaning there still existed some short range variability that could not be explained by the covariates. The two nonBayesian methods yielded quite different estimates of variance parameters (partial sill and nugget) ( Table 33). The WLS fitting gave gr eater estimates for partial sills and smaller estimates for nugget when compared to REML. The two methods agreed on the estimate of range parameters (29.1 km for raw logSOC values in log kg m2) Vasques et al. (2012b) investigated the spatial variability of soil total carbon at 0100 cm depth in Florida and obtained an estimate of ~ 40 km for the range parameter which is slightly higher than in this study possibly due to the greater soil depth considered in their study In a large watershed nested within Florida, the range parameter for total soil carbon (TC)
67 in the topsoil (030 cm) was found to be 3.9 km and for the soil profile (0100 cm) about 2.2 km (Vasques et al., 2010c) I n another regional scale study of SOC va riability in the Panhandle area of northwestern Florida, SOC exhibited no spatial autocorrelation (pure nugget effect), suggesting prevalence of short range (< 10 km) variability in SOC (Ovalles and Collins, 1988) These varying range parameters for SOC and TC suggest that variability of carbon in soils extends across different scales in the landscape of the southeastern USA This heterogeneity in soil carbon formation/loss is confounded by variability induced by vegetation (biomass, residue), topography, and climate/hydrology (Vasques et al., 2012a) Bayesian Geostatistical Models The summary statistics of the marginal posterior parameter distributions of two models is shown in Table 34. One is the Bayesian geostatistical model with spatial random effect b ut no fixed effects (Model 1) and the other one is with both fixed and spatial random effects (Model 2). Generally the spatial random effect parameters from both models did not exhibit much uncertainty especially for the range parameter and the intercept, as indicated by the ratio of the 2.597.5 interval to posterior mean w hile most of the fixed effect parameters from Model 2, such as LAI, Landsat PC1 and OM, showed considerable uncertainty compared to those of the spatial random effect parameters. Bernar do and Smith (2000) suggested that collinearity among variables may contribute to the relative high uncertainty in the fixed effect model. The posterior means of the fixed effect parameters were quite comparable to the estimates from OLS (Eq. ( 3 6 ) ) and all of OLS estimates were located within 2575 intervals. The Bayesian method agreed with REML instead of WLS in the estimates of spatial random effect parameters. This is probably because the Bayesian method and
68 REML are likelihoodbased, while WLS is moment based method. There have been studies that compared the likelihoodbased and moment based methods in estimating semivariogram (Zimmerman and Zimmerman, 1991; Stein, 1999; Lark, 2000; Minasny and McBratney, 2005) Their conclusions can be summariz ed as that in kriged prediction the method of moment can perform as well as the likelihood method but the latter is better in estimating models for smooth processes (such as in a Gaussian semivariogram model) and reducing the prediction uncertainty. Furthe r, Minasny et al. (2011) compared Bayesian kriging with conventional regression kriging and found the prediction accuracy of pH and SOC from the Bayesian method was substantially better. As a result, the Bayesian models are preferred to be used in predicti ons like the one presented in this study. Model 2 had much lower DIC than Model 1, suggesting that the six covariates markedly improved the model performance of Model 2 over Model 1. This finding is substantiated by the validation results shown in Figure 34. The posterior mean predictions of SOC at the locations of independent validation samples from Model 2 matched better the observed SOC values than did Model 1 in terms of R2, RMSD, and RPD. The regression line of posterior mean SOC vs. observed SOC shif ted off the reference 1:1 line in Model 1 and had a slope less than 1, indicating Model 1 tended to over predict low SOC samples and under predict the high SOC ones. In contrast, in Model 2 these biases were mitigated by incorporating covariates. In fact, Model 2 had comparable performance to the best model developed with treebased methods based on the same SOC dataset in Chapter 2. As opposed to the non linear treebased models, Model 2 in the current study was linear. Therefore, it was simpler and more
69 t ransparent in model structure than the treebased models, which enhances the model interpretability. Although tree based regression models can handle nonlinearity, they did not add substantial benefits compared with the Bayesian geostatistical model used in this study. Scaling up of SOC in Florida The posterior mean SOC prediction maps of 020 cm depth soil in Florida from the two models are shown in Figure 35A and Figure 36 A Generally, the spatial patterns of SOC captured by the two models resembled each other, though the model including the covariates showed much more heterogeneity in SOC. Most of Florida soils at 0 20 cm contained less than 5 kg m2 organic carbon which corresponds to the sample distribution in Figure 32. Strikingly high SOC was identified by both models in the Everglades Agriculture Area (EAA) to the south of Lake Okeechobee (the big blue area located in central southern Florida). Moderate SOC (515 kg m2) was found in southern Florida as well as some interspersed areas in central and northern Florida. It should be noted that t he interpolation map predicted by Model 1 (Figure 35A) drastically misrepresents the reality in the area below the EAA since SOC would be very high all the way to Florida Bay s ( swamps and wetlands). The SOC i s moderate only because there were no samples collected south of the EAA (Figure 3 1) While in Figure 3 6A, the unsampled area was excluded by Model 2 because the SSURGO data were missing in this area, which although was an artifact, it d id not misrepre sent the reality The Panhandle in northern Florida generally stored low level s of carbon in 020 cm soils, except in the southern Lowlands of the Suwannee River Basin where the Suwannee river drains into the Gulf of Mexico. The low SOC content of most sur face soil layers in Florida can be attributed to the unique soil and climatic condition of Florida. The parent
70 material of Florida soils consists mainly of sandy marine deposits with rapid to moderate permeability With the relatively high precipitation (l arger than 1,300 mm mean annual precipitation), decomposed litter in surface layers can migrate vertically as a result of leaching (Jobbgy and Jackson, 2000) Vasques et al. (2010) modeled and upscaled soil TC at multiple depths in the Santa Fe River Watershed in Florida and found that significant amounts of soil carbon w ere stored in the deeper layers (3060 cm, 60120 cm, and 120200 cm) as well as in the surface layer (030 cm). Maps produced with the two models also display differences. Compared to Model 1, Model 2 reflected the spatial patterns of covariates. On the other hand, it also propagated the artifacts that existed in the covariate data T he large patch of no predictions occurred in southern Florida due to the missing values in the SSURGO database in the Everglades area. This wetland area stores large amounts of soil carbon, but SOC could not be predicted by Model 2. A major advantage of Bayesian spatial models is that it allows one to access the prediction uncertainty as a result of the uncertainty in model parameter estimates. Figure 35B and Figure 36B show the posterior standard deviation of SOC predictions. It is obvious that the standard deviation was large where the SOC mean prediction was large. The 2.5 and 97.5 percentile of predictions are shown in Figure 35 C D, and Figure 36 C D The percentile maps constitute a credible interval for the posterior prediction distribution at each pixel, which m eans at a certain pixel there is 95% probability that the true value may fall between the 2.5 and 97.5 percentile. In other words, given a specific SOC value, one may assess the probability of each pixel to be
71 higher or lower that value from the posterior distributions. This information can be very useful in soil quality or risk assessment (Carr et al., 2007; Aelion et al., 2009) To verify how credible the interval can be, the frequency that the intervals encompassed the observed SOC in the validation set was assessed in Table 35. The results show that there were 58.7% (Model 1) and 55.2% (Model 2) of samples with observed SOC values falling in the 50% probability prediction intervals, while for the 95 % probability intervals there were 95.7% (Model 1) and 94.1% (Model 2), respectively. It indicates that the credible interval can very well predict the SOC given a significance level. Aelion et al. (2009) validated the Bayesian kriging of heavy metal concentrations in surface soils using both le ave oneout cross validation (LOOCV) and independent validation with additional field samples. Their results also suggested Bayesian kriging gave good predictions of arsenic, chromium, lead, and mercury concentrations in terms of LOOCV and in independent v alidation more than half of the additional samples had measured values within the 5 95 percentile of the prediction distribution. The relatively low accuracy in validation compared to that in our study was attributed to the bad predictions of low values th at were below the analytical minimum detection limit. The large inherent variability usually involved in trace metal measurement, however, did not plague our predictions of SOC. To further illustrate the posterior prediction distributions, five samples wer e selected from the validation set representing the minimum, 25, 50, 75 percentiles, and maximum SOC values ( Figure 37 ) The posterior prediction distributions of SOC were positively skewed resembling the sample distributions of SOC. Furthermore Model 2 generally had less scattered distributions than Model 1 for each selected sample. This
72 indicates that Model 2, which incorporated covariates, had narrower prediction intervals given a significance level, and hence, yielded more precise prediction than Model 1. Summary and Conclusions In this study, the Bayesian geostatistical method was successfully applied in modeling and predicting SOC in a large region Florida, USA. The Bayesian geostatistical model incorporating six covariates representing soil moisture, vegetation, and topography had better prediction accuracy than the Bayesian model with only spatial random effect in terms of the independent validation. In addition, the former had narrower prediction intervals indicating it has better prediction prec ision. Generally, the Bayesian prediction intervals were narrow where the posterior mean SOC predictions were low (e.g., the Panhandle area in northern Florida) and relatively wide where the SOC predictions were high (e.g., Everglades Agriculture Area in t he southern Florida). The validation of prediction interval with independent validation data confirmed the effectiveness of uncertainty assessment from Bayesian inference. For the uncertainty in the model parameter estimation different parameters varied. G enerally, the spatial random effect parameters from both models did not exhibit much uncertainty, while most of the fixed effect parameters from the Bayesian model with covariates, such as LAI, Landsat PC1 and OM, showed considerable uncertainty. One possi ble explanation can be the correlation among these covariates (collinearity). Our results also confirmed the usefulness of conventional geostatistical methods in model parameter estimation, especially for REML. Our findings are im portant to quantify the SOC in southeastern USA where the soils are enriched by carbon especially in the southern coastal plain. The uncertainty assessments of SOC add value when compared to crisp SOC prediction models
73 because they are based on viewing the soil continuum as r ealizations, rather than one discrete entity. To assess the adaptation and mitigation potential of soils to external threats, such as global climate or land use change, can be much better accomplished using an assessment framework that incorporates uncertainty in the prediction process. This study focused to address model uncertainty, yet other types of uncertainties, such as parameter and positional uncertainties, may also play a role. However, their assessment is beyond the scope of this study and could be addressed in future investigations.
74 Table 31. Descriptive statistics of soil organic carbon observations at 020cm in Florida. Number of samples Min Median Mean Max MAD SD CV Skewness Kurtosis kg m 2 Whole set 1080 0.45 3.40 4.98 34.15 2.05 4.38 0.88 2.52 8.48 Calibration 756 0.45 3.40 5.00 29.26 2.16 4.42 0.88 2.41 7.48 Validation 324 1.01 3.48 4.92 34.15 2.00 4.30 0.87 2.80 11.02 Abbreviations: MAD, median absolute deviation; SD, standard deviation; CV, coefficient of variation.
75 Table 32. Descriptive statistics of covariates used to model the global spatial trend at the 1,080 sampling sites and their correlations to soil organic carbon (SOC) at 0 20cm in Florida. Abbreviated variable Mean SD Min Max Correlation with SOC a Sources AWC (cm cm 1 ) 2.52 1.80 0.00 10.00 0.39*** USDA/NRCS/SSURGO SOM (% weight) 10.68 22.70 0.00 82.50 0.43*** USDA/NRCS/SSURGO Albedo 0.24 0.07 0.00 0.50 0.39*** USDA/NRCS/SSURGO LandsatPC1 (DN) 104 33 35 276 0.23*** USGS/Landsat ETM+ CTI 14.75 5.74 5.40 30.20 0.28*** USGS/NED LAI (% area) 407 520 40 2550 0.10* MODIS4NACP a Spearman correlation coefficient with and *** indicating significant correlation at 0.05 and 0.001 significance level. Abbreviations: AWC, a vailable w ater capacity at 025 cm depth; OM, soil o rganic m atter content at 020 cm depth; Albedo, dry surface soil a lbedo; LandsatPC1, Landsat Principle Component 1 ( date : 2003) ; CTI, Compound Topographic Index; LAI, Leaf Area Index (date: 1999) ; USDA/NRCS/SSURGO, United States Department of Agriculture/Natural Resources Conservation Service/Soil Survey Geographic Database (date: 2009) ; USGS/Landsat ETM+, United States Geological Survey/Landsat Enhanced Thematic Mapper; USGS/NED, United States Geological Survey/National Elevation Dataset; MODIS4NACP, ModerateResolution Imaging Spectroradiometer for North American Carbon Project (date: 2005)
76 Table 33. Semivariogram model parameters of conventional geostatistical models using log transformed soil organic carbon (logSOC) of 756 calibration samples. Exponential model was used to fit empirical semivariograms. Data Method 2 (log(kg m 2 )) 2 2 (l og(kg m 2 )) 2 Raw logSOC WLS 1.33 0.443 0.103 29.1 REML 1.30 0.231 0.285 29.1 Residuals a WLS 0 0.190 0.120 20.0 REML 0.002 0.0835 0.227 20.1 Abbreviations: WLS weighted least square; REML, restricted maximum likelihood. a Ordinary least squares residuals of calibration logSOC after fitting linear global trend model (R2 = 0.42) : = 1 4690 + 0 1073 + 0 0076 2 0640 0 0016 1 + 0 0091 + 0 0096 where, AWC is a vailable w ater capacity at 025 cm depth (cm cm1) ; OM is soil o rganic m atter content at 0 20 cm depth (% weight) ; Albedo is dry surface soil albedo; LandsatPC1 is Landsat Principle Component 1 (DN) ; CTI is Compound Topographic Index; LAI is Leaf Area Index (% area).
77 Table 34. Model parameters of Bayesian geostatistical models using log transformed soil organic carbon (logSOC) of 756 calibration samples. The exponential model was used to fit empirical semivariograms. Posterior mean Posterior percentiles DIC 2.5 25 50 75 97.5 Model 1: Bayesian geostatistical model with spatial random effect but no fixed effects 0.25 1.30 1.15 1.25 1.30 1.36 1.47 2 0.228 0.154 0.198 0.225 0.253 0.318 2 0.281 0.244 0.271 0.286 0.302 0.333 30.0 29.8 29.9 30.0 30.1 30.4 Model 2: Bayesian geostatistical model with both fixed and spatial random effects 222.58 1.43 1.14 1.33 1.43 1.53 1.70 AWC (10 1 ) 1.27 0.823 1.11 1.26 1.43 1.75 OM (10 3 ) 3.76 0.510 2.21 3.82 5.29 7.79 Albedo 2.12 2.74 2.33 2.14 1.91 1.49 Landsat PC1 (10 3 ) 1.50 2.71 1.93 1.52 1.07 0.268 CTI (10 2 ) 1.16 0.318 0.867 1.15 1.46 2.06 LAI (10 3 ) 8.34 0.656 5.89 8.63 10.8 15.4 2 0.0904 0.0528 0.0749 0.0889 0.105 0.136 2 0.223 0.190 0.211 0.222 0.234 0.260 20.5 20.2 20.4 20.5 20.6 20.8 22, nugget, range parameter which depicts the rate of decay of spatial correlation. Abbreviations: AWC, a vailable w ater capacity at 025 cm depth (cm cm1) ; OM, soil o rganic m atter content at 0 20 cm depth (% weight) ; Albedo, dry surface soil a lbedo; LandsatPC1, Landsat Principle Component 1 (DN); CTI, Compound Topographic Index; LAI, Leaf Area Index (% area); DIC, Deviance Information Criterion.
78 Table 35. Frequencies of validation samples with observed soil organic carbon (SOC) values falling in 2575 and 2.597.5 percentile of posterior prediction distributions. Model a Number (percentage % ) of validation SOC observations falling in 25 75 prediction percentile Number (percentage % ) of validation SOC observations falling in 2.5 97.5 prediction percentile Model 1 182 (58.7) 310 (95.7) Model 2 179 (55.2) 305 (94.1) a Model 1: Bayesian geostatistical model with spatial random effect but no fixed effects (covariates); Model 2: Bayesian geostatistical model with both fixed and spatial random effects.
79 Figure 31. A total number of 1080 sampling sites (70% calibration samples in green and 30% validation samples in red) and elevation in Florida.
80 Figure 32. Histograms of soil organi c carbon (SOC) of 1080 samples. A) Original values. B) Log transformed values. The red solid curve is the estimated kernel density and the dark blue dashed curve denotes the normal distribution with mean and variance set to those of samples A B
81 Figure 33. Semivariograms of log transformed soil organic carbon (logSOC) of 756 calibration samples A ) raw logSOC B ) ord inary least squares residuals of logSOC after fi tting linear global trend model : = 1 4690 + 0 1073 + 0 0076 2 0640 0 0016 1 + 0 0091 + 0 0096 where, AWC, a vailable w ater capacity at 025 cm depth; OM, soil o rganic m atter content at 0 20 cm depth; Albedo, dry surface soil a lbedo; LandsatPC1, Landsat Principle Component 1; CTI, Compound Topographic Index; LAI, Leaf Area Index. The dotted curves are the Monte Carlo envelopes f or the semivariograms. The dark blue dashed and black solid curves are estimated exponential semivariogram models with weighted least square and restricted maxim um likelihood, respectively. A B
82 Figure 34. Posterior mean predictions of soil organic carbon (SOC) vs. observed SOC of the validation set A ) Prediction from Model 1: Bayesian geostatistical model with spatial random effect but no fixed effects (covariates). B ) P rediction from Model 2: Bayesian geostatistical model with both fixed and sp atial random effect s. A B
83 Figure 35. S oil organic carbon (SOC) prediction at 020 cm depth in Florida from Model 1: Bayesian geostatistical model with spatial random effect but no fixed effects (covariates ). A) Posterior mean. B) S tandard deviation. C) 2.5 percentile D) 97.5 percentile. A B C D
84 Figure 36. S oil organic carbon (SOC) prediction at 020 cm depth in Florida from Model 2: Bayesian geostatistical model with both fixed and spatial random effect s. A) Posterior mean B) S tandard deviation. C) 2.5 percentile. D) 97.5 percentile A B C D
85 Figure 37. Histograms of soil organic carbon (SOC) predictions from two models at five select validation sample locations representing the minimum, 25, 50, 75 percentile and maximum SOC values of validation SOC distribution A) Min. Model 1. B) 25 percentile Model 1. C) 50 percentile Model 1. D) 75 percentile Model 1. E) Max. Model 1. F) Min. Model 2. G) 25 percentile Model 2. H) 50 percentile Model 2. I) 75 percentile Model 2. J) Max. Model 2. Note: Model 1 is Bayesian geosta tistical model with spatial random effect but no fixed effects (covariates); Model 2 is Bayesian geostatistical model with both fixed and spatial random effect s. Red solid vertical line is the observed SOC and dark blue dashed line is the posterior mean of SOC predictions. A B C D F G H I E J
86 CHAPTER 4 UNRAVELING FINE SCALE SPATIAL VARIABILITY OF SOIL ORGANIC CARBON Overview In the context of sustainable agriculture and global climate change the interest in soil organic carbon (SOC) has been profound (Batjes and Sombroek, 1997; Bellamy et al., 2005) Globally, abundant efforts have been dedicated to quantifying the spatial distribution and achiev ing accurate assessment s of SOC at finer resolution (Grunwald, 2009; Minasny et al., 2013) Despite these efforts the a vailable soil spatial data around the globe are at rather coarse scale (Grunwald et al, 2011) For example, in the USA the major two soil databases providing readily available soil data are State Soil Geographic Database (STATSGO2) and Soil Survey Geographic Database (SSURGO via Soil Data Mart ) at scale of 1: 250,000 and 1:24,000, respectively. These soil databases cannot reflect SOC variation at fine scale (i.e., within a few hundred meters) and provide little spatial information for precision agricultur e and land management. There have been finescale studies of SOC variability around the world, however they are difficult to compare because of differences in sampling design, protocols, density of observations, sample support, carbon measurement techniques, soil depth, and more. Even though these studies were conducted in different climatic zones, land use and land cover (LULC), soil types, topography, and intensity of human interferences, the maximum spatial autocorrelation ranges rarely exceeded 500 m (T rangmar et al., 1987; Fromm et al., 1993; Cambardella et al., 1994; McBratney and Pringle, 1999; Terra et al., 2004; Cerri et al., 2004; Rossi et al., 2009) These studies provide knowledge about the finescale variability of SOC under specific environment al and geographic settings. However, they lack the ability to generalize because
87 differences in the finescale variation of SOC may be due to methodology, SOC variation, or scaledependent relationships between SOC and other prominent ecosystem characteris tics such as topography, climate, or net ecosystem productivity. Florida soils hold a large amount of carbon and have great potential to accumulate more carbon. According to the estimates of SOC based on the State Soil Geographic database (STATSGO) done by Vasques et al. (2010) Florida has the highest SOC stock value per unit area among the conterminous U.S. states with minimum, median, and maximum values of 12.4, 35.3, and 64.0 kg m2, respectively. In addition, Florida enjoys a wide variety of LULC types unique to the southeastern USA and SOC stocks can vary dramatically among ecosystems (Vasques et al., 2012; Xiong et al., 2012). However, little effort has been dedicated to understan d ing the spatial variability of SOC across different LULC in this region. Various sampling designs to investigate scale dependent variation of SOC that minimizes sampling efforts have been suggested. Youden and Mehlich (1937) introduced a spatially nested sampling design to study soil spatial variation over multiple magnitudes of scales cost effectively compared with systematic or random sampling techniques. Webster et al. (2006) formulated the theory of this technique. They successfully applied both balanc ed and unbalanced spatial nested sampling schemes to study various soil properties and reported the unbalanced design had comparable performance to the balanced one with much fewer samples and better distribution of degrees of freedom across scales. Lark ( 2011) then explored the scope for optimization of the unbalanced spatially nested design using simulated annealing and claimed that
88 the optimization can theoretically yield better estimation of variance components (i.e., smaller variance of estimation) giv en a fixed number of samples and scales. To fill the research gap in finescale spatial variability of SOC across different LULC in the southeastern US A the major objective of this study was to investigate the fine scale variability of SOC under five comm on LULC types in Florida with a uniform spatially nested sampling scheme. The specific objectives were to: (i) investigate and compare the spatial structures of SOC in five LULC types at fine scale (< 500 m); (ii) identify the scaledependent variability of SOC (2, 7, 22, 67, 200 m), and (iii) assess the effect of LULC on SOC with and without considering the spatial autocorrelation of SOC. Materials and Methods Description of Study Area and Sites The study was conducted in Florida, USA, which is located in the southeastern 2. The climate of Northern and Central Florida is humid, subtropical and Southern Florida has tr opical climate. The mean annual precipitation of Florida is 1,373 mm and the mean annual temperature is 22.3 C (National Climatic Data Center, 2008) Overall, soils in Florida are sandy in texture. Dominant soil orders of Florida are: Spodosols (29%), Ent isols (20%), Ultisols (17%), Alfisols (12%) and Histosols (10%) as shown in Figure 41. Most frequent soil classes are: Aeric Alaquods, Ultic Alaquods, Lamellic Quartzipsamments, Typic Quartzipsamments, and Arenic Glossaqualfs (Natural Resources Conservati on Service, 2009) Floridas topography is muted with gentle slopes varying from 0 to 5% in almost the whole state (United States Geological Survey, 1999)
89 To investigate the fine scale variation of SOC under different LULC, five of the most prevalent and contrasting LULC types in Florida were selected for sampling, namely Pineland (accounting for 15.5% of the total area of Florida), Improved Pasture (7.0 % ), Dry Prairie1 (2.9 % ), Hardwood Hammock and Forest (2.3% ) and Sandhill (1.8 % ) (Florida Fish and Wildl ife Conservation Commission, 2003) In total, the selected LULC types account for approximately 36% of the Florida land area (excluding open water areas). Spatially, the five sites are spread out across Florida as shown in Figure 4 1. A summary of soil properties and environmental characteristics of the five sites can be found in Table 41. Optimized Unbalanced Spatially Nested Sampling At each of the five LULC sites, the exact same sampling scheme and protocol were used. The sampling followed an optimized unbalanced spatially nested design. The theoretical foundation of the spatially nested design can be found in the paper by Webster et al. (2006) and technical details how to optimize it in Lark (2011). Five levels of hierarchy were studied ( Table 42) and a detailed account of the design is described as follows. Nine main stations gridded at 200 m intervals constitute the highest level of the hierarchy ( Figure 42A ). At each main station, two additional sampling points (subnode) are chosen 67 m away and for m an equilateral triangle with the main station sampling point. The equilateral triangle is placed in a random direction. Then in similar fashion the 3rd, 4th and 5th hierarchical samples are sampled 22 m, 7 m and 2 m away from their parent nodes, respecti vely. The approximate 3fold hierarchy has proven 1 The definition of Dry Prairie adopted in this study is a large native grass and shrubland occurring on very flat terrain interspersed with scattered cypress domes and strands, bayheads, isolated freshwater marshes, and hardwood hamm ocks, according to Florida Fish and Wi ldlife Conservation Commission ( 2003)
90 effective in capturing as much variation as possible and avoiding overlaps among different branches (Webster et al., 2006) Figure 42B shows the sample deployment of one main station. A number of 108 (9 12) samples were collected for each LCLU, resulting in a total number of 540 samples for the five sites. Soil Sampling and Laboratory Analysis Predetermined sample locations were identified with a submeter accuracy global positioning system (Trimble GeoXT 2005 series). Soil samples were collected with a 5.7 cm diameter steel core at 020 cm depth. Soil samples were put into sample bags and transported to the laboratory within coolers. Then samples were dried in a drying from milled. Total 5000A). Considering that the topsoil in Flori da contains little inorganic carbon, a pedotransfer function developed for Florida soils was used to derive SOC (Vasques et al., 2009) The pedotransfer function is: SOC (% mass) = 0.9977 TC (% mass) 0.0389 (n = 1,080, R2 = 0.999). Laboratory SOC measurements in mass units (mg kg1) were converted to stock units (kg m2) using the measured bulk density and soil depth (20 cm). Hierarchical Analysis of Variance with Restricted Maximum Likelihood (REML) The statistical model for the optimized unbalanced spat ially nested design in this study is expressed in Eq. (41). = + + + + + + (4 1) w here, is the SOC value of the n th sampling point in the m th class at Level 5, in the l th class at Level 4, the i th class at Level 1, is the overall mean, is the
91 difference between mean of the i th main station and the overall mean and is an identically and independently distributed random variable with mean zero and variance is the difference between the mean of the j th class within the i th main station and the mean of the i th main station, and is an identically and independently distributed random variable with mean zero and variance and so on. is the error term. To estimate the variance components, , from the data with an unbalanced design restricted maximum likelihood (REML) was used (Webster et al., 2006) Geostatistical Analysis The Matrn theoretical semivariogram model (Eq. (4 2)) was used because it generalizes a variety of semivariogram models that has a smoothness parameter which can be used to infer the short range variation (Minasny and McBratney, 2005) ( ) = + ( ) (4 2) w here, 2 is the nugget, 2 is the partial sill, is the lag distance, is a modified Bessel function of the second kind of order is the gamma function and is the range parameter and is the smoothness parameter. Two methods were used to estimate the parameters. The first method was using REML a nd the second one was using weighted least squares to model the Cressie Hawkins robust estimators which is shown in Eq. ( 4 3 ) (Cressie, 1993) ( ) = ( ) ( ) / ( )0 457 + ( ) (4 3) w here, ( ) is the number of data pairs separated by a distance of ( ) and are the SOC observation at location and respectively.
92 Model with Ordinary Least Square (OLS) versus Generalized Least Square (GLS) A general linear model (Eq. (4 4) ) was used to investigate the effect of LULC on SOC. = + (4 4) w here, is an 1 vector of observed SOC values, is an 1 design matrix of LULC, is the model coefficient, and is an 1 vector of errors. In OLS, the errors are assumed to be identically and independently distributed random variables of mean zero and variance i.e. ~ ( 0 ) where is the identical matrix. In GLS, the correlation among errors are taken into account, i.e. ~ ( 0 ) where is the variancecovariance matrix of error variables. REML was used to estimate (Lark and Cullis, 2004) Results Descriptive Statistics of Soil Organic Carbon The mean SOC varied substantially across different LULC types. Dry Prairie, Hardwood Hammock and Forest, and Improved Pasture had relatively high SOC stocks, while Sandhill and Pineland soil had relatively low SOC ( Table 43). The standard deviation of SOC also showed some differences across the five LULC. Hardwood Hammock and For est had the largest overall variation of SOC, followed by Dry Prairie and Improved Pasture, while Sandhill and Pineland showed modest variation. The coefficient of variation reflected that the overall variation generally increased with the SOC level, which results in a similar level of relative variation of SOC across the five sites. The SOC data in the five LULC classes were positively skewed, especially in Pineland; hence they were logarithmically transformed to approximate symmetric and normal distributi on for the statistic al and geostatistical analysis.
93 Hierarchical Analysis of Variance of Soil Organic Carbon Variations of SOC at five sites were scaledependent in the five LULC types ( Figure 43). In Sandhill, the largest variance components were found at relatively fine scales (7 and 2 m), whereas variations at 22, 67 and > 200 m were negligible. Hardwood Hammock and Forest had the most variation at the coarsest scale (>200 m) and the finest scale (2 m) also accounted for a large portion of the overall S OC variation. The variation at 7 m was also noticeable compared with the variations at 22 m and 67 m which were almost muted. In Pineland, the finest scale variation was markedly higher than the other scales indicating most variation in Pineland occurred w ithin 2 m. In Improved Pasture, the change of variation across scales was less prominent than those in other LULC types the relatively large variation originated from the coarsest scales (67 and >200 m) followed by the finest scale variation. Similar to Sandhill and Pineland, Dry Prairie had the largest variance components at 2 and 7 m, indicating that fine scale variation dominated the variation under this LULC type. Semivariogram Analysis of Soil Organic Carbon To complement the hierarchical analysis of variance, semivariogram analysis was conducted and the results are shown in Table 44 and Figure 4 4. Generally, the consistency of parameters estimated by WLS and REML varied across parameters and LULC types. The two methods agreed more on nugget and par tial sill than range and smoothness. Nugget and partial sill estimates in Sandhill and Dry Prairie given by WLS and REML were nearly identical. However, smoothness and range parameters in Hardwood Hammock and Forest estimated by the two methods diverged markedly. These inconsistencies resulted from the fact that WLS fits only to the calculated semivariogram estimators (the dots in Figure 44) and the choice of binning and cut off
94 distance can make an extraordinary difference on the parameter estimation. Min asny and McBratney (2005) suggested REML be used preferentially over WLS in estimating semivariogram parameters (especially the smoothness v) because REML fits the local spatial process satisfactorily. Therefore, the interpretation of results in this chapt er will be primarily based on the estimates by REML otherwise stated. The sampled SOC of five sites showed distinct spatial structures. Improved Pasture showed strong spatial dependence (0.85), followed by Sandhill (0.37), Hardwood Hammock and Forest (0.37), and Dry Prairie (0.35) with moderate spatial dependence, whereas Pineland showed no clear spatial structure (Table 44) The effective spatial autocorrelation ranges of the five sites varied dramatically from site to site. Large ranges were observed in Hardwood Hammock and Forest ( effective range: 391.2 m) and Improved Pasture (348.0 m) compared with small ranges in Sandhill (11.5 m) and Dry Prairie (23.4 m). These ranges were smaller than the sampling extent ( ~ 5 00 m) indicating stationarity in SOC within each of the fields.The smoothness parameter v is an important factor that not only measures the smoothness of the SOC processes but is also helpful in inferring short range variation. The nugget variance is the combination of measurement errors and short range spatial variation that occurs within the shortest sampling distance, i.e., 2 m in this study. The smoother the processes to accrete or deplete soil carbon, the less likely there exists short range variation. These processes of gaining/losing SOC und er Hardwood Hammock and Forest were quite smooth ( v = 99.0) approximating a Gaussian model (Fig ure 4 4) This implies that the short range variation was less likely. In contrast, Dry Prairie and Improved Pasture had relative
95 rough processes of aggregating SOC stocks suggesting that short range variation prevailed under these two LULC types. Accounting for Spatial Correlation in Linear Model Because the samples are spatially correlated, it is necessary to account for the spatial correlation among samples when analyzing the fixed effect of LULC on SOC stock and estimating the mean SOC stock in the LULC. Table 45 compares the results of two ANOVA models one using OLS that renders samples as independent of each other and the other one using GLS that accounts for the autocorrelation. It shows that accounting for the autocorrelation significantly improved the model (p value < 0.0001), which justifies the two degrees of freedom used to estimate the two autocorrelation parameters (range and nugget). Estimates of S OC mean for each LULC from OLS are the arithmetic mean of corresponding samples, while GLS gave slightly larger estimates except for Improved Pasture. More importantly, the results of pair wise comparisons from the two models are different. OLS tended to r eject the null hypothesis based on underestimated variance of SOC in each LULC, therefore ordered the mean SOC of the five LULC as: Dry Prairie ~ Hardwood Hammock and Forest > Improved Pasture > Pineland > Sandhill. In contrast, GLS detected no difference between Pineland and Sandhill, Pineland and Improved Pasture, and Dry Prairie and Improved Pasture. These results demonstrate that accounting for the spatial correlation among samples can yield more reliable models. Discussion To the best of our knowledge, this is the first study to compare the fine scale variability of SOC of prevalent LULC types in the southeastern landscape of the USA under a unified framework (standardized protocol sampling design and models ). This
96 study de monstrates that the spatial heterogeneity of SOC in different LULC showed both s imilarities and differences which may be attributed to both natural process es and anthropogenic interference. The heterogeneity in SOC is known to be primarily controlled by so il moisture, vegetation, climate, topography (Guo et al., 2006; Liu et al., 2006; Vasques et al., 2010; Xiong et al., 2012) Human activities such as land use, have been recognized as an important soil forming factor that exerts great influence on SOC variation (Guo and Gifford, 2002; Murty et al., 2002; Maia et al., 2010) The controls of these factors on SOC variability are known to operate at various scales corresponding to the scales of variability of the factors per se (Grunwald et al., 2011; Vasques et al., 2012b) F or example, annual precipitation and temperature show variation at regional or coarser scale rather than at fine scale (the scale interested in this study) so it is argued that climate is a trivial factor affecting the fine scale variability of SOC in this study The t opography in Florida is generally flat. However, the microtopography can dictate soil moistur e condition (e.g., toposequence) and results in dramatically different SOC accumulation (Kamara et al., 2007; Myers et al., 2011) Therefore, the discussion of the difference of SOC spatial variability in different LULC in this chapter will be centered on vegetation, soil moisture regime which is reflected by vegetation and controlled by micro topography and land use management. In Sandhill, the vegetation is dominated by an overstory of scattered longleaf pines ( Pinus palustris ) (~10 m spacing), along with an understory of oak trees ( Quercus) The space between trees was filled by various herbs and grasses. This vegetation pattern coinciding at about 10 m scale explains the largest variance
97 components of SOC that was found at 7 and 2 m scale ( Figure 42) and the short range of autocorrelation ( Figure 43). There were limited land management practices that had occurred at the Sandhill site, suggesting that the native spatial structure of SOC that was formed by natural ecosystem processes was not disturbed. This indicates that the vegetation was the only major controller of the spatial heterogeneity of SOC. In Pineland, the vegetation was exclusively longleaf pine trees (~2 m spacing) that were planted in form of commercial stands converted from nonnative slas h pines ( Pinus elliottii) approximately 5 years ago. The commercial pine plantation operation had altered substantially the native spatial structure of SOC under pine forest, which is evidenced by the lack of spatial correlation at the Pineland site ( Figur e 43). The largest variance component at 2 m indicates that the restored longleaf pine exerted influence on the spatial heterogeneity of SOC within 2 m. In Dry Prairie, the native vegetation was dominated by saw palmettos ( Serenoa repens) (~1 m spacing) s parsely intermixed with various grasses sedges herbs and shrubs (e.g., Leucothoe fontanesiana, Aristida beyrichiana, Axonopus fissifolius, Vaccinium formosum ) This site had been maintaining natural condition and was subject to lightning induced fires and seasonal flooding that controlled the ever invading woody plants. Therefore, the short range of spatial correlation and the largest variance components found at the finest scales (2 and 7 m) were primarily in accordance to the spatial distribution pattern of saw palmettos. In Hardwood Hammock and Forest the vegetation was more diverse in species than the other LULC and the spatial distribution of species coincided with the soil moisture regimes hardwoods predominant ly found in wetter areas and pine trees in well drained areas. This spatial pattern of vegetation species accounted for the large
98 variation at the coarsest scale (> 200 m) and the influence of individual trees explained the large variation at 2 m. In Improved Pasture, the soil was heavily managed. The site had been cleared, tilled, reseeded with forage grass and applied with fertilizer periodically. Besides, some areas were more heavily frequented by cattle flocks than other areas, which resulted in spatial variation of manure input and grass consumption. These practices may have determined the strong spatial structure of SOC observed in this study, similar to the findings by Zhang et al. (2012) The five spatial structures also shared some common features. Variability at very fine scales (2 and 7 m) were marked at all sites, with only Hardwood Hammock and Forest and Improved Pasture showing more pronounced variability at coarse scale (> 200 m). Interestingly, all sites had small variation components at 22 m scale. A possible explanation is that 22 m is beyond the scale that individual plants and human activities influence soil carbon depletion and formation; and yet large enough to capture the variation due to soil moisture regime shift (indicated by vegetati on composition change) at a coarse scale (> 200 m in Hardwood Hammock and Forest). This finding has specific indication for future sampling design and finescale SOC mapping at similar sites The spatial structures of the topsoil SOC under five common LULC in Florida from this study show similarities compared to other studies. Rossi et al. (2009) applied a similar nested sampling design to investigate the spatial variation of topsoil SOC in tropical forests in Southeastern Tanzania. They also found that und er pine plantation, there was no spatial correlation due to human management activities, while the spatial structures of SOC in other natural forests were pronounced. Their results also suggest
99 that in forests where the vegetation composition and structur e were more complex, the ranges of spatial correlation were larger, which is confirmed by this study (Hardwood Hammock and Forest). Cambardella et al. (1994) studied the finescale variability of SOC in two farms of Central Iowa and observed similar spatial structure to that of Improved Pasture from this study with observed ranges between 104 and 129 m with strong spatial dependence. This may be because the agricultural practices, such as tilling, weed control and fertilizing, are similar between the two st udies and resulted in similar impact on the spatial structure of SOC. Cerri et al. (2004) used a systematic sampling design with a 25 m grid to study the spatial variation of soil total carbon in an Amazon pasture and also observed relatively large nugget effect s accounting for 85 and 73% of the overall variances at 0 10 cm and 1020 cm respectively Their result suggested that quite amount of variability existed either within 25 m and the measurement error. These similarities of SOC spatial structures from distinct studies suggest that the spatial structure model obtained at the five sites may apply to other sites with the same or similar land use and management practices at fine scale ( ~ 500 m). The average semivariogram (Figure 4 4F ) derived by pooling the data of five sites together provides a general picture of the finescale spatial structure of SOC T he model is comparable to the average SOC semivariogram reported by McBratney and Pringle (1999) who gathered a set of finescale semivariograms from nine studies and obtained an average semivariogram for soil carbon. Despite the difference in the models used to fit the semivariogram, the spatial dependence and range are fairly similar between the two studies. This similarity suggests some generality in the average
100 semivariogram model of SOC at the scale of ~ 500 m regardless of the different edaphic and environmental conditions between the two studies. The average semivariogram has potential to infer the finescale spatial structure of SOC at new sites as it can be used as a guide to develop optimal sampling schemes The unique landscape and geomorphology of Florida may also facilitate the extrapolation of soil spatial information to other fields. Floridas relatively flat topography imparts less control on soil formation compared with other mountainous areas (Vasques et al., 2012b; Cao et al., 2012) Xiong et al. (2012) identified that the SOC processes in Florida are majorly vegetation and soil moisture driven. Therefore, at sites with similar vegetation composition, soil moisture regime and human interference, sim ilar spatial structures in SOC are expected. The spatial structures found in the five common LULC in this study can inform future research on soil spatial variability at other similar sites in Florida. The smoothness parameters found in this study indicate that the processes under most LULC were rough except Hardwood Hammock and Forest, which generally agrees with the results from the hierarchical analysis of variance. However, in Hardwood Hammock and Forest, there existed marked variability at 2 m that is not reflected in the smoothness parameter (smoothness of 99 suggests there was less likely short range variability). This is because the variability at the > 200 m scale was greater and thus the signal of 2 m variability was suppressed in the estimation. In this sense, the smoothness can be used as a rough index for short range variability but a more sensitive method with a strategic sampling design is needed to detect fine scale variability, such as hierarchical analysis of variance with a nested sampling design.
101 Accounting for the spatial correlation among samples can yield more accurate estimates of the mean and variance of SOC. Due to the nature of the nested sampling design, samples are not independent of each other. Therefore, it would yield biased estimate s of the mean and underestimate the variance of the interested population. However, some other studies (e.g., Rossi et al., 2009) had not taken the spatial correlation among samples with the nested sampling design into account and this could result in misleading results as demonstrated in Table 4 5 (Rossi et al., 2009) L inking the SOC means of the five sites with the edaphic and environmental conditions (Table 41) reveals the underlying controls of SOC levels across different LULC types. Improved P asture although very high in AWC (both 25 and 50), is predetermined to accumulate SOC it ranked just as intermediate in the sequence of LULC types in terms of accumulation of SOC due to the limited NPP in the Improved Pasture system This suggests that there is an inverse effect between moisture and NPP that determines if a system accumulates or loses C in soils. Also, Sandhill had low AWC and low NPP and thus ha d very limited ability to sequester soil C. On the other hand, D ry Prairi e and Hardwood Hammock and Forest were high in both AWC and NPP and therefore were destined to accumulate SOC. Summary and Conclusions This study investigated the finescale variability ( < 500 m) of SOC at 020 cm in five prevalent LULC types in Florida, U SA under a uniform framework same sampling design, size, support and data analysis methods allowing an intra and inter comparison among fields. The spatial variability of SOC under the five LULC showed different behavior but also shared some similarit ies. Hardwood Hammock and Forest had the highest overall
102 variance to which the variation at > 200 m contributed the most and the variation at 2 m was also marked. Similarly, Improved Pasture also demonstrated both large variation at both coarse scale (67 and > 200 m) and very fine scale (2 m). Sandhill, Pineland, and Dry Prairie were dominated by variation at very fine scales (2 and 7 m). All five sites showed large variability at very fine scales, indicating the substantial impact of individual plants (biomass, residue, and roots) and land use management activities on topsoil SOC spatial distribution. Semivariogram analysis identified short ranges of spatial correlation of topsoil SOC in Sandhill and Dry Prairie and relatively long ranges in Hardwood Hammock and Forest and Improved Pasture. In contrast, Pineland showed almost no spatial autocorrelation which suggests large variability in SOC in Pineland. A ccount ing for these spatial correlations, a general linear model with GLS demonstrated better estimates of LULC effect on SOC than OLS that deems samples independently. This study show s that scale dependent behavior of SOC is prevalent and tightly coupled to LCLU t ype and their vegetation composition and structure. In conclusion, SOC stocks in different ecosystems varied across a spectrum of fine scales. Lumping and aggregation of this fine scale variability and behavior of SOC can lead to severely biased soil carbon assessment, specifically if upscaled to larger regions. These findings can provide guidance on future research or survey in similar LULC, environmental settings and land use management .
103 Table 41. Summary of soil properties and environmental settings of the five sampling sites. Sites Soil suborder Precip. Max. T. Min. T. Elevation Slope NPP AWC25 AWC50 mm C C m % kg C m 2 cm cm 1 cm cm 1 SH Psamments 1325 27.5 14.0 42.8 1.15 7.91 1.22 2.25 HHF Aquults 1344 27.1 13.8 43.5 1.20 13.58 2.12 4.01 PL Psamments 1634 26.3 12.9 23.9 2.85 9.07 1.54 3.03 IP Udults 1360 27.2 13.6 28.8 2.43 7.45 2.18 4.50 DP Aquods 1464 29.1 16.3 8.7 0.15 8.13 1.66 3.13 Abbreviations: SH = Sandhill, HHF = Hardwood Hammock and Forest, PL = Pineland, IP = Improved Pasture, DP = Dry Prairie, Precip. = long term annual average precipitation (19712000), Max. T. = annual average maximum temperature, Min. T. = annual average minimum temperature, NPP = net primary production, AWC25 = soil available water holding capacity at 025 cm depth, AWC50 = soil available water holding capacity at 050 cm depth. Precip., Max. T. and Min. T. are from PRISM (Parameterelevation Regressions on Independent Slopes Model), NPP from MODIS (ModerateResolution Imaging Spectroradiometer) for North Am erican Carbon Project (date: 2005), AWC25 and AWC50 from SSURGO (Soil Survey Geographic Database, date: 2009)
104 Table 42. Hierarchy of the optimized unbalanced spatially nested scheme. Hierarchical level Scale Degrees of freedom 1 > 200 m 8 2 67 m 18 3 22 m 36 4 7 m 27 5 2 m 18
105 Table 43. Descriptive statistics of soil organic carbon stock at 020 cm. Original Log transformed Sites N Min. Median Mean Max. SD CV Skew. Mean SD CV Skew. kg m 2 % log (kg m 2 ) % SH 108 0.86 1.49 1.58 2.87 0.42 26.6 0.92 0.42 0.26 61.9 0.33 HHF 108 2.16 3.90 4.13 7.02 1.18 28.6 0.60 1.38 0.28 20.3 0.08 PL 108 0.84 1.65 1.78 5.64 0.63 35.4 3.05 0.53 0.28 52.8 1.18 IP 108 1.79 3.13 3.28 6.97 0.85 25.9 1.56 1.16 0.24 20.7 0.57 DP 108 2.31 4.06 4.17 8.54 0.98 23.5 1.13 1.40 0.23 16.4 0.10 Whole 540 0.84 2.88 2.99 8.54 1.41 47.2 0.65 0.98 0.49 50.0 0.16 Abbreviations: SH = Sandhill, HHF = Hardwood Hammock and Forest, PL = Pineland, IP = Improved Pasture, DP = Dry Prairie, SD = standard deviation, CV = coefficient of variation, skew. = skewness.
106 Table 44. Estimated and derived parameters of semivariograms. The Matrn model was used to fit all empirical semivariograms. Sites Method 2 (log(kg m 2 )) 2 2 (log(kg m 2 )) 2 (m) Smoothness v Spatial dependence SH WLS 0.043 0.024 9.5 1.0 0.36 REML 0.043 0.025 1.5 4.3 0.37 HHF WLS 0.040 0.045 132.4 2.1 0.53 REML 0.039 0.023 11.3 99.0 0.37 PL WLS 0.065 0 REML 0.071 0 IP WLS 0.020 0.067 116.7 1.0 0.77 REML 0.010 0.057 134.7 0.35 0.85 DP WLS 0.034 0.014 9.5 1.0 0.29 REML 0.034 0.018 9.4 0.32 0.35 Abbreviations: SH = Sandhill, HHF = Hardwood Hammock and Forest, PL = Pineland, IP = Improved Pasture, DP = Dry Prairie, WLS = weighted least square with Cressie weights. REML = restricted maximum likelihood. Spatial dependence is defined as partial sill / sill.
107 Table 45. Results for analysis of variance (ANOVA) models fitted by or dinary least square (OLS) and generalized least square (GLS). 1 Model comparison test (likelihood ratio) shows the log likelihood value from GLS model is significantly higher than that from OLS model (pvalue < 0.0001). 2 In GLS, exponential semivariogram model was used to account for the spatial correlation of sampl es. 3 AIC = Akaike information criterion. denotes t test of the parameters are significant (pvalue < 0.001). Mean values that share a same letter in superscripts are not significantly different at = 0.05. Model OLS GLS 2 parameters (class means, log (kg m 2 )) Sandhill 0.42* d 0.45* c Hardwood Hammock and Forest 1.38* a 1.43* a Pineland 0.53* c 0.58* bc Improved Pasture 1.16* b 1.13* ab Dry Prairie 1.40* a 1.44* a parameters Range (m) 336.3 Nugget ((log(kg m 2 )) 2 ) 0.032 Model summary Degrees of freedom 6 8 AIC 3 101.8 33.4 Log likelihood 1 44.9 8.7
108 Figure 41. Sampling sites under five land use and land cover types and soil orders in Florida.
109 Figure 42. The optimized unbalanced spatially nested scheme. A ) T he layout of nine main stations B ) T he sample deployment within one main station. A B
110 Figure 4 3. Components and accumulated components of variance values for soil organic carbon at 020 cm depth in five land use and land cover types estimated by restricted maximum likelihood. A) Sandhill. B) Hardwood Hammock and Forest. C) Pineland. D) Improved Pasture. E) Dry Prairie. A B C D E
111 Figure 44. Cressie Hawkins robust estimators of semivariogram (dots) and fitted Matrn semivariogram (dashed lines: Cressie weighted least square, solid lines: restricted maximum likelihood) for soil organic carbon at 020 cm depth in five land use and land covers. A) Sandhill. B) Hardwood Hammock and Forest. C) Pineland. D) Improved Pasture. E) Dry Prairie. F) Whole. A B C D E F
112 CHAPTER 5 SOIL ORGANIC CARBON STOCK CHANGE AND ITS LINK TO LAND USE AND LAND COVER CONVERSION AND CLIMATE GRADIENT Overview It has been estimated that soil organic carbon (SOC) constitutes about twothirds of the Earths terrestrial carbon pool (Post et al., 1990) Its active interactions with vegetation and atmosphere carbon pools make it a critical component in the global carbon cycle (Kutsch et al., 2010) Great scientific attention has been dr awn to the SOC pool because of the huge potential to deposit carbon belowground with a relatively slow turnover rate (Post et al., 1 982) However, the SOC pool is susceptible to human interferences primarily as land use and land cover (LULC) change. Worldwide, conversations from primary forest to agricultural land are thought to be depleting SOC while afforestation is considered a means to restore SOC stock (Houghton et al., 1999; Guo and Gifford, 2002; Paul et al., 2002; Wu et al., 2003; DeGryze et al., 2004; Grnzweig et al., 2004; Maia et al., 2010) C limate change has raised concerns on its impact on global SOC stocks, however, there has been no consensus on soils role as a sink or source in response to global warming (Davidson and Janssens, 2006) The controversy results from the complex interaction among soil, LULC and climate systems Some studies have shown that the warming temperature can accelerate SOC decomposition and cause a net loss of C to the atmosphere (Davidson et al., 2000; Bellamy et al., 2005; Dorrepaal et al., 2009) However other studies argued that the increas ing temperature can lead to a ne t gain of SOC by promoting plant derived C input which exceeds the increase of decomposition (Nemani et al., 2003; BondLamberty and Thomson, 2010) The debate reflects the complexity in temperature sensitivity of SOC decomposition. S oil organic
113 matter (SOM) has a wide range of intrinsic temperature sensitivity of decomposition because they consist of thousands of organic compounds each of which has its own inherent decomposition property (Kutsch et al., 2010) Furthermore, the temperature sensitivity can be further confounded by some edaphic and environmental conditions that interfere with the decomposition process. For example, SOM can be physically protected from decomposition when organic matter form inside of the soil aggregates, or be chemically pr otected if it is adsorbed onto soil mineral surfaces (Davidson and Janssens, 2006) L and use and land cover plays a critical role in the response of SOC to climate change. On the one hand, LULC controls the quantity and quality of organic compounds that en ter soils, and subsequently determines the intrinsic temperature sensitivity of the SOM decomposition, On the other hand, LULC also changes the soil and environmental conditions that can further affect the apparent temperature sensitivity (Post and Kwon, 2000; Jones et al., 2005; Davidson and Janssens, 2006). There is a need to study the interacting effect of LULC and climate on SOC changes in order to better understand how SOC responds to climate change. Florida soi ls store approximately 2.26 Pg of SOC more than any other state in the conterminous U.S. according to the U.S. General Soil Map (STATSGO2, 2006) This is primarily due to the extensive occurrence of Histosols in the 46,951 km2 of wetlands in the state especially in south Flor ida (U.S. Fish and Wildlife Service, 2009) In addition, Florida features the SOC rich Spodosols which cover about 32% of the area. The formation of these high SOC soils is attributed to the unique climatic (high temperature and precipitation), topographic (flat landscape) and hydrologic (high water table) conditions (Stone et al., 1993; Vasques et al., 2012b) Meanwhile, Florida has
114 been experiencing significant LULC shifts which include rapid urban growth and losses of agricultural and forest land for the past decades (Kautz et al., 2007) This change may have caused significant SOC change in Florida because SOC has been shown in numerous studies to be closely interlinked to LULC (Houghton et al., 1999; Post and Kwon, 2000; Guo and Gifford, 2002; Vesterdal et al., 2011; Minasny et al., 2013) Richter et al. (2011) pointed out that in the Anthropocene soil change has accelerated at global scale in response to anthropogenic induced stressors, however, the magnitude and rate varies geographically. Unfortunatel y, t here is still substantial lack in knowledge about how SOC has been changed over the past decades due to the LULC change in Florida which has great significance if soils a ct as a sink or source for carbon. Moreover, the Florida boundary extends wide in longitude and latitude, covering a large range of climat ic conditions It has both subtropical and tropical climate which makes Florida an ideal area to study the impact of climate factors on SOC change. T he objectives of the study were to 1) investigate the relationships between LULC and SOC stock in Florida based on a current SOC assessment (20082009) and 2) track the SOC change by comparing the current SOC and historical (19651996) data and reveal how LULC interacting with climate fa ctors influence SOC change in this area. Materials and Methods Study Area The study area is the state of Florida, located in the southeastern United States, Florida covers approximately 150,000 km2 (United States Census Bureau, 2000) The climate is humid and subtropical in northern and central Florida and is humid and tropical in southern Florida. The mean annual precipitation of Florida is 1,373 mm and
115 the mean annual temperature is 22.3 C (National Climatic Data Center, 2008) Overall, soils in Florida are sandy in texture. Dominant soil orders of Florida are: Spodosols (32 % ), Entisols (22 % ), Ultisols (19 % ), Alfisols (13 % ), and Histosols (11% ). Most frequent soil subgroups are: Aeric Alaquods, Ultic Alaquods, Lamellic Quartzipsamments, Typic Quartzipsamments, and Arenic Glossaqualfs (Natural Resources Conservation Service, 2009) Land use and land cover consists mainly of wetlands ( 28% ), p inelands (18 % ), u rban and barren lands (1 5 % ) agriculture (9 % ), rangelands (9 % ), and improved pasture (8% ) (Florida Fish and Wildlife Conservation Commission, 2003) Floridas topography is muted with gentle slopes varying from 0 to 5 % in almost the whole State (United States Geological Survey, 1999) Historical Soil Organic Carbon Data set The historical SOC dataset is from the Florida Soil Characterization Dataset ( FSCD) which includes 1,251 sites specific soil profiles These soil profiles were collected and described across Florida bet we e n1965 and 1996. Soil sampling locations were determined based on tacit knowledge of soil surveyors to represent the soil and historical landscape types. The FSCD measured SOC and BD values for each genetic horizon up to 2 m depth. Soil organic carbon was measured in mineral soils (A, B, C and E horizons) by the Walkley Black modified acid dichromate (WB) method (Nelson et al., 1996) and in organic soils soil organic matter (SOM) was measured by loss onignition (LOI). Current Soil Organic Carbon Data set A new soil sampling campaign was designed and carried out b etween 2008 to 2009 to quantify the current topsoil SOC stocks and track the across Florida. A total of 1,080 sites were sampled for the topsoil (020 cm) across Florida ( Figure 51). A
116 r andom sampling design stratified by the combination of soil suborder and LULC was used to preselect locations There were 89 strata and the number of samples in each stratum was proportional to the strata area. Among the 1,080, 194 sites were randomly selected to overlap with the historical sites (collocated sites within 30 m radius ) from FSCD. This design allows for tracking site specific SOC change over the past decades. Inorganic carbon SSM 5000A). SOC was derived by subtraction (TC IC). The laboratory SOC measurements in mass units (%) were converted to stock units (kg m2) using the measured bulk density and soil dept h (20 cm). Harmonization of Historic and Current Datasets To derive SOC change historical and current data had to be harmonized to standardize them to fixed depth (020 cm) and to account for the difference in SOC values due to different measurement methods. Two pedotransfer functions (PTFs) were developed using linear regression on a representative soil set (n: 144) that spans across soil types found in Florida to harmonize SOC values. A robust model was fitted using iteratively reweighted least square to produce a model (R2 = 0.92; residual standard error = 0.11, see Figure 5 2 ) between SOM (WB) and SOC (%) resulting in Eq. ( 5 1). = 0 08 + ( 0 85 ) ( 5 1) w here, SOC is soil organic carbon in % representing combustion method and soil organic matter in % derived from Walkley Black dichromate extraction. Linear regression between LOI and SOC from the mineral soils of DS1 indicated a possible segmented relationship. The Davies test (Davies, 1987) for nonconstant
117 regression slope showed a significant indication of change in slope around 75% LOI (p < 0.001) in the regression graph plotting LOI and SOC (representing combustion method). A brokenline regression procedure was used to develop two equations for convert ing LOI to SOC (%), one for samples with LOI less than 75.8% (Eq. (5 2 ) ) or greater than 75.8% (Eq. (5 3 ) ) with R2 of 0.998 and residual standard error of 0.68 (Figure 5 3) = 0 5 (5 2) = 26. 26 + ( 0 85 ) (5 3) w here, LOI is soil organic matter derived from loss on ignition in % Then the SOC values in % were converted to stock units (kg m2) using the measured bulk density. After historical and current (i.e., reconnaissance) SOC data were harmonized in terms of their analytical measureme nts and standardized to 020 cm depth. Carbon Sequestration Rate SOC sequestration rates were calculated according to Eq. ( 5 4) and ( 5 5) for all ( ) = ( ) ( ) (5 4) ( ) = ( ) ( 5 5) w here, 1 is the historical (19651996) soil organic carbon dataset, 2 is the current (20082009) soil organic carbon dataset, is the year of historical measurement ( 1 ), is the year of current measurement ( 2 ), is the number of years between historical and current observations ( = ) is soil organic carbon stocks in g m2, is the soil organic carbon sequestration rate (g m2 yr1)
118 constrained to collocat ed historical and current sites ( ), and is the geographic coordinate (x and y coordinates) of collocated sites. Positive SOCseq values represent soil carbon gains (sequestration) and negative SOCseq values represent soil carbon losses over the consi dered time period. Land Use and Land Cover and Climate Data Land use and land cover data for the State of Florida of 1995, 2004 and 20082011 were acquired from the Florida Department of Environmental Protection and the LULC data of 1970s from the United S tates Geological Survey (USGS) The LULC changes at the collocated sites were tracked by comparing the historical LULC data and the LULC type determined by the sampling crew in the current sampling campaign. In order to assure the LULC changes, the observed LULC changes were further scrutinized and confirmed with the historical imagery from Google EarthTM. Climate conditions represented by the maximum annual temperature and mean annual precipitation averaged over 19812010 were obtained from Parameter elevation Regressions on Independent Slopes Model (PRISM) Climate Group (2013) Data Analysis The relationship between LULC and SOC was investigated using the Kruskal Wallis test. The p ost hoc multiple comparison was then conducted to compare the SOC of LULC types with each other. In order to compare the current SOC stock with the historical one, the MannWhitney test was used on the current ( n = 1,080) and historical ( n = 1,251) sites. Furthermore, the paired MannWhitney was applied on the collo cated sites ( n = 194) A g eneral linear model was used to test the effects of LULC and climate factors (temperature and precipitation) on SOC sequestration rate The interaction effect of
119 LULC and climate was accounted for in the crosseffect term in the m odels. Then statistically significant interactions were plotted to show the relationships between climate and SOC sequestration rates. Results Effects of Land Use and Land Cover on Soil Organic Carbon Figure 54 shows the distribution density functions along with the contemporary SOC observations grouped by LULC of the time when samples were taken (20082009). Kruskal Wallace test results show that LULC had significant effect on SOC at a significance level of 0.0 001. Generally, wetland soils had significantly higher SOC than almost any other LULC class except sugarcane. The sugarcane land is located in the Everglades Agricultur al Area covering approximately 1,900 km2 and is located south of Lake Okeechobee. This a rea used to be part of the Everglades prior to agricultural development and is characterized by organic soils containing high level s of SOC. No significant difference of SOC was found among the five types of wetlands, although the SOC median of the wetland types varied from 6.7 kg m2 (mixed wetland forest) to 9.1 kg m2 (cypress swamp). Natural upland forests, including mesic upland forest, xeric upland forest and pine forest, had comparable SOC contents as urban and agriculture (except sugarcane). In fact cropland and improved pasture showed even higher SOC than xeric upland forest primarily grown on well drained soils such as Udults and Psamments. Within natural upland land forests, pine forest and mesic upland forest showed quite similar SOC content and both were higher than xeric upland forest. Within agricultural LULC types, improved pasture was observed to have significantly higher SOC compared with crop. It is worth noting that no significant difference in SOC was detected between pine plantation and natural pine forest. Similarly, improved pasture
120 and rangeland did not significantly differ in SOC. Urban soils, primarily characterized by lawn and grassland soils, contained relatively high SOC which was comparable to improved pasture, mesic upland forest, pineland and rangeland. Soil Organic Carbon Change between 19651996 and 20082009. Comparing the current SOC observations with the historical observations from the FSCD database shows that obvious SOC change occurred between 19651996 and 20082009. The MannWhitney test indicated that the current SOC median (1 080 sites) was significantly higher than that of the historical one (1, 251 sites) (p < 0.0001) as shown in Table 51. In the more straightforward oneto one comparison at the collocated sites, the paired MannWhitney test also detected that the current SOC dataset had a significant higher median than the historical one (p = 0.00027) In a ddition, the means of the current SOC datasets were also numerically higher than their historical counterparts even though the historical SOC datasets had much higher maximum SOC observations as shown in Table 51. Figure 55 displays the percent change of SOC at collocated sites between the two soil sampling campaigns. Approximately 63% of the collocated sites had experienced net SOC gain compared to the ~37% sites with SOC loss. These results depict an overall trend of SOC g ain in the top soils of Florida between the two soil sampling campaigns (1965 1996 to 20082009). Impact of Land Use and Land Cover and Its Change on Soil Organic Carbon Figure 56 shows the SOC change at the collocated sites grouped by LULC types and LULC changes between the two sampling campaigns. At the sites without LULC conversion improved pasture, mesic upland forest, xeric upland forest, pineland, rangeland, urban, and wetland showed overall gains of SOC with positive SOC
121 change medians and means while in citrus and sugarcane overall losses were observed. Sugarcane sites experienced a fairly marked SOC loss from the highly organic soil, probably due to the intensive agricultural use in the Everglades Agriculture Area. Citrus, another extensive agr icultural LULC also witnessed an overall SOC loss. However, in the cropland only about half of the sites had SOC losses and overall no obvious loss was found in this study. Improved pasture was the only one under agricultural use that gained SOC even more than the SOC gain in the rangeland. All of the urban sites accumulate d SOC by a remarkable amount. It is worth noting that most urban samples were taken from lawns with irrigation and fertilization on a regular basis. These management practices can promote soil organic carbon input resulting in increased SOC level. At the sites that had undergone LULC changes, conversion of wetland to other LULCs resulted in dramatic SOC losses. For instance, the conversion from wetland to urban led to an average loss of ~16 kg m2 SOC and from wetland to rangeland a loss of ~5 kg m2 was observed On the contrary, conversion from other LULC to wetland promoted SOC accumulation, e.g., from both pineland and rangeland to wetland, ~ 3 kg m2 SOC was gained on average. From barre n land to urban, a slight SOC increase was observed, and similar gains were found converting pineland and cropland to urban. Only one site that was sampled had been converted from rangeland to improve pasture, and it showed no obvious change of SOC. Impact of Land Use and Land Cover and Climate on Soil Organic Carbon Sequestration Rate Because the time factor plays an important role in the amount of SOC change, the SOC sequestration rate was calculated which allow ed to compare SOC change on
122 the same time scale. Table 52 presents six general linear models that show the effect of LULC and climate factors (MAT, maximum annual temperature and MAP, mean annual precipitation) on the SOC sequestration rate. Again, LULC was significant in differentiating SOC sequ estration rate and explained 27% of its variance, while MAT and MAP did not show significant effect on their own. However, both of the two climate factors had significant interaction effects with LULC. A significant t hreedimensional interaction effect was also significant among LULC, MAT and MAP (Model 6). Figure 57 and Figure 58 demonstrate the interaction effects of LULC change types and climate factors on the SOC sequestration rate. In cropland, mesic upland forest and pineland which remained unchang ed between 1970s and 20082011, the SOC sequestration rate showed an increasing trend with MAT which increas ed from about 25 to 29 C. The same effect was also observed in the sites that had been converted from pineland to urban. In contrast, elevated MAP slowed down the SOC accumulation in cropland and pineland as seen in Figure 58 Discussion Quite a few metaanalyses have shown that l and use and its change is a major humaninduced driver of SOC change in a wide range of climate zones (Post and Kwon, 200 0; Guo and Gifford, 2002; Don et al., 2011; Poeplau et al., 2011) In this study, it is found that any LULC conversion involving wetlands resulted in a large SOC change. Wetland soil generally had significantly higher SOC than any other LULC due to its high soil moisture creating anaerobic conditions that are favorable to SOC accumulation. The conversion of wetland to urban or rangeland greatly altered the hydric soil condition and resulted in significant amount of SOC losses. Conversely change of LULC from drier soil condition (i.e., pineland) to wetland led to marked SOC accumulation. However, the
123 SOC gain in recreated wetland was much lower than the SOC decline in the lost wetland (SOC sequestration rate: + 101.2 g m2 yr1 for from rangeland converted to wetland vs. 422 g m2 yr1 for wetland lost to urban and rangeland), showing a slow in fast out effect in SOC change due to land use and land cover conversion (Poeplau et al., 2011) Agriculture practices such as tillage have been reported worldwi de to decrease the SOC level (Wu et al., 2003; Maia et al., 2010) However, in this study no significant change in SOC for cropland sites without any change to other LULC was observed between the two sampling campaigns. This is probably because the extensi ve conservation and best management practices implemented in Florida that include reduced/notillage, conservation management of residues, and efficient water management which are practices that retain SOC in the tilled soil (Simonne et al., 2010) Another explanation is that these cropland sites had been under agricultural use for a long time and the SOC balance had already reached equilibrium. As reported by Poeplau et al. (2011) it required approximate 17 and 23 years for croplands converted from grassland and forest to reach their equilibriums of SOC change rates respectively while the cropland sites in this study had been under agricultural use for more than 19 to 39 years. The sugarcane collocated sites were all located in the Everglades Agric ulture Area which is one of the prime farming areas developed on Histosols (Armentano, 1980) The dramatic SOC loss observed in this study provide some evidence as to how much the extensive and continuing drainage for agriculture beginning in 1912 can reduce the SOC stock in the organic soil. For the past three decades, the SOC stock
124 declined to 2.78 kg m2 at the rate of 91.6 g m2 yr 1 on average. However, these data may have even greatly underestimated the amount of SOC actually lost to the atmosphere d ue to oxidation because an enormous amount of SOC could have been decomposed and released from soils along with the subsidence of the organic soils (Deverel and Rojstaczer, 1996) Studies have show n that the organic soil depths declined 1.8 m by 1950 and 3 .3 m by 1970 (Stephens, 1956; Stephens and Speir, 1970) As marked by the concrete post that was driven to the underlying bedrock in 1924 at the Everglades Research and Education Center of University of Florida, the soil depth dropped 1.8 m at an annual subsidence rate of 1.4 cm yr1 from 1967 to 2009 (Wright and Snyder, 2009) As a result of subsidence, the Everglades peat was estimated to release more than 135 t of CO2 per ha annually (Knipling et al., 1970) In citrus soils, the SOC was relatively low wi th mean and median of 2.38 kg m2 which was the second lowest among all the LULC ( Figure 54 ). In addition, it is observed that the SOC slowly declined at the rate of 22.9 g m2 yr1 on average. A similar finding was reported in coarseg clay kg 1) in citrus and other low intensive agriculture uses such as perennial crops in Brazil where 20% SOC loss at 0 20 cm depth was detected (Zinn et al., 2005) In contrast, the improved pasture soils tended to accumulate SOC at an average rate of 42 g m2 yr1. This can be attributed to the management for high grass productivity such as fertilization, which enhances the SOC input (Post and Kwon, 2000) Moreover, improved pasture maintained relatively high SOC stock which was significantly higher than that in crop and citrus soils ( Figure 5 4 ). Other studies also found the conversion of row crops to managed pasture in subtropical moist climate zone resulted in a gain of SOC at 33.2 g m2 yr1(Lugo et al.,
125 1986) These results show the encouraging potential of improved pasture soils in sequestering carbon under low intensive agriculture use. In urban soils, a relatively high SOC stock was observed compared with other humanmanaged LULC systems such as citrus and cropland ( Figure 54 ). In addition, ur ban soils also showed a relatively large SOC accumulation ( Figure 56 ). These results suggest that urban soils especially in residential areas can serve as an ideal soil carbon sink because of the high level of manag ement, such as irrigation and fertilizat ion and the lack of soil disturbances that may occur in other systems such as tillage and harvesting in agriculture and prescribed fire in forest management (West and Post, 2002; Certini, 2005) Pouyat et al. (2006) investigated carbon storage by urban s oils in the conterminous U.S. and found residential lawns consistently contained high SOC stock in all urban landscapes even higher than many forest soils. In addition, cities located in warmer climate (e.g., southeast) tended to accumulate SOC after urban establishment on nonwetland areas. This trend was also reflected in our result the conversions of crop to urban and pineland to urban soils resulted in overall gains of SOC between the historical and current sampling campaigns. Another interesting fi nding is that the SOC stocks of pine plantation was not significantly different from the natur al pine forest, suggesting the human management of forests did not have significant impact on SOC stock. Overall pineland (including both r ain forest and pine pl antation) showed an average SOC accumulation rate of 18.2 g m2 yr1. However, the SOC change in pineland was extremely variable ( Figure 56 ). This can be attributed to disturbances such as natural and prescribed fire that are common
126 to pineland systems in Florida, which has been shown to greatly remove SOC in the surface soils (Certini, 2005) In this study, it is found that the climate factor imposed significant impact on SOC sequestration rate under various LULC and LULC change types. Elevated temperature tended to accelerate SOC accumulation under cropland, mesic upland forest, pineland and urban converted from pineland, while precipitation had an opposite effect in cropland and pineland. Various studies reported that elevated temperature can have opposite effects on SOC change at the same time enhanc ing SOC input by promoting primary production and accelerat ing SOC decomposition rate as well (BondLamberty and Thomson, 2010) In this study, the enhanced SOC sequestration rate in warm climat e under both changed and unchanged LULC conditions indicates that the SOC balance was dictated by SOC input rather than SOC decomposition (Post and Kwon, 2000) A similar result was reported by Poeplau et al. (2011) from a meta analysis of 95 studies conducted in the temperate zone, in which they found that mean annual temperature had a positive effect on SOC accumulation in five land use change conditions (cropland to grassland, grassland to cropland, forest to cropland, cropland to forest and grassland t o forest). Altogether these results suggest soils might be able to sequester carbon at a faster rate to reduce greenhouse gas emission under the current global warming trend. Precipitation has been known to promote primary production and increase the residues into soils as SOC input However, a negative relationship was observed between SOC sequestration rate in the top soils and mean annual precipitation. This may be due to the sandy texture of Florida soils with high
127 permeability that allows more organic material to migrate vertically to lower layers under higher precipitation (Jobbgy and Jackson, 2000) For the past four decades, Florida has experienced profound LULC change primarily due to urbanization (Mulkey, 2007) From the 1970s to 2008 2011, the urban area in Florida increased by more than 1.5 to about 24,897 km2. Thanks to a variety of wetland restoration project in Florida, the wetland area kept increasing for the past four decades from 33,458 km2 in 1970s to 43,752 km2 in 20082011( Figure 59 ). The wetland area reported by National Wetland Inventory as of 2009 was 46,951 km2 (U.S. Fish and Wildlife Service, 2009) Other LULC databases such as the National Land Cover Dataset confirmed that the Florida wetland area increased by 21.5% between 1992 and 2006 (Multi Resolution Land Characteristics Consortium, 1992 and 2006) Meanwhile, the agriculture area declined by about 20%. These LULC conversions that are favorable to SOC accumulation could further explain the overall SOC increase between 1965199 6 and 20082009. Summary and Conclusions T his study shows that LULC strongly related to SOC variation in Florida. Generally, sugarcane and wetland contained the highest SOC stock, followed by mesic upland forest, pineland, rangeland, improved pasture and urban while crop, citrus and xeric upland forest remained the lowest. Our comparative analyses of current and historical SOC datasets showed a significant SOC accumulation between 19651996 and 20082009. T he amount of SOC change was dependent on LULC and LULC change types. In most of the LULC overall sequestration of SOC was observed, except in sugarcane and citrus. Remarkable SOC losses were involved in the conversions of wetland to other LULC types and vice versa, recreation of wetlands restored the SOC
128 stock. Urban soils contained moderately high SOC stock and conversions of crop, pineland and bar r en land resulted in SOC accumulation which suggests that urban soils can serve as a promising SOC sink if managed properly. In general, the LULC change in Florida for the last four decades followed a trend that favored SOC accumulation the wetland and urban areas increased and the agriculture area decreased as shown by different LULC databases. It should be noted that our data may have underestimated the SOC loss in organic soils in the Everglades Agriculture Area due to subsidence confounded by the management of organic soils, fluctuating drainage and subsurface irrigation based on the depth of organic soils that differs substantially in thi s region. The SOC sequestration rate was also LULC dependent and controlled by climate factors interacting with LULC. Warm conditions tended to accelerate SOC accumulation while high precipitation reduced the SOC sequestration rate. These major findings pr ovided insights into how LULC and LULC change have impact ed SOC change over the past four decades and they have significance for the SOC change in the context of global climate change projected into the future.
129 Table 51. Descriptive statistics of histo rical and current soil organic carbon observations at 020 cm in Florida. All sites Collocated sites 1 Historic Current Historic Current N 1251 1080 194 194 Mean (kg m 2 ) 4.67 4.98 3.81 3.99 Median 2 (kg m 2 ) 2.69 A 3.40 B 2.44 a 2.93 b SD (kg m 2 ) 6.65 4.38 5.28 3.75 MAD (kg m 2 ) 1.69 2.05 1.49 1.31 Skewness 4.33 2.52 4.98 3.54 Kurtosis 23.58 8.48 28.91 14.82 Min. (kg m 2 ) 0.34 0.45 0.36 0.45 Max. (kg m 2 ) 68.09 34.15 46.06 25.90 1 194 collocated sites at which current sites were within 30 m distance to the observations from the historical Florida Soil Characterization Database (19651996). 2 The uppercase letters in the superscripts of medians indicate significant difference in medians between current and historical observati ons of all sites at 0.001 significance level using MannWhitney test. The lowercase letters indicate significant difference in median between current and historical observations of collocated sites at 0.001 significance level using paired MannWhitney test Abbreviations: N = number of observations, SD = standard deviation, MAD = median absolute deviation.
130 Table 52. General linear models showing the effects of land use and land cover, temperature and precipitation on soil organic carbon sequestrat ion rate. Model Model ID Df F P value Variance explained (%) AIC SOC seq. rate ~ LULC*** 1 14 6.19 < 0.001*** 27 2404 SOC seq. rate ~ LULC*** + MAT 2 15 6. 03 < 0.001*** 28 240 2 SOC seq. rate ~ LULC*** + MAT + LULC MAT** 3 27 4.96 < 0.001*** 38 23 86 SOC seq. rate ~ LULC*** + MAP 4 15 5. 79 < 0.001*** 28 240 1 SOC seq. rate ~ LULC*** + MAP + LULC MAP*** 5 27 5. 21 7 < 0.001*** 40 238 1 SOC seq. rate ~ LULC*** + MAP + MAT + LULC MAT*** + LULC MAP + MAT MAP + LULC MAT MAP** 6 51 3. 67 < 0.001*** 4 6 238 2 Abbreviations: SOC seq. rate (g m2 yr1) = soil organic carbon sequestration rate, LULC = land use and land cover type, MAT (C) = maximum annual temperature average over 19712000, MAP (mm) = annual precipitation average over 19812010. Significance code: *** < 0.001, ** < 0.01, and < 0.05.
131 Figure 51. The spatial distribution of sites sampled between 2008 and 2009 on top of the general land use and land cover (LULC) map between 2008 and 2011 of Florida, USA. In red are 194 collocated ( historical) sites that were within 30 m distance to the observations from Florida Soil Characterization Database (19651996) and in white are reconnaissance sites which were not designed to collocate with any historical sites.
132 Figure 52. Interactively reweighted least square regression to convert Walkley Black soil organic matter to soil organic carbon. Figure adopted from Myers et al. (2009)
133 Figure 53. A segmented pedotransfer function for the conversion of loss on ignition ( LOI) measurements to soil organic carbon. A) LOI less than 75.8%. B) LOI greater than 75.8% Figure adopted from Myers et al. (2009) A B Soil Organic Carbon (%)
134 Figure 54 Violin plot of soil organic carbon (kg m2) of the 1, 080 current s amples grouped by land use and land cover (LULC) upon sampling (20082009). The Kruskal Wallace test shows the significant effect of LULC on SOC at the significance level of 0.0001 and post hoc multiple comparison results are denoted by the letter codes above the class names (classes that share no
135 Figure 55 Histogram of soil organic carbon (SOC) change between 19651996 and 20082009 at the 194 collocated sites that were within 30 m dis tance of the observations from the Florida Soil Characterization Database (1965 1996).
136 Figure 56 Soil organic carbon change between 19651996 and 20082009 at the 194 collocated sites that were within 30 m distance to the observations from Florida Soil Characterization Database (1965 1996) sites, grouped by land use and land cover (LULC) and LULC change, respectively For the horizontal axis label, the LULC before a hyphen is the historical LULC and after the hyphen is the LULC upon sampling (20082009). Bar r en = bar r en land, ImPasture = improved pasture, Mesic = mesic upland forest, Xeric = xeric upland forest. The blue boxes denote the first and third quantiles, black dots and dashed lines in boxes denote the medians, and the whiskers embracing the boxes extend to the most extreme point which is no more than 1.5 times of the box length. The number above the horizontal axis is the number of sites in each group.
137 Figure 57 The effect of ma ximum annual temperature on s oil organic carbon sequestration rate in four land use and land cover (LULC) and LULC change types. Filled red circles denote crop, filled blue triangles mesic upland forest, open dark green circles pineland and light green crosses change from pineland to urban between 1970s and 20082011.
138 Figure 58 The effect of mean annual temperature on s oil organic carbon sequestration rate in two land use and land cover and change (LULC) and LULC types. Filled red circles denote crop and open dark green circles pineland without change between 1970s and 20082011.
139 Figure 59 General land use and land cover change between 1970s and 20082011 in Florida. Data of 1970s are derived from the United States Geological Survey (USGS), and d ata of 1995, 2004 and 20082011 from the Florida Department of Environmental Protection (FDEP).
140 CHAPTER 6 SUMMARY AND SYNTHESIS Soil carbon pool has great potential to impact the climate system due to its enormous size compared with atmospheric and terrestrial carbon pools. Florida soils store the most soil carbon among the conterminous U S states primarily in the prevalent Histosols and Spodosols. The SOC in Florida is subject to change imposed by LULC and climate change. Understanding how SOC vary in time and space and their relationship with the environmental factor and stressors is an important step to assess the role SOC plays in the whole earth system. This dissertation is aimed to address some of the questions related to the spatial and tempor al variation of SOC in a southeastern U S state Florida. This dissertation addresses several questions in digital soil mapping and modeling of topsoil SOC in Florida. Chapter 2 explored a new strategy of developing factorial soil models based on the STEP AWBH conceptual model An exhaustive set of 210 potential environmental variables was compiled to characterize Floridas soil landscape based on stateof the art pedological knowledge and technical and computational capabilities. Models were developed to predict SOC (020 cm) using the comprehensive predictor set rooted in factorial soil landscape conceptual modeling paradigms under constraints to select the best performing models which are also striving towards parsimony. Our approach di ffers from current DSM documented in the literature which commonly select the few environmental variables based on domain expertise and then develop models utilizing the variables to predict a soil property. The latter DSM studies may be biased by subjecti ve opinions of experts, whereas our approach is more objective in the sense
141 that it uses strategic, machine learning data reduction techniques to identify soil models that optimize predictive capabilities (fitting), accuracy, and parsimony. It has the obvi ous advantage of impartiality in selecting predicting variables compared with the way of presumably determining the variables to be used in modeling purely based on domain knowledge. Our approach also allowed objective identification of the relevant variables to explain SOC variation, and hence, enabled inferences about major SOC processes at regional scale Florida, U.S. Results confirmed that vegetation and soil water gradient were the driving factors that impart control on SOC variation. The paramount i mportance of LULC in explaining SOC variation raises the concern of potential impacts of human induced LULC changes on future SOC storage. Categorical variables showed great relevance to SOC, but they can be difficult to be use d as predictors when sampling of nominal classes is sparse, or if there are missing endmember cases. These issues can limit modeling and application of the model to make a prediction map. However, it was possible to reduce the use of them at limited or no cost of model performance, b ecause the continuous variables contained similar information that could capture the factors of SOC processes. These findings open a new view of selecting and utilizing variables for predicting SOC and other properties. This research has relevance for larg e region DSM and modeling, specifically soil property predictions at continental and global scales where a balance between pedological rooting of soil prediction models, computational efforts, model performance, and parsimony are needed. Chapter 3 appli ed the Bayesian geostatistical method in modeling and predicting SOC in a large region. The Bayesian geostatistical model incorporating six covariates represent ing soil moisture, vegetation, and topography had better prediction accuracy
142 than the Bayesian model with only spatial random effect in terms of the independent validation. In addition, the former had narrower prediction intervals indicating it has better prediction precision. Generally, the Bayesian prediction intervals were narrow where the posterior mean SOC predictions were low (e.g., the Panhandle area in northern Florida) and relatively wide where the SOC predictions were high (e.g., Everglades Agricultur al Area in southern Florida). The validation of prediction interval with independent validation data confirmed the effectiveness of uncertainty assessment from Bayesian inference. For the uncertainty in the model parameter estimation different parameters varied. Generally, the spatial random effect parameters from both models did not exhibit much uncertainty, while most of the fixed effect parameters from the Bayesian model with covariates, such as LAI, Landsat PC1 and OM, showed considerable uncertainty. Our results also confirmed the usefulness of conventional geostatistical methods in model parameter estimation, especially for REML. The findings are important to quantify the SOC in southeastern USA where the soils are enriched by carbon especially in the prevalent wetland soils in Florida. The SOC assessment with uncertainty enables not only the soil assessment in terms of its functions in ecosystems and the threats posed to soil, but the ability of soils to sequester carbon to mitigate global climate change as well. This dissertation presents a few geospatial models assessing SOC primarily in Chapt er s 2 and 3 with various machine learning and statistical methods. The merits of machine learning models in Chapter 2 lie in their flexibility to account for various data types from categorical to continuous predictors and the optimal prediction accuracy as well. Instead of giving single realizations of a given parameter and predictions, the
143 Bayesian models in Chapter 3 yield the distributions of parameters and predictions and thus allow one to assess the uncertainty of the predictions in DSM. The wide ranges of prediction intervals given by both geostatistical models with and without accounting for the environmental factors show that there still exist ed considerable uncertainty from various sources such as model inadequacy and data measurement errors. However the method presented in Chapter 3 does not allow one to partition the uncertainty into specific types which woul d be worth studying in the future research. The SOC maps produced in Chapter 2 and 3 advance the previous ones (SSURGO, STATSGO) in terms of both accuracy and resolution to allow for a regional scale land resource management. The products have been used in a statewide conservation program Florida Forever Program led by Florida Department of Environmental Protection who use these maps to better inventory Florida natural resources and guide the agency to better assess the soils function in the whole ecosys tem and plot the strategies and policies for natural resource conservation. Chapter 4 investigated the finescale variability ( ~ 500 m) of SOC at 020 cm in five prevalent LULC types in Florida, USA under a uniform framework same sampling design, size, support and data analysis methods allowing an intraand inter comparison among fields. The spatial variability of SOC under the five LULC showed different behavior but also shared some similarities. Hardwood Hammock and Forest had the highest overall var iance to which the variation at > 200 m contributed the most and the variation at 2 m was also marked. Similarly, Improved Pasture also demonstrated both large variation at both coarse scale (67 and > 200 m) and very fine scale (2 m). Sandhill, Pineland, and Dry Prairie were dominated by variation at very fine
144 scales (2 and 7 m). All the five sites showed large variability at very fine scales, indicating the substantial impact of individual plants (biomass, residue, and roots) and land use management activi ties on topsoil SOC spatial distribution. Semivariogram analysis identified short ranges of spatial correlation of topsoil SOC in Sandhill and Dry Prairie and relatively long ranges in Hardwood Hammock and Forest and Improved Pasture. In contrast, Pineland showed almost no spatial autocorrelation which suggests large variability in SOC in Pineland. By account ing for these spatial correlations, a general linear model with GLS demonstrated better estimates of LULC effects on SOC than OLS that deems samples independently. This study demonstrated that scaledependent behavior of SOC prevails and is tightly coupled to LCLU type and their vegetation composition and structure. In conclusion, SOC stocks in different ecosystems varied across a spectrum of fine scales that was remarkable. Lumping and aggregation of this fine scale variability and behavior of SOC can lead to severely biased soil carbon assessment, specifically if upscaled to larger regions. Linking the SOC variation at fine scale (Chapter 3) and regional scale (Chapter 2) shows that finescale variation (< 500 m, variance: 1.99 kg2 m4) account ed for about 10% of the overall variation at the state scale ( 19.2 kg m2) This suggests that the contribution of finescale information (2 to 500 m) DSM at the Florida scale would be marginal. However, this does not depreciate the value of understanding the fine scale variation which can be valuable in precision agriculture that requires high resolution soil information. The average fine scale spatial structures of SOC obtained in this dissertation show ed some universality across geographic regions ( compared to the
145 average semivariogram by McBratney and Pringle (1998)). Therefore, it has the potential to be generalized to describe other sites with similar soil la ndscape settings. Chapter 5 show ed that LULC was a strong descriptor of SOC variation in Florida. Generally, sugarcane and wetland contained the highest SOC stock, followed by mesic upland forest, pineland, rangeland, improved pasture and urban, while crop, citrus and xeric upland forest remained the lowest. Our comparative analyses of current and historical SOC datasets showed a significant SOC accumulation between 19651996 and 20082009, and the amount of SOC change was dependent on LULC and LULC change types. In most of the LULC overall sequestration of SOC was observed except in sugarcane and citrus. Marked SOC losses were involved in the conversions of wetland to other LULC types and vice versa, recreation of wetlands restored SOC stock. Urban soils contained moderately high SOC stock and conversions of crop, pineland and bar r en land resulted in SOC accumulation which suggests that urban soils can serve as a promising SOC sink if managed properly. In general, the LULC change in Florida for the last four decades followed a trend that favored SOC accumulation the wetland and urban areas increased and the agriculture area decreased as shown by different LULC databases. It should be noted that our data may have underestimated the SOC loss in organic soils in Everglades Agricultur al Area where a huge amount of SOC was reported to be lost due to subsidence. The SOC sequestration rate was also LULC dependent and controlled by climate factors interacting with LULC. Warm conditions tended to accelerate SOC accumulation while high precipitation reduced the SOC sequestration rate. These major findings provide insi ghts into how LULC and
146 LULC change have impact ed SOC change over the past four decades and have significance for the SOC change in the context of global climate change. Given the effects of LULC and climate gradients on SOC and SOC sequestration, it seems the SOC stock is likely to increase in the recreated/restored wetlands. Elevated temperature may not result in the loss of carbon in Florida soils; on the contrary, it may increase the SOC stock in some LULC types such as cropland, mesic upland forest and pineland. However, the change of the overall carbon pool in Florida responding to only LULC and climate change is a constrained conclusion based on this dissertation because it is subject to other disturbing factors that may confound the overall SOC balanc e (e.g., fires) It still needs to accurately estimate the SOC loss through soil subsidence to understand how it impacts the overall SOC dynamics. High imperviousness areas from urbanization also cease the carbon sequestration process in the vegetation and its impact on SOC still remains an unknown fact. In the future, more studies are needed to address these questions which are critical to better understand if soils act as sink or source of carbon in Florida In a retrospective point of view, this dissertation successfully addresses the objectives. First, the effects of natural and anthropogenic factors on obs erved patterns of SOC stocks were determined at regional scale. The relationship between SOC spatial patterns and the factors was expressed in the for m of soil landscape models, i.e., STEP AWBH, which can be used both to conceptually express and quantitatively predict the spatial patterns of SOC. This allows for determining what the main drivers are (Chapter2), describe the spatial behavior of SOC (Chapter 3) and determine the spatial scales over which these drivers operate at (Chapter 3 and 4) and rate of change
147 (Chapter 5). The overall hypothesis of this dissertation is accepted with uncertainty in SOC models being recognized. This uncertainty is probably due to the divergent spatial scales a t which soil processes operate. Further work can be directed to address limitation of soil factorial models with a special emphasis on the scaling effect s of the soil processes. This dissertation has made an attempt in linking SOC spatial patterns and processes across scales (Chapter 2 and 3). However, more work is needed to model the multi scale behaviors of soil carbon and other properties to advance our understanding of soil processes in a hierarchical point of vi ew and improve the interpolatability and extrapolatability of soil models.
148 LIST OF REFERENCES Aelion, C.M., Davis, H.T., Liu, Y., Lawson, A.B., and McDermott, S., 2009. Validation of Bayesian kriging of arsenic, chromium, lead, and mercury surface soil concentrations based on internode sampling. Environ. Sci. Technol. 43, 4432 4438. Armentano, T.V., 1980. Drainage of organic soils as a factor in the world carbon cycle. BioScience 30, 825 830. Batjes, N.H., and Sombroek, W.G., 1997. Possibilities for carbon sequestration in tropical and subtropical soils. Glob. Change Biol. 3, 161 173. Bellamy, P.H., Loveland, P.J., Bradley, R.I., Lark, R.M., and Kirk, G.J.D., 2005. Carbon losses from all soils across England and Wales 19782003. Nature 437, 245 248. B ernardo, J.M., and Smith, A.F.M., 2000. Bayesian theory. 1st ed. Wiley, New York. BondLamberty, B., and Thomson, A., 2010. Temperatureassociated increases in the global soil respiration record. Nature 464, 579 582. Bouma, J., 1989. Using soil survey data for quantitative land evaluation. Adv. Soil Sci 9, 177 213. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123 140. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5 32. Burgess, T.M., and Webster, R., 1980a. Optimal interpolation and isarithmic mapping of soil properties (Part I) The semi variogram and punctual Kriging. J. Soil Sci. 31, 315 331. Burgess, T.M., and Webster, R., 1980b. Optimal interpolation and isarithmic mapping of soil properties (Part II) Block kriging. J. Soil Sci. 31, 333 341. Cambardella, C.A., T. B Moorman, Parkin, T.B., Karlen, D.L., Novak, J.M., Turco, R.F., and Konopka, A.E., 1994. Fieldscale variability of soil properties in Central Iowa soils. Soil Sci. Soc. Am. J. 58, 1501 1511. Cao, B., Grunwald, S., and Xiong, X. 2012. Cross regional digital soil carbon modeling in two contrasting soil ecological regions in the US. p. 103 108. In Digital Soil Assessments and Beyond. CRC Press, London, UK. Carr, F., McBratney, A.B., Mayr, T., and Montanarella, L., 2007. Digital s oil assessments: Beyond DSM. Geoderma 142, 69 79.
149 Cerri, C.E.P., Bernoux, M., Chaplot, V., Volkoff, B., Victoria, R.L., Melillo, J.M., Paustian, K., and Cerri, C.C., 2004. Assessment of soil property spatial variation in an Amazon pasture: basis for select ing an agronomic experimental area. Geoderma 123, 51 68. Certini, G., 2005. Effects of fire on properties of forest soils: a review. Oecologia 143, 1 10. Cormen, T.H., Leiserson, C.E., and Rivest, R.L., 1990. Introduction to algorithms. MIT Press. Cox, P.M ., Betts, R.A., Jones, C.D., Spall, S.A., and Totterdell, I.J., 2000. Acceleration of global warming due to carboncycle feedbacks in a coupled climate model. Nature 408, 184 187. Cressie, N.A.C., 1993. Statistics for spatial data. Revised edition. Wiley, New York. Davidson, E.A., and Janssens, I.A., 2006. Temperature sensitivity of soil carbon decomposition and feedbacks to climate change. Nature 440, 165 173. Davidson, E.A., Trumbore, S.E., and Amundson, R., 2000. Biogeochemistry: Soil warming and organic carbon content. Nature 408, 789 790. Davies, R.B., 1987. Hypothesis testing when a nuisance parameter is present only under the alternatives. Biometrika 74, 33 43. Death, G., and Fabricius, K., 2000. Classification and regression trees: a powerful yet si mple technique for ecological data analysis. Ecology 81, 3178 3192. DeGryze, S., Six, J., Paustian, K., Morris, S.J., Paul, E.A., and Merckx, R., 2004. Soil organic carbon pool changes following landuse conversions. Glob. Change Biol. 10, 1120 1132. Dever el, S.J., and Rojstaczer, S., 1996. Subsidence of agricultural lands in the SacramentoSan Joaquin Delta, California: role of aqueous and gaseous carbon fluxes. Water Resour. Res. 32, 2359 2367. Diggle, P.J., Tawn, J.A., and Moyeed, R.A., 1998. Model based geostatistics. J. R. Stat. Soc. Ser. C Appl. Stat. 47, 299 350. Don, A., Schumacher, J., and Freibauer, A., 2011. Impact of tropical landuse change on soil organic carbon stocks a metaanalysis. Glob. Change Biol. 17, 1658 1670. Dorrepaal, E., Toet, S. van Logtestijn, R.S.P., Swart, E., van de Weg, M.J., Callaghan, T.V., and Aerts, R., 2009. Carbon respiration from subsurface peat accelerated by climate warming in the subarctic. Nature 460, 616 619.
150 Eberhardt, R.W., and Latham, R.E., 2000. Relationships among vegetation, surficial geology and soil water content at the Pocono mesic till barrens. J. Torrey Bot. Soc. 127, 115 124. Fierer, N., Craine, J.M., McLauchlan, K., and Schimel, J.P., 2005. Litter quality and the temperature sensitivity to decomposit ion. Ecology 86, 320 326. Finley, A.O., Banerjee, S., and Carlin, B.P., 2007. spBayes: an R package for univariate and multivariate hierarchical point referenced spatial models. J. Stat. Soft. 19, 1 24. Florida Department of Environmental Protection, Florida land use and land cover datasets. Available at http://www.nass.usda.gov/research/Cropland/ SARS1a.htm Florida Fish and Wildlife C onservation Commission (FFWCC), 2003. Florida vegetation and land cover data derived from Landsat ETM+ imagery. Available at http://myfwc.com/research/gis/data ma ps/terrestrial/fl vegetationland cover/. Friedman, J.H., 2002. Stochastic gradient boosting. Comput. Stat. Data An. 38, 367 378. Fromm, H., Winter, K., Filser, J., Hantschel, R., and Beese, F., 1993. The influence of soil type and cultivation system on the spatial distributions of the soil fauna and microorganisms and their interactions. Geoderma 60, 109 118. Grunwald, S., 2009. Multi criteria characterization of recent digital soil mapping and modeling approaches. Geoderma 152, 195 207. Grunwald, S., Go ovaerts, P., Bliss, C.M., Comerford, N.B., and Lamsal, S., 2006. Incorporation of auxiliary information in the geostatistical simulation of soil nitrate nitrogen. Vadose Zone J. 5, 391 404. Grunwald, S., Thompson, J.A., and Boettinger, J.L., 2011. Digital soil mapping and modeling at continental scales: finding solutions for global issues. Soil Sci. Soc. Am. J. 75, 1201. Grnzweig, J.M., Sparrow, S.D., Yakir, D., and Stuart Chapin, F., 2004. Impact of agricultural landuse change on carbon storage in boreal Alaska. Glob. Change Biol. 10, 452 472. Guo, L.B., and Gifford, R.M., 2002. Soil carbon stocks and land use change: a meta analysis. Glob. Change Biol. 8, 345 360. Guo, Y., Gong, P., Amundson, R., and Yu, Q., 2006. Analysis of factors controlling soil car bon in the conterminous United States. Soil Sci. Soc. Am. J. 70, 601.
151 Guyon, I., and Elisseeff, A., 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157 1182. Handcock, M.S., and Stein, M.L., 1993. A Bayesian analysis of kr iging. Technometrics 35, 403 410. Hernndez, T., Garca, C., and Reinhardt, I., 1997. Short term effect of wildfire on the chemical, biochemical and microbiological properties of Mediterranean pine forest soils. Biol. Fert. Soils 25, 109 116. Houghton, R.A ., Hackler, J.L., and Lawrence, K.T., 1999. The U.S. carbon budget: contributions from land use change. Science 285, 574 578. Jenny, H., 1941. Factors of soil formation, a system of quantitative pedology. McGraw Hill, New York. Jobbgy, E.G., and Jackson, R.B., 2000. The vertical distribution of soil organic carbon and its relation to climate and vegetation. Ecol. Appl. 10, 423 436. Jones, C., McConnell, C., Coleman, K., Cox, P., Falloon, P., Jenkinson, D., and Powlson, D., 2005. Global climate change and s oil carbon stocks; predictions from two contrasting models for the turnover of organic carbon in soil. Glob. Change Biol. 11, 154 166. Kamara, A., Rhodes, E.R., and Sawyerr, P.A., 2007. Organic carbon dynamics along a toposequence in a peri urban site. Com m. Soil Sci. Plant Anal. 38, 2371 2379. Kautz, R., Stys, B., and Kawula, R., 2007. Florida vegetation 2003 and land use change between 198589 and 2003. Fla. Scientist 70, 12. Kennedy, M.C., and OHagan, A., 2001. Bayesian calibration of computer models. J R. Stat. Soc. Ser. B Stat. Methodol. 63, 425 464. Knipling, E.B., Schroder, V.N., and Duncan, W.G., 1970. CO2 evolution from Florida organic soils. p. 320 326. In Soil and Crop Science Society of Florida Proceedings. Kohavi, R., and John, G.H., 1997. Wra ppers for feature subset selection. Artif. Intel. 97, 273 324. Kursa, M.B., and Rudnicki, W.R., 2010. Feature selection with the Boruta Package. J. Stat. Soft. 36, 1 13. Kutsch, W., Bahn, M., and Heinemeyer, A., 2010. Soil carbon relations: an overview. p. 1 15. In Soil carbon dynamics: an integrated methodology. Cambridge University Press, Cambridge, UK.
152 Laganire, J., Angers, D.A., and Par, D., 2010. Carbon accumulation in agricultural soils after afforestation: a metaanalysis. Glob. Change Biol. 16, 439 453. Lal, R., 2003. Global potential of soil carbon sequestration to mitigate the greenhouse effect. Crit. Rev. Plant Sci. 22, 151 184. Lark, R.M., 2000. Estimating variograms of soil properties by the methodof moments and maximum likelihood. Eur. J. Soil Sci. 51, 717 728. Lark, R.M., 2011. Spatially nested sampling schemes for spatial variance components: Scope for their optimization. Comput. Geosci. 37, 1633 1641. Lark, R.M., and Cullis, B.R., 2004. Model based analysis using REML for inference from sy stematically sampled data on soil. Eur. J. Soil Sci. 55, 799 813. Liu, D., Wang, Z., Zhang, B., Song, K., Li, X., Li, J., Li, F., and Duan, H., 2006. Spatial distribution of soil organic carbon and analysis of related factors in croplands of the black soil region, Northeast China. Agriculture, Ecosystems & Environment 113, 73 81. Lugo, A.E., Sanchez, M.J., and Brown, S., 1986. Land use and organic carbon content of some subtropical soils. Plant Soil 96, 185 196. Maia, S.M.F., Ogle, S.M., Cerri, C.E.P., and Cerri, C.C., 2010. Soil organic carbon stock change due to land use activity along the agricultural frontier of the southwestern Amazon, Brazil, between 1970 and 2002. Glob. Change Biol. 16, 2775 2788. Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246 1266. McBratney, A.B., Mendona Santos, M.L., and Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3 52. McBratney A.B., and Pringle, M.J., 1999 Estimating average and proportional variograms of soil properties and their potential use in precision agriculture. Precis. Agric. 1, 125 152. Melillo, J.M., Steudler, P.A., Aber, J.D., Newkirk, K., Lux, H., Bowles, F.P., Catricala, C., Magill, A., Ahrens, T., and Morrisseau, S., 2002. Soil warming and carboncycle feedbacks to the climat e system. Science 298, 2173 2176. Minasny, B., and McBratney, A.B., 2005. The Matrn function as a general model for soil variograms. Geoderma 128, 192 207. Minasny, B., McBratney, A.B., Malone, B.P., and Wheeler, I., 2013. Digital mapping of soil carbon. p. 1 47. In Donald L. Sparks (ed.), Advances in Agronomy. Academic Press.
153 Minasny, B., Vrugt, J.A., and McBratney, A.B., 2011. Confronting uncertainty in model based geostatistics using Markov Chain Monte Carlo simulation. Geoderma 163, 150 162. Mulkey, S. 2007. Climate change and land use in Florida: Interdependencies and opportunities. Department of Botany, University of Florida, Gainesville, FL. Multi Resolution Land Characteristics Consortium (MRLC), 1992. National land cover data (NLCD). Available at http://www.epa.gov/mrlc/nlcd.html Murty, D., Kirschbaum, M.U.F., Mcmurtrie, R.E., and Mcgilvray, H., 2002. Does conversion of forest to agricultural land change soil carbon and nitrogen? A review of the li terature. Glob. Change Biol. 8, 105 123. Myers, D.B., Grunwald, S., Vasques, G.M., and Harris, W.G., 2009. Pedotransfer functions for carbon methods and bulk density estimation in Florida soils. In Pittsburgh, PA. Myers, D.B., Kitchen, N.R., Sudduth, K.A., Miles, R.J., Sadler, E.J., and Grunwald, S., 2011. Peak functions for modeling high resolution soil profile data. Geoderma 166, 74 83. National Climatic Data Center (NCDC), National Oceanic and Atmospheric Administration (NOAA), 2008. Monthly surface data. Available at http://www.ncdc.noaa.gov Natural Resources Conservation Service (NRCS), U.S. Department of Agriculture, 2006. U.S. general soil map (STATSGO2). Available at http://soils.usda.gov /survey/geography/ssurgo/description_statsgo2.html Natural Resources Conservation Service (NRCS), U.S. Department of Agriculture, 2009. Soil survey geographic database (SSURGO). Available at http://soils. usda.gov/survey/geography/ssurgo/ Nelson, D.W., and Sommers, L.E ., 1996. Total carbon, organic carbon, and organic matter. p. 961 1010. In Methods of soil analysis. Part 3 Chemical methods Soil Science Society of America, Madison, WI, USA. Nemani, R.R., Keeling, C.D., Hashimoto, H., Jolly, W.M., Piper, S.C., Tucker, C.J., Myneni, R.B., and Running, S.W., 2003. Climatedriven increases in global terrestrial net primary production from 1982 to 1999. Science 300, 1560 1563. Nilsson, R., Pea, J.M., Bjrkegren, J., and Tegnr, J., 2007. Consistent feature selection for pattern recognition in polynomial time. J. Mach. Learn. Res. 8, 589 612. Olson, K.R., 2010. Impacts of tillage, slope, and erosion on soil organic carbon retention. Soil Sci. 175, 562 567.
154 Ovalles, F.A., and Collins, M.E., 1988. Evaluation of soil variability in northwest Florida using geostatistics. Soil Sci. Soc. Am. J. 52, 1702 1708. Paul, K.I., Polglase, P.J., Nyakuengama, J.G., and Khanna, P.K., 2002. Change in soil carbon following afforestation. Forest Ecol. Manag. 168, 241 257. Percival, H.J., Parfitt, R.L., and Scott, N.A., 2000. Factors controlling soil carbon levels in New Zealand grasslands: is clay cont ent important? Soil Sci. Soc. Am. J. 64, 1623 1630. Phillips, D.L., and Marks, D.G., 1996. Spatial uncertainty analysis: propagation of interpolation errors in spatially distributed models. Ecological Modelling 91, 213 229. Poeplau, C., Don, A., Vesterdal, L., Leifeld, J., Van Wesemael, B., Schumacher, J., and Gensior, A., 2011. Temporal dynamics of soil organic carbon after landuse change in the temperate zone carbon response functions as a model approach. Glob. Change Biol. 17, 2415 2427. Poggio, L., Gimona, A., and Brewer, M.J., 2013. Regional scale mapping of soil properties and their uncertainty with a large number of satellitederived covariates. Geoderma 209 210, 1 14. Post, W.M., Emanuel, W.R., Zinke, P.J., and Stangenberger, A.G., 1982. Soil carbon pools and world life zones. Nature 298, 156 159. Post, W.M., and Kwon, K.C., 2000. Soil carbon sequestration and landuse change: processes and potential. Glob. Change Biol. 6, 317 327. Post, W.M., Peng, T., Emanuel, W.R., King, A.W., Dale, V.H., an d DeAngelis, D.L., 1990. The global carbon cycle. American Scientist 78. Pouyat, R.V., Yesilonis, I.D., and Nowak, D.J., 2006. Carbon storage by urban soils in the United States. J. Environ. Qual. 35, 1566. PRISM Climate Group, 2000. PRISM spatial climate layers. Available at http://www. prism.oregonstate.edu/ Quinlan, J.R., 1993. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. R Development Core Team, 2011. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. R Development Core Team, 2012. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
155 Reid, H., Haque, U., Clements, A.C.A., Tatem, A.J., Vallely, A., Ahmed, S.M., Islam, A., and Haque, R., 2010. Mapping malaria risk in Bangladesh using Bayesian geostatistical models. Am. J. Trop. Med. Hyg. 83, 861 867. Ribeiro, P.J., and Diggle, P.J., 2001 geoR: a package for Geostatistical analysis. R News 1, 15 18. Richter, D. deB., Bacon, A.R., Megan, L.M., Richardson, C.J., Andrews, S.S., West, L., Wills, S., Billings, S., Cambardella, C.A., Cavallaro, N., DeMeester, J.E., Franzluebbers, A.J., Grandy, A.S., Grunwald, S., Gruver, J., Hartshorn, A.S., Janzen, H., Kramer, M.G., Ladha, J.K., Lajtha, K., Liles, G.C., Markewitz, D., Megonigal, P.J., Mermut, A.R., Rasmussen, C., Robinson, D.A., Smith, P., Stiles, C.A., Tate, R.L., Thompson, A., Tugel, A.J., va n Es, H., Yaalon, D., and Zobeck, T.M., 2011. Human soil relations are changing rapidly: Proposals from SSSAs crossdivisional soil change working group. Soil Sci. Soc. Am. J. 75, 2079. Richter, D.D., and Markewitz, D., 2001. Understanding soil change: soil sustainability over millennia, centuries, and decades. Cambridge University Press, Cambridge, UK. Rivero, R.G., Grunwald, S., and Bruland, G.L., 2007. Incorporation of spectral data into multivariate geostatistical models to map soil phosphorus variabil ity in a Florida wetland. Geoderma 140, 428 443. Rossi, J., Govaerts, A., De Vos, B., Verbist, B., Vervoort, A., Poesen, J., Muys, B., and Deckers, J., 2009. Spatial structures of soil organic carbon in tropical forests A case study of Southeastern Tanzani a. Catena 77, 19 27. Rossini, A., Li, N., and Tierney, L., 2007. Simple parallel statistical computing in R. J. Comput. Graph. Stat., 399 420. Russell, S.J., Norvig, P., and Davis, E., 2010. Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River, NJ. Saiz, G., Bird, M.I., Domingues, T., Schrodt, F., Schwarz, M., Feldpausch, T.R., Veenendaal, E., Djagbletey, G., Hien, F., Compaore, H., Diallo, A., and Lloyd, J., 2012. Variation in soil carbon stocks and their determinants across a prec ipitation gradient in West Africa. Glob. Change Biol. 18, 1670 1683. Sanchez, P.A., Ahamed, S., Carre, F., Hartemink, A.E., Hempel, J., Huising, J., Lagacherie, P., McBratney, A.B., McKenzie, N.J., MendoncaSantos, M. de L., Minasny, B., Montanarella, L., Okoth, P., Palm, C.A., Sachs, J.D., Shepherd, K.D., Vagen, T.G., Vanlauwe, B., Walsh, M.G., Winowiecki, L.A., and Zhang, G.L., 2009. Digital soil map of the world. Science 325, 680 681.
156 Schimel, D.S., Braswell, B.H., Holland, E.A., McKeown, R., Ojima, D .S., Painter, T.H., Parton, W.J., and Townsend, A.R., 1994. Climatic, edaphic, and biotic controls over storage and turnover of carbon in soils. Global Biogeochem. Cy. 8, 279 293. Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., and Mansmann, U., 2009. State of the art in parallel computing with R. J. Stat. Soft. 47Available at http://epub.ub.uni muenchen.de/8991 (verified 12 March 2013). Schuur, E.A.G., Vogel, J.G., Crummer, K.G., Lee, H., Sickman, J.O., and Osterkamp, T.E., 2009. The effect of permafrost thaw on old carbon release and net carbon exchange from tundra. Nature 459, 556 559. Simonne, E., Hutchinson, C., DeValerio, J., Hochmuth, R., Treadwell, D., Wright, A., Santos, B., Wh idden, A., McAvoy, G., Zhao, X., Olczyk, T., Gazula, A., and Ozores Hampton, M., 2010. Current knowledge, gaps, and future needs for keeping water and nutrients in the root zone of vegetables grown in Florida. HortTechnology 20, 143 152. Smith, J., Smith, P., Wattenbach, M., Zaehle, S., Hiederer, R., Jones, R.J. a., Montanarella, L., Rounsevell, M.D. a., Reginster, I., and Ewert, F., 2005. Projected changes in mineral soil carbon of European croplands and grasslands, 1990 2080. Glob. Change Biol. 11, 2141 2 152. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and Van Der Linde, A., 2002. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64, 583 639. Stein, M.L., 1999. Interpolation of spatial data: some theory for kriging. Springer, New York. Stephens, J.C., 1956. Subsidence of organic soils in the Florida Everglades. Soil Sci. Soc. Am. J. 20, 77 80. Stephens, J.C., and Speir, W.H., 1970. Subsidence of organic soils in the USA. p. 523 534. In Publ. int. Ass. scient. Hydrol. Symp. Tokyo. Tokyo, Japan. Stone, E.L., Harris, W.G., Brown, R.B., and Kuehl, R.J., 1993. Carbon storage in Florida Spodosols. Soil Sci. Soc. Am. J. 57, 179 182. Sun, X. L., Wu, S.C., Wang, H. L., Zhao, Y. G., Zhang, G.L., Man, Y.B., and Wong, M.H., 2013. Dealing with spatial outliers and mapping uncertainty for evaluating the effects of urbanization on soil: A case study of soil pH and particle fractions in Hong Kong. Geoderma 195 196, 220 233. Terra, J.A., Shaw, J.N ., Reeves, D.W., Raper, R.L., Van Santen, E., and Mask, P.L., 2004. Soil carbon relationships with terrain attributes, electrical conductivity, and a soil survey in a coastal plain landscape. Soil Sci. 169, 819 831.
157 Thornley, J.H.M., and Cannell, M.G.R., 2 001. Soil carbon storage response to temperature: an hypothesis. Ann. Bot. 87, 591 598. Trangmar, B.B., Yost, R.S., Wade, M.K., Uehara, G., and Sudjadi, M., 1987. Spatial variation of soil properties and rice yield on recently cleared land. Soil Sci. Soc. Am. J. 51, 668 674. Trumbore, S.E., 1997. Potential responses of soil organic carbon to global environmental change. Proc. Natl. Acad. Sci. USA 94, 8284 8291. U.S. Fish and Wildlife Service, 2009. National wetland inventory (NWI). Available at http://www.fws.gov/wetlands/Data/State Downloads.html United States Census Bureau, 2000. The boundary of the State of Florida. Available at http://www.census.gov/geo/maps data/data/tiger cart boundary.html United States Geological Survey (USGS), 1970s. Historical land use and land cover dataset. Available at http://www.nass.usda.gov/research/Cropland/SARS1a.htm United States Geological Survey (USGS), 1999. National elevation dataset (NED). Available at http://ned.usgs.gov/ Vasques, G.M., Grunwald, S., Comerford, N.B., and Sickman, J.O., 2010 Regional modelling of soil carbon at multiple depths within a subtropical watershed. Geoderma 156, 326 336. Vasques, G.M., Grunwald, S., and Myers, D.B., 2012a. Influence of the spatial extent and resolution of input data on soil carbon models in Florida, USA. J. Geohys. Res.: Biogeosci. 117, 1 12. Vasques, G.M., Grunwald, S., and Myers, D.B., 2012b. Associations between soil carbon and ecological landscape variables at escalating spatial scales in Florida, USA. Landscape Ecol 27, 355 367. Vasques, G.M., Grunwald, S., and Sickman, J.O., 2009. Modeling of soil organic carbon fractions using visiblenear infrared spectroscopy. Soil Sci. Soc. Am. J. 73, 176 184. Vasques, G., Grunwald, S., Sickman, J., and Comerford, N., 2010c. Upscaling of dynamic soil organic carbon pools in a northcentral Florida watershed. Soil Sci. Soc. Am. J. 50, 741870 879. Vesterdal, L., Leifeld, J., Poeplau, C., Don, A., and van Wesemael, B., 2011. Landuse change effects on soil carbon stocks in temperat e regions development of carbon response functions. p. 33 48. In Jandl, R., Rodeghiero, M., Olsson, M. (eds.), Soil Carbon in Sensitive European Ecosystems. John Wiley & Sons, Ltd.
158 Webster, R., and Burgess, T.M., 1980. Optimal interpolation and isarithmi c mapping of soil properties (Part III) Changing drift and universal kriging. J. Soil Sci. 31, 505 524. Webster, R., Welham, S.J., Potts, J.M., and Oliver, M.A., 2006. Estimating the spatial scales of regionalized variables by nested sampling, hierarchical analysis of variance and residual maximum likelihood. Computers & Geosciences 32, 1320 1333. Wehrens, R., 2011. Chemometrics with R: multivariate data analysis in the natural sciences and life sciences. 2011th ed. Springer. West, T.O., and Post, W.M., 200 2. Soil organic carbon sequestration rates by tillage and crop rotation. Soil Sci. Soc. Am. J. 66, 1930. Wilkinson, D.J., 2005. Parallel Bayesian computation. p. 481 512. In Statistics Textbook and Monographs. Handbook of Parallel Computing and Statistics. CRC Press, New York. Williams, P., 1987. Variables affecting near p. 143 167. In Near infrared technology in the agricultural and food industries. American Association of Cereal Chemists, St. Paul, Minnesota. W sten, J.H.M., Pachepsky, Y.A., and Rawls, W.J., 2001. Pedotransfer functions: bridging the gap between available basic soil data and missing soil hydraulic characteristics. Journal of Hydrology 251, 123 150. Wright, A.L., and Snyder, G.H., 2009. Soil subsi dence in the Everglades Agricultural Area. Available at http://ufdcimages.uflib.ufl.edu/UF/00/08/73/99/00541/ Binder2.pdf (verified 20 May 2013). Wu, H., Guo, Z., and Peng, C., 2003. Land use induced changes of organic carbon storage in soils of China. Glob. Change Biol. 9, 305 315. Wynn, J.G., Bird, M.I., Vellen, L., Grand Clement, E., Carter, J., and Berry, S.L., 2006. Continental scale measurement of the soil organic carbon pool with climatic, edaphic, and biotic controls. Global Biogeochem. Cy. 20, GB1007. Xiong, X., Grunwald, S., Myers, D.B., Kim, J., Harris, W.G., and Comerford, N.B., 2012. Which covariates are needed for soil carbon models in Florida? p. 109 114. In Digit al Soil Assessments and Beyond. CRC Press, London, UK. Yaalon, D.H., 1975. Conceptual models in pedogenesis: Can soil forming functions be solved? Geoderma 14, 189 205. Youden, W.J., and Mehlich, A., 1937. Selection of efficient methods for soil sampling. Contributions of the Boyce Thompson institute for plant research 9, 59 70.
159 Zhang, H., 2002. On estimation and prediction for spatial generalized linear mixed models. Biometrics 58, 129 136. Zhang, W., Weindorf, D.C., Zhu, Y., Haggard, B.J., and Bakr, N., 2012. Anthropogenic Management Impact on Soil Organic Carbon Variability: A Case Study in Louisiana, USA. Soil Horizons 53, 18 22. Zimmerman, D.L., and Zimmerman, M.B., 1991. A comparison of spatial semivariogram estimators and corresponding ordinary krigi ng predictors. Technometrics 33, 77 91. Zinn, Y.L., Lal, R., and Resck, D.V.S., 2005. Changes in soil organic carbon stocks under agriculture in Brazil. Soil and Tillage Research 84, 28 40.
160 BIOGRAPHICAL SKETCH Xiong Xi ong was born and raised in Tongshan a town in central Chinas Hubei Province. He received his Bachelor of Engineering with honors in Environmental Engineering in 2009 at China Agricultural University, Beijing, China. In 2009, Xiong obtained his Master of Science in Environmental Science at Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. In 2009, Xiong came to University of Florida to pursue his Ph.D. degree in Soil and Water Science.