|UFDC Home||myUFDC Home | Help|
This item has the following downloads:
1 SPATIAL AND SPECTRAL M ODEL S OF SOIL CARBON AT MULTIPLE SCALES IN FLORIDA By GUSTAVO DE MATTOS VASQUES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2009
2 2009 Gustavo de Mattos Vasques
3 To all Floridians and environmental enthusiasts
4 ACKNOWLEDGMENTS I thank the University of Florida, the So il and Water Science Department, the Alumni Fellowship program and my parents for giving me the opportunity and the resources to conduct this study I thank my major advisor S abine Grunwald f or her wise words of advice and guidance for research and life and my supervisory committee members N icholas Comerford, Willie Harris, Timothy Fik and Wendell Cropper as well as James Sickman for their guidance and support. I also thank the Information Technology experts Brandon Hoover, Steve Bloom and William Deich IV a nd my working colleagues from the Geographic Information Systems Laboratory Jinseok Hong, Brent Myers, Sanjay Lamsal, Rosanna Rivero, Deoyani Sarkhot, Mi youn Ann, Jongsung Kim and Ho young Kwon for their friendship and help. I also thank my cou sins Robert, Slvia and Lus Cludio for the love and support that made my stay in the U.S. so pleasant my friends in Gainesville for the wonderful talks, parties, trips, laughs, and cries, and my family in Brazil, who always encouraged me to pursue my dr eams, and whom I miss so much I give special thanks to my wife and best friend Patricia whose love, friendship, enthusiasm, guidance and support were essential in everything that I have accomplished I cannot express enough my love for her. Funding for t his doctoral r esearch was provided from various projects including Linking Experimental and Soil Spectral Sensing for Prediction of Soil Carbon Pools and Carbon Sequestration at Landscape Scales (Cooperative Ecosystem Studies Unit, Natural Resources Cons ervation Service, U.S. Department of Agriculture), and Rapid Assessment and Trajectory Modeling of Changes in Soil Carbon across a Southeastern Landscape (National Research Initiative, U.S. Department of Agriculture).
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...............................................................................................................4 LIST OF TABLES ..........................................................................................................................9 LIST OF FIGURES ....................................................................................................................... 12 ABSTRACT .................................................................................................................................. 15 CHAPTER 1 INTRODUCTION ................................................................................................................. 17 Rationale and Significance ..................................................................................................... 17 Overview ................................................................................................................................ 20 2 COMPARISON OF MULTIVARIATE METHODS FOR INFERENTIAL MODELING OF SOIL CARBON USING VISIBLE/NEAR INFRARED SPECTRA ........ 23 Summary ................................................................................................................................ 23 Introduction ............................................................................................................................ 24 Materials and Methods ........................................................................................................... 27 Study Area ...................................................................................................................... 27 Field Sampling ................................................................................................................ 28 Laboratory Analysis ........................................................................................................ 28 Spe ctroscopy ................................................................................................................... 29 Pre processing Transformations ..................................................................................... 29 Multivariate Techniques ................................................................................................. 30 Results and Discussion ........................................................................................................... 33 Descriptive Statistics ....................................................................................................... 33 Stepwise Multiple Linear Regression ............................................................................. 33 Principal Components Regression .................................................................................. 34 Partial Least Squares Regression .................................................................................... 35 Regression Tree .............................................................................................................. 36 Committee Trees ............................................................................................................. 38 Variable Selection ........................................................................................................... 39 Pre processing Transform ations ..................................................................................... 39 Conclusions ............................................................................................................................ 41 3 MODELING OF SOIL ORGANIC CARBON FRACTIONS USING VISIBLE/NEAR INFRARED SPECTROSCOPY ............................................................................................ 61 Summary ................................................................................................................................ 61 Introduction ............................................................................................................................ 62 Materials and Methods ........................................................................................................... 63
6 Field and Laboratory Measurements ............................................................................... 63 Pre treatment of Soil Spectra and Multivariate Methods ................................................ 65 Results and Discussion ........................................................................................................... 68 Descriptive Statistics ....................................................................................................... 68 Visible/Near Infrared Spectroscopy Models of Soil Organic Carbon Propert ies ........... 70 Conclusions ............................................................................................................................ 75 4 BUILDING A SPECTRAL LIBRARY TO ESTIMATE SOIL ORGANIC CARBON IN FLORIDA ......................................................................................................................... 88 Summary ................................................................................................................................ 88 Introduction ............................................................................................................................ 88 Materials and Methods ........................................................................................................... 91 Study Area ...................................................................................................................... 91 Field and Laboratory Measurements ............................................................................... 92 Soil Scanning and Data Preparation ................................................................................ 93 Multivariate Calibration .................................................................................................. 94 Results and Discussion ........................................................................................................... 97 Descriptive Statistics ....................................................................................................... 97 Performance of the Different Multivariate Calibration Methods .................................... 99 Effect of the Inclusion of Soil Order Data, or Stratification by Soil Order .................. 101 Explanatory Wavelengths for Soil Organic Carbon ...................................................... 102 Conclusions .......................................................................................................................... 103 5 REGIONAL MODELING OF SOIL CARBON AT MULTIPLE DEPTHS WITHIN A SUBTROPICAL WATERSHED ......................................................................................... 113 Summary .............................................................................................................................. 113 Introduction .......................................................................................................................... 114 Materials and Methods ......................................................................................................... 116 Study Area .................................................................................................................... 116 Field Samplin g .............................................................................................................. 117 Laboratory Analysis ...................................................................................................... 118 Comparison of Soil Total Carbon at Different Depths ................................................. 119 Relationship between Soil Total Carbon and Environmental Landscape Factors ........ 119 Scaling up of Soil Total Carbon in the Santa Fe River Watershed ............................... 120 Results and Discussion ......................................................................................................... 123 Descriptive Statistics ..................................................................................................... 123 Relationship between Soil Total C arbon and Environmental Landscape Factors ........ 125 Scaling up of Soil Total Carbon in the Santa Fe River Watershed ............................... 131 Conclusions .......................................................................................................................... 138 6 UPSCALING OF DYNAMIC SOIL ORGANIC CARBON POOLS IN A NORTHCENTRAL FLORIDA WATERSHED ............................................................................... 153 Summary .............................................................................................................................. 153 Introduction .......................................................................................................................... 153
7 Materials and Methods ......................................................................................................... 157 Study Area .................................................................................................................... 157 Field Sampling and Laboratory Methods ...................................................................... 157 Upscaling Methods ....................................................................................................... 158 Results and Discussion ......................................................................................................... 160 Descriptive Statistics ..................................................................................................... 160 Upscaling of Soil Organic Carbon Properties ............................................................... 162 Conclusions .......................................................................................................................... 168 7 INFLUENCE OF GRAIN, EXTENT, AND GEOGRAPHIC REGION ON SOIL CARBON MODELS IN FLORIDA, USA .......................................................................... 180 Summary .............................................................................................................................. 180 Introduction .......................................................................................................................... 181 Materials and Methods ......................................................................................................... 183 Study Areas, Sampli ng Designs, and Laboratory Methods .......................................... 183 The state of Florida ................................................................................................ 184 The Santa Fe River watershed ............................................................................... 186 The University of Florida Beef Cattle Station ....................................................... 187 Conversion of Soil Organic Carbon Measurements to Soil Total Carbon .................... 188 Calculation of Profile Soil Total Carbon at 0 100 cm .................................................. 188 Regression Modeling of Soil Total Carbon .................................................................. 189 Preparation of soil total carbon data ...................................................................... 190 Preparation of environmental Geographic Information System layers .................. 191 Evaluating the influence of grain ........................................................................... 192 Evaluating the influence of extent ......................................................................... 193 Evaluating the influence of geographic regions ..................................................... 194 Results and Discussion ......................................................................................................... 194 Descriptive Statistics ..................................................................................................... 194 Influence of Grain on Soi l Total Carbon Regression Models ....................................... 195 Transferability of Soil Total Carbon Regression Models across Grains ....................... 198 Influence of Extent on Soil Total Carbon Regression Models ...................................... 199 Transferability of Soil Total Carbon Regression Models across Extents ..................... 200 Influence of Geographic Region on the Distribution of Soil Total Carbon .................. 201 Transferability of Soil Total Carbon Regression Models across Geographic Regions 202 Conclusions .......................................................................................................................... 205 8 MULTI SCALE BEHAVIOR OF SOIL CARBON AT NESTED REGIONS IN FLORIDA, USA .................................................................................................................. 221 Summary .............................................................................................................................. 221 Introduction .......................................................................................................................... 222 Materials and Methods ......................................................................................................... 225 Study Area .................................................................................................................... 225 Field Sampling and Laboratory Analysis ...................................................................... 226 Characterization of the Spatial Dependence of Soil Total Carbon ............................... 228 Results and Discussion ......................................................................................................... 229
8 Descriptive Statistics ..................................................................................................... 229 Spatial Dependence of Soil Total Carbon ..................................................................... 230 Variogram analysis ................................................................................................ 230 Fractal analysis ...................................................................................................... 233 Conclusions .......................................................................................................................... 236 9 SYNTHESIS AND OUTLOOK .......................................................................................... 247 LIST OF REFERENCES ............................................................................................................ 256 BIOGRA PHICAL SKETCH ....................................................................................................... 273
9 LIST OF TABLES Table page 21 Investigations on diffuse reflectance spectroscopy of soil carbon and soil organic matter. ................................................................................................................................ 42 22 Pre processing transformations applied to the spectral curves of soil samples. ................ 45 23 Soil total carbon (TC) and log transformed T C (LogTC) descriptive statistics for the whole dataset, calibration set, and validation set. ............................................................. 46 24 Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by stepwise multiple linear regression (SMLR). ................................ 47 25 Descriptive statistics of predicted log transformed soil total carbon (LogTC) for the calibration and validation sets for the best models obtained from the five multivariate calibration techniques tested. ............................................................................................ 48 26 Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by principal com ponents regression (PCR). ....................................... 49 27 Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by partial least squares regression (PLSR). ........................................ 50 28 Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by regression tree (RT). ...................................................................... 51 29 Summary st atistics for the spectral models of logtransformed soil total carbon (LogTC) produced by committee trees (CT). .................................................................... 52 31 Descriptive statistics of measured soil organic carbon properties. .................................... 77 32 Pearsons correlation coefficients between the measured soil organic carbon properties. .......................................................................................................................... 79 33 Summary statistics of the models obtained for each soil organic carbon property by the different multivariate methods, associated with their respective best pre processing transformations. ............................................................................................... 80 34 Summary statistics of the mo dels obtained for each soil organic carbon fraction by simple linear regression, and by partial least squares regression, using both soil reflectance and LogTC as predictors. ................................................................................ 82 41 Descriptiv e statistics of soil organic carbon (SOC) and ln transformed SOC (LnSOC) for the whole dataset, and stratified soil horizons. .......................................... 105
10 42 Summary statistics for the spectral models of soil organic carbon (SOC) produced by committee trees (CT). ...................................................................................................... 106 43 Summary statistics for the spectral models of ln transformed soil organic carbon (LnSOC) produced by partial least squares regression (P LSR). ..................................... 107 51 Environmental data and sources used to model the global spatial trend of logtransformed soil total carbon (LogTC). ........................................................................... 140 52 Descriptive statistics of observed soil total carbon (TC) and log transformed TC (LogTC) at different depths ............................................................................................ 142 53 Pair wise comparison of logtransformed soil total carbon (LogTC ) at different depths. ............................................................................................................................. 144 54 Analyses of variance (ANOVA) between log transformed soil total carbon at 0100 cm (LogTC100) and selected environmental variables. .................................................. 145 55 Homogeneous groups of logtransformed soil total carbon at 0100 cm (LogTC100) based on land use/land cover, soil order, soil drainage class, and geologic unit, respectively, according to Dunnetts T3 test at the 0.05 confidence level. ..................... 146 56 Comparative results of the three geostatistical methods used to model soil total carbon (TC) at different depths. ...................................................................................... 147 57 Semivariogram parameters of the fitted exponential model of the best geostatistical method identified for soil total carbon (TC) at each depth. ............................................. 148 61 Descriptive sta tistics of measured soil organic carbon (C) properties. ........................... 171 62 Comparative results of the three geostatistical methods used to model soil organic carbon (C) properties at 030 cm. .................................................................................... 173 63 Stepwise multiple linear regression (SMLR) models, and variables selected by regression tree (RT) models of the global trend of soil organic carbon (C) propertie s. .. 174 64 Semivariogram parameters of the fitted exponential models of the soil organic carbon (C) propert ies. ...................................................................................................... 175 71 Environmental Geographic Information Syste ms (GIS) layers used as explanatory variables in the stepwise multiple linear regression models of soil total carbon. ............ 209 72 Descriptive statistics of soil total carbon (TC) and ln transforme d TC (LnTC) at the three study areas. ............................................................................................................. 210 73 Regression coefficients and goodness of fit statistics of the stepwise multiple linear regression model of lntransformed soil total carbon in ln% derived at different grains in the Santa Fe River watershed. .......................................................................... 211
11 74 Regression coefficients and goodness of fit statistics of the stepwise multiple linear regression model of lntransformed soil total carbon derived at different extents. ......... 212 75 Descriptive statistics of soil total carbon (TC) and ln transformed TC (LnTC) by hydrologic unit (i.e., geographic region) in Florida. ....................................................... 213 81 Descriptive statistics of soil total carbon (TC), and ln transformed TC (LnTC) at the three nested scales, and for the pooled dataset across scales. .......................................... 239 82 Variogram and fractal parameters of ln transformed soil total carbon (LnTC) over short, medium, and long distances at the three nested scales, and using the pooled dataset across scales. ....................................................................................................... 240
12 LIST OF FIGURES Figure page 21 Diffuse reflectance curves of different soil orders present in the dataset, along with important absorbance regions related to soil carbon in the visible/near infrared region and their responsible chemical groups. .................................................................. 53 22 Sampling locations and soil orders within the Santa Fe River watershed (SFRW) ......... 54 23 Predicted versus observed log transformed soil total carbon (LogTC) for the Savitzky Golay 1stderivative using a 1storder polynomial with search window 9 (SGF 1 9) stepwise multiple linear regression (SMLR) model. ..................................... 55 24 Predicted versus observed log transformed soil total carbon (LogTC) for the normalization by the range (NRA) principal components regression (PCR) model. ...... 56 25 Predicted versus observed log transformed soil total carbon (LogTC) for the Savitzky Golay 1stderivative using a 1storder polynomial with search window of 9 (SGF 1 9) partial least squares regression (PLSR) model. ............................................. 57 26 Predicted versus observed log transformed soil total carbon (LogTC) for the for the Norris gap derivative with a search window of 5 (NGD 5) regression tree (RT) model. ................................................................................................................................ 58 27 Predicted versus observed log transformed soil total carbon (LogTC) for the for the Norris gap derivative with a search window of 7 (NGD 7) committee trees (CT) model. ................................................................................................................................ 59 28 Important wavelengths used by the best models of logtransformed soil total carbon (LogTC) obtained by the different multivariate techniques and corresponding pre processing transformations. ............................................................................................... 60 31 Estimated versus measured values in the validation of the best visible/near infrared spectroscopy models of the soil organic carbon properties. .............................................. 83 32 Cumulative percent of explained variance as a function of the number of partial least squares (PLS) factors for the total organic carbon model estimated by partial least squares regression using log(1/reflectance) transformation. ............................................. 85 33 Coefficients used in the partial least squares regression (PLSR) models of soil organic carbon (C) properties ........................................................................................... 86 34 Important wavelengths used in the total organic carbon models produced by four multivariate methods, associated with their best pre processing transformations. ............ 87 41 Distribution of soil profiles and soil orders w ithin the state of Florida. .......................... 108
13 42 Estimated versus observed plots of the validation of soil organic carbon (SOC) models derived by committee trees (CT) ....................................................................... 109 43 Estimated versus observed plots of the validation of lntransformed soil organic carbon (LnSOC) models der ived by partial least squares regression (PLSR) ................. 111 44 Regression co efficients of the partial least squares regression (PLSR) models of ln transformed soil organic carbon (LnSOC) ..................................................................... 112 51 La nd use/land cover and sampling sites in the Santa Fe River watershed (SFRW), Florida. ............................................................................................................................ 149 52 Soil total carbon (TC) output maps obtained by the best geostatistical method at each depth in the Santa Fe River watershed (SFRW), Florida ................................................ 150 53 Regressi on tree of the logtransformed soil total carbon at 3060 cm (LogTC2) global trend model. .......................................................................................................... 152 61 Sampling design, and elevation in the Santa F e River watershed (SFRW), Florida. ...... 176 62 Output maps of estimated soil organic carbon (C) properties ......................................... 177 63 Regression tree of the logtransformed recalcitrant organic carbon (LogRC) global trend model. ..................................................................................................................... 179 71 Sampling locations at three nes ted extents (i.e., study areas) in Florida ......................... 214 72 Separation of Florida samples into geographic regions (i.e., hydrologic units) and extent subsets. .................................................................................................................. 215 73 Overview of the framework used to test the influence of grain, extent, and geographic region on the quality of stepwi se multiple linear regression models of lntransformed soil total carbon. .......................................................................................... 216 74 Prediction quality of the stepwise multiple linear regression models of ln transformed soil total carbon in ln% derived at specific grains, and evaluated at the other six grains in the Santa Fe River watershed. ............................................................ 217 75 Output maps of lntransformed soil total carbon (LnTC) from the stepwise multiple linear regression models derived at seven grains, respectively, in the Santa Fe River watershed. ........................................................................................................................ 218 76 Prediction quality of the stepwise multiple linear regression models of ln transformed soil total carbon derived at a specific extent, and eval uated at the other two extents. ..................................................................................................................... 219 77 Prediction quality of the stepwise multiple linear regression models of lntransformed soil total carbon (LnTC) derived at the University of Florida Beef Cattle
14 Station (BCS), Santa Fe River watershed (SFRW), and state of Florida (FL), and evaluated at 10 geographic regions (i.e., hydrologic units) in FL. .................................. 220 81 Three nested scales within the state of Florida, with their respective sample distributions of soil total carbon (TC).. ........................................................................... 241 82 Variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively, at the field scale (FS) watershed scale (WS) state scale (SS) and for the pooled dataset across scales .......................................................................... 242 83 Fitted variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively at the field scale (FS), watershed scale (WS), and state scale (SS) up to 10,000 m. ...................................................................................... 244 84 Log log plots of the variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively, at the field scale (FS) watershed scale (WS) state scale (SS) and for the pooled dataset across scales ............................ 245
15 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requi rements for the Degree of Doctor of Philosophy SPATIAL AND SPECTRAL MODELS OF SOIL CARBO N AT MULTIPLE SCALES IN FLORIDA By Gustavo de Mattos Vasques August 2009 Chair: Sabine Grunwald Major: Soil and Water Science Soil carbon (C) is an important indica tor of ecosystem quality. In Florida, soil C is influenced by hydrologic gradients, vegetation, land use and management, and other environmental variables acting in synergy. However, these influences are understood only partially within limited geographic conditions and domains, raising the need to assess the spatial patterns of soil C using more holistic approaches. Our objectives were to develop spectral and spatial models to estimate soil C efficiently and accurately. First, we estimated soil total C (TC ) and soil organic C (SOC) fractions using visible/near infrared spectroscopy within a northcentral Florida watershed (3585 km2, 544 observations), and TC across Florida (150,000 km2, 7122 observations). Spectral models explained up to 86% of the variabil ity of TC and up to 82% (recalcitrant C) of the variability of SOC fractions in independent validation (Rv 2). For SOC fractions, the quality of spectral models decreased in the following order: recalcitrant C > hot water soluble C > mineralizable C > hydrolyzable C. Second, we derived spatial models of TC across the watershed, estimating a total stock of 39.29 Tg of soil C in the upper 1 m. Spatial patterns of SO C fractions resembled that of TC, and the most important factor to account for TCs variability was its spatial autocorrelation. Third, we tested the influence of scale on soil -
16 landscape relationships in TC spatial models among seven grains (30 to 1920 m), three extents (Florida, watershed, and field 5.58 km2, 152 observations), and ten geographic regions in Florida. Hydrologic variables imparted major control on TC estimates across grains and extents. Transferability of TC spatial models was best among grains up to 60 m, and between the Florida and watershed extents, and varied among geographic regions. Finally, we characterized the spatial dependence of TC at multiple scales in Florida, identifying ranges of scale invariant (i.e., self similar) TC variation (< 1500 m, 150030,000 m, and > 30,000 m). Our spectral and spatial models improved the know ledge about the variability of soil C in Florida, with implications for land management, environmental quality and conservation, policy making, assessment of ecosystem services, global climate change, and C cycling and sequestration.
17 CHAPTER 1 INTRODUCTI ON Rationale and Significance In Florida and all over the world, soils are a valuable resource for the production of food, fiber, and energy to support people and sustain life at all levels of terrestrial systems, participating in the biogeochemical cycle of principal nutrient elements (e.g., carbon, nitrogen, and phosphorus) and water (Lal et al., 1998; Jacobson et al., 2004). Soils are composed of organic and inorganic materials that result from their interaction with the atmosphere, lithosphere, hydrosphere, and biosphere. Carbon (C), being part of organisms and soil organic matter (SOM), directly influences the physical, chemical, and biological properties and processes in the soil, enhancing soil quality through the regulation of nutrients and toxic substances, storage of water, stabilization of soil aggregates and structure, and regulation of microbial activity, ultimately affecting the biodiversity and sustainability of whole ecosystems (Kay, 1998; Ernst, 2004). Hence, the analysis and prediction of the distribution and dynamics of soil C is an essential requirement for sustainable land management (McBratney et al., 2000). Globally, soils store about 3250 Pg (petagrams) of C including wetlands and permafrost, which is about five times the biotic pool ( 650 Pg) and about four times the atmospheric pool ( Field et al., 2007), and have been indicated as a potential reservoir to sequester atmospheric C dioxide and mitigate global warming (Follett et al., 2000; Smith and Heath 2004). In Florida soils, however information about the quantity and quality of C is still sparse, as research has usually focused on specific ecotypes land uses, and soil classes occurring within specific geographic domains Thus, at the landscape and regional scales, there are still k nowledge gaps about the amount of soil C and the influence of diverse environmental conditions on its spatial distribution.
18 This type of knowledge can be generated by upscaling soil C, which is essentially to estimate its regional spatial distribution base d on site specific ( point ) measurements, in other words to interpolate soil C spatially. Soil landscape analysis is used to derive such upscaling models that explain the influence of environmental gradients on the global spatial patterns (trends) of soil C across the region of interest, but also account for the local spatial variation of soil C based on its spatial dependence (i.e., spatial autocorrelation) within relatively short distance s However, many factors can influence these soil landscape models, a nd ultimately the quality of regional soil C estimates Scale is one of the factors that can potentially influence upscaling models of soil C. Scale translates the notion of how much detail can be discerned in a map by the eyes of the observer. It can be p erceived in different ways as it relates to the amount of spatial detail inherent in the map (i.e. the grain, or spatial resolution), the total area covered by the map (i.e., the extent), the relative geographic position of the map (i.e., the region), and the level of feature detail contained in the map (i.e., the range of continuous or thematic information). The relationship between soil C and environmental properties depends on all these scaling properties, which determine the quality of upscaling models and estimates of soil C. Thus, understanding the influence of scale on upscaling models of soil C is critical, as it helps to identify ideal scales to produce, and apply the models, as well as the potential and limitations of available ancillary environme ntal data to characterize soil C spatial patterns at different scales. Another factor that dictates the spatial distribution of soil C, and influences its upscaling behavior across large regions, is its inherent spatial dependence that generates spatially autocorrelated patterns of soil C, i.e., the tendency that soil C observations clustered together are
19 more similar (positive spatial autocorrelation), or more dissimilar (negative spatial autocorrelation) than observations further apart. The formation and accumulation of soil C is the result of many soil and environmental processes acting jointly over a range of spatial and temporal scales. Because of this, spatial patterns of soil C inherently include structured spatial dependences over many scales, so tha t the perceived spatial variation of soil C depends on the scale at which it is characterized. This means that soil C is likely to have multi scale spatial dependence, which needs to be characterized to better guide upscaling of soil C at specific scales ( e.g., at specific grain and extent combinations), so that it can incorporate the spatial dependence of soil C, or at least be conducted within adequate spatial ranges. In addition, knowing soil Cs spatial dependence structure helps to elaborate sampling designs that more efficiently capture its spatial variation, which is of great advantage since field sampling is one of the most limiting components of regional assessments of soil C due to its high costs. Currently, research about the influence of scale on spatial patterns of soil C (as well as other soil and environmental properties) is still in its infancy, especially in soil science, with very little documented in the literature. In this aspect, our research contributes to unveil the intricacies among spatial scale, environmental properties, and soil C patterns, and how these relationships generate spatial models of soil C, controlling its spatial distribution across scales. One alternative technique that has been proposed to reduce costs of field and laboratory analyses of soil properties aiming to populate spatial soil databases is visible/near infrared spectroscopy (VNIRS). Compared to conventional laboratory methods, VNIRS is nondestructive, requires less sample preparation with fewer or no chemical r eagents, is highly adaptable to automated and in situ use, and can analyze various soil properties simultaneously
20 (McCarty et al., 2002; Viscarra Rossel et al., 2006). It has the potential to be faster and cheaper than conventional methods and to estimate soil properties with high accuracy (e.g., Reeves III et al., 2002). Given these potential benefits, it is worthwhile to test the feasibility of VNIRS to estimate soil C in Florida soils, which until now has not been done. The overarching objective of this dissertation was to advance the knowledge about modeling of soil C in Florida. Our first objective was to test the feasibility of VNIRS to estimate soil total C (TC) and soil organic C (SOC) fractions using samples collected within a north central Florida watershed, and samples spanning the s tate of Florida. Our second objective was to compare soil landscape methods to upscale TC and SOC fractions within the same watershed, identifying the most important environmental explanatory factors of the spatial dist ribution of soil C. Our third objective was to investigate the influence of grain, extent, and region on the characteristics and predictive quality of upscaling models of TC. Finally, our fourth objective was to characterize the spatial dependence structur e of TC using samples collected at three nested scales in Florida. Overview The contents of this dissertation are linked by the common theme of modeling of soil C in Florida. The following chapters present the findings of investigations conducted separatel y to improve the knowledge about different aspects under this theme. They were written in a way to be as standalone as possible, and the reader should need to refer to other chapters only to complement information about study areas, laboratory methods, and sampling designs, or to compare results. The last chapter (9) presents a synthesis of the main findings of the dissertation, and discusses some of their implications to the study of soil C in Florida, and in a broader context to the conservation of soil r esources and advancement of soil C science. Limitations of the study are discussed, and some recommendations for future research are provided.
21 The first part of th e dissertation Chapter s 2 3, and 4 investigates the feasibility of modeling soil C using VNIRS as a means to provide data to support the spatial assessment of TC, and SOC fractions. Chapters 2 and 3 focus on soils of the Santa Fe River watershed (SFRW) that were collected at four depth intervals (0 30, 3060, 60120, and 120180 cm) and analy zed for TC at the se four depths Soils were also analyzed for four SOC fractions at the first depth (0 30 cm), namely recalcitrant organic C (RC), hydrolysable organic C (HC), hot water soluble organic C (SC), and mineralizable organic carbon (MC). In Chap ter 2, five multivariate calibration methods, and thirty pre processing transformations of soil spectra are compared to derive VNIRS predictive models for TC at the four depth intervals. In Chapter 3, five multivariate calibration methods, and six selected pre processing transformations of soil spectra are compared to derive VNIRS predictive models at 0 30 cm for TC, and SOC fractions. Chapter 4 evaluates the feasibility of VNIRS to estimate TC in soils across Florida. Using the knowledge obtained in Chapte r s 2 and 3, three multivariate calibration methods are compared to derive VNIRS models for mineral and organic soil horizons spanning Florida In addition, the effect of adding soil taxonomic information (i.e., soil order) on the quality of VNIRS models is evaluated. The second part of the dissertation Chapter s 5 and 6 assesses the spatial distribution of TC and SOC fractions across the SFRW, comparing three soillandscape upscal ing methods. Important environmental explanatory factors of the spatial di stribution of TC and SOC fractions are identified, as well as the influence of TCs spatial dependence on the quality of the upscaled models and resulting spatial patterns of soil C. For consistency with Chapter s 2 and 3, Chapter 5 focuses on TC at the fo ur depth intervals, and Chapter 6 focuses on T C and SOC fractions at 0 30 cm.
22 The th ird part of the dissertation C hapters 7 and 8 investigates the influence of scale on the spatial distribution of TC at a depth of 0100 cm and on the quality and transferability of the upscaled models. In Chapter 7, the influence of grain, extent, and geographic region to upscale TC is investigated at three nested scales in Florida using soillandscape regression modeling. The effect of grain on upscaling models of TC is evaluated within the SFRW by comparing models derived using environmental explanatory variables resampled to seven different grains. The effect of extent is evaluated by comparing upscaling models of TC derived using samples collected independently at t hree extents within Florida. To finalize the investigation on the influence of scale on TC models, the transferability of upscaled models is evaluated across grains, extents, and regions in Florida. Finally, Chapter 8 characterizes the spatial dependence s tructure of TC at three nested scales in Florida using variogram and fractal analyses. Differences and similarities among scales are highlighted to identify multi scale spatial patterns, ranges of spatial dependence, and other important scaling characteris tics of TC.
23 CHAPTER 2 COMPARISON OF MULTIV ARIATE METHODS FOR I NFERENTIAL MODELING OF SOIL CARBON USING VI SIBLE/NEAR INFRARED SPECTRA1 Summary In order to reduce cost and time in the analysis of soil properties, visible/near infrared diffuse reflectance s pectroscopy (VNIRS) has been proposed. Since various pre processing transformations and calibration techniques are in use to analyze soil spectral data much uncertainty still exists about predictive soil modeling. We investigated the feasibility of VNIRS t o determine the concentration of carbon in soils collected in the Santa Fe River watershed, Florida. A total of 554 soil samples (400 for calibration, and 154 for validation) were collected to a depth from 0 to 180 cm. Total carbon was measured by dry combustion after sieving (2 mm), air drying, and ball milling and is reported in mg kg1. Reflectance measurements from 350 to 2500 nm were collected in a controlled laboratory environment. Five multivariate techniques (stepwise multiple linear regression, pri ncipal components regression, partial least squares regression, regression tree, and committee trees) and thirty pre processing transformations (including derivatives, normalization and nonlinear transformations) of spectral data were compared with the ai m of identifying the best combination to predict soil carbon. The coefficient of determination (R2), the root mean square error (RMSE), and the residual prediction deviation (RPD) were used to evaluate the models. The combination of multivariate technique and pre processing transformation that provided the highest coefficient of determination for the validation set (Rv 2) and RPD, and lowest root mean square error for the validation set (RMSEv), was committee trees associated with Norris gap derivative with a search window of 7 measurements (Rv 2 = 0.86, RMSEv = 0.170, RPD = 2.68). When considering the overall results 1 Published in Geoderma 146, 14 25, 2008.
24 of the multivariate techniques across all tested preprocessing transformations, partial least squares regression performed best (highest averag e RMSEv across all pre processing transformations), followed by stepwise multiple linear regression, and committee trees. In terms of pre processing transformations, Savitzky Golay derivatives consistently improved the models of soil carbon, being among the five best pre processing transformations for all of the multivariate techniques tested. Norris gap derivative was the preferred data preparation for the tree based techniques. Except for standard variate transformation, normalization techniques performed worse than expected. The RPD of the best VNIRS models were higher than 2.50, which suggest that the VNIRS models produced in this study are robust and stable enough to be applied for similar soils. Introduction The analysis and forecast of the distribution and dynamics of soil carbon is an essential requirement for sustainable land management (McBratney et al., 2000; Florinsky et al., 2002). Digital soil mapping provides a cost efficient tool to map soil properties across large areas, but requires comprehe nsive sampling in the field to provide data to train the models. Thus, there is tremendous need for new techniques to measure soil properties that are faster and cheaper than traditional methods, but offer comparable accuracy (Shepherd and Walsh, 2002). In recent years, visible/near infrared diffuse reflectance spectroscopy (VNIRS) has proven to be a promising technique for the investigation of soil carbon, soil organic matter (SOM), and various other soil properties. Compared to conventional analytical met hods, VNIRS is faster, cheaper, and nondestructive, requires less sample preparation, with less or no chemical reagents, is highly adaptable to automated and in situ measurements, and has the potential to analyze various soil properties simultaneously (Mc Carty et al., 2002; Viscarra Rossel et al., 2006).
25 Models of soil carbon obtained using VNIRS benefit from the interaction between soil carbon, i.e., organic matter, and soil reflectance in the visible/near infrared (VNIR) region (Gaffey et al., 1993), and often predict soil carbon with high accuracy, explaining more than 80% of its variability ( e.g., Chang and Laird, 2002; Reeves III et al., 2002; Shepherd and Walsh, 2002). Although the fundamental absorption features of the main soil elements occur in the mid and far infrared regions, overtones and combinations with predictive ability are present in the VNIR region (Hunt, 1977; Gaffey et al., 1993). Factors that influence the reflectance of soils also include: moisture, particle size, and mineral composit ion, especially the presence of iron (Bowers and Hanks, 1965; Hunt, 1977; Torrent et al., 1983; Gaffey et al., 1993; Lobell and Asner, 2002; Reeves III et al., 2002). Typical soil reflectance curves in the VNIR region for various soil types are shown in Fi gure 21, along with the most important absorbance regions related to soil carbon, and their responsible chemical groups. Several calibration techniques have been used for predictive modeling of soil properties using VNIRS. Partial least squares regression (PLSR) is the most common (Masserschmidt et al., 1999; Reeves III et al., 2001, 2002; Chang and Laird, 2002; Dunn et al., 2002; McCarty et al., 2002; Kooistra et al., 2003; Udelhoven et al., 2003; Brown et al., 2005; Viscarra Rossel et al., 2006), but the list extends to other least squares methods, including principal components regression (PCR) (Chang et al., 2001; Islam et al., 2003), and multiple linear regression (MLR) (Al Abbas et al., 1972; Krishnan et al., 1980; Dalal and Henry, 1986; Meyer, 1989), as well as nonparametric data mining techniques, including artificial neural networks (Fidncio et al., 2002a; Daniel et al., 2003), regression trees (RT) (Brown et al., 2006), and multivariate adaptive regression splines (Shepherd and Walsh, 2002), whic h have been more recently incorporated.
26 Table 2 1 summarizes various VNIRS investigations that have been conducted to infer on soil carbon and SOM. Most VNIRS studies are conducted under controlled laboratory conditions, but investigations done in situ (Da niel et al., 2003; Kooistra et al., 2003), or using air borne remote sensing (Palacios Orueta and Ustin, 1998; Chen et al., 2000; Hill and Schtt, 2000; Galvo et al., 2001; Fox and Sabbagh, 2002), have produced promising results, with coefficients of dete rmination (R2) ranging from 0.45 (Kooistra et al., 2003) to 0.93 (Chen et al., 2000). Outdoor conditions such as sunlight, atmospheric condition, soil moisture, particle size, shade, and vegetation cover influence soil spectra, introducing noise and posing complications to the use of VNIRS in the field. Different pre processing transformations have been applied in numerous studies to transform the soil spectral data, remove noise, accentuate features, and prepare them for chemometric modeling. Pre processin g transformations of spectral data constitute an important step in multivariate calibration (Stark, 1988) and have been shown to improve the accuracy of prediction models (Dunn et al., 2002; McCarty et al., 2002; Kooistra et al., 2003). The most common pre processing transformations include: smoothing, averaging, normalization, scatter correction, baseline correction, and derivatives. Albeit numerous pre processing transformations have been proposed in VNIRS, the choice of which pre processing transformatio n to use is somewhat arbitrary, and to what degree that affects the final predictions of soil properties is little known. While comparative modeling studies have been presented to infer on soil carbon (Fidncio et al., 2002a; Brown et al., 2006), the selec tion of pre processing transformations is less well documented.
27 Therefore, the main objective of this study was to compare various multivariate techniques and pre processing transformations of spectral data to determine their suitability for modeling soil carbon using VNIR spectra. We aim to elucidate the choices of multivariate techniques and pre processing transformations of spectral data available for VNIRS modeling of soil carbon. Our specific objectives were the following: (i) to compare thirty pre pro cessing transformations of VNIR spectra for the development of soil total carbon (TC) models; (ii) to compare five multivariate techniques to predict TC using VNIRS; and (iii) to validate the derived soil carbon models using an independent dataset of soil TC. We expected that nonparametric regression models would perform better than parametric regression models to predict soil TC. The study was conducted using a dataset of diverse soils collected within the Santa Fe River watershed, in northcentral Florid a. Materials and M ethods Study Area The study was conducted in the Santa Fe River watershed (SFRW), which is located between latitudes 29.63 and 30.21 N and longitudes 82.88 and 82.01 W. The study area is approximately 3585 km2, and spans nine counties in northcentral Florida. The climate is subtropical, with mean annual precipitation of 1224 mm and mean annual temperature of 20.5 oC (19 972008) ( National Climatic Data Center, 2008 ). Dominant soil orders of the watershed are: Ultisols (47%), Spodosols (27% ), and Entisols (17%). Histosols, Inceptisols and Alfisols occupy the remaining areas. Overall, soils in the SFRW have sandy eluvial horizons commonly overlain by loamy to clayey illuvial horizons. Most frequent soil series are: Sapelo, Blanton, Ocilla, Ma scotte, and Foxworth, and most frequent soils are: Ultic Alaquods, Grossarenic Paleudults, Aquic Arenic Paleudults, and Typic Quartzipsaments. Hydric soils are present in 12% of the sampling sites.
28 Predominant land use/land cover consist of pine plantations (30%), wetlands (14%), pasture (13%), rangeland (12%), and upland forest (11%) ( Florida Fish and Wildlife Conservation Commission, 2003a ). Urban areas occupy less than 7% of the watershed, and crops around 5%. The topography consists of level to slightly undulating slopes varying from 0 to 5%, with elevations ranging from around 1.5 to 92 m above mean sea level (United States Geological Survey, 1999). The geology is dominated by limestone and karst terrain, capped by Miocene, Pliocene and Pleistocene Holo cene sediments (Randazzo and Jones, 1997; Brown et al., 1990). Field Sampling Soil samples were collected during five sampling events between September 2003 and January 2005. Field sampling consisted of 141 sites spread over the study area in a stratified random design, with stratification based on soil orders and land cover/land use (Figure 22). At each site, composite soil samples were collected at four depths: 0 30, 3060, 60120, and 120180 cm, totaling 554 samples: 141 at the top layer, 141 at 3060 cm, 139 at 60120 cm, and 133 samples at 120 180 cm. Laboratory Analysis Basic sample preparation consisted of sieving using a 2mm mesh, air drying, and ball milling. Total carbon was determined by dry combustion on a FlashEA 1112 Elemental Analyzer (Ther mo Electron Corp., Waltham, MA), and is reported in mg kg1. The laboratory TC measurements showed a positively skewed distribution. Thus, a log10 transformation was applied to TC in order to normalize the distribution before running the regressions. This provided better predictions for all multivariate techniques tested. Only the results obtained with the log transformed variables are discussed here. The four soil depths were combined with the aim of producing a more robust model of TC, given that pooled t ogether they represent a higher diversity of soil characteristics with more
29 variable TC content. The whole dataset (n = 554) was split randomly into 400 samples (~70%) for calibration (model development) and 154 samples (~30%) for validation. The Levenes test for equality of variances (Levene, 1960) and the Students t test of equality of means were performed between the calibration and validation sets to make sure the validation set was truly representative. Spectroscopy In order to remove the effect of m oisture, the soil samples were dried for 12 h at 4045 C. They were scanned using a QualitySpec Pro spectroradiometer (Analytical Spectral Devices Inc., Boulder, CO). The instrument measures reflectance in the wavelength range of 3502500 nm, at 1nm int ervals. The soil samples were scanned four times, with replicates collected at angles of 90. Reference spectrum using Spectralon (LabSphere, North Sutton, NH) was collected prior to the first scan and at every 25 samples. An average spectral curve was cal culated for each sample (from the four scans) that was further used for transformat ions and chemometric modeling. Prep rocessing Transformations Two pre processing transformations were applied as a standard preparation of the soil reflectance curves, befor e the other preprocessing transformations were applied. First, to reduce random noise, the reflectance curves were smoothed across a moving window of 9 nm using the Savitzky Golay algorithm with a 3rdorder polynomial (Savitzky and Golay, 1964). Second, t o reduce the dimensionality of the data, and to match the spectral resolution of the spectroradiometer (Analytical Spectral Devices Inc., 2008), the reflectance values were averaged across a 10 nm window. This twostep pre treatment (SAV) reduced the soil spectral curves to 214 reflectance values, and prepared them to further pre processing.
30 In this study, a total of 30 pre processing transformations were compared to prepare the soil spectral curves for multivariate calibration. These included: Savitzky Gol ay (Savitzky and Golay, 1964) and Norris (Norris and Williams, 1984) derivatives, Kubelka Munk transformation (Kubelka and Munk, 1931), reflectance to absorbance transformation, baseline offset, standardization, and normalizations ( CAMO Technologies Inc., 2006) Table 2 2 shows the complete list of pre processing transformations tested, with their respective optional parameters. Multivariate Techniques Five multivariate techniques were tested to relate the soil spectral curves to the TC values measured on t he elemental analyzer using the calibration set. These were: stepwise multiple linear regression (SMLR); PCR (Martens and Ns, 1989); PLSR (Martens and Ns, 1989); RT (Breiman et al., 1984); and committee trees (CT; Breiman, 1996). The best model was chose n based on the coefficient of determination of validation (Rv 2; Equation 21). Complementary error statistics were also provided, including the root mean square error (RMSE; Equation 2 2), and the residual prediction deviation (RPD; Equation 23; Williams, 1987). n i i n i iy y y y R1 2 1 2 2 (2 1) n y y RMSEn i i i 1 2 (2 2) 1 / n n RMSE d s RPDv val (2 3) Where: y = predicted values; y = mean of observed values; y = observed values; n = number of predicted/observed values with i = 1, 2, ... n; vald s = standard deviation of the validation set; RMSEv = root mean square error of validation. A robust model to predict TC should be capable of accurately predicting both very high and very low TC values. Thus, outliers were eliminated from the sample only when laboratory measurement errors were evident. Very high and very low TC values were checked against soil
31 type and land use information in order to identify if they we re justifiable from an environmental perspective. If not, they were considered outliers, and removed from the dataset. Since no evidence was present of laboratory errors, no outliers were removed. The complete set containing 554 measurements was therefore used in the analysis. The SMLR algorithm uses a combination of the forward and backward selection techniques, in which the variables are added and removed according to a tolerance significance level, based on the F probability, which was set to 0.05. The a ssumptions about the distribution of the SMLR model residuals were checked to assure that the predictions obtained by the least squares approximation are valid and constitute the best linear unbiased estimators of the regression parameters (Berry and Feldm an, 1985). Stepwise MLR was implemented in SPSS 11.0 (SPSS Inc., Chicago, IL). Principal components are commonly used to reduce the dimensionality of a large number of potentially correlated variables, avoiding problems of multicollinearity, with minimal l oss of information of the original variables (Ns et al., 1986; Martens and Ns, 1989). The PCR algorithm was performed as a two step operation (CAMO Software Inc., 2006). First, the independent variables were decomposed into orthogonal Principal Component s (PCs), using the NIPALS (Non linear Iterative Partial Least Squares) algorithm (Martens and Ns, 1989) and full crossvalidation on the calibration set. The number of PCs was chosen as the one that minimized the RMSE of crossvalidation (RMSEcv), accordi ng to Martens and Ns (1989), up to a maximum of 20 PCs. The predictions were obtained by multiple linear regression of the targe t variable on the selected PCs. Partial least squares regression has the same general structure as PCR, with the advantage that it also takes into account the dependent variables when calculating the PCs (Geladi and
32 Kowalski, 1986; Martens and Ns, 1989). Partial least squares regression reduces the data, noise, and computation time, with minor loss of the information contained in the original variables. Partial least squares regression was performed on the calibration set using the orthogonalized PLSR algorithm for one Y variable (PLS 1) and full cross validation. The number of partial least squares (PLS) factors was chosen to min imize the RMSEcv. Both PCR and PLSR were implemented in the Unscrambler 9.5 software (CAMO Technologies Inc., Woodbridge, NJ). Regression tree analysis fundamentally performs a binary recursive partitioning of the dataset (Breiman et al., 1984; Steinberg a nd Colla, 1997). At each terminal node, a predicted value is obtained as the average of all the measurements that were grouped in that node. Committee trees use the same principles of RT, with the difference that a bootstrap algorithm, in this case bagging (Breiman, 1996), is implemented to provide an aggregated predictor based on a committee of trees. The predicted value calculated by CT is an average over the multiple versions of predictors formed by making bootstrap replicates of the learning set. A comm ittee of 100 trees was used to calibrate the soil carbon equations using the calibration set, with a maximum number of sample redraws of three. The optimal number of tree nodes was identified in both RT and CT by minimizing the least squares error. Regress ion tree and CT were implemented in CART 5.0 (S alford Systems, San Diego, CA). After the models were generated on the calibration set by the combinations of multivariate technique and pre processing transformation, they were validated using the separated v alidation set.
33 Results and D iscussion Descriptive Statistics Considering the whole dataset, TC showed a positively skewed distribution, with mean 7235 mg kg1, median 2903 mg kg1, and range between 169 mg kg1 and 268,995 mg kg1. The log10 transformat ion confirmed that TC had a lognormal frequency distribution. The minimum and maximum values of LogTC were 2.2279 and 5.4297, respectively, and the mean and median were 3.5118 and 3.4643, respectively. Table 23 lists the descriptive statistics of TC and LogT C with respect to the whole dataset, calibration set, and validation set. The highest TC values occurred in wetland soils, mainly Histosols, Mollisols, and Inceptisols. These soils are frequently saturated with water, which promotes the accumulation of car bon due to the less efficient anaerobic decomposition of organic matter. The Levenes test indicated homogeneity of variance of LogTC values between the calibration and validation sets (p = 0.432). Comparison between the mean LogTC values for the calibration and validation sets did not show a significant difference between them, according to the Students t test (p = 0.725) at a 0.05 confidence level This similarity between the calibration and validation sets is indicative that the randomly separated valid ation samples appropriately represent the population under study. Stepwise Multiple Linear Regression Overall, SMLR yielded Rv 2 ranging from 0.66 to 0.85, with an average Rv 2 of 0.82 for TC (Table 2 4). The RMSEv varied from 0.176 to 0.269, with an average of 0.199. The number of predictors selected by the SMLR varied from 6 to 35, with an average of 19. Except for KMT, all models had a reasonable RPD ( 1.97). Five pre processing transformations produced the highest Rv 2: SNV, SGF 1 9, SGF 2 9, NRA and LOG. The SGF 19 transformation was selected because it produced the most parsimonious model, with 12 selected predictors. The error
34 statistics for the SGF 1 9 SMLR model were: RMSEc = 0.178 and RMSEv = 0.178, and the coefficients of determination were: Rc 2 = 0.88 and Rv 2 = 0.85. Figure 23 shows the predicted versus observed values for the SGF 1 9 SMLR model. Table 25 presents the descriptive statistics of predicted LogTC for the best models obtained with the five multivariate techniques tested. The most important variables in the bestfit SMLR model are indicated by the regression coefficients listed in Equation 2 4, and are located around wavelengths: 430, 460, 950, 1320, 1330, and 2220 nm. Wavelengths with secondary importance, i.e., that had smaller coeffi cients in the model, were: 590, 1020, 1850, 1990, 2070, and 2400 nm. 2400 2220 2070 1990 1850 1330 1320 1020 950 590 460 430R R R R R R R R R R R R LogTC 72.13 379.54 159.28 + 155.25 72.80 469.10 + 545.59 176.47 + 358.36 148.55 391.44 + 366.90 4.52 = (2 4) Where: Rw = reflectance at wavelength w in nm The best SMLR model produced in this study was worse than some produced by similar studies with MLR or SMLR (K rishnan et al., 1980; Meyer, 1989; Chen et al., 2000). Nevertheless, the Rv 2 of 0.85 is comparable to that obtained by Dalal and Henry (1986), and superior to the one obtained by Al Abbas et al. (1972), both of which predicted SOC. The results obtained by SMLR are comparable to the other multivariate techniques tested here and elsewhere ( e.g., Dunn et al., 2002; Shepherd and Walsh, 2002; Islam et al., 2003; Kooistra et al., 2003; Brown et al., 2006), contradicting our expectations that data mining technique s would perform better than ordinary least squares methods. Principal Components Regression The models obtained by PCR using different pre processing transformations are summarized in Table 2 6. Best results were derived by NRA, SAV, and SNV (Rv 2 = 0.84),
35 however, PCR generally produced slightly worse results than SMLR. Two possible explanations to this behavior could be that, first, the PCR models are over fitted, compared to the SMLR models, and second, that the wavelengths selected by the SMLR models explained most of the variability of TC, leaving no significant variability to be explained by the rest of the wavelengths in the PCR models. The average Rv 2 was 0.76, and the worst pre processing transformation was SGS23, with an Rv 2 of 0.56. The number of principal components (PCs) used in the model varied from 3 to 20, with a mean of 12 PCs. The pre processing transformation selected for PCR was NRA, since it produced the lowest RMSEv (0.183), using 18 PCs. The average RMSEv was 0.224, and the worst RMSEv (0.307) was obtained after SGS 23. Principal components regression suffers from the same problems of other least squares techniques, which reduce the variability and the range of the predicted variable when compared to the original values, producing a sm oother frequency distribution. The predicted versus observed values for the NRA PCR model are plotted in Figure 24, while the descriptive statistics of predicted LogTC values are listed in Table 2 5. The goodness of fit and error statistics for the NRA PC R model were as follows: Rcv 2 = 0.85; Rv 2 = 0.84; RMSEcv = 0.196; and RMSEv = 0.183; these results are comparable to similar studies (Chang et al., 2001; Islam et al., 2003). The RPD of 2.49 confirmed that VNIRS associated with PCR can be used as a rapid a nd accurate technique for the assessment of soil carbon. Partial Least Squares Regression The TC models produced using PLSR are summarized in Table 27. Partial least squares regression performed better than PCR and SMLR, with an average Rv 2 of 0.82. Based on the similarity between PCR and PLSR, it was expected that these two multivariate techniques would behave similarly across the series of pre processing transformations. The highest Rv 2 (0.86) was
36 obtained after LOG, followed closely by SGF 1 9, SGF 29, SAV, SGF 3 9, and SNV, all of them with a Rv 2 of 0.85. The number of PLS factors varied from 6 to 19, with a mean of 9. The worst pre processing transformation was KMT (Rv 2 = 0.68; RMSEv = 0.259). The average RMSEv was 0.193, and the lowest was 0.177, obt ained after SGF 1 9, SGF 2 9, and SGF 3 9. The pre processing transformation selected for PLSR was SGF 1 9, because it produced a more parsimonious model than LOG, with 7 PLS factors, and had the lowest RMSEv (0.177). Figure 25 shows the predicted versus observed values for the SGF 1 9PLSR model. Descriptive statistics of predicted LogTC from the SGF 19 PLSR model are listed in Table 2 5. The goodness of fit and error statistics for the SGF 1 9 PLSR model were as follows: Rcv 2 = 0.84; Rv 2 = 0.85; RMSEcv = 0.200; and RMSEv = 0.177. This result is in conformity with results obtained by other studies of VNIRS of soil carbon with PLSR ( e.g., Fidncio et al., 2002a; McCarty et al., 2002; refer to Table 21). Overall, PLSR performed consistently well no matter which pre processing transformation was applied to the data. The RPD varied from 1.76 to 2.57, with mean 2.37, which indicates that the models are robust enough to predict carbon from similar soils, i.e., from the same geographical region and within the sa me range (Brown et al., 2005). Regression Tree The RT models produced the worst results among all multivariate techniques tested, with only one model exceeding a RPD of 2. The Rv 2 varied from 0.51 to 0.76, with an average of 0.67. The number of variables selected by the model varied from 2 to 16, with a mean of 7. The number of terminal nodes, which defines the number of groups of predicted values, varied from 3 to 22, with an average of 10 (Table 28). The pre processing transformation which gave the hi ghest Rv 2 was NGD 5, and this was the pre processing adopted to create the RT model of LogTC. The resulting NGD 5RT model
37 selected 9 variables and had 12 terminal nodes. The goodness of fit and error statistics for this model were as follows: Rc 2 = 0.86; Rv 2 = 0.76; RMSEc = 0.191; and RMSEv = 0.226. The predicted versus observed LogTC values obtained by the NGD 5RT model are shown in Figure 26. One explanation of why RT was not as good as the other multivariate techniques is the fact that it produces dis crete outputs, predicting a single value at each terminal node (Breiman et al., 1984). Given that soil TC is a continuous property, it is more appropriate to choose one of the other multivariate techniques used in this study, which produce continuous outputs. Nevertheless, the Rv 2 value of 0.76 is higher than some results found in the literature (compare Table 2 1). The predicted values of the NGD 5RT model show a characteristic pattern of RT models: all the values in a specific terminal node receive the s ame predicted value. This creates the stepped pattern seen in Figure 26. The descriptive statistics of predicted LogTC produced by the NGD 5RT model are listed in Table 2 5. Contrary to the other multivariate techniques, the NGD 5RT model produced the s ame range of predicted TC for the calibration and validation sets. This is again related to how the method is implemented, and to the discrete output options for the predicted value. Table 2 8 shows two interesting patterns. First, similar to the other mul tivariate techniques, the models based on the Savitzky Golay derivatives selected the same variables and split positions despite the polynomial order. The difference is that, for RT, the degree of the derivative was a less important factor in the final res ults than for the other multivariate techniques. Second, the same tree was created based on differently pre processed data, as was the case with SAV,
38 KMT and LOG. This indicates that the most informative wavelengths for TC are still maintained in the trans formed spectral curves after the pre processing transformations. It is worth noting that RT does not make predictions based on a regression model. Instead, the predicted values at each terminal node are the mean values across the group of samples that fell under that node (Breiman et al., 1984). If not for predicting the variable of interest, the tree model is at least a formidable technique to separate the sample population into clusters with some degree of similarity, offering an alternative way to identi fy implicit relationships between the dependent and independent variables. Committee Trees Overall, CT was superior to RT and PCR, but slightly inferior to SMLR and PLSR (Table 29). The average Rv 2 for the CT models was 0.79, and the minimum was 0.62. The RMSEv varied from 0.170 to 0.287, with an average of 0.207. In terms of model stability, the majority of the pre processing transformations tested generated a RPD > 2, indicating that the models can be used if the same conditions persist. The best model w as obtained after NGD 7, giving the highest Rv 2 and lowest RMSEv. Individually, this model (NGD 7CT) offered the best combination of multivariate technique and pre processing transformation when compared to the models discussed until this point. The coeff icients of determination and RMSE were: Rc 2 = 0.97; Rv 2 = 0.86; RMSEc = 0.087; and RMSEv = 0.170. Table 25 lists the descriptive statistics of predicted LogTC, and Figure 2 7 shows the predicted versus observed LogTC for the NGD 7 CT model. The results obtained by the NGD 7CT model demonstrate that CT can be used to predict TC with high accuracy. The main drawback of this method is its nontransparency. The use of bagging to develop the tree committee prohibits the display of the model structure.
39 Overall, all multivariate techniques produced good predictive models of soil carbon, which confirms the suitability of visible/near infrared spectroscopy to predict soil carbon at the landscape scale. Except for RT, all multivariate techniques tested had comparabl e results. Variable Selection The wavelengths that were selected or important for the best models obtained by the combination of multivariate techniques and pre processing transformations are presented in Figure 28. Overall, the best models consistently s elected variables in the spectral regions of the absorption features of C H, N H and O H groups. Variables in the visible portion of the spectrum were also important due to the sensitivity of the regression models to chromophorous constituents present in t he soil; these include humic substances, iron oxides, and other soil minerals, which jointly confer distinct colors to the soil (Bowers and Hanks, 1965; Hunt, 1977; Torrent et al., 1983). Similarly, organic pigments have absorption features close to 960 nm (Gaffey et al., 1993). Reflectance values between 1100 and 1400 nm, and from 2100 to 2500 nm are in the region of overtones and combinations of the fundamental vibrations of C H groups, including: CH, CH2 and CH3 (Goddu and Delker, 1960; Gaffey et al., 1993; Analytical Spectral Devices Inc., 2003). Amine and amide absorbance peaks are accounted for by the model coefficients between 2000 and 2200 nm. Hydroxyl overtones and combinations contribute to the coefficients at around 1700 and 1860 nm. The strongest absorption features of O H groups, including water, occur at around 1400 and 1900 nm (Siesler et al., 2002). All the multivariate techniques tested had important variables close to 1900 nm, and only RT did not select variables around 1400 nm. Prep roce ssing Transformations Some studies reported improvements of the VNIRS models by using pre processing transformations, such as 1st and 2nd derivatives (Dunn et al., 2002), normalization of the data
40 (McCarty et al., 2002), and scatter corrections (Kooistra e t al., 2003), while others found better results with untransformed reflectance data (Kooistra et al., 2001). Considering all the multivariate techniques we tested, Savitzky Golay derivatives were consistently among the five best pre processing transformati ons. The degree of the derivative was the most important factor to determine the regression performance, followed by the search window size. Except for RT, all multivariate techniques showed improvement in predictions using first degree derivative and a search window with nine measurements, as compared to other Savitzky Golay alternatives. The order of the polynomial seldom had any influence on the final model. The best data normalization technique was SNV, followed by NRA. However, data normalization was o nly justifiable for SMLR, and PCR, since they produced poorer results for both tree based techniques, and for PLSR, relative to simply using SAV. The other normalization techniques NME, and NMX did not improve the models obtained by any of the multivar iate techniques relative to SAV. Baseline offset, LOG and KMT performed worse than we expected, since these techniques are commonly listed in the VNIRS literature for data preparation. The treebased techniques benefited from NGD pre processing transformat ion, and the size of the search window was a relevant factor. We cannot explain why tree based models were more sensitive to NGD, and parametric models were more sensitive to Savitzky Golay derivatives. However, we believe this is useful information in the practical sense to help narrow the choices of pre processing transformations when the multivariate technique is already defined. Our study indicates that pre processing transformation of spectral data generally improves the TC models obtained with VNIRS, and is more important in the context of tree based techniques than for the parametric multivariate techniques. The Savitzky Golay derivative,
41 especially SGF, is probably the most suitable technique to be used for data preparation, since both parametric and nonparametric multivariate techniques improved when it was applied. Norris gap derivative was the best pre processing transformation for the nonparametric multivariate techniques. Conclusions The best multivariate technique to predict organic carbon content from soil VNIR spectra was PLSR. Overall, parametric multivariate techniques outperformed tree based ones. Based on the mean Rv 2 and mean RMSEv, the predictive ability of the multivariate techniques tested decreased in the following order: PLSR > SMLR > CT > PCR > RT. The combination of pre processing transformation and multivariate technique that produced the best TC model was NGD 7CT (Rv 2 = 0.86, RMSEv = 0.170), but only small predictive accuracy was gained in relation to PLSR, and SMLR. Spectral d ata preparation improved the VNIRS models of TC, and the choice of pre processing transformation depended on the multivariate technique used. The best pre processing transformation for parametric multivariate techniques was SGF, while NDG was preferred for tree based modeling. Visible/near infrared spectroscopy associated with multivariate calibration offered a rapid and accurate approach to predict TC for the Santa Fe River watershed. The methodology used in this study can be extended to areas other than t he SFRW, offering a cost effective technique to assess soil carbon in Florida, and elsewhere. As the demand for accurate soil data increases, it becomes necessary to evaluate new techniques of data collection and analysis that offer robust results with les s cost and time requirements. Visible/near infrared spectroscopy can provide relatively cheap data for inference systems, helping reduce the costs of digital soil mapping endeavors.
42 Table 2 1. Investigations on diffuse reflectance spectroscopy of soil car bon and soil organic matter. Source Spectral range (m) Soil carbon Location/ area (ha) Soil depth (cm) Pre processing Method Calib/ valid1 Range (g kg-1) RMSE (g kg-1)/ RPD BestR 2 Al Abbas et al. (1972) 0.42.6 SOC Indiana/ 25 02 N/A SMLR 134 7.562 0.57 Krishnan et al. (1980) 0.42.4 SOM Illinois N/A LOG, 1 st and 2 nd DER SMLR 10 1151 0.98 Dalal and Henry (1986) 1.12.5 SOC Australia 0120 LOG MLR 72 2.725.1 2.2 0.86 Meyer (1989) 1.52.4 SOM South Africa 025 LOG MLR 96 370.9 4.6 2 0.90 Palacios Orueta and Ustin (1998) 0.42.5 SOM California 03 N/A PCR 74 7.661.6 0.51 Masserschmidt et al. (1999) 2.525 SOM Brazil N/A LOG, KM, NORM, MSC, 1st DER, PCA PLSR 31 6.86.3 0.96 120.6 Chen et al. (2000) Visible SOC Georgia/ 115 015 Log arithm MLR 28/31 0.93 Hill and Schtt (2000) 0.51.7 SOC Spain N/A NORM, smoothing Polynomial 91 815.7 0.79 0.80 3 Chang et al. (2001) 1.32.5 TC USA 030 LOG, smoothing, 1st DER PCR 30 1.3285.8 7.9/2.8 0.87 Reeves III et al. (2001) 2.525 TC Maryland 020 LOG PLSR 120/60 6.133.9 1.3 0.93 Chang and Laird (2002) 1.12.5 TC Iowa N/A N/A PLSR 76/32 15.4145.1 6.5/4.4 0.91 SOC 15.4144.9 6.2/4.2 0.89 SIC 0.035.7 1.5/5.5 0.96 Dunn et al. (2002) 0.42.5 SOC Australia 010 LOG, 1 st and 2 nd DER PLSR 270/90 6.430.0 2.5 2 /1.7 0.66 Fidncio et al. (2002a) 1.02.5 SOM Brazil LOG, 2 nd DER, NORM PLSR 140/60 4.048.8 4.3 0.77 MLP 3.2 0.88 RBFN 2.5 0.92
43 Table 2 1. Continued. Source Spectral range (m) Soil carbo n Location/ area (ha) Soil depth (cm) Pre processing Method Calib/ valid1 Range (g kg-1) RMSE (g kg-1)/ RPD BestR 2 Fidncio et al. (2002b) 1.02.5 SOM Brazil 0100 LOG, 2 nd DER PLSR 70/30 0.810.7 0.2 0.90 Fox and Sabbagh (2002) 0.41.0 SOM Iowa/32.4 02.5 N/A Exponential 123 1489 0.74 SOM Iowa/43.9 113 1229 0.76 McCarty et al. (2002) 1.12.5 TC Central USA 0200 LOG, 1 st and 2 nd DER, MSC, SNV, NORM PLSR 177/60 0.98104 5.4 0.86 SIC 0.065.4 3.1 0.87 SOC 0.2398 5.5 0.82 2.525 TC 0.98104 3.4 0.95 SIC 0.065.4 1.2 0.98 SOC 0.2398 3.2 0.97 Reeves III et al. (2002) 0.42.5 TC Maryland 020 LOG, NORM, 1st DER PLSR 179 6.133.9 0.8 0.97 2.525 6.133.9 0.7 0.98 Shepherd and Walsh (2002) 0.42.5 SOC Africa 020 Smoothing, 1 st DER, PCA MARS 674/337 2.355.8 3.1 0.80 Daniel et al. (2003) 0.41.2 SOM Thailand N/A Averaging GMDH 23/10 0.86 0.41.1 SOM 0.85 4 Islam et al. (2003) 0.32.5 SOC Australia N/A N/A PCR 121/40 0.649.5 4.4 2 /1.7 0.76 K ooistra et al. (2003) 0.42.5 SOM The Netherlands 03 1 st and 2 nd DER, SNV, MSC, GA PLSR 70/35 112.8 18.1 0.69 SOM 112.8 24.6 0.45 4 Udelhoven et al. (2003) 0.42.5 SIC Germany/13 1530 NORM, 1 st DER, C H PLSR 165 2.4 0.93 SOC 1.4 0.60 Brown et al. (2005) 0.42.5 SIC Montana 0100 Smoothing, 1 st DER, PCA PLSR 198/85 0.026.1 1.6/4.5 0.96 5 SOC 1.9315.8 1.1/2.6 0.86 5 SIC 235237/47 0.018.9 1.2/6.0 0.98 6 SOC 5.315.8 3.5/0.9 0.85 6 Brown et al. (2006) 0.42.5 SO C Global N/A Smoothing, 1 st DER, PCA BRT 3793 0.0536.8 9 0.82 SIC 4184 0.0128.8 6.2 0.83
44 Table 2 1. Continued. Source Spectral range (m) Soil carbon Location/ area (ha) Soil depth (cm) Pre processing Method Calib/ valid1 Range (g kg-1) RMSE (g kg-1)/ RPD BestR 2 Vgen et al. (2006) 0.42.5 SOC Madagascar 020 Smoothing, 1 st DER, MSC PLSR N/A 3.3120.8 8.4 0.92 Viscarra Rossel et al. (2006) 0.40.7 SOM Australia/ 17.5 020 LOG PLSR 118 8.119.8 1.8 0.60 7 0.72.5 SOM 1.8 0.60 7 2.525 SO M 1.5 0.73 7 0.425 SOM 1.5 0.72 7 Abbreviations: BRT = boosted regression trees; C H = convexhull computation; Calib = calibration sample; DER = derivative; GA = genetic algorithm; GMDH = advanced group method of data handling; K M = Kubelka Munk transformation; LOG = log (1/Reflectance); MARS = multivariate adaptive regression splines; MLP = multi layer perceptron networks; MLR = multiple linear regression; MSC = multiplicative scatter correction; NORM = normalization; PCA = principal compon ents analysis; PCR = principal components regression; PLSR = partial least squares regression; RBFN = radial basis function networks; RMSE = root mean squar e error; RPD = residual prediction deviation; SIC = soil inorganic carbon; SMLR = stepwise multiple linear regression; SNV = standard normal variate transformation; SOC = soil organic carbon; SOM = soil organic matter; TC = soil total carbon; Valid = validation sample. 1 Values not shown indicate that either calibration or cross validation R2 is shown. 2 Standard error. 3 Landsat TM bands used for calibration. 4 Soil reflectance collected in the field. 5 Validation set taken randomly from the same population. 6 Validation set taken from an independent population. 7 Adjusted R2
45 Table 2 2. Pre processin g transformations applied to the spectral curves of soil samples Pre processing transformation 1 Search window 2 Abbreviation Savitzky Golay smoothing, and averaging 9 SAV Baseline offset 1 BLO Kubelka Munk transformation 1 KMT Log (1/Reflectance) 1 LOG Normalization by the maximum value 1 NMX Normalization by the mean 1 NME Normalization by the range 1 NRA Norris gap derivative 3 NGD 3 Norris gap derivative 5 NGD 5 Norris gap derivative 7 NGD 7 Norris gap derivative 9 NGD 9 Savitzky Golay 1 st de rivative using a 1 st order polynomial 3 SGF 1 3 Savitzky Golay 1 st derivative using a 1 st order polynomial 5 SGF 1 5 Savitzky Golay 1 st derivative using a 1 st order polynomial 7 SGF 1 7 Savitzky Golay 1 st derivative using a 1 st order polynomial 9 SGF 1 9 Savitzky Golay 1 st derivative using a 2 nd order polynomial 3 SGF 2 3 Savitzky Golay 1 st derivative using a 2 nd order polynomial 5 SGF 2 5 Savitzky Golay 1 st derivative using a 2 nd order polynomial 7 SGF 2 7 Savitzky Golay 1 st derivative using a 2 nd order polynomial 9 SGF 2 9 Savitzky Golay 1 st derivative using a 3 rd order polynomial 5 SGF 3 5 Savitzky Golay 1 st derivative using a 3 rd order polynomial 7 SGF 3 7 Savitzky Golay 1 st derivative using a 3 rd order polynomial 9 SGF 3 9 Savitzky Golay 2 nd derivative using a 2 nd order polynomial 3 SGS23 Savitzky Golay 2 nd derivative using a 2 nd order polynomial 5 SGS25 Savitzky Golay 2 nd derivative using a 2 nd order polynomial 7 SGS27 Savitzky Golay 2 nd derivative using a 2 nd order polynomial 9 SGS29 Savitzky Golay 2 nd derivative using a 3 rd order polynomial 5 SGS35 Savitzky Golay 2 nd derivative using a 3 rd order polynomial 7 SGS37 Savitzky Golay 2 nd derivative using a 3 rd order polynomial 9 SGS39 Standard normal variate transformation 1 SNV 1 Savitzky Golay smoothing, and averaging, were used as a standard preparation of the soil spectral curves. This standard curve was used as the input to all other pre processing transformations 2 A search window of 1 indicates that only the reflectance value at the corresponding wavelength is used in the transformation.
46 Table 2 3. Soil total carbon (TC) and logtransformed TC (LogTC) descriptive statistics for the whole dataset, calibration set, and validation set. Statistic TC (mg kg 1 ) LogTC [log ( mg kg 1 )] Whole set Calibration Validation Whole set Calibration Validation N 554 400 154 554 400 154 Mean 7235 7840 5662 3.5072 3.5118 3.4953 S EM 805 1070 805 0.0210 0.0254 0.0368 Median 2903 2913 2711 3.4628 3.4643 3.4331 S D 18,942 21,393 9990 0.4939 0.5080 0.4565 Variance 358,791,447 457,678,614 99,808,802 0.2439 0.2580 0.2084 Skewness 8.82 8.13 7.78 0.49 0.55 0.24 Kurtosis 96.18 79.54 77.62 0.55 0.69 0.18 Range 268,826 268,826 109,407 3.2019 3.2019 2.5977 Minimum 169 169 277 2.2279 2.2279 2.4425 Maximum 268,995 268,995 109,684 5.4297 5.4297 5.0401 Abbreviations: N = number of observations; SD = standard deviation; SEM = standard error of the mean
47 Table 2 4. Summary statistics for the spectral models of logtransformed soil total carbon ( LogTC) produced by stepwise multiple linear regression (SMLR). Pre processing transformation Number of predictors Calibration Validation Rc 2 RMSEc [log (mg kg 1 )] Rv 2 RMSEv [log (mg kg 1 )] RPD SNV 23 0.91 0.149 0.85 0.176 2.59 SGF 1 9 12 0.88 0.178 0.85 0.178 2.56 SGF 2 9 12 0.88 0.178 0.85 0.178 2.56 NRA 14 0.88 0.174 0.85 0.178 2.56 LOG 14 0.87 0.182 0.85 0.179 2.54 SAV 16 0.86 0.187 0.84 0.187 2.43 SGF 1 7 17 0.88 0.173 0.84 0.187 2.43 SGF 2 7 17 0.88 0.173 0.84 0.187 2.43 SGF 3 7 18 0.89 0.172 0.84 0.187 2.43 NGD 5 12 0.87 0.181 0.84 0.189 2.41 BLO 15 0.86 0.187 0.83 0.188 2.42 SGF 3 9 23 0.90 0.164 0.83 0.190 2.39 NGD 7 16 0.88 0.175 0.83 0.195 2.33 NME 12 0.85 0.196 0.82 0.193 2.36 NMX 15 0.87 0.186 0.82 0.193 2.36 SGF 1 3 23 0.89 0.165 0.82 0.199 2.29 SGF 2 3 23 0.89 0.165 0.82 0.199 2.29 NGD 3 28 0.90 0.158 0.82 0.202 2.25 SGS27 27 0.90 0.161 0.81 0.200 2.28 SGS37 27 0.90 0.161 0.81 0.200 2.28 SGF 3 5 20 0.89 0.169 0.81 0.203 2.24 NGD 9 9 0.86 0.193 0.81 0.207 2.20 SGF 1 5 18 0.89 0.172 0.80 0.209 2.18 SGF 2 5 18 0.89 0.172 0.80 0.209 2.18 SGS29 20 0.88 0.176 0.80 0.211 2.16 SGS39 20 0.88 0.176 0.80 0.211 2.16 SGS25 32 0.90 0.158 0.77 0.224 2.03 SGS35 32 0.90 0.158 0.77 0.224 2.03 SGS23 35 0.89 0.169 0.75 0.231 1.97 KMT 6 0.69 0.283 0.66 0.269 1.69 Minimum 6 0.69 0.149 0.66 0.176 1.69 Maximum 35 0.91 0.283 0.85 0.269 2.59 Mean 19 0.88 0.177 0.82 0.199 2.30 Std. deviation 7 0.04 0.024 0.04 0.020 0.20 Abbreviations: Rc 2 = c oefficient of determination of calibration; RMSEc = r oot mean square error of calibration; RMSEv = root mean square error of validation; RPD = residual prediction deviation; Rv 2 = c oefficien t of determination of validation
48 Table 2 5. Descriptive statistics of predicted log transforme d soil total carbon (LogTC) for the calibration and validation sets for the best models obtained from the five multivariate calibration techniques tested. Statistic Observed LogTC [log (mg kg1)] Predicted LogTC [log (mg kg 1 )] SGF 1 9SMLR NRA PCR SGF 1 9PLSR NGD 5RT NGD 7CT Calibration set N 400 400 400 400 400 400 Mean 3.5118 3.5118 3.5118 3.5118 3.5118 3.5128 S EM 0.0254 0.0238 0.0234 0.0233 0.0235 0.0233 Median 3.4643 3.4216 3.4085 3.4270 3.5010 3.4413 S D 0.5080 0.4756 0.4685 0.4669 0.4706 0.4661 Variance 0.2580 0.2262 0.2195 0.2180 0.2214 0.2173 Skewness 0.55 0.61 0.71 0.56 0.70 0.43 Kurtosis 0.69 0.08 0.25 0.01 0.52 0.07 Range 3.2019 2.6185 2.8260 2.9250 2.1634 2.5706 Minimum 2.2279 2.5753 2.5040 2.4580 2.8119 2.4350 Maximum 5.4297 5.1938 5.3300 5.3830 4.9753 5.0056 Validation set N 154 154 154 154 154 154 Mean 3.4953 3.4789 3.4723 3.4706 3.4837 3.4851 S EM 0.0368 0.0355 0.0342 0.0356 0.0350 0.0334 Median 3.4331 3.3727 3.3730 3.4155 3.5010 3.4284 S D 0.4565 0.4400 0.4240 0.4418 0.4339 0.4147 Variance 0.2084 0.1936 0.1798 0.1952 0.1883 0.1720 Skewness 0.24 0.32 0.45 0.35 0.34 0.32 Kurtosis 0.18 0.34 0.46 0.47 0.20 0.70 Range 2.5977 2.3485 1.9560 2.1900 2.1634 1.9434 Minimum 2.4425 2.3816 2.7160 2.5470 2.8119 2.6738 M aximum 5.0401 4.7301 4.6720 4.7370 4.9753 4.6172 Abbreviations: N = number of observations; SD = standard deviation; SEM = standard error of the mean
49 Table 2 6. Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by principal components regression (PCR). Pre processing transformation Number of principal components Calibration Validation Rcv 2 RMSEcv [log (mg kg 1 )] Rv 2 RMSEv [log (mg kg 1 )] RPD NRA 18 0.85 0.196 0.84 0.183 2.49 SAV 19 0.85 0.198 0.84 0.185 2.46 SNV 18 0.87 0.185 0.84 0.184 2.47 SGF 1 9 12 0.83 0.212 0.83 0.193 2.36 SGF 2 9 12 0.83 0.212 0.83 0.193 2.36 SGF 3 7 14 0.81 0.224 0.82 0.195 2.34 SGF 1 7 12 0.81 0.222 0.82 0.197 2.31 SGF 2 7 12 0.81 0.222 0.82 0.197 2.31 SGF 1 5 15 0.83 0.212 0.82 0.200 2.28 SGF 2 5 15 0.83 0.212 0.82 0.200 2.28 NGD 5 16 0.83 0.212 0.82 0.201 2.26 SGF 3 9 13 0.82 0.216 0.81 0.199 2.29 LOG 13 0.77 0.242 0.81 0.201 2.26 NGD 9 14 0.80 0.229 0.81 0.207 2.20 NMX 11 0.75 0.253 0.80 0.202 2.25 NGD 7 12 0.78 0.236 0.80 0.211 2.16 BLO 10 0.75 0.253 0.80 0.206 2.21 NME 10 0.73 0.263 0.79 0.207 2.20 NGD 3 16 0.81 0.220 0.79 0.216 2.11 SGS29 14 0.75 0.254 0.74 0.233 1.96 SGS39 14 0.75 0.254 0.74 0.233 1.96 SGF 1 3 6 0.68 0.285 0.72 0.246 1.85 SGF 2 3 6 0.68 0.285 0.72 0.246 1.85 SGF 3 5 5 0.67 0.293 0.71 0.250 1.82 SGS27 12 0.67 0.293 0.64 0.278 1.64 SGS37 12 0.67 0.293 0.64 0.278 1.64 KMT 3 0.60 0.319 0.62 0.284 1.60 SGS25 6 0.53 0.347 0.58 0.297 1.53 SGS35 6 0.53 0.347 0.58 0.297 1.53 SGS23 20 0.57 0.331 0.56 0.307 1.48 Minimum 3 0.53 0.185 0.56 0.183 1.48 Maximum 20 0.87 0.347 0.84 0.307 2.49 Mean 12 0.75 0.251 0.76 0.224 2.08 Std. deviation 4 0.09 0.046 0.09 0.038 0.31 Abbreviations: Rc v 2 = coefficient of determination of c rossvalida tion; RMSEc v = root mean square error of c rossvalid ation; RMSEv = root mean square error of validation; RPD = residual prediction deviation; Rv 2 = coefficient of determination of validation.
50 Table 2 7. Summary statistics for the spectral models of log transformed soil total carbon (LogTC) produced by partial least squares regression (PLSR). Pre processing transformation Number of PLS factors Calibration Validation Rcv 2 RMSEcv [log (mg kg 1 )] Rv 2 RMSEv [log (mg kg 1 )] RPD LOG 13 0.88 0.173 0.86 0.180 2.53 SGF 1 9 7 0.84 0.200 0.85 0.177 2.57 SGF 2 9 7 0.84 0.200 0.85 0.177 2.57 SAV 10 0.85 0.194 0.85 0.178 2.55 SGF 3 9 7 0.85 0.195 0.85 0.177 2.57 SNV 9 0.87 0.182 0.85 0.178 2.56 SGF 3 7 7 0.86 0.192 0.84 0.181 2.52 SGF 3 5 7 0.86 0.189 0.84 0.182 2.50 SGF 1 7 7 0.85 0.197 0.84 0.184 2.47 SGF 2 7 7 0.85 0.197 0.84 0.184 2.47 NRA 10 0.86 0.190 0.84 0.183 2.49 NGD 3 9 0.86 0.188 0.84 0.186 2.44 NMX 10 0.85 0.197 0.84 0.187 2.43 SGF 1 5 8 0.86 0.188 0.83 0.191 2.38 SGF 2 5 8 0.86 0.188 0.83 0.191 2.38 NGD 5 7 0.83 0.208 0.83 0.193 2.36 BLO 11 0.86 0.190 0.83 0.190 2.39 SGF 1 3 7 0.85 0.195 0.83 0.193 2.36 SGF 2 3 7 0.85 0.195 0.83 0.193 2.36 NGD 7 9 0.85 0.197 0.83 0.196 2.32 NME 9 0.83 0.210 0.83 0.192 2.37 SGS29 8 0.85 0.198 0.82 0.193 2.36 SGS39 8 0.85 0.198 0.82 0.193 2.36 NGD 9 7 0.82 0.218 0.82 0.200 2.28 SGS27 7 0.84 0.205 0.79 0.207 2.20 SGS37 7 0.84 0.205 0.79 0.207 2.20 SGS25 9 0.84 0.204 0.78 0.214 2.13 SGS35 9 0.84 0.204 0.78 0.214 2.13 SGS23 19 0.87 0.183 0.78 0.217 2.10 KMT 6 0.68 0.289 0.68 0.259 1.76 Minimum 6 0.68 0.173 0.68 0.177 1.76 Maximum 19 0.88 0.289 0.86 0.259 2.57 Mean 9 0.84 0.199 0.82 0.193 2.37 Std. deviation 3 0.03 0.019 0.04 0.017 0.18 Abbreviations: PLS = partial least squ ares; Rcv 2 = coefficient of determination of cross validation; RMSEcv = root mean square error of cross validation; RMSEv = root mean square error of validation; RPD = residual prediction deviation; Rv 2 = coefficient of determination of validation
51 Table 28. Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by regression tree (RT). Pre processing transformation Number of predictors Number of terminal nodes Calibration Validation Rc 2 RMSEc [log (mg kg-1)] Rv 2 RMSEv [log (mg kg-1)] RPD NGD5 9 12 0.86 0.191 0.76 0.226 2.01 NGD7 6 22 0.91 0.149 0.75 0.233 1.95 NGD9 12 16 0.98 0.177 0.74 0.238 1.91 SGS 2 7 13 15 0.87 0.183 0.74 0.238 1.91 SGS 3 7 13 15 0.87 0.183 0.74 0.238 1.91 SGF 1 9 8 11 0.84 0.200 0.74 0.239 1.90 SGF 2 9 8 11 0.84 0.200 0.74 0.239 1.90 SGF 3 5 12 15 0.88 0.174 0.73 0.243 1.87 SGF 1 7 16 18 0.89 0.165 0.70 0.270 1.69 SGF 2 7 16 18 0.89 0.165 0.70 0.270 1.69 SAV 7 9 0.77 0.241 0.68 0.260 1.75 KMT 7 9 0.77 0.241 0.68 0.260 1.75 LOG 7 9 0.77 0.241 0.68 0.260 1.75 SGS 2 9 9 11 0.84 0.206 0.68 0.263 1.73 SGS 3 9 9 11 0.84 0.206 0.68 0.263 1.73 SGS 2 5 9 11 0.82 0.214 0.66 0.271 1.68 SGS 3 5 9 11 0.82 0.214 0.66 0.271 1.68 SGS 2 3 5 7 0.70 0.280 0.65 0.272 1.67 SGF 3 7 3 4 0.66 0.295 0.65 0.273 1.67 SNV 2 3 0.59 0.325 0.65 0.274 1.66 NME 7 9 0.81 0.224 0.64 0.273 1.67 SGF 1 5 3 4 0.66 0.295 0.64 0.277 1.64 SGF 2 5 3 4 0.66 0.295 0.64 0.277 1.64 SGF 3 9 3 4 0.66 0.295 0.64 0.277 1.64 SGF 1 3 4 5 0.72 0.271 0.64 0.281 1.62 SGF 2 3 4 5 0.72 0.271 0.64 0.281 1.62 NGD3 3 4 0.66 0.298 0.63 0.282 1.61 BLO 4 5 0.66 0.298 0.63 0.284 1.60 NMX 7 18 0.86 0.189 0.63 0.288 1.58 NRA 6 8 0.72 0.271 0.51 0.324 1.40 Minimum 2 3 0.59 0.149 0.51 0.226 1.40 Maximum 16 22 0.98 0.325 0.76 0.324 2.01 Mean 7 10 0.78 0.232 0.67 0.265 1.73 Std. deviation 4 5 0.10 0.051 0.05 0.021 0.14 Abbreviations: Rc 2 = coefficient of determination of calibration; RMSEc = root mean square error of calibration; RMSEv = root mean square error of validati on; RPD = residual prediction deviation; Rv 2 = coefficient of determination of va lidation
52 Table 2 9. Summary statistics for the spectral models of logtransformed soil total carbon (LogTC) produced by committee trees (CT). Pre processing transformation C alibration Validation Rc 2 RMSEc [log (mg kg 1 )] Rv 2 RMSEv [log (mg kg 1 )] RPD NGD 7 0.97 0.087 0.86 0.170 2.68 SGF 1 9 0.97 0.090 0.86 0.172 2.65 SGF 2 9 0.97 0.090 0.86 0.172 2.65 NGD 9 0.97 0.092 0.85 0.178 2.56 NGD 5 0.97 0.093 0.84 0.181 2.51 SGF 1 7 0.97 0.095 0.84 0.181 2.51 SGF 2 7 0.97 0.095 0.84 0.181 2.51 SGF 3 7 0.97 0.095 0.84 0.184 2.47 NGD 3 0.97 0.096 0.83 0.186 2.45 SGF 1 5 0.97 0.097 0.83 0.187 2.43 SGF 2 5 0.97 0.097 0.83 0.187 2.43 SGF 3 9 0.97 0.098 0.83 0.190 2.39 SGS29 0.97 0.095 0.82 0.192 2.37 SGS39 0.97 0.095 0.82 0.192 2.37 SGF 1 3 0.97 0.097 0.81 0.201 2.26 SGF 2 3 0.97 0.097 0.81 0.201 2.26 SGF 3 5 0.97 0.097 0.79 0.207 2.20 SGS25 0.97 0.103 0.79 0.209 2.18 SGS35 0.97 0.103 0.79 0.209 2.18 SGS27 0.97 0.093 0.79 0.211 2.16 SGS37 0.97 0.093 0.79 0.211 2.16 SNV 0.94 0.125 0.78 0.216 2.11 SGS23 0.96 0.115 0.77 0.222 2.05 BLO 0.94 0.130 0.73 0.237 1.92 NME 0.95 0.124 0.73 0.241 1.89 KMT 0.95 0.122 0.71 0.246 1.85 NMX 0.94 0.129 0.71 0.248 1.83 LOG 0.95 0.122 0.70 0.253 1.80 SAV 0.95 0.122 0.70 0.253 1.80 NRA 0.93 0.142 0.62 0.287 1.59 Minimum 0.93 0.087 0.62 0.170 1.59 Maximum 0.97 0.142 0.86 0.287 2.68 Mean 0.96 0.104 0.79 0.207 2.24 Std. deviation 0.01 0.015 0.06 0.030 0.29 Abbreviations: Rc 2 = coefficient of determination of calibration; RMSEc = root mean square error of calibration; RMSEv = root mean square error of validation; RPD = residual prediction deviation; Rv 2 = coefficient of determination of va lidation
53 Figure 21. Diffu se reflectance curves of different soil orders present in the dataset, along with important absorbance regions rela ted to soil carbon in the visible/near infrared region and their responsible che mical groups.
54 Figure 22. Sampling locations and soil orde rs within the Santa Fe River watershed (SFRW). Abbreviations: SSURGO = Soil Survey Geographic Database; FDEP = Florida Department of Environmental Protection.
55 Figure 23. Predicted versus observed log transformed soil total carbon (LogTC) for the Savitz ky Golay 1stderivative using a 1storder polynomial with search window 9 (SGF 1 9) stepwise multiple linear regression (SMLR) model.
56 Figure 24. Predicted versus observed log transformed soil total carbon (LogTC) for the normalization by the range (N RA) principal components regression (PCR) model.
57 Figure 25. Predicted versus observed log transformed soil total carbon (LogTC) for the Savitzky Golay 1stderivative using a 1storder polynomial with search window of 9 (SGF 1 9) partial least squar es regression (PLSR) model.
58 Figure 26. Predicted versus observed log transformed soil total carbon (LogTC) for the for the Norris gap derivative with a search window of 5 (NGD 5) regression tree (RT) model.
59 Figure 27. Predicted versus observed lo gtransformed soil total carbon (LogTC) for the for the Norris gap derivative with a search window of 7 (NGD 7) committee trees (CT) model.
60 Figure 28. Important wavelengths used by the best models of log transformed soil total carbon (LogTC) obtained by the different multivariate techniques and corresponding pre processing transformations: Savitzky Golay 1stderivative using a 1storder polynomial with search window 9 (SGF 1 9) stepwise multiple linear regression (SMLR) model; normalization by the r ange (NRA) principal components regression (PCR) model; Savitzky Golay 1stderivative using a 1storder polynomial with search window of 9 partial least squares regression (PLSR) model; and Norris gap derivative with a search window of 5 (NGD 5) regr ession tree (RT) model. Relevant absorbance regions related to soil carbon in the visible/near infrared region and their responsible chemical groups are shown.
61 CHAPTER 3 MODELING OF SOIL ORG ANIC CARBON FRACTIONS USING VISIBLE/NEAR INFRARED SPECTROSCOP Y1 Summary There is pressing need for rapid and cost effective tools to estimate soil carbon across larger landscapes. Visible/near infrared diffuse reflectance spectroscopy (VNIRS) offers comparable levels of accuracy to conventional laboratory methods for e stimating various soil properties. We used VNIRS to estimate soil total organic carbon (TC) and four organic carbon fractions in 141 samples collected in the Santa Fe River watershed of Florida. The carbon fractions measured were (in order of decreasing po tential residence time in soils): recalcitrant carbon (RC), hydrolyz able carbon (HC), hot water soluble carbon (SC), and mineralizable carbon (MC). Soil samples were scanned in the visible/near infrared spectral range. Six pre processing transformations we re applied to soil reflectance, and five multivariate techniques were tested to model soil TC and organic carbon fractions: stepwise multiple linear regression (SMLR), principal components regression (PCR), partial least squares regression (PLSR), regressi on tree (RT), and committee trees (CT). Total organic carbon was estimated with highest accuracy, obtaining a coefficient of determination using a validation set (Rv 2) of 0.86, followed by RC (Rv 2 = 0.82), both using PLSR. The SC fraction was modeled best by SMLR (Rv 2 = 0.70), while PLSR produced the best models of MC (Rv 2 = 0.65), and HC (Rv 2 = 0.40). The addition of TC as a predictor improved the VNIRS models of the soil organic carbon fractions. Our study indicates the suitability of VNIRS to quantify so il organic carbon pools with widely varying turnover times in soils, which are important in the context of carbon sequestration and climate change. 1 Published in the Soil Science Society of America Journal 73, 176 184, 2009.
62 Introduction Soil organic carbon sequestration has received much attention recently as the concentration of carbon dioxide rises in the atmosphere, intensifying climate change (Keeling et al., 1995; Carbon Diox ide Information Analysis Center, 2008; Grunwald, 2008b). Longterm sequestration of carbon in soils typically involves the decomposition of biologically l abile organic matter, into more recalcitrant, macro molecules, through the process of humification (Quideau, 2006). Several, discrete organic carbon pools can be identified on the basis of size, turnover rate and ecosystem function. The smallest pool with the most rapid turnover rate is typically designated labile (e.g., residence time of days to years), and the larger pool, with the longest residence time (e.g., decades to thousands of years), is described as recalcitrant (McLauchlan and Hobbie, 2004; Rice 2006). Thus, quantifying the different compartments of soil organic carbon improves understanding of how and at what rate, stable forms of carbon are being formed or lost in soils. Moreover, the dynamics of soil organic carbon across a landscape are strongly controlled by environmental determinants such as temperature and moisture, which are sensitive to climate change. Recent evidence suggests that soil carbon is being lost over wide regions in response to climate change (Bellamy et al., 2 005). Better tools are needed to monitor changes in soil organic carbon over large regions and to provide inputs to process based models of carbon dynamics (Parton et al., 1983; Coleman and Jenkinson, 1996). Characterizing organic carbon pools across large regions is cri tical to understanding the dynamics of soil carbon in the context of climate change. However, measurement of discrete soil organic carbon fractions is time consuming and requires intensive field sampling and costly laboratory analyses. An alternative appro ach to estimate soil carbon in a cost effective manner is to build soil spectral libraries using visible/near infrared diffuse reflectance spectroscopy (VNIRS) and chemometric modeling (Shepherd and Walsh, 2002; Brown et al., 2006).
63 Visible/near infrared d iffuse reflectance spectroscopy has been used in the last decades for the rapid characterization of various materials (McClure, 2003). Numerous soil carbon models derived from visible/near infrared spectra have been presented and validated (Chang and Laird, 2002; Dunn et al., 2002; McCarty et al., 2002; Shepherd and Walsh, 2002; Islam et al., 2003; Brown et al., 2005). Although many soil properties have been investigated using VNIRS, less attention has been given to VNIRS modeling of soil physical and organic carbon fractions. Our objective was to estimate total soil organic carbon (TC) and four soil carbon fractions using VNIRS. These fractions are, in order of decreasing stability and residence time in soils: recalcitrant organic carbon (RC), hydrolyz able organic carbon (HC), hot water soluble organic carbon (SC), and mineralizable organic carbon (MC). Materials and Methods Field and Laboratory Measurements Soil samples were collected in the Santa Fe River watershed, a 3585 km2 watershed in northcentral F lorida. A total of 141 soil samples were collected from the surface to a depth of 30 cm, and sieved through a 2mm mesh. The soil samples proportionally represent all soil orders and land uses that occur in the watershed. The stratifiedrandom sampling des ign we used was presented in Chapter 2 Thirty six percent of the samples occurred in Ultisols, 28% in Spodosols, and 22% in Entisols. Other soil orders accounted for the remaining 14% of the samples, and included: Alfisols (11%), Mollisols (2%), and Inceptisols (1%). Major land uses where the samples were collected were: pine plantations (28%), improved pasture (15%), upland forest (14%), and wetlands (13%). Remaining land uses included: urban (11%), agriculture (10%), and rangelands (9%). The soil samples were dried for 12 hours at 45 C and scanned using a QualitySpec Pro spectroradiometer (Analytical Spectral Devices Inc., Boulder, CO). The instrument collects
64 diffuse reflected light in the wavelength range of 3502500 nm, with ten coadded scans averag ed at 1 nm intervals. An average spectral curve was calculated based on four scans of each soil sample, rotated by an angle of 90 degrees. The soil samples were sieved through a 2mm mesh and ball milled prior to chemical analysis of organic carbon fractio ns. In our study area, organic carbon represents more than 98% of the soil total carbon ( Guo et al., 2006a ; N.B. Comerford personal communication); therefore soils were not pretreated with acid to remove carbonates. All analytical methods used to measure organic carbon concentrations in the samples are well documented in the literature and are in routine use. Total organic carbon was determined by high temperature combustion on a FlashEA 1112 Elemental Analyzer (Thermo Electron Corp., Waltham, MA). Recalcit rant organic carbon was measured on the elemental analyzer using soil samples that were refluxed with 6N HCl for 16 hours, following the methods of Paul et al. (2001), and McLauchlan and Hobbie (2004). Hydrolyz able organic carbon was computed as the differ ence between TC and RC. Soluble organic carbon was extracted using hot water, according to Sparling et al. (1998), and Gregorich et al. (2003), and measured on a Shimadzu TOC 5050 Analyzer (Shimadzu Scientific Instruments Inc., Columbia, MD) employing Pt c atalyzed combustion and nondispersive infrared detection of CO2. The extracts were centrifuged, decanted, and filtered using a 0.2 m filtration membrane prior to the determination of organic carbon. Soil organic carbon mineralization rate was estimated b ased on soil respiration inside an incubation chamber. Before incubation, 1 g of soil was wetted daily to 100% water holding capacity for the first 5 days in 12mL vials, and pre incubated in the open system at 35 C in the dark. After the pre incubation, the vials were filled with CO2free air, sealed with rubber septa, and incubated at 35 C in the dark. The first measurement of CO2 concentration was taken after 3
65 days of incubation (8th day), using a CO2Coulometer (UIC Inc., Joliet, IL) with CO2free ai r used as a purge and carrier gas. Following CO2 concentration measurements were taken on a weekly basis until the 36th day of incubation. After the first incubation period of 3 days, during which CO2 release was relatively high, mineralization rates becam e constant during the remainder of the incubation period. The rate of CO2 release was modeled from the 8th until the 36th day of incubation using linear regression ( R2= 0.98, data not shown). Mineralizable organic carbon was calculated by integrating these measurements between the 15th and 29th day of incubation. The TC and carbon fractions covered a great variability of different soils and land uses within the watershed. Our aim was to represent this environmental variability in the spectral dataset; thus, high and low values (e.g., high TC in wetlands; low TC in Entisols) were not excluded from the dataset. All soil organic carbon properties were positively skewed, and had a lognormal frequency distribution. Thus, the VNIRS models were developed based on l ogtransformed values that approximated a Gaussian distribution. Pretreatment of Soil Spectra and Multivariate Methods The collected soil spectral curves were composed of 2151 reflectance measurements (bands) for each sample. The average soil spectral cur ves, obtained from the four rotations, were smoothed across the spectral bands (wavelengths) using a Savitzky Golay smoothing algorithm (Savitzky and Golay, 1964) with a 3rdorder polynomial across a 9band window, and then averaged (pooled) across 10nm i ntervals to match the spectral resolution of the spectroradiometer in the near infrared region (Analytical Spectral Devices Inc., 2008). This resulted in the reduction of the soil spectra to 214 reflectance values.
66 We compared six preprocessing transforma tions to prepare soil spectra for analysis, and five multivariate methods to develop the predictive models. The six pre processing transformations were assembled to represent an array of different types of pre treatments that can be employed to transform s pectral data, and included: smoothing, standardization, normalization, and derivation routines. They were selected based on a comprehensive comparative analysis described Chapter 2 The six pre processing transformations tested were: Savitzky Golay smoothi ng across a 9 band window, followed by averaging across a 10band window (SAV); log(1/reflectance) (LOG); normalization by the range (NRA); Norris gap derivative across a 7 band window (NGD); Savitzky Golay 1st derivative using a 1storder polynomial, acro ss a 9 band window (SGD); and standard normal variate transformation (SNV). SavitzkyGolay smoothing across a 9band window, followed by averaging across a 10band window, was used as a standard preparation of the soil spectral curves; the SAV transformed curves served as input to all other pre processing transformations tested. All pre processing transformations were implemented in the Unscrambler 9.5 software (CAMO Software Inc., Woodbridge, NJ). The five multivariate methods we compared to model soil organic carbon fractions using VNIRS were: stepwise multiple linear regression (SMLR), principal components regression (PCR), partial least squares regression (PLSR), regression tree (RT), and committee trees (CT). All the models were developed using a calibr ation dataset, comprising 102 measurements that were randomly selected from the whole dataset. A validation set of 39 measurements was used to evaluate the accuracy of the different multivariate methods and preprocessing transformations. The coefficient o f determination (R2) was used to compare the models, but other error statistics were provided, including the root mean square error (RMSE), and the residual prediction deviation (RPD; Williams, 1987) (Equations 31 through 33, respectively).
67 n i i n i iy y y y R1 2 1 2 2 (3 1) n y y RMSEn i i i 1 2 (3 2) 1 / n n RMSE d s RPDv val (3 3) Where: y = predicted values; y = mean of observed values; y = observed values; n = number of pre dicted/observed values with i = 1, 2, n; vald s = standard deviation of the validation set; RMSEv = root mean square error of validation. Stepwise multiple linear regressions used a cutoff F threshold of 0.05 to include and exclude variables from the models. Principal components regression and PLSR were developed using full (leave one out) cross validation on the calibration set. The optimum number of principal components (PCs) was chosen based on the RMSE of cross validation (RMSEcv) (Ma rtens and Ns, 1989), up to a maximum of 20 PCs. Stepwise regression was performed in SPSS 11.0 (SPSS Inc., Chicago, IL), while PCR and PLSR were developed in the Unscrambler 9.5 software (CAMO Software Inc., Woodbridge, NJ). Regression tree (Breiman et al ., 1984) and CT (Breiman, 1996) are nonparametric data mining techniques, which have recently been incorporated in soil science (Fidncio et al., 2002 a ; Shepherd and Walsh, 2002; Brown et al., 2006). They have no assumptions about the distribution of the data, and can identify nonlinear relationships in the data, offering an alternative to linear methods to analyze soil properties. The CT models were generated by bagging, using a committee of 100 trees, with a maximum number of sample redraws of 3. The tr ees were pruned based on the least squares error. The estimated values were calculated by averaging multiple versions of predictors generated by bootstrapping, using 10fold cross validation (Breiman,
68 1996). Regression tree and CT were implemented in CART 5.0 (Salford Systems, San Diego, CA). In order to improve the estimations of soil organic carbon fractions, TC was added as an additional predictor along with the VNIR reflectance values, and PLSR was used to derive the models. Complementarily, a simple li near regression model was derived for each soil organic carbon fraction using TC as the only explanatory variable. Results and Discussion Descriptive Statistics The descriptive statistics of all soil organic carbon properties measured, both in the original units and in log10 units, are shown in Table 31. Total organic carbon varied from 2670 to 201,988 mg kg1, with a mean of 14,828 mg kg1, and median of 10,529 mg kg1. Except for HC, the range of values in the validation set was encompassed by the range of the calibration set. Mean and median values were smaller in the validation sets for all soil organic carbon properties due to the presence of carbon rich wetland soils in the calibration set, which accounted for the extremely high values. The SC and MC fractions represent the most labile carbon fractions in the soil, i.e., the most readily available to heterotrophic microorganisms. The SC fraction is composed of water soluble organic compounds, including simple organic molecules. The HC fraction is less labile than SC, but still can be utilized by organisms employing enzymes and hydrolytic mechanisms to acquire soil nutrients. The remaining fraction, RC, is the most stable soil carbon pool, and is composed of complex organic molecules of low decomposabili ty by microorganisms (e.g., humic and fulvic acids) (Rice, 2006). This fraction is associated to the longterm accumulation of soil organic matter, and thus the most important fraction from a carbon sequestration perspective.
69 There was an inverse relation ship between lability of the organic carbon fractions and their relative sizes in the Santa Fe soils (Table 3 1). On average MC was the smallest organic carbon fraction, followed by SC, then HC; and RC was the largest fraction. Average MC was 111 mg kg1, and minimum and maximum MC values were, respectively, 18 and 1036 mg kg1. During the MC incubations, soil carbon showed a steady rate of mineralization between the 15th and 29th day of incubation, with an average of 7.91 mg kg1 day1. This steady carbon mineralization rate was confirmed until the 36th day of incubation, the day when the experiment was terminated. The labile HC and SC fractions accounted for an average of 25 and 5% of TC, respectively. The HC concentrations ranged from 37 to 29,399 mg kg1, with an average of 3707 mg kg1. The SC concentrations varied from 221 to 8995 mg kg1, with an average of 809 mg kg1. The RC fraction accounted for an average of 75% of the organic carbon in the samples. It ranged from 1150 to 181,738 mg kg1, with an average of 11,122 mg kg1. As the most stable organic carbon fraction, RC is composed mainly of complex humic substances, with relatively high lignin content and C/N ratio (Bouchard and Cochran, 2006; Rice, 2006). Recalcitrant organic carbon in the region of study possibly originates mainly from the humification of pine litter and residues of pasture and crops. Saturated soil conditions also foster the accumulation of stable carbon forms due to reduced oxidation of organic materials (Bouchard and Cochran, 2006). All soil organic carbon properties were highly variable across the Santa Fe River watershed, with coefficients of variation of at least 89% for the whole sample sets. The sampling design covered an extensive variety of soil types and land uses, from Histosols to Entisols, and
70 from wetlands to urban areas, which explains the large range of values for the soil organic carbon properties. The Pearsons correlations between the soil organic carbon properties were all significant at the 0.01 confidence leve l (Table 3 2). All correlation coefficients were above 0.50, with highest values between TC, RC, and SC. Mineralizable organic carbon had good linear correlation with all properties except HC; and HC had the lowest correlations with the other soil organic carbon properties. High significant linear correlations between TC and the soil organic carbon fractions justified the addition of TC as an auxiliary predictor in the VNIRS models of the soil organic carbon fractions. Visible/Near Infrared Spectroscopy Mod els of Soil Organic Carbon Properties Amongst the five soil organic carbon properties investigated, TC was estimated with highest accuracy. The maximum coefficient of determination of validation (Rv 2) obtained for TC, 0.86, was from the PLSR model after LO G transformation, and the LOG PLSR model also had the highest residual prediction deviation of the models tested (RPD = 2.64) (Table 33). Chang et al. (2001) categorized the accuracy and stability of their spectroscopy models based on the RPD values. Valu es above 2.0 were considered stable and accurate predictive models; RPD values between 1.4 and 2.0 indicated fair models that could be improved by more accurate predictive techniques; and RPD values below 1.4 indicated poor predictive capacity. The TC mode ls for the Santa Fe soils had comparable accuracy to models developed using VNIRS produced elsewhere (Chang et al., 2001; Chang and Laird, 2002; McCarty et al., 2002). However, we could not find VNIRS models of discrete soil carbon fractions in the literature, thus we could not compare our results for these properties. Since the VNIRS models were validated, and some of them produced RPD values above 2.0, this indicates that the models are reliable, and offer good generalization potential. In the case
71 of HC, SC, and MC, RPD values below 2.0 indicate that there is room for improvement of these models (Chang et al., 2001). The models obtained for each soil organic carbon property by the different multivariate methods, associated with their respective best preprocessing transformations, amongst the six pre processing transformations tested are summarized in Table 33. Usually, pre processing transformations of spectral data improve the accuracy of regression models. Some studies report improvements of the regres sion models by using 1st and 2nd derivatives (Dunn et al., 2002), normalization of the data (McCarty et al., 2002), and scatter corrections (Kooistra et al., 2003), while others found better results with untransformed reflectance data (Kooistra et al., 200 1). In this study, the preferred pre processing transformation associated with the different multivariate methods varied according to the soil organic carbon property investigated. Only NRA and SGD were not selected as the best pre processing transformatio ns for any of the multivariate methods or soil organic carbon properties investigated. When considering the calibration quality, all soil organic carbon properties were estimated with high accuracy, even HC, whose CT model had a coefficient of determination of calibration (Rc 2) of 0.89. In terms of calibration, CT provided the best results, with Rc 2 varying from 0.87, for the MC model, to 0.93, for the TC and RC models. However, when validated using the independent validation set, CT models performed poorly and were only better than the RT models. The main reason for the poor performance of RT and CT is that these models are data mining techniques that require large datasets for robust model predictions. The 102 measurements contained in the calibration may have been too limiting to produce models that have good generalization capacity, in other words, that have high Rv 2 and RPD, and low RMSEv.
72 In validation mode, all but the HC models of soil organic carbon properties were robust, explaining at least 65% of the variability of the validation set in the case of MC, and up to 86% of the variability of the validation set for the TC model. Biplots of estimated and measured soil organic carbon concentrations, derived using the validation set, are presented in Figure 3 1. The plots show trends of underestimation of high values, and overestimation of low values, for HC, SC and MC. The estimated TC and RC values, however, closely approximated the 1:1 line and show little bias. This indicates that the TC and RC models can be reasonably generalized to estimate total organic carbon, as well as the most stable organic carbon fraction. It is worth noting that the low HC value of 1.5646 in Figure 31c in the validation set was lower than HC values found in the calibration se t. Since this low range of HC values was not covered in the calibration set it lead to an extrapolation in validation mode. Thus, it explains the o verestimation of this HC value. Among the multivariate methods tested, PLSR provided the best validation results. Similar to PCR, PLSR is a robust statistical method that uses all the available reflectance data to build the models. Both methods can deal with collinearity, and are fairly robust to nonlinearity and data outliers. The main advantage of PLSR relativ e to PCR is that it takes into account the variability of the target variable when calculating the factors, which is not the case with PCR (Martens and Ns, 1989). The PLSR models generated for the different soil organic carbon properties selected a minimu m of 5 factors (RC model), and a maximum of 12 (TC model). The number of partial least squares (PLS) factors was chosen based on the Rv 2. Figure 32 shows the cumulative percent of explained variance by the number of PLS factors for the LOG PLSR TC model. According to Figure 32, one can observe that, since the TC model was developed using 12 PLS
73 factors, it explained virtually 100% of the variability of both dependent (TC) and independent (reflectance data) variables. Thus, the PLSR model reduced the numbe r of predictors from 214 reflectance bands to 12 factors, while keeping almost all of the variability information contained in the 214 bands. The PLSR coefficients used in the models of the five soil organic carbon properties are shown in Figure 33. In both TC and RC models (Figures 33a, 33b), important wavelengths concentrated around: 400; 1000; 1400; 1900; 2100; and after 2200 nm. Since RC represented the greatest part of total organic carbon, the RC and TC models were closely related, and were sensitive to similar spectral regions. Models of HC, SC, and MC (Figures 3 3c, 33d, and 33e) had important wavelengths approximately in the same regions, as shown by the relatively large coefficients around 1400 nm, and between 2050 and 2400 nm. The SC and MC m odels also had important wavelengths close to 1900 nm, which is the region of absorbance features of OH and water. Stepwise multiple linear regression and PCR were the second best multivariate methods. Stepwise multiple linear regression is a relatively ra pid and easy technique to analyze multivariate data and requires that linear relationships exist between the target and predictor variables. Therefore, SMLR usually selects those predictors that have the strongest linear correlations with the target variab le, which will reflect the highest predictive capacity. Since SMLR did not use all the reflectance bands in the models, it suggests that the predictive information contained in the soil spectral curves is actually concentrated in a subset of important wave lengths. Based on this observation, one would expect that the different multivariate methods would consistently select the same spectral regions in the models. This is confirmed in Figure 3 4, which shows the most important wavelengths used to estimate TC by
74 four multivariate methods, associated with their respective best pre processing transformations of soil spectra. Stepwise multiple linear regression, PCR, and PLSR, captured approximately the same regions of absorbance features of the main constituents of soil organic matter. The main absorbance regions selected in the models, and the respective soil organic constituents associated with them, were: ~400 nm chromophorous groups; ~960 nm organic pigments; ~1400 and 1900 nm OH groups, including water; ~2000 to 2200 nm CH and NH groups; and ~2200 to 2400 nm CH groups (Goddu and Delker, 1960; Gaffey et al., 1993; Siesler et al, 2002; Analytical Spectral Devices Inc., 2003). Since the effect of moisture content as well as particle size, was standardized by sieving and ovendrying of the soil samples, one can expect that the reflectance values, especially at 1400 and 1900 nm, actually translate the interaction of soil organic matter with water and soil particles, and not the presence of water per se, or differences in particle size. Regression trees use a different approach to select the most informative predictors in a model. Tree based models partition the data, separating the target variable recursively into more homogeneous classes. The wavelengths s elected by the RT model were mainly in the visible part of the spectrum, which suggests that RT estimated TC mainly based on the color of the soil. Because RT estimates the target variables as discrete values, or classes, RT was not as suitable for estimat ing TC and soil organic carbon fractions, and had the worst performance among the multivariate methods tested for all the soil organic carbon properties investigated. As for TC, the best models of the soil organic carbon fractions (HC, RC, and SC) also con sistently captured the regions of absorbance features (including overtones and combinations of the fundamental vibrations) of important chemical groups related to soil organic matter. This
75 confirms that our VNIRS models were sensitive to soil organic components, and were not a mere consequence of loading the models with multiple predictors. The models developed using TC as an auxiliary explanatory variable are presented in Table 3 4. When simple linear regression was used to estimate soil organic carbon fra ctions as a function of TC alone, the Rv 2 varied from 0.30, for the HC model, to 0.91, for the RC model, whereas the RMSEv varied from 0.069 to 0.304. Recalcitrant organic carbon was estimated with high accuracy probably because it constituted the major pa rt of TC, and thus had the most relevant chemical constituents of soil organic matter with absorbance features in the VNIR region. When TC was added as an additional predictor in the PLSR models of soil organic carbon fractions, all models improved, except the HC model. The RC model showed a substantial response to the addition of TC, with the Rv 2 improving from 0.82 to 0.91. The SC model showed improvement of the Rv 2 from 0.69 to 0.81, whereas the MC model had improvement of the Rv 2 from 0.65 to 0.73. The RPD of all soil organic carbon fraction models improved with the addition of TC as predictor, including the one from the HC model. The improvement of the PLSR models with addition of TC as a predictive variable was expected, since all soil organic carbon fractions were highly correlated with TC (Table 3 2). Determination of total soil organic carbon in the laboratory is relatively easy and cheap, making it a very good auxiliary variable to VNIRS models of soil carbon fractions. More improvement could be ac hieved, especially to the HC models, if other properties correlated to HC were also included in the models. Ideally, these properties would be cheaply and easily measurable. Conclusions Our modeling study indicated that besides TC other ecologically releva nt organic carbon fractions in soils can be estimated from soil spectra in the visible/near infrared range. Total soil
76 organic carbon and the most stable and largest pool of carbon in the soils (RC) were estimated with high accuracy. In contrast, the organic carbon pool with moderate lability, HC, was difficult to model using soil spectra. Visible/near infrared spectroscopy models of the smallest and most labile fractions of soil organic carbon, SC and MC, had intermediate predictive power. Separate validat ion of the models provided evidence to support the use of the VNIRS models developed in this study to estimate soil total organic carbon and soil organic carbon fractions in soils with similar characteristics in Florida. Soil carbon fractions are important components of the soil carbon cycle, and are essential inputs in various process based soil carbon modeling systems. Given that soil carbon fractions are more costly and laborious properties to measure, VNIRS models offered good estimates of these propert ies in a cost effective way. The VNIRS technique offers the possibility for measuring important soil organic carbon pools over large areas and may be useful for monitoring changes in carbon sequestration and storage in the context of climate change. Furthe rmore, addition of TC as an explanatory variable improved the residual prediction deviation of the VNIRS models of all the soil organic carbon fractions analyzed. Soil total organic carbon is relatively cheap to measure, justifying its inclusion in the VNI RS models of soil organic carbon fractions. Our results demonstrate the effectiveness of using VNIRS to estimate ecologically relevant soil organic carbon fractions in a mixeduse landscape in northcentral Florida. Further research is needed to validate t he models developed in this study in other places in Florida, and in the southeastern United States.
77 Table 3 1. Descriptive statistics of measured soil organic carbon properties. Statistic Whole set Calibration Validation Whole set Calibration Validation TC (mg kg 1 ) LogTC (log mg kg 1 ) Mean 14,828 16,625 10,128 4.0327 4.0631 3.9532 Std. error of mean 1852 2518 909 0.0242 0.0305 0.0335 Median 10,529 10,885 9438 4.0224 4.0368 3.9749 Std. deviation 21,993 25,427 5675 0.2879 0.3082 0.2093 Coeff. of var iation 148.32 152.94 56.03 7.14 7.59 5.29 Skewness 6.35 5.51 2.01 1.33 1.31 0.42 Kurtosis 46.74 34.53 4.95 3.98 3.58 0.29 Range 199,318 199,318 25,870 1.8788 1.8788 0.9213 Minimum 2670 2670 3523 3.4265 3.4265 3.5469 Maximum 201,988 201,988 29,393 5.3053 5.3053 4.4682 RC (mg kg 1 ) LogRC (log mg kg 1 ) Mean 11,122 12,634 7165 3.8704 3.9026 3.7862 Std. error of mean 1616 2200 761 0.0276 0.0348 0.0381 Median 7382 7730 6387 3.8682 3.8882 3.8053 Std. deviation 19,194 22,223 4751 0.3277 0.3517 0.2381 C oeff. of variation 172.58 175.90 66.31 8.47 9.01 6.29 Skewness 6.64 5.75 2.24 1.03 0.96 0.50 Kurtosis 51.49 38.03 5.98 3.28 2.97 0.30 Range 180,587 180,587 23,069 2.1986 2.1986 1.0508 Minimum 1150 1150 2253 3.0609 3.0609 3.3527 Maximum 181,738 181,738 25,322 5.2594 5.2594 4.4035 HC (mg kg 1 ) LogHC (log mg kg 1 ) Mean 3707 3991 2963 3.4619 3.4900 3.3884 Std. error of mean 277 369 240 0.0275 0.0305 0.0582 Median 2892 2921 2749 3.4612 3.4655 3.4392 Std. deviation 3292 3725 1502 0.3261 0.3079 0.3635 Coeff. of variation 88.80 93.34 50.69 9.42 8.82 10.73 Skewness 4.58 4.19 1.05 1.41 0.18 3.43 Kurtosis 30.09 23.98 2.57 8.51 2.03 16.65 Range 29,362 29,143 8022 2.9037 2.0607 2.3417 Minimum 37 256 37 1.5646 2.4076 1.5646 Maximum 29,399 29,399 8059 4.4683 4.4683 3.9063 SC (mg kg 1 ) LogSC (log mg kg 1 ) Mean 809 869 655 2.8287 2.8465 2.7824 Std. error of mean 69 93 44 0.0196 0.0248 0.0274 Median 664 697 563 2.8218 2.8431 2.7501 Std. deviation 818 942 272 0.2324 0.2504 0.1710 Coeff. of variation 101.11 108.40 41.53 8.22 8.80 6.15 Skewness 7.57 6.73 0.95 1.01 0.97 0.33 Kurtosis 72.65 55.86 0.19 3.31 3.07 0.88 Range 8774 8774 1048 1.6104 1.6104 0.6218 Minimum 221 221 329 2.3436 2.3436 2.5170 Maximum 8995 8995 1377 3.9540 3.9540 3.1388
78 Table 31. Continued. Statistic Whole set Calibration Validation Whole set Calibration Validation MC (mg kg 1 ) LogMC (log mg kg 1 ) Mean 111 120 86 1.9450 1.9737 1.8699 Std. error of mean 9 12 8 0.0231 0.0283 0.0368 Median 90 94 71 1.9564 1.9744 1.8519 Std. deviation 107 120 51 0.2745 0.2856 0.2299 Coeff. of variation 96.39 99.94 59.80 14.12 14.47 12.29 Skewness 5.39 5.01 1.62 0.50 0.44 0.35 Kurtosis 41.36 33.98 2.22 1.28 1.38 0.06 Range 1018 1018 205 1.7614 1.7614 0.9667 Minimum 18 18 25 1.2541 1.2541 1.3936 Maximum 1036 1036 229 3.0154 3.0154 2.3603 Abbreviations: HC = hydrolyzable organic carbon; LogHC = log10 of hydrolyzable organic carbon; LogMC = log10 of mineralizable organic carbon; LogRC = log10 of recalcitrant organic carbon; LogSC = log10 of hot water soluble organic carbon; LogTC = log10 of total organic carbon; MC = mineralizable organic carbon; RC = recalcitrant organic carbon; SC = hot water soluble organic carbon; TC = total organic carbon.
79 Table 3 2. Pearsons correlation coefficients between the measured soil organic carbon properties. LogTC LogRC LogHC LogSC LogMC LogTC 1.00 0.98** 0.69** 0.90** 0.79** LogRC 0.98** 1.00 0.54** 0.87** 0.78** LogHC 0.69** 0.54** 1.00 0.64** 0.53** LogSC 0.90** 0.87** 0.64** 1.00 0.79** LogMC 0.79* 0.78** 0.53** 0.79** 1.00 Abbreviations: LogHC = l og10 of hydrolyz able organic carbon; LogMC = l og10 of mineralizabl e organic carbon; LogRC = l og10 of recalcitrant organic carbon; LogSC = l og10 of hot water soluble organic carbon; LogTC = l og10 of total organic carbon. ** Correlation is significant at the 0.01 confidence level.
80 Table 3 3. Summary statistics of the models obtained for each soil organic carbon property by the different multivariate methods, associated with their respective best pre proces sing transformations. Multivariate method Pre processing transformation Number of predictors/ factors1 Calibration Validation Rc 2 RMSEc (log mg kg-1) Rv 2 RMSEv (log mg kg-1) RPD LogTC SMLR SNV 6 0.82 0.132 0.77 0.102 2.02 PCR LOG 16 0.90 0.098 0.79 0.095 2.17 PLSR LOG 12 0.93 0.082 0.86 0.078 2.64 RT SAV 9 0.92 0.085 0.68 0.131 1.58 CT NGD 214 0.93 0.087 0.72 0.129 1.60 LogRC SMLR LOG 4 0.84 0.140 0.73 0.124 1.90 PCR SNV 13 0.80 0.157 0.72 0.125 1.88 PLSR SAV 11 0.90 0.109 0.82 0.108 2.17 R T SAV 5 0.86 0.133 0.55 0.181 1.30 CT SAV 214 0.93 0.096 0.69 0.142 1.65 LogHC SMLR SAV 2 0.47 0.223 0.23 0.315 1.14 PCR NGD 8 0.48 0.222 0.40 0.283 1.27 PLSR SAV 5 0.49 0.218 0.40 0.285 1.26 RT SAV 7 0.72 0.163 0.16 0.338 1.06 CT SNV 214 0.89 0.120 0.23 0.316 1.13 LogSC SMLR SNV 7 0.88 0.087 0.70 0.095 1.77 PCR SNV 12 0.78 0.118 0.65 0.101 1.67 PLSR SNV 6 0.81 0.110 0.69 0.100 1.68 RT SAV 4 0.80 0.113 0.44 0.146 1.16 CT SAV 214 0.92 0.072 0.52 0.135 1.25 LogMC SMLR SAV 1 0.56 0.188 0.54 0.157 1.44 PCR SNV 10 0.59 0.182 0.65 0.141 1.61 PLSR SNV 6 0.69 0.159 0.65 0.137 1.66 RT SAV 4 0.55 0.191 0.53 0.161 1.41 CT LOG 214 0.87 0.106 0.51 0.164 1.38 Abbreviations: CT = committee trees; LOG = log(1/reflectance) ; LogHC = l og10 of hydrolyz abl e organic carbon; LogMC = l og10 of mineral izable organic carbon; LogRC = l og10 of recalcitrant organic carbon; LogSC = l og10 of hot water soluble organic carbon; LogTC = l og10 of total organic carbon; NGD = norris gap derivative across a 7 band window; PCR = principal components regression; PLSR = partial least squares regression; Rc 2 = c oefficient of determination of calibration; RMSEc = r oot mean square error of calibration; RMSEv = r oot mean squ are error of validation; RPD = resid ual prediction deviation ; RT = r egression tree; Rv 2 =
81 c oefficient of determination of validation; SAV = Savitzky Golay smoothing across a 9band window, followed by averaging a cross a 10 band window; SMLR = s tepwise mul tiple linear regression; SNV = standard normal variate transf ormation. 1 Predictors refer to the number of reflectance bands used by the SMLR, RT, and CT models; f actors refer to the number of principal components or partial least squares factors used by the PCR and PLSR models ; note that PCR, PLSR, and CT use all the available reflectance bands to calibrate the models, but in PCR and PLSR these reflectance bands are first converted to factors, then the factors are used as predictors in the models.
82 Table 3 4. Summary statistics of the models obtained for each soil organic carbon fraction by simple linear regression, and by partial least squares regression, using both soil reflectance and LogTC as predictors. Soil organic carbon fraction Number of predictors/ factors1 Calibration Validation Rc 2 RMSEc (log mg kg-1) Rv 2 RMSEv (log mg kg-1) RPD Simple linear regression using LogTC as predictor LogRC 1 0.96 0.071 0.91 0.069 3.40 LogHC 1 0.55 0.206 0.30 0.304 1.18 LogSC 1 0.83 0.104 0.75 0.086 1.97 LogMC 1 0.64 0.170 0.47 0.166 1.36 PLSR using SAV transformed soi l reflectance and LogTC as predictors LogRC 2 0.96 0.071 0.91 0.072 3.25 LogHC 7 0.60 0.194 0.36 0.280 1.28 LogSC 4 0.87 0.091 0.81 0.075 2.26 LogMC 4 0.73 0.149 0.73 0.122 1.86 Abbreviations: LogHC = l og10 of hydrolyz able organic carbon; LogMC = l og10 of mineral izable organic carbon; LogRC = l og10 of recalcitrant organic carbon; LogSC = l og10 of hot water soluble organic carbon; LogTC = l og10 of total organic carbon; PLSR = partial least squares regression; Rc 2 = c oefficient of determination of calibr ation; RMSEc = r oot mean square error of calibration; RMSEv = r oot mean square error of validation; RPD = r esidual prediction deviation; Rv 2 = c oefficient of determination of validation; SAV = Savitzky Golay smoothing across a 9band window, followed by av eraging across a 10 band window. 1 Predictors refer to the number of predictors used by the simple linear regression models, only LogTC in this case; f actors refer to the number of partial least squares factors used by the PLSR models.
83 A B C D Figure 31. Estimated versus measured values in the validation of the best visible/near infrared spectroscopy models of the soil organic carbon properties: A) total organic carbon (LogTC) estimated by partial least squares regression (PLSR) using log(1/reflectan ce) transformation; B) recalcitrant organic carbon (LogRC) estimated by PLSR using Savitzky Golay smoothing across a 9band window, followed by averaging across a 10band window (SAV) transformation; C) hydrolyzable organic carbon (LogHC) estimated by PLSR using SAV transformation; D) hot water soluble organic carbon (LogSC) estimated by stepwise multiple linear regression using standard normal variate (SNV) transformation; and E) mineralizable organic carbon (LogMC) estimated by PLSR using SNV transformati on.
84 E Figure 31. Continued.
85 Figure 32. Cumulative percent of explained variance as a function of the number of partial least squares (PLS) factors for the total organic carbon model estimated by partial least squares regression using log(1/reflectan ce) transformation.
86 A B C D E Figure 33. Coefficients used in the partial least squares regression (PLSR) models of soil organic carbon (C) properties: A) t otal organic C using log(1/reflectance) transformation; B) r ecalcitrant organic C using S avitzky Golay smoothing across a 9band window, followed by averaging across a 10band window (SAV) transformation; C ) hydrolyz able organic C using SAV transformation; D ) hot water soluble organic C using standard normal variate (SNV) transformation; and E ) m ineralizable organic C using SNV transformation.
87 Figure 34. Important wavelengths used in the total organic carbon models produced by four multivariate methods, associated with their best pre processing transformations. Abbreviations: SMLR SNV = s tepwise multiple linear regression using standard normal variate transformation; PCR LOG = principal components regression using log(1/reflectance) transformation; PLSR LOG = partial least squares regression using log(1/reflectance) transformation; RT SAV = r egression tree using Savitzky Golay smoothing across a 9band window, followed by averaging across a 10band window, transformation.
88 CHAPTER 4 BUILDING A SPECTRAL LIBRARY TO ESTIMATE SOIL ORGANIC CARBON IN FLORIDA Summary Visible/near infrared spectr oscopy (VNIRS) has been applied to quantify numerous soil properties, offering reasonable accuracy. Our objective was to derive VNIRS models for soil organic carbon (SOC) in mineral (6982 samples), and organic soil horizons (140 samples) in Florida, USA, using committee trees (CT), and partial least squares regression (PLSR). The VNIRS models were validated using independent data sets, and explained up to 71 and 35% of the variability of SOC in mineral and organic horizons, respectively, and up to 77% of the variability of all horizons analyzed together. We stratified the mineral horizons into seven soil orders, and derived PLSR models for each order, which explained from 53 (Inceptisols) to 85% (Mollisols) of the variability of SOC in validation mode. Soil organic carbon estimations from all models were noticeably scattered along the regression lines, especially for high SOC values. Moreover, the slopes of the regression lines were generally smaller then 1, as VNIRS models tended to underestimate high SOC va lues and overestimate low SOC. This tendency was more pronounced for organic horizons, indicating that the VNIRS models derived in this study are not ideal for organic horizons in Florida. They could lead to incorrect SOC estimations that could escalate to erroneous regional assessments of SOC in the vast areas of peat soils in the state. On the other hand, besides some lack of correlation, overall the VNIRS models had reasonable accuracy for mineral horizons, given the heterogeneity of soils and environmental conditions in Florida, and are suitable for a rapid assessment of SOC. Introduction Conventional laboratory analysis of the large number of samples needed for accurate assessment of spatial and temporal variability of soil organic carbon (SOC) is time consuming
89 and expensive, limiting the degree to which this variability can be characterized. This has created a data crisis, in which lack of information is the major constraint for monitoring SOC across larger landscapes. Because of this, visible/near i nfrared diffuse reflectance spectroscopy (VNIRS) has gained much attention in recent years as a relatively cheap and rapid technique to estimate various soil properties, including SOC (e.g., Dunn et al ., 2002; Shepherd and Walsh, 2002; Brown et al ., 2005, 2006; Vasques et al ., 2008, 2009). Visible/near infrared spectroscopy offers some advantages over conventional laboratory methods, which include less sample preparation, less or no use of chemical reagents, nondestructiveness, and potential to estimate va rious soil properties simultaneously. Furthermore, VNIRS can potentially be combined with other proximal (i.e., in situ ) (e.g., Lahoche et al ., 2002; Adamchuk et al ., 2004) and remote sensors (e.g., Robson et al ., 2004) to produce data to support assessmen t and monitoring of soil properties at the field to continental scales. In order for VNIRS to be efficiently used to estimate soil properties, it is necessary to calibrate models that are specific for every soil property within a given geographic range of interest. In other words, estimation of a soil property at a new location requires calibration of a custom VNIRS model at that location. Because of this, VNIRS still depends on conventional laboratory soil analysis to provide baseline data, which greatly a ffects the cost and time of soil estimation, constraining the adoption of VNIRS in preference of conventional laboratory methods. To address these constraints it has been proposed to develop soil spectral libraries covering large geographic domains (so cal led global spectral libraries). Since they comprise heterogeneous soils and the variances of their soil attributes are large, they may not perform as well as local libraries customized for a specific geographic or attribute domain. Our understanding to def ine boundary conditions for spectral libraries is still limited.
90 We have estimated SOC fractions (Chapter 3), and soil total carbon (0.0226.90%) at different depths (Chapter 2) in a northcentral Florida watershed using VNIRS, and proved that accurate (R2 = 0.86 for validation of total C) VNIRS models can be derived even with small sample sizes (102 calibration observations in Chapter 3). However, the derived VNIRS models a priori are only valid for similar soils under similar conditions (Brown et al ., 2005), thus limiting the use of the derived models for more variable soils and environments. To expand the scope of regional VNIRS models, and enhance their representativeness across larger regions, creation of soil spectral libraries have been proposed. Shepherd and Walsh (2002), for example, used multivariate adaptive regression splines (MARS) to estimate various properties of African soils. Their VNIRS model for SOC (0.235.58%) was derived using samples collected from multiple countries, and explained 91% of the variability in the calibration set (674 samples), and 80% of the variability in the validation set (337 samples). Brown et al. (2006) constructed a global soil spectral library containing 4184 soil samples collected from the U.S. (3768 samples), Afr ica (125), Asia (104), the Americas (75) and Europe (112) to estimate various soil properties. Their VNIRS model derived using boosted regression trees explained 82% of the variability of SOC (3793 samples; 0.0024.16%) using 6fold crossvalidation. At th e current time there is no such spectral library encompassing the subtropical soil landscape in Florida (~ 150,000 km2) representing the variability of soil properties. There is a need to accurately assess SOC in large subtropical regions such as the south eastern U.S., and VNIRS could contribute to reduce costs in these large scale assessments. Florida represents a mix of distinct environmental conditions, including vast areas of wetlands (28% of the state) (e.g., the Everglades) and C rich subsoils (i.e., Spodosols) (32% of the state), complex geology, diverse plant communities, and diverse land uses, including
91 agroecosystems, natural and planted forests, and conservation and urbanized areas. Florida soils vary from very sandy, highly permeable Entisols to constantly saturated peat soils, with Ultisols, Alfisols, and Spodosols lying between these two extremes. There is uncertainty, given this diversity, about the range of soils that could be evaluated for SOC using the same VNIRS model. This study addresses that uncertainty by evaluating the advantages of stratifying soil populations based on various criteria. Our objectives were to: (i) build a comprehensive spectral library covering the attribute space of soils found in the whole state of Florida; (ii) deri ve a robust VNIRS model of SOC, including both mineral and organic soils found in the state; (iii) compare multivariate methods and soil population stratification criteria to derive the most accurate VNIRS model of SOC; and (iv) validate all derived VNIRS SOC models using independent samples. Given the great number of observations in our database, we expected that data mining calibration methods would provide better estimates relative to the parametric method tested. Materials and Methods Study A rea We cond ucted our study in Florida, USA. The state is about 150,000 km2, located mainly in the subtropical climatic zone between latitudes 24.55 and 31.00 N, and longitudes 80.03 and 87.63 W (Figure 4 1). Mean annual precipitation is 1373 mm, and mean annual tempe rature is 22.3 oC (National Climatic Data Center, 2008). Florida is located in the common trajectory of hurricanes formed in the Intertropical Convergence Zone (ITCZ). The state is also susceptible to wildfires, mainly during the driest months (October May ), and to the effects of El Nio and La Nia Southern Oscillations (Florida Division of Emergency Management, 2009). Dominant soils in the state are Spodosols (32%), Entisols (22%), Ultisols (19%), Alfisols (13%), and Histosols (11%). Mollisols and Incepti sols occupy together less than 3% of the state
92 (Natural Reso urces Conservation Service, 2009) (Figure 4 1). Land use/land cover consists mainly of wetlands (28%), pinelands (18%), and urban and barren lands (15%). Agriculture, rangelands, and improved past ure occupy 9%, 9%, and 8% of the state, respectively ( Florida Fish and Wildlife Conservation Commission, 2003a ). The topography consists of gentle slopes varying from 0 to 5% in almost the whole state, with moderate slopes of 5 to 19% occurring in less tha n 1% of the state, along an escarpment in northcentral Florida (i.e., the Cody Scarp) and in the Florida Panhandle. Elevation ranges from sea level up to approximately 114 m in the Panhandle (United States Geological Survey, 1984). Most soils in regions n orth of Lake Okeechobee form in sandy and loamy sediments. Soils south of the lake are formed mainly in sapric organic materials or secondary carbonates (marl) overlying shallow limestone bedrock. Field and Laboratory M easurements Field sampling was conduc ted from 1965 to 1996 as part of the Florida Soil Characterization project (Florida Soil Characterization Database, 2009), held jointly by the University of Florida Soil and Water Science Department, and the Natural Resources Conservation Service, U.S. Dep artment of Agriculture. Soil information retrieved from archived documents pertaining to 1288 soil profiles throughout the state was digitized into a spreadsheet, and contained data for 8269 soil horizons, including either taxonomic description, or detaile d physical and chemical characterization, or both. Of the 8269 horizons, 7716 had SOC measurements, and belonged to 1252 soil profiles. The original sampling design was established at each county separately to account for local soil variability. Sampling l ocations were chosen adhoc by soil survey crews with the help of aerial photographs, map unit delineations, and supporting county and state maps. The site geographic locations (x,y coordinates) recovered from the soil archive were georeferenced based
93 on a vailable geographic information provided by personnel who described the soils, Public Land Survey System, and sampling sites identified on orthophotos. Of the total 1252 soil profiles with SOC measurements, geographic coordinates could be recovered for 1229 locations, which are shown in Figure 41. At each site, soils were routinely described and sampled by horizon to 2 m or more. Samples were analyzed chemically, physically, and mineralogically for classification and interpretive purposes. Aliquots of soil samples were stored for future reference in an archive maintained by the Soil and Water Science Department of the University of Florida. In the laboratory, SOC was measured using the Walkley Black modified acid dichromate method (Walkley and Black, 1934; Natural Resources Conservation Service, 1996) in mineral soils. In organic horizons, soil organic matter (SOM) was measured by loss on ignition (LOI), and SOC was calculated by multiplying SOM by the van Bemmelen factor (0.58) (Natural Resources Conservati on Service, 1996). Soil Scanning and Data P reparation Before scanning, we sieved the samples through a 2mm mesh, and ovendried them for 12 h at 40 to 45 C to standardize moisture. Each soil sample was scanned in the visible/near infrared region (VNIR; 3502500 nm) four times, with replications rotated 90. An average spectral curve was calculated for each sample based on the four replicate scans, and was further processed to be used as input for model development. Reference measurements of white Spectral on (LabSphere, North Sutton, NH) were collected prior to the first scan and at every 25 samples (100 scans). We used a QualitySpec Pro spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO), and measured soil reflectance at 1 nm intervals, bas ed on the average of ten internal readings per wavelength.
94 The collected soil spectral curves, composed of 2151 reflectance measurements, were smoothed by a Savitzky Golay 3rdorder polynomial across a moving window of 9 bands (Savitzky and Golay, 1964), a nd then averaged across a 10 nm window to reduce dimensionality by a factor of 10. Savitzky Golay first derivatives (Savitzky and Golay, 1964) were calculated on the resulting soil spectra (with 214 reflectance measurements) using a 1storder polynomial across a 9 band window. The first derivative curves containing 206 reflectance bands were used for VNIRS modeling. We could recover and scan 7122 soil samples (1172 soil profiles) from the stored soil archive, out of the 7719 samples (1252 profiles) containi ng SOC values in the database. We stratified the 1252 soil profiles by soil taxonomic order, then randomly split samples within each soil order into calibration (~ 70% of the profiles), and validation subsets (~ 30%). These separation criteria avoided corr elations between calibration and validation samples, assured a balanced representation of different types of soils in both subsets, and provided an independent validation dataset. After separation, calibration and validation sets were compared using Students t test for equal means, and Levenes F test for equal variances (Levene, 1960), to assure unbiasedness in their random assignments. In total, there were 4761 calibration samples, and 2361 validation samples, respectively. Multivariate C alibration We co mpared three methods to derive VNIRS models of SOC: partial least squares regression (PLSR) (Martens and Naes, 1989), because it is a standard method to analyze VNIR spectral data, and two variations of committee trees (CT), using bagging predictors (BAG) (Breiman, 1996), and ARCing classifiers (ARC) (Breiman, 1998), respectively. Tree based data mining methods are powerful to extract complex, nonlinear relationships in large datasets (4761 calibration samples).
95 ARCing stands for adaptive resampl ing and c ombining and follows the same principles of bagging (Breiman, 1996), with the difference that, instead of redrawing samples at random from the dataset for growing every consecutive regression tree (Breiman et al ., 1984), ARCing preferably redraws those samples that were poorly estimated in the previous tree (Breiman, 1998), with the aim to increase the overall estimation accuracy of the CT. ARCing was applied to build a committee of 100 trees, using an exponent of 4.0, 10fold crossvalidation, and a maxi mum number of sample redraws of 3. Similarly, BAG was applied to 100 trees, 10fold crossvalidation, and a maximum number of sample redraws of 3. Partial least squares regression used 10fold crossvalidation, and the number of partial least squares (PLS) factors was chosen to minimize the root mean square error of cross validation (RMSEcv). The frequency distribution of SOC was positively skewed, thus we applied natural log transformation to normalize the data for model development using PLSR. The CT methods are nonparametric, and do not assume an approximate Gaussian distribution of the target variable, thus we used SOC in the original units (%) to derive both CT models. We derived VNIRS models using CT/BAG, CT/ARC, and PLSR for the whole dataset (7122 o bservations: 4761 for calibration, and 2361 for validation), but because of the great variability of soils encompassed by the Florida Soil Characterization database, we also stratified our soil dataset using different criteria, aiming to improve model accuracy. First, we stratified the dataset into mineral (6982 observations: 4676 for calibration, and 2306 for validation) and organic (140 observations: 85 for calibration, and 55 for validation) horizons to reflect their distinct characteristics, and the dif ferent laboratory methods used to measure SOC. Predictive models were derived by PLSR, and CT, using mineral and organic horizons, separately.
96 Second, we evaluated the effect of including taxonomic information in VNIRS models of SOC. Only mineral horizons were considered for this analysis, since the number of organic horizons was limiting. Different approaches were applied to integrate soil taxonomy, according to the calibration method used. In PLSR, the dataset was stratified by soil order, and a separate model was derived for each order. In the CT methods, we included a categorical variable representing soil orders, and derived only one model for the whole dataset of mineral horizons. Our database included 8 soil orders, namely (with the respective number of soil profiles in parenthesis): Alfisols (210), Entisols (281), Histosols (54), Inceptisols (40), Mollisols (42), Spodosols (295), Ultisols (320), and Vertisols (2), plus 8 profiles without taxonomic data. These 8 profiles and the 2 Vertisols had 49 mine ral horizons that were eliminated from the dataset, leaving a total of 6933 observations, 4639 for calibration, and 2294 for validation. In order to test the homogeneity between the groups formed by the two stratification schemes, we used Levenes F test ( Levene, 1960) to check if variances were homogeneous among groups, and Welchs analysis of variance (ANOVA) (Welch, 1951) to test the effect of soil order on the variability of SOC. We further identified significant contrasts among soil orders using Dunnet ts T3 post hoc test (Dunnett, 1980). Lntransformed SOC (LnSOC) was used to perform the comparisons. We assessed the quality of the models using the coefficient of determination (R2; Equation 41), mean square error (MSE; Equation 42), which was further decomposed into 3 additive components (Gauch et al ., 2003), namely: squared bias (SB; Equation 43), nonunity slope (NU; Equation 44), and lack of correlation (LC; Equation 45), and residual prediction deviation (RPD; Equation 4 6; Williams, 1987). The coefficient of determination calculated based on
97 independent model validation (Rv 2) was used to compare among methods, and stratification schemes. n i i n i iy y y y R1 2 1 2 2 (4 1) LC NU SB n y y MSEn i i i 1 2 (4 2) 2 y y SB (4 3) iy b NU 12 2 (4 4) iy R LC2 21 (4 5) 1 / n n MSE RPDv val (4 6) Where: y = estimated values; y = mean of observed values; y = observed values; n = number of estimated/observed values with i = 1, 2, n; y = mean of estimated values; b = slope of the least squares regression of y on y ; 2= variance; val = standard dev iation of the validation set; MSEv = mean square error of validation. Results and Discussion D escriptive S tatistics Soil organic carbon in Florida was highly variable, ranging from 0.01 to 59.2%, with a mean of 1.30%, and a median of 0.22% (Table 41). The majority of samples had low SOC concentration, while a relatively few organic samples had very high SOC concentrations. Thus, the frequency distribution of SOC was highly skewed (skewness = 7.66). Natural log transformation brought the distribution closer to normal (skewness = 0.69), with the new mean ( 1.2747 ln%) closer to the new median ( 1.5141 ln%). Soil organic C in mineral horizons ranged from 0.01 to 14.70%, with a mean of 0.63%, and median of 0.22% (Table 41). In organic horizons, SOC ranged from 11.26 to 59.20%, with a
98 mean of 35.15%, and median of 37.08%. However, there was some overlap of SOC concentrations between mineral and organic horizons. Organic horizons had significantly lower variance compared to mineral horizons (Levenes F test p val ue ~ 0), which could be an artifact of their smaller sample size. Stratification of mineral horizons by soil order produced heterogeneous groups, with the variance of LnSOC among soil orders significantly unequal (Levenes F test p value ~ 0). Welchs ANOVA was significant (p value ~ 0), indicating that soil orders explained in part the variability of LnSOC. We confirmed this finding using the post hoc Dunnetts T3 test, which identified homogeneous groups between Entisols and Ultisols (lowest SOC concentra tion), and between Histosols, Inceptisols, Mollisols, and Spodosols (medium to high SOC concentration) (Table 4 1), indicating that other contrasts between soil orders were significant at the 0.05 confidence leve l. Texture is an important determinant of SO C concentration. Sandy soils are the most predominant in Florida (e.g., Quartzipsamments), and soil horizons in our database had a mean sand content of 84.8%. Therefore, it is reasonable to assume that sand particles and adsorbed elements thereof, are main ly responsible for the VNIR spectra of most soils. Moreover, it is expected that the predictive ability of the VNIR models are due to the association between SOC and sand particles, since the clay SOC relationship is somewhat masked by the overwhelming san d content. Ln transformed calibration and validation sets based on the whole sample (7122 observations) had equal means (Students t test pvalue = 0.382), and equal variances (Levenes F test p value = 0.057) at the 0.05 confidence level. For mineral hori zons the range of SOC
99 values in the calibration set encompassed that of the validation set, but for organic horizons the validation set had a slightly smaller minimum, and a higher maximum SOC. Performance of the Different Multivariate Calibration Methods The CT method using ARC had slightly higher accuracy (higher Rv 2) than BAG for both mineral, and organic horizons, but slightly worse results for the whole dataset (Table 4 2). In both methods, mineral horizons were estimated with better accuracy than orga nic horizons. Lack of correlation represents the dispersion of the estimated values around the regression line obtained by least squares fitting of estimated values as a function of observed values. It was the most important source of error in the estimati on of SOC in both BAG and ARC models, with the dispersion more pronounced for organic than mineral horizons (Figure 4 2). In addition, all CT models showed a trend of overestimating small SOC values, and underestimating large SOC. This rotated the regressi on line in relation to the 1:1 correlation line, causing the NU. This trend can be partially explained by the fact that SOC estimations are averages of the samples that are grouped in the terminal nodes, thus the minimum and maximum estimated values are al ways closer to the center of the distribution than the observed ones in the nodes. Finally, the other source of error in the SOC estimations related to the translation of the regression line about the 1:1 correlation line, and is represented by the SB, whi ch was more important for organic than mineral horizons. In comparison, the PLSR models of LnSOC had comparable accuracy (Rv 2) with CT (Table 4 3), but considerably less dispersion (Figure 43). The relative contribution of LC in the total MSE was still th e highest among the 3 components, but the shift of the regression line from the 1:1 correlation line was less strong than in the CT models. Based on these results, our expectation that data mining calibration methods (CT) would outperform PLSR was not met. In
100 contrast, both Shepherd and Walsh (2002) and Brown et al. (2006) obtained better results using data mining methods when tested against PLSR. The only VNIRS models with RPD > 2 were the ones derived for the whole dataset using CT. However, these CT mode ls still presented a lot of dispersion around the regression line, and we suspect that their Rv 2 were improved by a leverage introduced in the regression line by the organic horizons that contain much higher SOC. All other models had RPD < 2, which indicat es that there is potential for improvement of the models. To confront this problem, we tested the inclusion of soil order as a categorical predictor in the CT models of SOC for mineral horizons, and in PLSR, we stratified the mineral horizons by soil order and derived separate models for each order, and the results are discussed in the next section. Considering the great variability of SOC concentration in Florida, the results obtained for the whole dataset and mineral horizons are comparable in terms of accuracy to other VNIRS studies of SOC and SOM (e.g., Dunn et al., 2002; Kooistra et al., 2003; Udelhoven et al., 2003; Viscarra Rossel et al., 2006). Interestingly, SOC models produced in this study have lower estimation quality in spite of the considerabl y larger number of samples when compared to models developed for a large watershed nested within Florida (Chapter 2), meaning that the global (more comprehensive) soil set performed not as well as the local (smaller) soil set. This suggests that VNIRS mode ls of SOC in Florida have a limited geographic scope that might only be surpassed if certain requirements are met, such as the number of observations, representativeness of the samples, method of laboratory SOC analysis, and other conditions that were not achieved in this study. Nevertheless, assuming that soil conditions did not change significantly since the soil samples used in this study were collected, we are able to estimate SOC for mineral soils within
101 the state of Florida with some level of reliability (RPD > 1.82) by just measuring their reflectance. At this point, however, SOC models for organic horizons did not perform well. Lower accuracies in SOC for organic soils may be explained by the small number of organic horizons in the database relative to their great variability. Effect of the Inclusion of Soil Order Data, or S tratification by Soil Order In CT, inclusion of the soil order variable as a predictor in the SOC models slightly degraded them relative to the original models containing only refl ectance data (Table 4 2). It was shown using Welchs test that SOC changes in relation to soil order. This is not surprising for some of the orders whose definition include SOC as a criterion (e.g., Histosols). However, it seems that the variability of SOC that would be otherwise explained by soil order was already explained by VNIR reflectance data. In effect, soil reflectance depends on the characteristics of the soil (e.g., texture, SOM content, mineralogy), which in turn define the class for specific soils; thus, inclusion of soil order as a predictor apparently only added redundant information to explain SOC variability, when soil reflectance alone separated the mineral horizons into homogeneous groups (i.e., terminal nodes) more accurately than with the addition of soil order. Stratification of mineral horizons by soil order improved the PLSR models for some soil orders (Mollisols, Spodosols, and Ultisols) relative to the one including all samples, but for other orders, the quality of the model decrease d. Based on the Rv 2, the quality of the PLSR models decreased in the following order: Mollisols > Spodosols > Ultisols > Entisols > Alfisols > Histosols > Inceptisols. Models for Mollisols and Spodosols (both soils with high SOC) had RPD values greater tha n 2, indicating that they can be used for soils under similar conditions (Brown et al., 2005), i.e., anywhere in Florida. The number of PLS factors did not relate to the quality of the models, nor did SOC concentration. But to a lesser extent, overall the number of observations was positively related to the quality of the SOC models, with the exception of
102 Mollisols, where the high SOC concentration, and the type of SOM, would better explain the high accuracy of the model. Since stratification of mineral hor izons by soil order produced more homogenous groups (Table 4 1), it was expected that the VNIRS models performed better for all soil orders, which was not the case. It is then possible that the portion of the variability explained by the soil order had in part been explained by the soil spectra in the VNIRS model. This is the case of soil color, which is a criterion to classify soils, and is explicitly included in the spectral model. Along these lines, other classification criteria that correlate with soil VNIR spectra could be explained by the spectra itself, causing an overlap with the explanatory ability of soil order. Our results highlight the issue that sometimes taxonomic domain boundaries coincide well with soil properties, while in other cases they do not. In essence, besides SOC, other soil properties (such as texture, pH, moisture, etc.) are used to distinguish between soil orders/classes, and their interaction with SOC can either degrade or improve VNIR based estimation models. Brown et al (2006), for example, improved the VNIRS model of SOC by including sand content as a predictor variable. Explanatory Wavelengths for Soil Organic Carbon We identified the most important wavelengths for the PLSR models including the whole dataset, and mineral and organic horizons, respectively, by their regression coefficients, which are shown in Figure 44. Overall, important wavelengths in the PLSR models were in the regions of absorption features of water, and main chemical groups found in SOM, including C H, O H, C O, C N, N H, and S H, but also in the visible wavelengths, indicating the presence of chromophorous groups related to SOC (Goddu and Delker, 1960; Gaffey et al., 1993). The PLSR model for mineral horizons resembled that of the whole dataset. In the t hree PLSR models, strongest regression coefficients concentrated around 500 (green light), 700 (red light), 1400 (C -
103 H, O H, N H, water), and 18002400 nm (all cited chemical groups). For mineral horizons, strong coefficients also appeared close to 1600 nm (C H, O H), and for organic soils close to the 400 nm (blue light). Conclusions Our findings support the potential of a statewide soil spectra library to estimate SOC, but also indicate constraints. Addition of soil taxonomic data only improved the VNIRS S OC models in some cases (i.e., for some soil orders). Moreover, local, geographically constrained VNIR based SOC estimations seem to perform better than global models spanning across a larger soil landscape, such as Florida, the latter one covering a wider range of SOC and other soil characteristics. This points in the direction that a geographic stratification would be more promising than a stratification based on taxonomy. The soil spectral library already contains 7122 samples, which were used in this st udy, but can easily be augmented as data from new sampling blend in, making estimation models of SOC more robust within the state. A major sampling campaign is just about to finish for a U.S. Department of Agriculture funded project (Grunwald et al., 2007) to quantify soil C sequestration in Florida. Thus, another 1000 soil samples will be soon added to the library. Given the great variability of soils and ecosystems in Florida, it is encouraging to produce VNIRS models that explain up to 71% of the variabi lity of SOC in mineral soils, and up to 85% of the variability of SOC in selected soil orders. This level of accuracy suggests that our SOC VNIRS model can offer initial estimates of SOC in mineral soils for application in conservation, land management, an d environmental policy. However, compared to the VNIRS model derived within the Florida watershed (Chapter 2), statewide VNIRS models offered worse results, albeit the greater number of observations, suggesting that a better (but not necessarily ideal) geographic
104 domain to derive VNIRS models of SOC in Florida might be somewhere between the state and the watershed extents. On the other hand, SOC models for organic horizons had limited accuracy (Rv 2 < 0.35), which would impact their estimations throughout the state, since organic soils are prominent, and store a vast amount of SOC. Thus, we recommend collecting more data, mainly in organic soils, to improve our current VNIRS models of SOC. Moreover, there is potential to improve SOC models by inclusion of other readily available data (e.g., soil texture, sum of bases), or easily measurable properties (e.g., pH, density). However, this may reduce widespread applications of VNIR models due to requirements of more comprehensive sets of soil physical and chemical data, that defeat the purpose of VNIR spectral libraries allowing rapid and cheap inferences based on scanning of soil samples.
105 Table 4 1. Descriptive statistics of soil organic carbon (SOC) and ln transformed SOC (LnSOC) for the whole dataset, and strati fied soil horizons. Stratum N Mean 1 SD CV M in imum Median M ax imum S kew ness (%) (%) (%) (%) (%) (%) SOC Whole dataset (not stratified) Total 7122 1.30 5.26 403.40 0.01 0.22 59.20 7.66 Whole dataset stratified into mineral and organic horizons Min eral 6982 0.63 1.13 180.38 0.01 0.22 14.70 4.92 Organic 140 35.15 13.30 37.84 11.26 37.08 59.20 0.04 Mineral horizons stratified by soil order Alfisols 1239 0.52 1.06 202.85 0.01 0.18 11.55 5.30 Entisols 1334 0.51 1.21 236.43 0.01 0.15 14.35 6.99 Hi stosols 82 1.72 2.86 166.31 0.02 0.43 14.70 2.58 Inceptisols 186 1.05 1.59 151.92 0.03 0.42 8.93 2.85 Mollisols 199 1.35 2.06 153.29 0.02 0.39 12.18 2.46 Spodosols 2079 0.82 1.03 124.89 0.01 0.42 7.77 2.47 Ultisols 1814 0.37 0.64 172.72 0.01 0.14 8.64 4.91 All orders 6933 0.62 1.12 179.99 0.01 0.21 14.70 4.88 LnSOC Whole dataset (not stratified) Total 7122 1.2747 1.4761 115.80 4.6052 1.5141 4.0809 0.69 Whole dataset stratified into mineral and organic horizons Mineral 6982 1.3699 1.3256 96.76 4.6052 1.5141 2.6878 0.25 Organic 140 3.4756 0.4314 12.41 2.4213 3.6129 4.0809 0.59 Mineral horizons stratified by soil order Alfisols 1239 1.5272 b 1.2364 80.96 4.6052 1.7148 2.4467 0.48 Entisols 1334 1.6978 c 1.3554 79.84 4.6052 1.8971 2.6637 0.40 Histosols 82 0.5050 a 1.4433 285.79 3.9120 0.8440 2.6878 0.38 Inceptisols 186 0.7880 a 1.3099 166.23 3.5066 0.8678 2.1894 0.19 Mollisols 199 0.7173 a 1.4824 206.66 3.9120 0.9416 2.4998 0.18 Spodosols 2079 0.8878 a 1.2532 141.16 4.6052 0.8675 2.0503 0.16 Ultisols 1814 1.7545 c 1.1765 67.06 4.6052 1.9661 2.1564 0.42 All orders 6933 1.3726 1.3250 96.53 4.6052 1.5606 2.6878 0.25 Abbreviations: CV = coeff icient of variation; N = number of observations; SD = standard deviat ion. 1 Equal letters indicate homogeneous group means at the 0.05 confidence level, according to Dunnetts T3 test.
106 Table 4 2. Summary statistics for the spectral models of soil organic carbon (SOC) produced by committee trees (CT). Stratum Method Calibr ation Validation N Rc 2 MSEc SBc NUc LCc N Rv 2 MSEv SBv NUv LCv RPD Whole dataset (not stratified) Total BAG 4761 0.94 1.68 0.00 0.09 1.59 2361 0.77 7.46 0.01 0.19 7.26 2.07 ARC 4761 0.97 0.73 0.03 0.00 0.70 2361 0.76 7.57 0.01 0.00 7.55 2.05 Whol e dataset stratified into mineral and organic horizons Mineral BAG 4676 0.93 0.09 0.00 0.01 0.08 2306 0.66 0.49 0.00 0.01 0.48 1.70 ARC 4676 0.97 0.04 0.00 0.00 0.03 2306 0.71 0.42 0.00 0.01 0.41 1.82 Organic BAG 85 0.89 28.53 0.07 9.30 19.15 55 0.30 137.41 1.81 0.05 135.56 1.18 ARC 85 0.95 20.92 0.02 13.02 7.88 55 0.32 137.65 3.35 1.09 133.21 1.18 Mineral horizons with categorical variable representing soil orders included in the model Mineral BAG 4639 0.92 0.11 0.00 0.01 0.10 2294 0.65 0.47 0.00 0.00 0.47 1.69 ARC 4639 0.96 0.05 0.00 0.00 0.04 2294 0.67 0.46 0.00 0.01 0.45 1.72 Abbreviations: ARC = ARCing; BAG = bagging; c = calibration; LC = lack of correlation; MSE = mean square error; N = number of observations; NU = non unity slope; R2 = co efficient of determination; RPD = residual prediction deviation; SB = squared bias; v = validation.
107 Table 4 3. Summary statistics for the spectral models of ln transformed soil organic carbon (LnSOC) produced by partial least squares regression (PLSR). St ratum PLS factors Calibration Validation N Rc 2 MSEc SBc NUc LCc N Rv 2 MSEv SBv NUv LCv RPD Whole dataset (not stratified) Total 10 4761 0.74 0.5624 0.0000 0.0000 0.5624 2361 0.74 0.6075 0.0000 0.0001 0.6074 1.94 Whole dataset stratified into minera l and organic horizons Mineral 10 4676 0.72 0.4932 0.0000 0.0000 0.4932 2306 0.71 0.5178 0.0000 0.0001 0.5177 1.87 Organic 7 85 0.62 0.0623 0.0000 0.0000 0.0623 55 0.35 0.1572 0.0071 0.0069 0.1432 1.18 Mineral horizons stratified by soil order Alfisol s 8 823 0.69 0.4732 0.0000 0.0000 0.4732 416 0.69 0.4687 0.0006 0.0018 0.4663 1.80 Entisols 7 879 0.76 0.4177 0.0000 0.0000 0.4177 455 0.71 0.6332 0.0022 0.0080 0.6229 1.70 Histosols 5 50 0.79 0.4462 0.0000 0.0000 0.4462 32 0.66 0.7099 0.0023 0.0056 0.7021 1.69 Inceptisols 7 115 0.85 0.2476 0.0000 0.0000 0.2476 71 0.53 0.8445 0.0061 0.0024 0.8360 1.42 Mollisols 5 140 0.86 0.3247 0.0000 0.0000 0.3247 59 0.85 0.3278 0.0025 0.0376 0.2877 2.57 Spodosols 9 1388 0.77 0.3473 0.0000 0.0000 0.3473 691 0.77 0.3521 0.0000 0.0000 0.3521 2.11 Ultisols 9 1244 0.67 0.4518 0.0000 0.0000 0.4518 570 0.74 0.3697 0.0002 0.0017 0.3678 1.93 Abbreviations: c = calibration; LC = lack of correlation; MSE = mean square error; N = number of observations; NU = nonunity slope; P LS = partial least squares; R2 = coefficient of determination; RPD = residual prediction deviation; SB = squared bias; v = validation.
108 Figure 41. Distribution of soil profiles and soil orders within the state of Florida. Soil orders were derived from the State Soil Geographic (STATSGO) database (Natural Resources Conservation Service, 2006) Abbreviations: SOC = soil organic carbon.
109 A B C D Figure 42. Estimated versus observed plots of the validation of soil organic carbon (SOC) models derived b y committee trees (CT) for the whole dataset using A) bagging (BAG), and B) ARCing (ARC), mineral horizons using C) BAG, and D ) ARC, and organic horizons using E ) BAG, and F ) ARC. Dashed lines represent the 1:1 correlation lines.
110 E F Figure 42. Continued
111 A B C Figure 43. Estimated versus observed plots of the validation of lntransformed soil organic carbon (LnSOC) models derived by partial least squares regression (PLSR) for: A ) the whole dataset, B) mineral horizons, and C ) organic horizons. Da shed lines represent the 1:1 correlation lines.
112 Figure 44. Regression coefficients of the partial least squares regression (PLSR) models of ln transformed soil organic carbon (LnSOC) for the whole dataset, and mineral and organic horizons, respectively
113 CHAPTER 5 REGIONAL MODELING OF SOIL CARBON AT MULTI PLE DEPTHS WITHIN A SUBTROPICAL WATERSHE D Summary Environmental factors that exert control over fine scale spatial patterns of soil organic carbon (SOC) within profiles and across large regions differ by geographic location and landscape setting. Regions with large SOC storage and high variability can serve as natural laboratories to investigate how environmental factors generate vertical and horizontal SOC patterns across the landscape. This was invest igated in the Santa Fe River watershed (SFRW), Florida, where we modeled the spatial distribution of total carbon (TC) at four depths, namely 0 30 cm (TC1), 3060 cm (TC2), 60120 cm (TC3), and 120180 cm (TC4), and at the aggregated depth of 0100 cm (TC100) using geostatistical techniques. A total of 141 sampling sites were distributed within the watershed in a stratified random design across land use and soil order trajectories, and soil samples were analyzed for TC by high temperature combustion. Sample s were separated at each depth into training (~70% of the samples) and validation sets (~30%). First, to examine the vertical trend of TC, we compared mean TC between the four depths using paired Students t test. Second, we used analysis of variance (ANOV A) to test the significance of four environmental properties to explain the variability of TC100, namely land use/land cover, soil type (taxonomic order), soil drainage class, and geologic unit. Third, to investigate the influence of diverse soil forming f actors on the spatial distribution of TC, we compared three geostatistical methods to model TC across the SFRW, namely lognormal kriging (LK), and two modalities of regression kriging (RK). All environmental factors used to explain the variability of TC100 were significant at the 0.05 confidence level. Lognormal kriging was the best geostatistical method to scale up TC1 (RMSEv = 3.34 kg m2), TC3 (RMSEv = 2.02 kg m2), and TC100 (RMSEv = 7.21 kg m2), while RK performed best for TC2 (RMSEv = 6.20 kg m2), a nd
114 TC4 (RMSEv = 2.75 kg m2), using regression tree, and stepwise multiple regression, respectively. The relative performance of different geostatistical methods indicate to what degree the spatial distribution of TC is influenced locally by its spatial au tocorrelation, or globally by its relationship with collocated environmental factors that were selected in the RK models at different depths. Our study showed that TC in the SFRW is influenced by soil depth, land use/land cover, soil type, soil drainage cl ass, and geologic unit. Our models show that 39.3 Tg (teragrams) of organic carbon are held in the upper 1 m of soils in the SFRW, and significant amounts are stored in deeper layers. Our investigation identified the major factors responsible for regional spatial patterns of TC in Florida, highlighting the importance of accurately characterizing those factors to derive high quality spatial models of TC. This study contributes to current efforts of conservation of soil resources in Florida, and under similar environmental conditions in the southeastern U.S., and elsewhere. Introduction Even though many environmental properties have been correlated with soil carbon (C) at the site specific/plot scales, studies using comprehensive datasets comprising multiple e nvironmental properties to model soil C are less prominent, both at the regional scale (e.g., Ryan et al., 2000; Henderson et al., 2005; Minasny et al., 2006), and field scale (e.g., Terra et al., 2004; Simbahan et al., 2006). Moreover, according to Grunwa ld (2009), only a few studies have quantified soil C at multiple depths (e.g., Bor vka et al., 2007; Grimm et al., 2008), thus our understanding of C differences among soil layers, and assessment of deep C stocks is still limited over large extents. Previo us estimates of soil organic carbon (SOC) based on State Soil Geographic (STATSGO) data, indicated that Florida has the highest stock of SOC per unit area among U.S. states (minimum: 12.4 kg m2, middle: 35.3 kg m2, maximum: 64.0 kg m2). Furthermore,
115 Flo rida offers favorable conditions for the accumulation and longterm storage of SOC in the context of carbon sequestration programs, due to the widespread presence of wetlands and subsurface C rich Spodosols. However, spatial patterns of SOC in Florida have only been estimated from generalized soil taxonomic data (Guo et al., 2006a), and more detailed studies have been limited to specific ecosystems, and soil types. Stone et al. (1993), for example, assessed SOC in Florida Spodosols, observing concentrations in the range of 10.4 0.8 kg m2 (mean standard error) from 0 to 1 m, and 18.3 0.8 kg m2 from 0 to varying profile depths, of which 9.2 0.6 kg m2 were stored in spodic horizons. In southern pine plantations on Spodosols, Shan et al. (2001) measur ed TC stocks to 1 m ranging from 9.5 to 14.1 kg m2 including different forest managements, and control. In the Florida Everglades, Wright et al. (2008) observed TC concentrations at 020 cm including floc in the range of 0 to 550 g kg1, and corresponding soil organic matter (SOM) from 0 to 1000 g kg1. Another study in the Everglades found similar TC concentrations ranging from 176 to 505 g kg1 up to 20 cm including floc (Bruland et al., 2006). These findings indicate the potential of Florida wetlands to store C in the soil. In summary, even though some detailed soil C assessments are available in Florida, to the best of our knowledge they are limited to pinelands on Spodosols, and to the Everglades, at a single depth range. As such, a comprehensive soil landscape model for soil C has not yet been derived for a mixed land use watershed in Florida. It is still unknown to what extent hydrologic/topographic patterns, geology, climate and land use/land cover have interacted to generate regional SOC patterns al ong soil profiles, and across this distinct subtropical ecosystem, with characteristic sandy soils, and flat topography.
116 Therefore, the overall objective of this study was to measure the spatial distribution of soil C across a subtropical watershed located in northcentral Florida using regression modeling and geostatistical techniques, and determine how human land use, underlying geology, and other environmental factors influence these patterns. The specific objectives were to: (i) identify the environment al factors controlling the distribution of TC in the watershed; (ii) estimate TC stocks by comparing ordinary (lognormal) kriging with regression kriging; and (iii) validate the derived TC models using independent validation sets. Materials and Methods Stu dy Area The study was conducted in the Santa Fe River watershed (SFRW), a 3585km2 mixed use watershed located in north central Florida between latitudes 29.63 and 30.21 N and longitudes 82.88 and 82.01 W. The climate is predominantly wet and warm, with me an annual precipitation of 1224 mm and mean annual temperature of 20.5 oC (National Climatic Data Center, 2008). Dominant soil orders in the watershed are: Ultisols (47%), Spodosols (27%), and Entisols (17%). Histosols, Inceptisols and Alfisols occupy the remaining areas. Soil texture in the SFRW is predominantly sandy in the surface and sandy to loamy in the subsoil, with only 1% of the area in clayey soils. Sand content varies from 0 to 98%, silt content from 0 to 21%, and clay content from 0 to 51%. Most frequent soil series according to U.S. Soil Taxonomy are: Sapelo, Blanton, Ocilla, Mascotte, and Foxworth, and most frequent soils include: Ultic Alaquods, Grossarenic Paleudults, Aquic Arenic Paleudults, and Typic Quartzipsamments (Natural Resources Cons ervation Service, 2009). The SFRW comprises multiple land uses/land covers, including predominantly pineland (30%), wetland (14%), improved pasture (13%), upland forest (13%), and rangeland (13%) ( Florida Fish and Wildlife Conservation Commission, 2003a ). Urban and barren areas (i.e., areas
117 of exposed soil, e.g., after mining, or vegetation removal) occupy about 11% of the watershed, and water around 2% (Figure 51). The topography consists of level to slightly undulating slopes varying from 0 to 5% in 99% of the watershed, with moderate slopes of 512% occurring in less than 1% of the area, along the Cody Scarp. Elevations range from 1.5 to 92 m above mean sea level (United States Geological Survey, 1999). The geology is dominated by Ocala Limestone from the Tertiary period, and undifferentiated geology from the Quaternary period, in the western portion of the watershed. The central portion of the watershed is over the Coosawhatchie formation, originated in the Miocene. And in the east, undifferentiated sedi ments from the Pliocene and Pleistocene dominate (Florida Department of Environmental Protection, 1998). Field Sampling Soil samples were collected at four depths down to 180 cm at 141 sites spread across the study area in a stratified random design, following soil order and land use/land cover (Figure 51). I n order to account for local variability, composite samples were collected at each site within a 2 m radius with an auger Soil samples collected at 0 30 and 3060 cm were composed of four sub samples, while two sub samples were taken at depths 60 120 and 120180 cm, respectively, in order to keep the sample support constant for scaling up at multiple depths. At each site, the composite samples were homogenized in a plastic bag, subsampled in the field and transported to Gainesville, Florida, the same day. Samples were then air dried in a greenhouse, sieved through a 2 mm mesh screen, and stored. At two locations, samples could not be collected for the 60120 cm depth, and at eight sites samples could not be collected for the 120180 cm depth, due to adverse field conditions.
118 Laboratory Analysis The stored samples were ball milled prior to analyses. Total carbon (TC) was determined by high temperature combustion on a FlashEA 1112 Elemental Analyzer (Therm o Electron Corp., Waltham, MA). Soils were not pretreated with acid to remove carbonates. Free carbonate was not expected in these soils as evidenced by soil pH values ranging well below 8.0, therefore TC carbon is considered to be SOC. To derive TC stocks in kg m2 soil, bulk density was estimated using a pedotransfer function based on historical soil characterization data (Florida Soil Characterization Database, 200 9) collected by the staff of the Soil and Water Science Department at the University of Flo rida and the Natural Resources Conservation Service, comprising about 1300 soil profiles distributed throughout the state of Florida. Soil samples surrounding the watershed within a 100km buffer were selected from the complete historical database to captu re regional conditions, and their bulk densities assessed. At each depth interval, the average bulk density observed for each soil series was attributed to the corresponding soil series encountered at the sampling sites in the SFRW. Soil total carbon in ar eal units (kg m2) was calculated at four depths by multiplying the measured TC in concentration units by the bulk density at specific depth increments, and constituted the following datasets, with their respective number of observations in parenthesis: TC1 at 0 30 cm (141), TC2 at 3060 cm (141), TC3 at 60120 cm (139), and TC4 at 120180 cm (133). An aggregated measure of TC from 0 to 100 cm (TC100) was calculated by adding TC1 (30 cm), TC2 (30 cm) and 2/3 of TC3 corresponding to the top 40 cm (60100 cm) Soil total carbon values were positively skewed, thus log10 was applied to normalize their frequency distributions. A robust model to estimate TC at the watershed scale should be capable of handling both very high and very low TC values. Outlier detection was conducted by checking very high and
119 very low TC values against soil type and land use/land cover information in order to identify if they were justifiable from an environmental perspective. If not, they were considered outliers, and removed from the dataset. Since no evidence of laboratory errors was found, no outliers were removed. Comparison of Soil Total Carbon at Different Depths The correlation of TC values at different depths was assessed by the Pearsons product moment correlation, calculated o n logtransformed TC values (LogTC). Paired Students t test was used to compare the means of LogTC among the four depths, and Bonferroni correction (Bonferroni, 1936) was applied to adjust the significance level because multiple comparisons were done on t he same dataset. Since six pair wise comparisons were done, the significance level for each comparison became 0.0083 after dividing the desired overall significance level of 0.05 by 6. The Levenes test for equality of variances (Levene, 1960) was performe d to verify the assumption that TC had homogeneous variance at different depths. Relationship between Soil Total Carbon and Environmental Landscape Factors Four major environmental landscape factors: land use/land cover, soil type (taxonomic order), soil d rainage class, and underlying geologic unit, were formally tested for correlation significance with TC100. These factors were chosen to encompass key soil forming factors in Florida, respectively: human activity, soil conditions, hydrology, and geology/par ent material. Land use/land cover, and soil orders were identified at the sampling locations at the time of sampling. Soil drainage class, and geologic unit were obtained from GIS layers produced from different sources, which are detailed in Table 5 1. One way analysis of variance (ANOVA) was used in the case of homogeneous variance between categories of environmental factors, followed by Tukeys Honestly Significant Differences (HSD) test (Tukey, 1953) to assess pair wise differences in TC100 among the cat egories. In the case of unequal variances, either Welchs test
120 (Welch, 1951), or BrownForsythes test (Brown and Forsythe, 1974), was used to compare group (i.e., category) means. Myers and Well (2003) recommend Welchs test when the number of samples in every group is greater than 10, and BrownForsythes test when groups of less than 10 samples are present. Post hoc pair wise group comparisons in the case of unequal variances were done using Dunnetts T3 test (Dunnett, 1980). Scaling up of Soil Total Car bon in the Santa Fe River Watershed Three geostatistical techniques were compared to model the spatial distribution of TC in the SFRW: lognormal kriging (LK), and two modalities of regression kriging (RK), one using stepwise multiple linear regression (SML R) to model the global spatial trend (RK/SMLR), and the other using regression tree (RK/RT). At each depth, the dataset was split randomly into a training set (~70%), used for model development, and a validation set (~30%), used for model validation. Lognormal kriging is recommended for cases where the target variable has a positively skewed, distribution (Webster and Oliver, 2001), as was the case for TC. It was conducted in three steps: first, the positively skewed TC data were transformed using log10; se cond, ordinary kriging was used to interpolate logtransformed TC across the watershed; and third, the interpolated LogTC values were back transformed to original units using the formula presented in Webster and Oliver (2001, p. 180). The resulting grids w ere then validated using the randomly separated independent validation set. Similarly to LK, RK was applied on logtransformed TC values, and the resulting grids back transformed to original units. Regression kriging on LogTC was applied in three stages (E quation 51). First, a deterministic model was developed to explain the global spatial trend in the distribution of TC across the SFRW. It was assumed that the distribution of TC is a function of collocated environmental properties on the landscape, sugges ting that an environmental
121 correlation exists between TC and the landscape properties (Equation 52). This approach has been used successfully in a number of investigations (McKenzie and Austin, 1993; Moore et al., 1993; Odeh et al., 1994; McKenzie and Rya n, 1999; Ryan et al., 2000). 0 0 0x x m x S (5 1) e F x m0 (5 2) Where: S( x0) = soil property at location x0; m ( x0) = deterministic global spatial trend model describing the structural component of TC at location x0; ( x0) = stocha stic, spatially dependent residual from m ( x0); = spatially independent residual; F = environmental factors; = regression parameters; e = global trend residuals, where e = ( x0) + The environmental variables used in this study were collected as Ge ographic Information Systems (GIS) data layers that covered the whole study area. The data layers were obtained from various sources (e.g., U.S. Department of Agriculture, U.S. Geological Survey, Florida Department of Environmental Protection), and include d: soil survey maps, including physical and taxonomic soil properties, climate maps, vegetation maps, land use/land cover maps, satellite imagery and derived products (vegetation indices, tasseled cap indices, and principal components of the reflectance ba nds), a digital elevation model and derived topographic attributes (slope, aspect, catchment area, and compound topographic index), and geologic maps. A complete list of environmental ancillary data used to model the global spatial trend is presented in Ta ble 5 1. The theoretical basis for the environmental correlation of soil properties at the landscape comes from the seminal studies of Jenny (1941), who presented the well known CLORPT equation of soil formation (Equation 53), later modified by McBratney et al. (2003) into the SCORPAN model (Equation 54) to include other soil variables and the explicit position in space. t p, r, o, cl, f S (5 3) n t ~ y, x, a t ~ y, x, p t ~ y, x, r t ~ y, x, o t ~ y, x, c t ~ y, x, s f t ~ y, x, S (5 4)
122 Where: S = soil property or class; cl or c = climate; o = organisms, includi ng human activity; r = relief; p = parent material; t or a = time or age; s = soil property; n = spatial position defined by the x and y coordinates and other spatial distance measures The global spatial trend ( m ( x0); Equations 51, and 52) was modeled using either stepwise multiple linear regression (SMLR), or regression tree (RT), respectively. The global trend model obtained by SMLR used a combination of forward and backward selection, in which the variables are added and removed according to a toleran ce significance level, based on the F probability, which was set to 0.05. Regression tree models were obtained by performing a binary recursive partitioning of the TC training set (Breiman et al., 1984; Steinberg and Colla, 1997), and an estimated TC value was obtained as the average of all observations that were grouped at each terminal node. The optimal number of tree nodes was identified by minimizing the sum of squared residuals. The global trend model constitutes the deterministic structural component of the spatial variability of TC. The residuals from the global trend models constitute the local stochastic component of the spatial variability of TC. In LK, both global and local components are integrated in the same system of equations, i.e., the globa l trend component is assumed to be stationary across the study area. In both LK and RK, the local stochastic component ( ( x0); Equation 51) was modeled by ordinary kriging, which assumes that the variable of interest (TC for LK, and TC residuals in the c ase of RK) has a positive spatial autocorrelation The spherical and exponential models were interactively fitted and compared to visually approximate the experimental semivariograms of TC at each depth, and were used to calculate the kriging weights to in terpolate TC values. Finally, the interpolated LogTC values from the global (RT and RT, respectively) and local trend models in RK were added to produce the final LogTC maps across the SFRW. After backtransformation, the final TC predictions were validate d using the independent validation set. The
123 root mean square error calculated on the validation set (RMSEv; Equation 55) was used to choose the best TC model at each depth. n y y RMSEn i i i 1 2 (5 5) Where: y = predicted values; y = observed values; n = number of observed values with i = 1, 2, n. The compilation of data and model development were conducted in ArcGIS 9.2 (Environmental System Research Institute, Redlands, CA). Satellite ima gery and other raster materials were prepared in ERDAS Imagine 9.0 (Leica Geosystems Geospatial Imaging, LLC, Norcross, GA). Descriptive statistics, significance tests, and SMLR were conducted in SPSS 15.0 (SPSS Inc., Chicago, IL), regression trees were im plemented in CART 5.0 (Salford Systems, San Diego, CA), and ordinary kriging was conducted in Isatis 4.1.5 (Geovariances Americas Inc., Houston, TX). Results and Discussion Descriptive Statistics At all depths, TC showed a positively skewed frequency distr ibution, which approximated a normal distribution after the log10 transformation (Table 5 2). Overall, TC contents were variable, ranging from 0.16 kg m2 in the case of an Entisol sample at layer 4 (120180 cm), to 129.92 kg m2 for a Histosol sample collected in a wetland at layer 2 (30 60 cm). Except for TC2 and TC4, the range of the training set encompassed the range of the validation set. On average, TC decreased with depth, thus highest TC was found at the surface layer (TC1, 0 30 cm), with values ranging from 1.17 to 63.88 kg m2, with a mean of 6.26 kg m2, and median of 4.57 kg m2. The second highest TC was found at 3060 cm, especially due to the
124 presence of spodic horizons. At 3060 cm, TC2 averaged 3.73 kg m2, had a median of 1.67 kg m2, and r anged from 0.43 to 129.92 kg m2. At 60 120 cm, TC3 varied from 0.36 to 112.66 kg m2, and had a mean of 3.59 kg m2, and median of 1.71 kg m2. High TC3 values were found in water saturated horizons, and also reflected the presence of clay horizons, which have a relatively higher capacity to bond with organic matter. Lowest TC was found in the deepest layer, from 120 to 180 cm, with TC4 ranging from 0.16 to 18.50 kg m2, with an average of 1.61 kg m2, and median of 1.08 kg m2. It is worth noting that TC3 and TC4 have double the thickness (60 cm) of TC1 and TC2 (30 cm), therefore their areal TC amount is inflated by a factor of 2 when compared to the first two layers, since the TC content in kg m2 was obtained by multiplying the volumetric content of TC ( kg m3) by the depth (m). Overall, the highest TC values occurred in wetland soils, mainly Histosols and Inceptisols. These soils are frequently saturated with water, which promote the accumulation of carbon due to the slower anaerobic decomposition of organic matter. Spodosols had the second highest TC values, due to the accumulation of ironand aluminum organic matter complexes in spodic horizons. Lowest TC values occurred in Entisols, mainly Quartzipsamments, which are sandy soils relatively depleted of weatherable minerals (Natural Resources Conservation Service, 1999) that are usually associated in this region with karst terrain, and occupied by natural upland forests ( Florida Fish and Wildlife Conservation Commission, 2003a ; Natural Resources Conserva tion Service, 2009). Soil total carbon from 0 to 100 cm varied from 1.84 to 268.91 kg m2, with mean of 11.79 kg m2, and median of 7.49 kg m2. As the aggregation of TC1, TC2 and part of TC3, TC100 behaved similarly to the other layers, with very high TC in wetlands, and low values in upland
125 forests on karst terrain. The concomitant presence of wetlands and karst formations in the SFRW explains the large range of TC values, which is characteristic of this unique environment. Soil total carbon was significa ntly correlated between all depths at the 0.05 confidence level (Table 5 3), with highest correlation between LogTC2 and LogTC3 (0.76), followed by LogTC1 and LogTC2 (0.70), and LogTC3 and LogTC4 (0.69). Although TC was correlated among depths, significant differences were still identified at the 0.05 confidence level between all pairs of depths, except between LogTC2 and LogTC3, by the pair wise Students t tests (Table 5 3). Strong correlations in TC along soil profiles confirmed our expectations, which underpin the importance to select validation sets for testing of scalingup of TC by treating each layer separately. Soil total carbon was highly variable in all depth intervals as a consequence of the sampling design that spanned multiple land uses/land covers, and soil types. Layers 2 and 3 had coefficients of variation (CV) on the order of 300%, while TC1 and TC4 had CVs close to 100%. Despite of their large variability, the assumption of homogeneity of variance of TC among depths was still met by the Lev enes test (1.60) at the 0.05 confidence level (p value = 0.19). Relationship between Soil Total Carbon and Environmental Landscape Factors The effects of land use/land cover, soil order, soil drainage class, and geologic unit on TC100 were all significan t at the 0.05 confidence level, according to one way Welchs, or Brown Forsythes ANOVA (Table 5 4). Among the four environmental factors tested, only soil drainage class had homogeneous variance between classes, according to Levenes test (2.21), but the significance of the test (p value = 0.057) was very close to the 0.05 significance level. To build a quantitative soil landscape model requires analyzing and formalizing relationships between environmental predictor variables (compare Equations 53, and 54) and
126 TC. We used a variety of tests to quantify the relationships to guide scalingup of TC to the watershed scale. Land use/land cover, and geologic unit, had more than 10 samples in all categories, thus the Welchs test was preferred over Brown Forsyth es. Soil order, and soil drainage class, had at least one category with less than 10 observations, in which case the BrownForsythes test was preferred. Table 5 5 shows the groups with homogeneous TC100 means at the 0.05 confidence level, according to Dunnetts T3 test, based on land use/land cover, soil order, soil drainage class, and geologic unit. Land use/land cover had a significant effect on TC100 (Welchs F = 5.11, Table 54). In relation to land use/land cover, mean TC100 overall decreased in the following order: wetland > rangeland > pineland > urban land > improved pasture > agriculture > upland forest (Table 55, group means). Wetlands had significantly higher TC100 than pineland, improved pasture, upland forest, and agriculture, according to Dunnetts T3 (Table 5 5). Rangeland had significantly higher TC100 than agriculture. Compared to the other land uses/land covers, wetlands had higher TC100 because they accumulate more organic matter as a consequence of the relatively slower anaerobic decomp osition of litter. High TC100 was also found in rangeland, but was only statistically higher than TC100 in agricultural areas. This difference can be caused by soil degradation in cropped fields, or by disturbances in rangelands such as compaction, and her bivory, promoting root and s hoot growth. Urban land had similar stocks of TC100 compared to wetland and rangeland. This can be explained by the great variability of urban soils masking any significant differences that could appear, but also because the urban sites that we sampled were from lawns located in rural areas.
127 Additionally, urban land had higher TC100 than improved pasture and agriculture, both of which are fertilized stimulating mineralization. However, improved pasture and agricultural fields are managed for production, and thus their exports offset the fertilizer input. Pouyat et al. (2002), comparing mineral SOC pools in forest stands along an urbanrural gradient in the New York City metropolitan area, observed that the stands located in urban settings had significantly (p value = 0.03) higher SOC stocks to 10 cm depth than the suburban and rural forest stands. The authors also provided an estimate of the average SOC value for the conterminous U.S. urban soils of 8.2 kg m2 to 1 m depth. Our study agrees with their findings, first because we found a comparable average SOC of 8.4 kg m2 to 1 m depth (TC100) for urban sites, and second because our urban soils had higher TC100 than agricultural soils, which was also evidenced in their comparative an alysis (Pouyat et al., 2002, Table 53). Upland forests are predominantly found on Entisols over karst terrain. These areas have characteristically deep sandy soils, which promote high water infiltration with relatively lower C buildup. On the other hand, pine growers prefer Spodosols, which have high organic matter content in the subsurface coinciding with an increase in subsoil tree roots, or Alfisols, which have high natural fertility, granting these areas a good productive potential. For cropping, Ultis ols are the preferred soils in the SFRW. These soils have a moderate cation exchange capacity, but good structure for mechanized production. Land use has been identified as an important factor to explain soil C trends. Guo and Gifford (2002) did a meta ana lysis to assess the trends in SOC stock associated with land use change. They included 537 observations from 74 investigations from 16 countries, and pointed out that on average SOC stocks decreased after transitions from pasture to forest plantation ( 10% ), native forest to plantation ( 13%), native forest to crop ( 42%), and pasture to crop ( -
128 59%). Conversely, SOC increased after conversions from native forest to pasture (+8%), crop to pasture (+19%), crop to plantation (+18%), and crop to secondary fores t (+53%). Post and Kwon (2000) reviewed the literature on soil C accumulation after land use change from agriculture to forest, and from agriculture to grassland. Considering the change of agriculture to forest, the authors reported changes in soil C varyi ng from a loss of 5 kg C ha1 yr1 to a gain of 30 kg C ha1 yr1. When agriculture was converted to permanent grassland, soil C change varied from 9 kg C ha1 yr1 to +11 kg C ha1 yr1. Another review by Murty et al. (2002) indicated an average decrease of 22% in soil C after conversion from forest to crop, but no significant trend when forests were converted to pasture. When testing the significance of soil drainage class in relation to TC100, assuming equal variances, the one way ANOVA (F = 9.55) was s ignificant at the 0.05 confidence level. Significant differences were identified by Tukeys HSD between Very poorly drained and all other drainage classes, and between Excessively drained and Poorly drained, Somewhat poorly drained, and Moderately well drained. Since the Levenes test was very close to rejecting equal variances among drainage class categories (p value = 0.057), the case of unequal variances was also tested. Therefore, if unequal variances were assumed, significant differences woul d still persist (Brown Forsythes F = 9.21), specifically between Excessively drained and Very poorly drained, Poorly drained, Somewhat poorly drained, and Moderately well drained, according to Dunnetts T3 test. The interpretation of these resul ts is straightforward, as areas of lower drainage accumulate more water and promote higher productivity and SOC accumulation than areas of excessive drainage, which experience not only lack of water for plant growth, but also rapid nutrient and dissolved organic matter leaching due to high water infiltration rates. In accordance, the highest TC100
129 values were found in Very poorly drained soils, which are at least seasonally waterlogged, fostering the accumulation of organic matter, whereas the lowest TC100 were found in Excessively drained soils. Soil order also had a significant relationship with TC100, according to BrownForsythes test (F = 31.17). On average, TC100 decreased in soil orders in the following sequence: Inceptisols + Histosols > Spodosol s > Alfisols > Ultisols > Entisols (Table 5 5, group means). Dunnetts T3 appointed significant group differences in TC100 between Spodosols and Entisols, Spodosols and Ultisols, and Inceptisols + Histosols and all orders but Spodosols (Table 55). The Inc eptisols + Histosols class had higher TC100 than the other orders because all soils classified as Inceptisols or Histosols were located in wetlands, being at least seasonally saturated with water Spodosols also had high TC100 on an areal basis. In these s oils, organic matter is accumulated in the spodic horizon, where it is stabilized in association with sesquioxides, preventing its loss to the environment (De Coninck, 1980; Harris and Hollien, 2000). Alfisols had slightly higher average TC100 than Ultisol s and Entisols, but the difference was not significant at the 0.05 confidence level. In the case of Ultisols, this difference is probably due to the more reactive B horizon in Alfisols (base saturation > 35% in the control section) compared to Ultisols (ba se saturation < 35%), granting Alfisols a higher natural fertility. In the case of Entisols, most of these soils present in the study area are Quartzipsamments, quartz rich sandy soils that are depleted in reactive minerals and organic matter. These soils are usually poor in nutrients, formed over karst terrain, and preferentially occupied by natural upland forests in this region, since more fertile soils are preferred for productive uses.
130 Finally, TC100 differed significantly between geologic units, according to Welchs test (4.81), but the post hoc test confirmed that only Undifferentiated geology had statistically smaller TC100 than both Undifferentiated sediments, and Coosawhatchie formation. The Coosawhatchie formation is poorly to moderately cons olidated, and has phosphatic sands, and variable clay. Both Undifferentiated geology and Undifferentiated sediments are unconsolidated to poorly consolidated, with the difference that Undifferentiated sediments are located in areas where wetlands and Spodosols are concentrated, whereas Undifferentiated geology occurs mainly where Entisols were formed over karst terrain, explaining their lower TC stocks. Based on group means, TC100 decreased in geologic classes in the following order: Ocala limestone > Coosawhatchie formation > Undifferentiated sediments > Other geology > Undifferentiated geology. Other geology was composed of the following geologic units: Beach ridge and dunes, Cypresshead formation, Hawthorn group, Statenville formation, and Trail ridge sands. Highest average TC in Ocala limestone can be explained by the presence of high TC100 observations in Histosols/wetland sites, even though saturated conditions are not prominent in this type of geology. However, TC100 in Ocala limeston e was not significantly different from any other geologic units. This is due to the great variability of TC values observed on Ocala limestone, whose TC range encompassed that of all other geologic units. Coosawhatchie formation and undifferentiated sedime nts were associated with Spodosols and Alfisols, whereas Other geology and undifferentiated geology were associated with Entisols, with relatively low organic matter content. Therefore, from the present analysis, it is not clear to what degree the differences in TC100 values were due to geologic differences or to covariance with other environmental factors previously shown to affect SOC. Given the limited sample size (141
131 observations) more complex analyses (e.g., MANOVA) could not be performed to clarify this issue. Scaling up of Soil Total C arbon in the Santa F e R iver Watershed Lognormal kriging produced better results (smaller RMSEv) than RK for three (i.e., TC1, TC3, and TC100) of the five depth intervals investigated in this study. Regression kriging outperformed LK for TC2 (RK/RT) and TC4 (RK/SMLR). The maps produced by the respective best methods for TC at different depths are shown in Figure 52, and the TC model results obtained by the three geostatistical methods at all depths are summarized in Ta ble 5 6. The spatial distribution of TC as modeled by LK reflects solely the TC values at sampled locations, and identified spatial dependenc e In the case of TC2 and TC4, the best scalingup models involved the characterization of the global trend by RT, and SMLR, respectively. As a consequence, the maps show explicitly the influence of the environmental variables on the distribution of TC. Compared to RK, LK generally produced smoother maps, and reflected the influence of environmental landscape factors only intrinsically. Regression kriging, on the other hand, incorporated explicitly the influence of the environmental variables on the distribution of TC. In other words, in the case of TC1, TC3 and TC100, which were best modeled by LK, the spatial depende nce of TC was more important a factor than the environmental variables acting as global spatial determinants of TC. Grunwald (2008a) disaggregated spatial variation into global deterministic trend and spatial dependence structures using various simulated s patial fields illustrating that cross dependenc e exists between spatial and feature accuracy. Which component (trend or spatial dependence) is more pronounced differs among soil properties and environmental properties found in a given landscape, which is usually modeled using regression kriging (Odeh et al., 1995).
132 The RT global trend model of LogTC2 included two explanatory environmental factors, namely the green band of Landsat 4 Thematic Mapper (TM) (band 2), and elevation, and contained three terminal n odes, thus providing only three output values, respectively 0.620, 1.312, and 0.212 kg m2 (Figure 53). Most of the TC2 observations were grouped in terminal node 3, which contained 96 samples out of the 102 available in the training dataset. Nevertheless terminal node 3 constituted the most homogeneous group, with a standard deviation of 0.258 kg m2, comparable to the deviation of the whole data set of 0.356 kg m2, whereas terminal nodes 1 and 2 had standard deviations of 0.394 and 0.838 kg m2, respec tively, and contained 3 observations each. Selection of the green band of Landsat TM at layer 2 (030 cm) cannot be interpreted straightforwardly, since the sensor beam did not achieve 30 cm below ground. Thus, correlation of LogTC2 with green reflectance was actually due to its indirect correlation with soils, vegetation, and water at the surface. Reflectance properties have been shown to correlate with soil C, both in the laboratory, and in the field. In Chapter 2, we provided a review of the literature on the use of visible/near infrared reflectance spectra to estimate soil C, and Viscarra Rossel et al. (2006) also included mid infrared spectra, and other soil properties in the review. Lpez Granados et al. (2005) compared different methods to scale up SO M, and identified as the best method RK using aerial photographs of bare soil to estimate local SOM means. Simbahan et al. (2006) identified RK as the best method among four methods tested to scale up SOC, with reflectance data derived from IKONOS satellite images as predictors. Their best method was stratified by soil series, and also included relative elevation, and soil electrical conductivity. At the deepest layer (120 180 cm), TC4 was also best modeled by RK, but in this case SMLR performed best to mod el the global trend, producing more accurate results than RK/RT.
133 According to its SMLR trend model (Equation 56), LogTC4 in the SFRW depends on soil characteristics, slope, land use, and geology. Being the deepest layer, it was expected that some geologic property would influence the distribution of TC, but not reflectance properties. The positive coefficient for Spodosols indicates the presence of subsurface organic matter contributing to TC; likewise, the negative coefficient for Entisols corrects for th e relative depletion of organic matter in those soils. Albeit small, the negative coefficient for clay content is counterintuitive, and might be explained by the close association of the distribution of clay content with the distribution of soil orders in the watershed, with high clay contents present in Ultisols (average of 12.80%), and low contents in Entisols (average of 4.83%). Slope and agricultural land use were also significant explanatory variables for LogTC4; even though these properties should inf luence more strongly TC at the top layers, this influence is carried down through the soil profile to the deeper layers. CROP GEOMFS CLAY ENTS SLOPE ODS LogTC4 1227 0 1562 0 0146 0 4435 0 0846 0 1836 0 0901 0 = (5 6) W here: LogTC4 = log10 of soil total carbon at 120180 cm (TC4); ODS = indicator variable for Spodosols; SLOPE = average slope within a 3x3 30m pixel window in percent units; ENTS = indicator variable for Entisols; CLAY = clay content in the soil in percent units; GEOMFS = indicator variable for medium fine sand and silt environmental geology class; CROP = indicator variable for agricultural land use. Other studies have also shown the influence of topography on the distribution of soil C. Mueller and Pierce (2003) found significant correlations at the 0.05 confidence level at multiple scales between TC and e levation, slope, and plan, profile and tangential curvatures, as well as geographic coordinates. Creed et al. (2002) used forest biomass and selected topographic attributes (elevation, slope, and plan curvature) to estimate SOC in the forest floor using re gression trees (RT), obtaining a coefficient of determination using leave one out crossvalidation (Rcv 2) of 0.29. Their multiple linear regression model (Rcv 2 = 0.20) had significant
134 coefficients ( 0.05 confidence level) for slope, and aspect. Other studie s that have integrated topographic derivatives in models to estimate soil C include Florinsky et al. (2002), Kravchenko et al. (2006), who also included soil texture, and Kravchenko and Robertson (2007), who also included crop yield. The semivariogram para meters for the LK models of TC1, TC3 and TC100, and for the residuals of the global trend models of TC2 and TC4, are presented in Table 57. The exponential model was the most appropriate to fit the semivariograms of TC at all depths. Among the depths that were modeled using LK, TC1 had the largest effective range (11,579 m), almost double the ranges of TC3 and TC100. The overall highest range of 17,887 m was achieved by the TC4 residuals from the global trend modeled using SMLR. The local spatial structure s of TC2 and TC4, modeled from the residuals of the global trends, were clearly distinct from that of the depths that were modeled using LK, as shown in Table 5 7. The spatial autocorrelation of TC2 and TC4 residuals was less strong than that of raw TC at the other depths, as indicated by the larger nugget to sill ratios. Moreover, their high nugget effects indicate that a large portion of the variance occurs over short distances. This is a result of the global spatial trend explaining part of the variabili ty and spatial autocorrelation of TC, and leaving a less strong spatial dependence in the residuals of TC2 and TC4 to be kriged (compare Hengl et al., 2004). Similarly, Rivero et al. (2007) found that the nugget to sill ratio was much tighter for semivario gram models of residuals when compared to models that express the spatial dependence structure of raw soil phosphorus observations. These findings indicate that the sampling design adopted to characterize the distribution of soil C could not entirely captu re its short range spatial dependence after the global variability was explained in part by the landscape characteristics, i.e., environmental factors. The
135 implication of these findings for the study of soil C are twofold; first, since the semivariogram of the residuals shows a longer range compared to the raw TC values, this could mean that some longrange trend of TC could not be fully explained by the available global trend predictors, suggesting the need of sampling explanatory variables that capture t he underlying processes that generate this longrange trend. Second, because the spatial structure of TC was much weaker in the residuals, this suggests that over short distances, more samples need to be collected creating a tighter sampling mesh to more a ccurately sample the variability of TC, with the aim to minimize the nugget effect and produce a more robust semivariogram. It was shown in the previous section that land use/land cover, soil order, soil drainage class, and geologic unit, all had significa nt effect on TC100. Even so, LK still obtained better estimates than RK, indicating that, on a spatially explicit basis, the sample distribution of TC100 explains more of the spatial variability of TC100 itself than any other environmental factor. The envi ronmental variables are, however, implicitly linked to the distribution of TC100. This was also the case for TC1 and TC3, although they were not formally tested for significant relationship with environmental landscape factors like TC100 was. Based on the results of the best models (Table 5 6, in italics), TC generally decreased with depth, with average values of 6.41, 3.25, 2.95, and 2.50 kg m2 in layers 1 through 4, respectively. When interpolated across the whole watershed, total estimated TC stock vari ed from 8.98 Tg (teragrams) at the deepest layer (TC4, 120180 cm) to 23.01 Tg at the surface layer (TC1, 0 30 cm). Total stock to 1m depth, estimated by LK, was 39.29 Tg. The training and validation sample sets were randomly separated at each depth indep endently, which resulted in different locations being used as training sites for modeling TC at each depth. If the total TC stocks of the first three layers are summed up across the upper 1 m of the soil profile, based on
136 the results from the best models, the carbon stock (41.73 Tg) is similar to what was achieved by the TC100 LK model (39.29 Tg). This shows the consistency of the different TC scalingup models in the first three layers, and offers an indirect evidence to validate the scaled up TC maps at these layers in relation to one another and TC100. Other examples of regional studies of soil C within the same depth range include Ryan et al. (2000) and Minasny et al. (2006), who investigated soil C under mixedland use in southeastern Australia with com parable subtropical/tropical climate to Florida, estimating C ranges of 0 44 and 222 kg m2, respectively, in the upper 1 m. Batjes (2008) estimated SOC stocks to 1m depth in Central Africa, and found significant differences between the warm savannah reg ion (7.67.7 kg m2) and the cool, humid mountains on soils formed in volcanic parent material (22.1 22.7 kg m2). In the SFRW (this study), estimated TC100 (LK) ranged from 2.62 to 160.50 kg m2 (Figure 52). Krogh et al. (2003), on the other hand, estima ted a closer range of 2.8 134.1 kg m2 for SOC at 0 100 cm depth under forest and cropland in Denmark. Relative to these studies, TC stock in the SFRW was higher and more variable, and was influenced by the presence of wetlands and Spodosols, which constit uted favorable conditions for TC buildup. Grimm et al. (2008) modeled soil organic carbon to 50cm depth on Barro Colorado Island in the Panama Canal basin with tropical climate using Random Forests analysis. They found that topographic attributes explaine d most of the variation in soil organic carbon in the topsoil, whereas subsoil carbon was best explained by soil texture classes. Guo et al. (2006b) investigated the factors that impart most control on soil C in the conterminous U.S. and found that SOC dec reases as elevation increases and that SOC increases as annual precipitation increases, but only up to values of 700850 mm yr1. Our findings could not confirm the major
137 control of topographic attributes on TC, possibly because of the smooth relief within the Florida study area. In the Lower Namoi Valley, Australia, land uses showed high influence on soil carbon storage with values of 1522 kg m2 for forested areas, and much lower values of 2 6 kg m2 in cultivated areas (Minasny et al., 2006). Differences in SOC stocks by land use were documented by Guo et al. (2006b) in the following order: forest > agriculture > wetland > grass > pasture > shrub ecosystems. In comparison, we found an overall decrease in TC100 in the order: wetland > rangeland > pineland > urban land > improved pasture > agriculture > upland forest; however, significant differences were found only between certain land uses/land covers (Table 5 5). The influence of land use/land cover on surface soil carbon (TC1, 030 cm) in the SFRW was n ot strong enough to prevail over TC1s spatial autocorrelation, as evidenced by its LK model. Ahn et al. (2009) findings from the same watershed (SFRW) suggested that in the topsoil mineralizable carbon and TC are relatively homogenous across various land use and soil types. Only wetland and upland forest soils, with the largest and smallest C pool size, respectively, were consistently different from the soils of other land uses. Variations in potential carbon mineralization were best explained by TC (62%) and hot water extractable carbon (59%), while acid hydrolyz able carbon (32%) and clay content (35%) were generally not adequate indicators of C bioavailability. Generally, the sandy nature of these surface soils impart a lack of protection against carbon m ineralization and likely resulted in the lack of land use/soil order differences in the soil carbon pools. At 0 30 cm, TC1 varied from 1.81 to 38.79 kg m2 (Figure 52a) across the SFRW, with an average of 6.41 kg m2. Similar average SOC values in the 0 30 cm depth were found by Simbahan et al. (2006) in agricultural fields in Nebraska, U.S. However, only one of their study
138 areas, a 48.7 ha notill irrigated maize soybean field, had a similar range of TC (3.229.4 kg m2) when estimated using RK, whereas a ll other areas and methods reached a maximum SOC of 12.6 kg m2. More interestingly, at 030 cm TC1 in the SFRW equals or exceeds SOC found in the upper 1 m of soil in other studies (Homann et al., 1998; Ryan et al., 2000; Galbraith et al., 2003) and in some cases, in the upper 2 m of soil in other U.S. locations (Guo et al., 2006a). This finding demonstrates the potential of Florida soils to store large amounts of organic C, especially considering that wetlands are spread out across the whole state covering major proportions of Floridas land area (28%). Conclusions Overall, our study captured the typical trend of decreasing TC values down the soil profile, i.e., TC1 > TC2 > TC3 > TC4. In effect, formal pair wise comparisons of TC among the four depths reve aled statistically significant differences at the 0.05 confidence level in all pairs except between TC2 and TC3. However, even though TC varies among depths, significant correlations still exist between them, which corroborate a vertical redistribution of TC. The implications of these findings are that, on one hand, sampling TC at the surface layer may offer a crude estimate of TC at deeper layers, in other words, may indicate what to expect of the horizontal distribution of TC at deeper layers across the S FRW. On the other hand, significant amounts of TC are present in the deeper layers (e.g., in spodic horizons), thus to quantify TC only at the surface layer would not be sufficient to fully assess the total stock of TC, and how it is distributed both verti cally and horizontally across the watershed. All environmental factors tested by ANOVA to explain the variability of TC100, i.e., land use/land cover, soil type (taxonomic order), soil drainage class, and geologic unit, were statistically significant at th e 0.05 confidence level. In addition, Dunnetts T3 post hoc test identified differences in TC100 among group means in all environmental factors tested.
139 Among the geostatistical methods compared to scale up TC in the SFRW, LK was the most accurate at three of the five depth intervals, namely for TC1 (RMSEv = 3.34 kg m2), TC3 (RMSEv = 2.02 kg m2), and TC100 (RMSEv = 7.21 kg m2). Regression kriging using RT was the best method to scale up TC2, and obtained a RMSEv of 6.20 kg m2. At 120180 cm, TC4 was best modeled by RK using SMLR, producing a RMSEv of 2.75 kg m2. These results illustrate the influence of different environmental factors in the spatial distribution of TC in the SFRW. According to our findings, depth, land use/land cover, soil type, soil dra inage class, geologic unit, and TCs spatial autocorrelation, all influenced the distribution of TC, thus should be considered in future projects involving digital soil mapping in similar areas in Florida and in the southeastern U.S. Soil total carbon from 0 to 100 cm was high (2.62160.50 kg m2) compared to other regions, indicating that Florida has a great potential to store C in soils to mitigate the effects of global climate warming. In the 3585km2 SFRW, soils store from 39.3 (based on LK) to 90.1 Tg of C (based on RK/RT) in the upper 1 m. If these estimates were extrapolated to the state of Florida, this could mean a total TC stock of about 1.6 to 3.8 billion tons, without accounting for the fact that the Everglades would contribute higher TC stocks t han the remaining land. These numbers are conservative estimates because Florida soils store great amounts of C in the subsoil below 1 m that have not been completely assessed at this point. This is an important finding to support soil conservation in Flor ida considering the current land pressure over undeveloped areas. Thus, sustainable land uses must be adopted to avoid loss of this important C resource that has accumulated for millennia, and guarantee the quality of life of future populations, at the sam e time improving soil fertility for commercial uses, and aggregating land value through soil C sequestration.
140 Table 5 1. Environmental data and sources used to model the global spatial trend of logtransformed soil total carbon (LogTC). Environmental prop erty Format Data type Source Date Original scale/s patial resolution ( m) Soil taxonomic order1 Vector Categorical USDA/NRCS/SSURGO 1995 1:24,000 Soil drainage class Vector Categorical USDA/NRCS/SSURGO 1995 1:24,000 Soil hydric rating Vector Catego rical USDA/NRCS/SSURGO 1995 1:24,000 Soil hydrologic group Vector Categorical USDA/NRCS/SSURGO 1995 1:24,000 Soil AWC Vector Continuous USDA/NRCS/SSURGO 1995 1:24,000 Depth to water table Vector Continuous USDA/NRCS/SSURGO 1995 1:24,000 KSAT Vector Con tinuous USDA/NRCS/SSURGO 1995 1:24,000 Clay content Vector Continuous USDA/NRCS/SSURGO 1995 1:24,000 Silt content Vector Continuous USDA/NRCS/SSURGO 1995 1:24,000 Sand content Vector Continuous USDA/NRCS/SSURGO 1995 1:24,000 MAT Raster Continuous NCDC/ NOAA 19932005 N/A M AP Raster Continuous NCDC/NOAA 19932005 N/A Ecological regions Vector Categorical FDEP 1995 1:250,000 Physiographic divisions Vector Categorical FDEP 2000 1:2,000,000 Land use/land cover 1 Raster Categorical FFWCC 2003 N/A / 30 Tot al population per census tract Vector Continuous U.S. Census Bureau 2000 1:100,000 ETM+ reflectance Raster Continuous USGS 2004 N/A / 30 ETM+ NDVI Raster Continuous USGS 2004 N/A / 30 ETM+ TNDVI Raster Continuous USGS 2004 N/A / 30 ETM+ IR/R Raster Continuous USGS 2004 N/A / 30 ETM+ IR R Raster Continuous USGS 2004 N/A / 30 ETM+ tasseled cap indices Raster Continuous USGS 2004 N/A / 30 ETM+ principal components Raster Continuous USGS 2004 N/A / 30 Elevation Raster Continuous USGS/NED 1999 1:2 4,000 / 30 Slope Raster Continuous USGS/NED 1999 1:24,000 / 30 Aspect Raster Categorical USGS/NED 1999 1:24,000 / 30 Catchment area Raster Continuous USGS/NED 1999 1:24,000 / 30 CTI Raster Continuous USGS/NED 1999 1:24,000 / 30 Environmental geology V ector Categorical FDEP 2001 1:250,000 Surficial geology Vector Categorical FDEP 1998 1:100,000 Aquifer vulnerability index (DRASTIC) Vector Continuous FDEP 1998 1:100,000 Hydrogeology Vector Categorical FDEP 1998 1:100,000 Abbreviations: AWC = availabl e water capacity; CTI = c ompound topographic index; DRASTIC = depth to water, net recharge, aquifer media, soil media, topography, impact of the vadose zone, hydraulic conductivity of the aquifer ; ETM+ = Landsat Enhanced Thematic Mapper Plus; FDEP = Florid a Department of Environmental Protection; FFWCC = Florida Fish and Wildlife C onservation Commission; IR/R = infrared red ratio; IR R = infrared red difference; KSAT =
141 s aturated hydraulic conductivity; MAP = mean annual precipitation; MAT = mean annual temp er ature; N/A = n ot applicable; NCDC = National Climatic Data Center; NDVI = normalized difference vegetation index; NED = National Elevation Dataset; NOAA = National Oceanographic and Aeronautic Administration; NRCS = Natural Resources Conservation Service; SSURGO = Soil Surve y Geographic database; TNDVI = t ransformed NDVI; USDA = United States Department of Agriculture; USGS = United States Geological Survey. 1 These layers were solely used to apply the predictive models to upscale TC properties; land use/ land cover, and soil order data used to derive the models were obtained in the field respectively by the staff of the GIS Laboratory, and Wade Hurt, from the Soil and Water Science Department at the University of Florida.
142 Table 5 2. Descriptive statistics of observed soil total carbon (TC) and logtransformed TC (LogTC) at different depths Statistics Whole set Training Validation Whole set Training Validation TC1 (kg m 2 ) LogTC1 (log kg m 2 ) Observations 141 102 39 141 102 39 Mean 6.26 6.96 4.43 0.6719 0.7012 0.5952 Std. error of the mean 0.68 0.92 0.39 0.0236 0.0297 0.0330 Median 4.57 4.77 4.06 0.6598 0.6789 0.6090 S td. deviation 8.04 9.25 2.45 0.2808 0.3002 0.2064 C oeff. of variation 128.46 132.93 55.28 41.79 42.81 34.68 Skewness 5.43 4.74 2.02 1.19 1.15 0.43 Kurtosis 33.84 25.06 5.10 3.35 3.01 0.32 Range 62.71 62.71 11.49 1.7374 1.7374 0.9213 Minimum 1.17 1.17 1.56 0.0679 0.0679 0.1943 Maximum 63.88 63.88 13.05 1.8053 1.8053 1.1156 TC2 (kg m 2 ) LogTC2 (log kg m 2 ) Observations 141 102 39 141 102 39 Mean 3.73 3.86 3.41 0.2742 0.2559 0.3220 Std. error of the mean 1.01 1.35 0.99 0.0300 0.0354 0.0567 Median 1.67 1.56 2.04 0.2233 0.1942 0.3090 Std. deviation 12.02 13.62 6.21 0.3567 0.3577 0.3540 Coeff. of variation 321.92 353.32 181.98 130.09 139.78 109.94 Skewness 8.94 8.35 5.13 2.01 2.31 1.32 Kurtosis 89.46 75.07 28.51 7.10 9.00 3.59 Range 129.50 129.48 38.07 2.4817 2.4684 1.9535 Minimum 0.43 0.44 0.43 0.3680 0.3547 0.3680 Maximum 129.92 129.92 38.50 2.1137 2.1137 1.5854 TC3 (k g m 2 ) LogTC3 (log kg m 2 ) Observations 139 101 38 139 101 38 Mean 3.59 4.16 2.07 0.2669 0.2834 0.2233 Std. error of the mean 0.98 1.34 0.32 0.0299 0.0383 0.0397 Median 1.71 1.81 1.64 0.2321 0.2567 0.2156 Std. deviation 11.54 13.46 2.00 0.3526 0.3853 0.2447 Coeff. of variation 321.72 323.52 96.86 132.11 135.96 109.58 Skewness 8.24 7.08 3.59 1.86 1.76 1.61 Kurtosis 70.95 51.80 13.36 6.93 6.09 3.93 Range 112.29 112.29 10.21 2.4902 2.4902 1.1961 Minimum 0.36 0.36 0.69 0.4385 0.4385 0.1587 Maximum 112.66 112.66 10.90 2.0518 2.0518 1.0374
143 Table 5 2. Continued. Statistics Whole set Training Validation Whole set Training Validation TC4 (kg m 2 ) LogTC4 (log kg m 2 ) Observations 133 95 38 133 95 38 Mean 1.61 1.44 2.02 0.0493 0.0356 0.0835 Std. er ror of the mean 0.18 0.14 0.51 0.0295 0.0327 0.0631 Median 1.08 1.09 1.03 0.0320 0.0372 0.0142 Std. deviation 2.05 1.37 3.16 0.3397 0.3192 0.3890 Coeff. of variation 127.39 94.62 156.47 689.05 896.63 465.87 Skewness 5.16 3.08 4.22 0.55 0.19 1.00 Kurto sis 36.16 11.70 20.62 1.08 0.62 1.28 Range 18.34 8.41 18.25 2.0545 1.7206 1.8623 Minimum 0.16 0.16 0.25 0.7873 0.7873 0.5951 Maximum 18.50 8.58 18.50 1.2672 0.9333 1.2672 TC100 (kg m 2 ) LogTC100 (log kg m 2 ) Observations 139 101 38 139 101 38 Me an 11.79 12.97 8.65 0.9065 0.9239 0.8602 Std. error of the mean 2.13 2.89 1.14 0.0238 0.0294 0.0377 Median 7.49 7.75 6.65 0.8744 0.8892 0.8225 Std. deviation 25.08 29.06 7.03 0.2800 0.2952 0.2323 Coeff. of variation 212.68 224.01 81.26 30.89 31.95 27.01 Skewness 8.58 7.52 3.05 2.08 2.15 1.40 Kurtosis 82.93 62.53 9.94 8.41 8.80 2.44 Range 267.07 267.07 34.70 2.1646 2.1646 1.0638 Minimum 1.84 1.84 3.28 0.2650 0.2650 0.5157 Maximum 268.91 268.91 37.98 2.4296 2.4296 1.5795 Abbreviations: LogTC1 = l og10(TC1); LogTC2 = l og10(TC2); LogTC3 = l og10(TC3); LogTC4 = l og10(TC4); LogTC100 = l og10(TC100); TC1 = TC at 0 30 cm; TC2 = TC at 30 60 cm; TC3 = TC at 60 120 cm; TC4 = TC at 120180 cm; TC100 = TC at 0 100 cm.
144 Table 5 3. Pair wise comparison of logtransformed soil total carbon (LogTC) at different depths Pair Observations Correlation Student's t test Mean difference t Deg of freedom LogTC1 LogTC2 141 0.70* 0.40* 18.30 140 LogTC1 LogTC3 139 0.54* 0.39* 15.22 138 LogTC1 LogTC4 133 0.32* 0.58* 19.84 132 LogTC2 LogTC3 139 0.76* 0.00 0.18 138 LogTC2 LogTC4 133 0.60* 0.18* 7.41 132 LogTC3 LogTC4 133 0.69* 0.18* 8.50 132 Abbreviations: LogTC1 = l og10 of TC at 0 30 cm; L ogTC2 = l og10 of TC at 30 60 cm; L ogTC3 = l og10 of TC at 60 120 cm; L ogTC4 = l og10 of TC at 120180 cm; TC = soil total carbon Statistically significant at the 0.05 significance level. In the case of the Students t test, this corresponds to a significance level of 0.0083 after Bonferroni correction for each pair wise co mparison.
145 Table 5 4. Analyses of variance (ANOVA) between log transformed soil total carbon at 0 100 cm (LogTC100) and selected environmental variables. Environmental factor Levene's test ANOVA Welch's test 1 Brown Forsythe's test 1 Statistic Statistic St atistic Statistic Land use/land cover 8.10* N/A 5.11* 9.89* Soil taxonomic order 7.50* N/A 23.04* 31.17* Soil drainage class 2.21 9.55* 14.81* 9.21* Geologic unit 8.42* N/A 4.81* 1.67 Abbreviations: N/A = n ot applicable. 1 The preferred test for the respective environmental factor according to group sizes is shown in italics. Statistically significant at the 0.05 confidence level
146 Table 5 5. Homogeneous groups of log transformed soil total carbon at 0 100 cm ( Log TC100) based on land use/land cover, soil order, soil drainage class, and geologic unit respectively, according to Dunnetts T3 test at the 0.05 confidence level. G roup mean LogTC100 (log kg m2) Homogeneous groups according to Dunnett's T3 test Land use/land cover 1.3167 Wetland 0.9422 Rangeland Rangeland 0.8856 Pineland Pineland 0.8786 Urban land Urban land Urban land 0.8689 Improved pasture Improved pasture 0.7924 Agriculture 0.7374 Upland forest Upland forest Soil taxonomic order 2.1415 Inceptisols + Histosols 1.0024 Spodosols Spodosols 0.9237 Alfisols Alfisols 0.8311 Ultisols 0.7772 Entisols Soil drainage class 1.3611 Very poorly drained 0.9604 Poorly drained 0.9109 Somewhat poorly drained 0.8977 Moderately well drained 0.7955 Well drained Well drained 0.6251 Excessively drained Geologic unit 0.9746 Ocala limestone Ocala limestone 0.9518 Coosawhatchie formation 0.9464 Undifferentiated sediments 0.7863 Other Other 0.7767 Undifferentiated geology
147 Table 5 6. Comparative results of the three geostatistical methods used to model soil total carbon (TC) at different depths. Property Average estimated TC (kg m-2)1 Estimated total stock ( Tg ) 1 RMSEv (kg m 2 ) 1 LK RK/SMLR RK/RT LK RK/SMLR RK/RT LK RK/SMLR RK/RT TC1 6.41 7.06 7.73 23.01 25.36 27.74 3.34 6.96 9.99 TC2 2.93 18.19 3.25 10.52 65.30 11.65 12.57 20.27 6.20 TC3 2.95 10.39 5.73 10.60 37.29 20.57 2.02 19.32 3.42 TC4 1.47 2.50 3.45 5.28 8.98 12.38 2.94 2.75 3.46 TC100 10.94 23.01 25.11 39.29 82.59 90.13 7.21 16.67 15.82 Abbreviations: LK = l ognormal kriging; RK/SMLR = r egression kriging using stepwise multiple linear regression to map the global spatial trend; RK/RT = r egression kriging using regression tree to map the global spatial trend; RMSEv = r oot mean square error calculated on the validation set ; TC1 = TC at 0 30 cm; TC2 = TC at 30 60 cm; TC3 = TC at 60 120 cm; TC4 = TC at 120 180 cm; TC100 = TC at 0 100 cm. 1 Results obtained from the best models are shown in italics.
148 Table 5 7. Semivariogram parameters of the f itted exponential model of the best geostatistical method identified for soil total carbon (TC) at each depth Property Lag options Semivariogram parameters Size (m) Number Nugget effect [ (log kg m 2 ) 2 ] Sill [ (log kg m 2 ) 2 ] Effective range (m) Nugget/sill (%) LogTC1 2200 17 0.0159 0.0879 11,579 18.14 LogTC2 1 2500 16 0.0230 0.0869 8791 26.53 LogTC3 2200 17 0.0208 0.1452 6975 14.34 LogTC4 2 3100 14 0.0302 0.0545 17,887 55.34 LogTC100 2200 17 0.0064 0.0867 6490 7.41 Abbreviations: LogTC1 = l og10 of TC a t 0 30 cm; LogTC2 = l og10 of TC at 30 60 cm ; LogTC3 = l og10 of TC at 60 120 cm; LogTC4 = l og10 of TC at 120180 cm; LogTC100 = l og10 of TC at 0100 cm 1 Residual LogTC2 from the global trend model obtained with regression tree 2 Residual LogTC4 from the global trend model obtained with stepwise multiple linear regression.
149 Figure 51. Land use/land cover and sampling sites in the Santa Fe River watershed (SFRW), Florida.
150 A B C D Figure 52. Soil total carbon (TC) output maps obtained by the best geostatistical method at each depth in the Santa Fe River watershed (SFRW), Florida; A) TC at 0 30 cm (TC1) modeled by lognormal kriging (LK); B) TC at 3060 cm (TC2) modeled by regression kriging using regression tree to model the global trend (RK/RT); C) TC at 60120 cm (TC3) modeled by LK; D) TC at 120180 cm (TC4) modeled by regression kriging using stepwise multiple linear regression to model the global trend (RK/SMLR); and E) TC at 0 100 cm (TC100) modeled by LK.
151 E Figure 52. Continued.
152 Figure 53. R egression tree of the log transformed soil total carbon at 3060 cm (LogTC2) global tre nd model. Elevation in meters. Abbreviations: Avg = average of LogTC2 observations in the node in log kg m2; N = number of observations in the node; STD = standard d eviation of LogTC2 observations in the tree node in log kg m2; TM04B2 = band 2 (green) of Landsat Enhanced Thematic Mapper Plus in digital numbers
153 CHAPTER 6 UPSCALING OF DYNAMIC SOIL ORGANIC CARBON POOLS IN A NORTHCENTRAL FLORIDA WATERSHED Summary Re gional scale assessment of soil carbon (C) pools is essential to provide information for C cycling models, land management, and policy decisions, and elucidate the relative contribution of different C pools to total carbon (TC). We estimated TC, and four s oil C fractions, namely recalcitrant C (RC), hydrolyzable C (HC), hot water soluble C (SC), and mineralizable C (MC) at 0 30 cm across a 3585km2 mixed use watershed in northcentral Florida. We used lognormal kriging (LK), and regression kriging (RK) to upscale soil C using 102 training samples, and compared models using 39 validation samples. Lognormal kriging produced the most accurate models for TC, HC, SC, and MC, while RC was best modeled by RK using a regression tree (RT) global trend model. Maps produced by LK showed similar spatial patterns, due to the strong correlation between soil C properties, and similarity of their spatial dependence. The distribution of RC, in turn, reflected the split of the population by RT based on the depth to water table Soil total C amounted to 23.01 Tg (teragrams) across the watershed, indicating the potential of these soils to store C. Recalcitrant C totaled 21.77 Tg (94.6% TC), suggesting that a large amount of TC could be potentially stored for centuries to millenni a. Our estimates of soil C and fractions within a mixeduse watershed in Florida highlight the importance of appropriately characterizing the inherent spatial dependence of soil C, as well as relevant regional environmental patterns (e.g., hydrology) to be tter explain the variability of soil C. Introduction The demand for maps of soil properties over large regions has increased considerably in the last decades reflecting the growing concern for the conservation of soil resources for the provision of commodi ties and ecosystem services, including food and fiber, water quality, and
154 biodiversity. In this context, soil organic carbon (SOC) is a central property that relates to biological, chemical and physical soil properties and processes; thus, regional estimat es of SOC aid to understand the general spatial patterns of soils across landscapes. Given the current climate change (Carbon Dioxide Information Analysis Center, 2008), it is essential to quantify present SOC stocks and to understand how SOC behaves spati ally across large regions. This is prerequisite to assess the potential of soils to sequester and maintain carbon (C) for future generations. To quantify total amounts of SOC does not equate to understanding the recalcitrance/lability of C forms in soils that depend on several environmental determinants, such as climate, topography, hydrology, land use, and other properties (Jenny, 1941; McBratney et al., 2003). Therefore, in order to understand the dynamics of SOC across large landscapes, fractionation of the total SOC may be utilized to clarify the influence of biochemical recalcitrance on the spatial behavior of SOC, and its interaction with environmental landscape properties. Many fractionation techniques have been applied to decompose total SOC or soil organic matter (SOM) into more or less stable C forms with the objective of mapping the dynamics of SOC as it interacts with local (e.g., plants and microorganisms) and global (i.e., the lithosphere, hydrosphere and atmosphere) components (e.g., Parton et al., 1983; Coleman and Jenkinson, 1996; Zimmermann et al., 2007). The most common fractionation techniques utilize either physical methods, chemical methods, or a combination of both (von Ltzow et al., 2007). Physical fractionation usually involves separa tion based on particle density, aggregation or energy of aggregation and/or size (e.g., Sohi et al., 2001; Six et al., 2002; Echeverra et al., 2004; Sarkhot et al., 2 007a, 2007b), whereas chemical fractionation is commonly done by acidolysis, hydrolysis a nd/or oxidation (e.g., Leavitt et al., 1996; Ghani et al., 2003; Silveira et al., 2008).
155 Silveira et al. (2008) extracted labile pools of SOC using acid hydrolysis, and explained 99% of the variability of 6 M HCl hydrolyzable SOC (HC) using microbial C biomass and hot water soluble C (SC), which demonstrated the association of the HC pool with indicators of SOC lability in the soil. In their 18 soil samples, HC comprised 18 to 32% of soil total C (TC), and SC comprised 1 to 4% of TC. Ghani et al. (2003) cor related different soil C and nitrogen (N) fractions with land use, grazing intensity, and N and phosphorus (P) fertilization, and found that SC was more sensitive than total SOC to land use, and management practices. Moreover, SC was highly correlated with soil total carbohydrates (R2 = 0.88), mineralizable N after 7 days of incubation (R2 = 0.86), and soil microbial C (R2 = 0.84) and N (R2 = 0.72) biomass. Incubation to quantify mineralization/immobilization rates is another method to assess the stability and lability (i.e., availability to plants and microorganisms) of SOC. Mineralization rates and mean residence times (MRT) of SOC have been associated with chemical and physical SOC fractions (e.g., Franzluebbers, 1999; Alvarez and Alvarez, 2000; Paul et a l., 2006), and related soil bio physico chemical properties and processes (e.g., Alvarez et al., 1995; Causarano et al., 2008). Paul et al. (2006) used 6M HCl hydrolysis in 1100 samples to separate the nonhydrolyzable C pool, hereafter referred to as recalcitrant C (RC), determining that RC represented 30 to 80% of total SOC, depending on soil type, texture, depth, and management. In addition, they measured active and slow soil C pools after incubation, showing that active C represented 2 to 8% of SOC, and had a MRT of days to months, whereas slow C comprised 45 to 65% of SOC, with a MRT of 10 to 80 years. Causarano et al. (2008) identified differences between SOC and fractions among areas of pasture, conservation tillage, and conventional tillage based on 87 observations, estimating SOC stocks at 020 cm of 3.89, 2.79, and 2.22 kg m2 for the three types of management, respectively. Management explained 41.6% of the variability of
156 SOC, but other significant explanatory variables included clay content (5.2% ), and mean annual temperature (1.0%). Total SOC was significantly correlated with all measured SOC physical and chemical fractions, including particulate organic C, microbial C biomass, and mineralizable C after 24 days of incubation. The great variabilit y of SOC and distinct environmental conditions in Florida offer an ideal framework to characterize the relationship between soils and their respective soil forming landscape factors. Well known soil forming conceptual frameworks were adopted in this study to evaluate these relationships with the objective of assessing the spatial distribution of TC and dynamic SOC pools within a mixed use 3585km2 watershed in northcentral Florida. We hypothesized that TC and SOC chemical fractions are a function of colloc ated environmental landscape properties, and thus can be spatially explained and estimated using hybrid geostatistical environmental correlation models. Many soil properties have been modeled and upscaled regionally (McBratney et al., 2003). Of those prope rties, soil C is the most recurrent as it is probably the most omnibus to be considered an indicator of environmental quality, and soil ecosystem services. Examples of soil C studies at the regional scale include Homann et al. (1998), McKenzie and Ryan (1999), Krogh et al. (2003), Hengl et al. (2004), Minasny et al. (2006), and our assessment at multiple depths in Chapter 5. However, upscaling of SOC chemical fractions has been rarely done due to the analytical and computational costs and labor required to implement such studies. To overcome this research gap, we conducted a thorough investigation of the spatial distribution of TC and dynamic SOC pools with the aim to: (i) estimate the patterns of TC and four SOC fractions, namely RC, SC, and mineralizable o rganic carbon (MC), across a large subtropical watershed
157 using hybrid geostatistical upscaling methods, and (ii) validate the upscaled models using independent validation data sets. Materials and Methods Study Area The study was conducted in the Santa Fe River watershed (SFRW), a 3585 km2 mixed use watershed located in north central Florida between latitudes 29.63 and 30.21 N and longitudes 82.88 and 82.01 W. A complete description of the SFRW was presented in Chapter 2. Briefly, most frequent soil series in the SFRW are: Sapelo, Blanton, Ocilla, Mascotte, and Foxworth, and most frequent soil orders include Ultisols (47%), Spodosols (27%), and Entisols (17%) (Natural Resources Conservation Service, 200 9). Major land uses include pineland (30%), wetland (14% ), improved pasture (13%), upland forest (13%), and rangeland (13%) ( Florida Fish and Wildlife Conservation Commission, 2003a ). Field Sampling and Laboratory Methods A total of 141 soil samples (Figure 61) were collected between September 2003 and January 2005 at a fixed depth of 0 to 30 cm along soil order/land use trajectories, and analyzed for TC, RC, SC and MC. Local variability was accounted for by composite sampling with 4 subsamples collected within a 2 m radius at each site, and homogenized for ana lysis. Soil organic carbon represents more than 98% of the soil total carbon in Florida ( Guo et al., 2006a ; N.B. Comerford personal communication, 2005); therefore soils in this study that all had soil pH below 6.0, were not pretreated with acid to remove carbonates. The detailed sampling design and laboratory procedures were described in Chapters 2, and 3. Total C was determined by dry combustion on a FlashEA 1112 Elemental Analyzer (Thermo Electron Corp., Waltham, MA); RC was also measured by dry combusti on after samples were refluxed with 6N HCl for 16 hours, according to Paul et al. (2001), and McLauchlan and Hobbie
158 (2004); SC was measured on a Shimadzu TOC 5050 Analyzer (Shimadzu Scientific Instruments Inc., Columbia, MD) after extraction using hot wate r (Sparling et al., 1998; Gregorich et al., 2003) and filtration through a 0.2m membrane; lastly, MC was measured during a period of 14 days of steady soil respiration inside an incubation chamber using a CO2Coulometer (UIC Inc., Joliet, IL). Hydrolyzable organic carbon was calculated by the difference between TC and RC. The TC and SOC fractions covered a great variability of environmental conditions within the SFRW. Since our objective was to represent this environmental variability in the upscaled mode ls, we developed the models using the whole range of SOC values, which ranged from low values (minimum TC of 1.17 kg m2) on Entisols over karst terrain, to very high values (maximum TC of 63.88 kg m2) on Histosols in wetland sites. To ensure our laboratory data did not contain measurement errors, or outliers, we checked them against selected environmental properties. We confirmed that all measurements were reasonable, and no outliers were present. As a consequence, all SOC properties had a positively skew ed lognormal frequency distribution. Since the hybrid geostatistical methods (compare next section) either require or benefit from an approximate normal distribution of the target variable, all SOC properties were log10transformed prior to upscaling. Upsc aling Methods The laboratory SOC measurements in concentration units (mg kg1; Chapter 3) were converted to areal (i.e., stock) units (kg m2) by multiplying the C concentration by the soil bulk density, depth (30 cm), and a unit conversion factor. Soil bulk density, in turn, was estimated using a class pedotransfer function linking historical bulk density measurements (Florida Soil Characterization Database, 200 9) from sites around and within the SFRW to the corresponding soil series of the sampled soils ( compare Chapter 5).
159 The 141 samples were randomly split into a training (102 observations), and a validation set (39 observations). Using the training samples, three upscaling approaches were compared to estimate TC and SOC fractions in the SFRW, namely: l ognormal kriging (LK), regression kriging (RK) using stepwise multiple linear regression (SMLR) to map the global spatial trend, and RK using regression tree (RT; Breiman et al., 1984) to map the global trend. In LK, ordinary kriging (OK) was performed on the log transformed SOC properties, and the output maps converted back to original units (Webster and Oliver, 2001). In RK, the global trend was modeled by SMLR or RT and the residuals kriged using LK. The spherical (Equation 61) and exponential (Equation 62) models were compared to best approximate the experimental semivariograms observed for the different SOC properties. 0 0 5 0 5 .3 0h a h c a h a h a h 1 c c h (6 1) r he c c h3 01 (6 2) Where: h= semivariance at lag distance h; c0 = nugge t effect; c = partial sill; h = lag distance; a = range; e = natural exponential base; r = effective range, where h achieves 95% of the total sill ( c0 + c ), at about 3a. The global spatial trend was modeled based on diverse environmental landscape variables covering all soil forming factors included in the CLORPT (Jenny, 1941), and SCORPAN (McBratney et al., 2003) models, which were obtained from various sources, and used as, or converted to GIS layers. The independent variables included: 10 soil properties (taxonomic order, drainage class, hydric rating, hydrologic group, available water capacity, depth to water table, saturated hydraulic conductivity, and sand, silt, and clay contents), 19 variables derived from Landsat Enhanced Thema tic Mapper Plus (6 bands, 6 principal components, 3 tasseled cap
160 indices, and 4 vegetation indices), 5 topographic variables (elevation, slope, aspect, catchment area, and compound topographic index), and 5 geologic variables (geologic formation, epoch of geologic formation, environmental geologic class, hydrogeologic group, and aquifer vulnerability index), mean annual temperature, and precipitation, land use/land cover, physiographic region, ecologic region, and population count (compare Chapter 5). Stepw ise multiple linear regression used a F probability of 0.05 to include or exclude variables from the model, and was conducted in SPSS (SPSS Inc., Chicago, IL). The RT models used a 10fold crossvalidation on the training samples, and were derived in CART (Salford Systems, San Diego, CA). Lognormal kriging was conducted in Isatis (Geovariances Americas Inc., Houston, TX). Accuracy assessment of the different SOC properties was conducted by comparing the estimated values in the final output maps produced by LK, RK/SMLR, and RK/RT, respectively, with the independent validation samples. The best model of each SOC property was identified based on the root mean square error of independent validation (RMSEv; Equation 63). n y y RMSEn i i i 1 2 (6 3) Where: y = predicted values; y = observed values; n = number of observations with i = 1, 2, n Results and Discussion Descriptive Statistics Total organic carbon varied from 1.17 t o 63.88 kg m2, with a mean of 6.26 kg m2 and a median of 4.57 kg m2 (Table 6 1). After log10 transformation, the mean and median values resembled each other, and the original skewness of 5.43 was reduced to 1.19, approximating a normal distribution. The range of the TC training sample (102 observations) encompassed the
161 range of the validation sample (39 observations), which is advisable to avoid extrapolation. However, their frequency distribution was dissimilar, as TC values in the training sample varie d more and had a more skewed histogram. Similar trends were observed on the measured SOC fractions. The range of the training sample encompassed that of the validation sample for all SOC fractions but HC, and its frequency distribution was more positively skewed and more variable (higher coefficient of variation; CV) than that of the validation sample, for all SOC fractions. Log10 transformation reduced the variability (i.e., the CV) of all soil C properties, except for HC. Detailed descriptive statistics f or all SOC properties are listed in Table 6 1. First we considered the correlations among TC and SOC fractions in concentration units (presented in Chapter 3), which avoid bias due to bulk density variations. In the next section, soil bulk density is impli citly incorporated in the upscaling models and estimated output maps of TC stocks and SOC fractions. In Chapter 3, the correlations between TC and SOC fractions were all significant at the 0.01 confidence level; in this study, the correlations between SOC properties identified in stock units (kg m2) were also all significant at the 0.01 confidence level, and very similar to those found by them. In Chapter 3, we also pointed out that the concentration of SOC fractions in the soil was inversely related to th eir lability. Thus, the concentration (and recalcitrance) of SOC fractions decreased in the following order: RC > HC > SC > MC. Recalcitrant organic C comprised about 75% of TC, while HC comprised the remaining 25%. The most labile fractions, SC and MC, ha d the smallest amount in the soil, comprising a maximum of 5% of TC in the case of SC.
162 From a C sequestration perspective, it is desirable to accrete RC, which is the most complex and resistant SOC fraction, lasting in the soil from decades to thousands of years (Quideau, 2006; Rice, 2006). In effect, it has been shown that RC provides a major proportion of TC (Paul et al., 2006; Silveira et al., 200 8; Vasques et al., 2009). Labile SOC fractions, on the other hand, are less stable in the environment as they are generally composed of simpler molecules, including carbohydrates, resins, and lipids. They can either be converted to more recalcitrant forms through biochemical transformations (i.e., humification), or be used up to sustain soil microbial communities and ultimately plant growth, or they are lost by leaching. Upscaling of Soil Organic Carbon Properties Four out of the five SOC properties analyzed were best modeled by LK, according to the RMSEv (Table 6 2), namely TC, HC, SC and MC. Recalcitrant organi c C had the most accurate interpolation map created by RK/RT. Due to the close correlation among SOC properties (Chapter 3), their maps produced by LK showed similar distribution patterns, which differed from those of RC (Figure 6 2). Nonetheless, the same C hotspots present in the LK maps were also evident in the RK/RT map of RC, corresponding to wetland areas of Histosols in the southeast, and central east, and Spodosols in the central north and southwest, respectively. If the RMSE calculated on the train ing samples (RMSEt) would have been considered to select the best upscaling models instead of the RMSEv, they would agree in all cases but RC (Table 6 2). The global spatial trend RT model of RC had only one splitting node, which divided the samples with d epth to water table (DEPWATTBL) less or equal to 9.50 cm from those with DEPWATTBL > 9.50 cm, separating seasonally water logged soils from drier ones (Figure 63). All seven training samples grouped in terminal node 1 (DETWATTBL <= 9.50 cm) were collected at wetland sites, and had on average higher RC stock (1.10 kg m2) than the 95 samples grouped in terminal node 2 (0.50 kg m2). This is because of higher accumulation of organic
163 matter facilitated by slower decomposition in anaerobic conditions (Bouchard and Cochran, 2006). In addition, the variability of samples grouped in terminal node 1 (0.42 kg m2) was greater than those in terminal node 2 (0.30 kg m2), in spite of their reduced number and similar saturated soil conditions. For the reference of the reader, Table 6 3 lists the SMLR global trend models of SOC properties, and the variables selected in their RT models, respectively. Global trend models explained 18 to 69% of the variability of the SOC property in training mode, but only up to 18% in vali dation, and not a single predictor had a strong correlation with the SOC properties (correlations not reported). All the assumptions of SMLR were met, except for some collinearity among predictors, and heteroscedasticity in the residuals of some of the models. Correlation between residuals and dependent variables was also evident in some cases, due to the remaining unexplained variability in the dependent variables; however, this correlation was minimized by kriging the residuals. The spatial auto dependenc e identified by the LK models of SOC properties was more important in most of the cases (4 out of 5 properties) than their relationship with other environmental factors. Given the moderate to low explanatory power of the models (Table 63), the superiority of LK over RK was not surprising. The similar distribution patterns of TC, HC, SC and MC might be attributed to their close spatial dependence structure, summarized in Table 64. First, the lag sizes chosen to approximate the experimental semivariogram we re either 2150 or 2200 m, and the corresponding numbers of lags were 18 or 17, respectively. Second, the exponential model was chosen to fit the experimental semivariograms for all SOC properties. Third, their semivariogram ranges were on the same order of magnitude, varying from 9664 m (SC) to 12,053 m (HC). And fourth, their spatial dependence was also similar, as evidenced by the small nugget to sill ratios, ranging from about 13% (SC) to about 19% (MC), indicating
164 strong spatial dependences (Cambardella et al., 1994). The only exception was HC, whose nugget to sill ratio was much higher (39%), indicating a moderate spatial autocorrelation (Cambardella et al., 1994). Since these four SOC properties were statistically correlated at the 0.05 confidence leve l (Chapter 3), it was not surprising to observe similar geostatistical behavior between them. Raw RC also had similar spatial structure comparable to the other SOC properties, but residual RC from the RT trend model had smaller nugget, sill, and range, ind icating that only residual short range RC variation was left unexplained after RT modeling. This short range variation of RC was, however, more structured than that of raw RC, based on the smaller nugget to sill ratio. Compared to other studies, the effect ive ranges that we identified by the best upscaling models of SOC properties were usually longer than those observed for soil C. Hengl et al. (2004) found ranges of about 3000 m for topsoil SOM, and SOM residuals from SMLR in a 2500km2 region in central C roatia based on 100 observations. Wang et al. (2002 a ) observed a range of 3070 m for SOC at 030 cm in an 11,000ha experimental forest in northeast Puerto Rico, based on 100 observations. These ranges are smaller than the smallest range observed in this s tudy of 5039 m for residual RC from RT, however much smaller ranges were observed reflecting the smaller sizes of the study areas (field scale), and correspondent sampling designs, which are discussed in the next paragraph. At the field scale, Lark (2000) observed a range of 77.4 m for residual SOM in a 6ha barley field in Bedfordshire, England, using 45 observations (7.5 samples/ha). Lpez Granado et al. (2005) observed a range of 44.8 m for SOM in 6ha wheat/sunflower field in southern Spain using 86 obs ervations (14.3 samples/ha). Mueller and Pierce (2003) identified a range of 198 m for TC in a 12.5ha corn/soybean field in Michigan based on 134 observations (10.7 samples/ha).
165 All these studies mapped soil C or SOM within 0 to 20 cm. Simbahan et al. (2006) mapped soil C stocks at 0 30 cm in three crop fields of sizes of 48.7 to 65.4 ha in Nebraska, and observed ranges of 89 to 105 m, as well as a longrange spatial dependence up to 450 m in the largest field, using 202 to 265 observations (~4 samples/ha) Terra et al. (2004), also mapping SOC stock at 0 30 cm, in a 9ha cotton field in central Alabama, observed ranges varying from 63.4 to 73 m, depending on the sampling intensity, which varied from 68 to 496 observations (7.6 to 55.1 samples/ha, respectiv ely). Finally, McBratney and Pringle (1999, Figure 2d) compile d 9 studies of soil C to calculate an average range close to 250 300 m. In contrast, SOCs spatial dependence shows a much greater range across large regions. Zhang and McGrath (2004) found a ra nge of 100 km mapping SOC at 010 cm in southeast Ireland (~15,460 km2), based on 220 observations. Liu et al. (2006) observed a range of 632 km for SOC at 0 20 cm in a 3435km2 cropland region in northeast China, using 354 observations. Recent findings fr om a multi scale assessment of TC in Florida (~150,000 km2; work in progress) also indicate that there is a long range component in the spatial structure of TC. These results clearly confirm the trend that the larger the study area, the larger the correlat ion distance, i.e., the range. As the study area increases, landscape patterns occurring over longer distances that influence the spatial structure of SOC are identified by the semivariogram. Furthermore, as the number of samples collected per unit area us ually decreases with an increase in size of the study area, due to budget constraints, often there are not enough samples in the close range to adequately characterize the short to medium range variability of SOC over larger areas. As a consequence, the li mited number of samples spread out across the study area can only capture the long range variability of SOC, which was probably the case for this study.
166 In this study, the spatial dependence structure was a stronger determinant of the spatial distribution of most of the SOC properties than their correlation with collocated environmental properties. In Chapter 5, we upscaled SOC at different depth intervals, and observed that LK was also preferred instead of RK for 3 out of 5 depth intervals. These findings do not confirm other studies that compared variations of kriging, and other upscaling methods, to interpolate soil C or SOM. Lpez Granados et al. (2002, 2005) obtained more accurate results using RK, when compared to OK, and simple linear regression. Muel ler and Pierce (2003) preferred kriging with an external drift (KED) over SMLR, OK, co kriging (COK), and universal kriging (UK) when using a more complete sampling grid (134 observations), but UK performed best when the sampling size was reduced to 38 sam ples. They used elevation as the external drift variable. Finally, Simbahan et al. (2006) compared multiple variations of kriging (OK, COK, KED, and RK) to upscale SOC, and found that RK outperformed the other methods in all three fields. Kriging with an e xternal drift was the second most accurate method, followed by COK, and OK produced the worst results in all fields. On the other hand, our study agrees with Terra et al. (2004), who obtained the lowest RMSEv for SOC using OK, when compared to COK, RK, and multiple linear regression. They mapped SOC in a cotton field in central Alabama, and tested three sampling designs, with escalating grid intervals, containing 496, 248, and 68 observations, respectively. Ordinary kriging provided the most accurate maps f or all sampling designs, with RMSEv of 0.30 kg m2 for the sampling designs with 496, and 248 observations, and 0.36 kg m2 for the sampling design with 68 observations. The SFRW is comprised of many types of soils, and land uses; thus, a model to upscale SOC across this highly complex mixture of ecotypes should, at least in part, account for this
167 variability. Therefore, we would expect that RK would perform better than LK, which was not the case for most of the SOC properties investigated. Two explanations for this fact are possible. First, the variability of TC and SOC fractions are inherently represented under the LK models. This would be reasonable because the upscaled maps agree on the hotspots of SOC, which were, as discussed before, concentrated aroun d wetlands, and Spodosols. In this sandy and flat watershed, it seems like the landscape determinants of TC and SOC fractions are hydrology, land use, and soil type (more specifically the presence of spodic horizons). The second explanation could be that t he variability of environmental properties is so large in the SFRW that no environmental patterns were apparent, or linked strongly enough to the distribution of SOC properties. Along these lines, not a single environmental factor showed a high correlation with any of the SOC properties, resulting in a lack of strong soil environmental trends within the watershed. In spite of the weak correlations between SOC properties and environmental factors (Table 63), our upscaled maps depict clear spatial trends of SOC properties in northcentral Florida, which have important implications for sustainability and environmental quality in similar regions in the southeastern US. Soil total C in the 0 30cm depth amounted to 23.01 Tg (teragrams) (Figure 2a), of which 21.77 Tg consisted of RC (Figure 2b), which has a long residence time in the soil. If these contents were considered for the entire state (150,000 km2), Florida soils would store about 963 Tg of C in the upper 30 cm, and could potentially store 911 Tg of RC for centuries. Readily mineralizable organic C sums up to a minimum amount (0.18 Tg C; Figure 2e) relative to the other fractions, and HC, and SC amount to 6.21, and 1.30 Tg C, respectively (Figure 2c,d). These labile C fractions represent what could potenti ally be lost, or transferred to other C pools within a relatively short period of time. Because they are less protected than RC,
168 they will be more sensitive to environmental change, including climate change, land use shifts, and change of land management. Therefore, conservation of soil C will depend on fostering the conversion of labile C to RC, while minimizing its loss. Similarly, RC is stable as long as it stays protected from physically, biologically, and/or chemically induced decomposition. Disturbanc e of soils and ecosystems can disrupt RCs protection, promoting its conversion to less stable forms, and ultimately its loss to the environment. Conclusions Soil total C in the SFRW summed up to 23.01 Tg (teragrams) in the upper 30 cm. Of this total, reca lcitrant forms of C accounted for more than 75%, suggesting that the majority of C stored in the soils of this watershed could potentially remain stable for decades to thousands of years. If these estimates were regionalized, about 963 Tg of C would be stored in the first 30 cm of soils across the state of Florida, of which about 911 Tg would be in recalcitrant forms. And given the vast presence of wetlands and associated Histosols in the south of the state, these estimates are very conservative, as our res ults highlighted the importance of these soils to store C in hydrology controlled environments like Florida. Independent validation of the upscaled models indicated preference for LK for all SOC properties except RC, which was probably a consequence of the lack of strong predictive models of SOC properties based on environmental variables. These results do not agree with the results typically found in similar studies, where RK performs better than ordinary kriging alone. And this relates to the fact that, a lso contrary to many investigations, correlations between SOC properties and environmental properties in the SFRW were weak. However, in spite of the moderate to low quality of the regression models of SOC properties, they consistently selected environment al properties related to hydrology (e.g., wetlands, depth to water table, and saturated
169 hydraulic conductivity); thus, they were of value emphasizing that soil C patterns in the SFRW were influenced by hydrologic patterns. Some of the soil landscape relati onships in the SFRW were captured by our models, however there is still much potential to study soil landscape interactions in Florida (and by extension, in the subtropical southeastern US) to isolate the causes of soil C (de)stabilization, accretion, and depletion, aiming to promote its longterm storage. In the context of current climate change trends, it is essential to assess how much soil C is stored in large mixeduse regions that is sensitive to these changes, as well as how much C can be potentially lost, or transformed. Since SOC fractions are more costly and laborious to measure, the TC map could offer a good indication of the main trends and most important hotspots of C, as TC was strongly correlated with measured SOC fractions. In other words, the high correlation among TC and SOC fractions, and consequent high resemblance among upscaled maps, imply that measuring one of the SOC fractions in the SFRW will result in at least some redundant information with the other fractions. This suggests that TC could be used as an alternative indicator in biogeochemical studies in lieu of more costly and laborious to measure SOC fractions. Our assessment in the SFRW can provide some guidelines to estimate soil C pools in larger regions, including the state of Fl orida, and the southeastern US. First, the importance of spatial autocorrelation, in other words local soil C variability, to upscale SOC properties in the SFRW was demonstrated when comparing among upscaling methods. Second, it became apparent that soil l andscape correlations based on available data were not strong to derive highquality prediction maps of SOC properties, as compared to other regions, where landscape gradients (e.g., topography, or parent material) control the variation of SOC. Finally, if correlations between measured SOC properties are strong, the same spatial trends can be
170 expected if the dominant driver of spatial variation is the auto dependence (i.e., spatial autocorrelation) of the SOC property.
171 Table 6 1. Descriptive statistics of measured soil organic carbon (C) properties. Statistic Whole set Training Validation Whole set Training Validation TC (kg m 2 ) LogTC (log kg m 2 ) Mean 6.26 6.96 4.43 0.6719 0.7012 0.5952 Std. error of mean 0.68 0.92 0.39 0.0236 0.0297 0.0330 Median 4.57 4.77 4.06 0.6598 0.6789 0.6090 Std. deviation 8.04 9.25 2.45 0.2808 0.3002 0.2064 Coeff. of variation 128.46 132.93 55.28 41.79 42.81 34.67 Skewness 5.43 4.74 2.02 1.19 1.15 0.43 Kurtosis 33.84 25.06 5.10 3.35 3.01 0.32 Range 62.71 62.71 11.49 1.7374 1.7374 0.9213 Minimum 1.17 1.17 1.56 0.0679 0.0679 0.1943 Maximum 63.88 63.88 13.05 1.8053 1.8053 1.1156 RC (kg m 2 ) LogRC (log kg m 2 ) Mean 4.67 5.25 3.14 0.5096 0.5408 0.4282 Std. error of mean 0.58 0.79 0.33 0.0271 0.0341 0.0379 Median 3.30 3.37 2.85 0.5181 0.5273 0.4547 Std. deviation 6.91 7.96 2.06 0.3214 0.3444 0.2365 Coeff. of variation 148.12 151.58 65.80 63.06 63.68 55.22 Skewness 5.47 4.77 2.28 0.90 0.82 0.49 Kurtosis 34.16 25.22 6.40 2.75 2.49 0.30 Range 54.01 54.01 10.24 2.0283 2.0283 1.0508 Minimum 0.51 0.51 1.00 0.2918 0.2918 0.0001 Maximum 54.52 54.52 11.24 1.7366 1.7366 1.0509 HC (kg m 2 ) LogHC (log kg m 2 ) Mean 1.59 1.71 1.29 0.1011 0.1281 0.0305 Std. error of mean 0.11 0.15 0.10 0.0270 0.0298 0.0576 Median 1.28 1.30 1.25 0.1086 0.1150 0.0982 Std. deviation 1.33 1.51 0.64 0.3202 0.3013 0.3597 Coeff. of variation 83.86 88.29 49.41 316.67 235.14 1180.72 Skewness 4.74 4.39 0.95 1.54 0.30 3.49 Kurtosis 34.71 28.27 2.23 8.92 2.11 17.11 Range 12.68 12.59 3.39 2.8889 2.0488 2.3177 Minimum 0.02 0.11 0.02 1.7851 0.9450 1.7851 Maximum 12.70 12.70 3.41 1.1038 1.1038 0.5326
172 Table 6 1. Continued. Statistic Whole set Training Validation Whole set Training Validation SC (kg m 2 ) LogSC (log kg m 2 ) Mean 0.34 0.37 0.29 0.5320 0.5154 0.5756 Std. error of mean 0.02 0.03 0.02 0.0189 0.0238 0.0269 Median 0.29 0.31 0.25 0.5404 0.5110 0.6025 Std. deviation 0.27 0.31 0.12 0.2241 0.2408 0.1681 Coeff. of variation 79.33 84.81 41.04 42.13 46.73 29.21 Skewness 5.38 4.90 0.96 0.77 0.71 0.36 Kurtosis 41.16 32.71 0.12 2.05 1.89 0.85 Range 2.61 2.61 0.43 1.4737 1.4737 0.5939 Minimum 0.09 0.09 0.15 1.0426 1.0426 0.8288 Maximum 2.70 2.70 0.58 0.4311 0.4311 0.2349 MC (kg m 2 ) LogMC (log kg m 2 ) Mean 0.05 0.05 0.04 1.4157 1.3881 1.4881 Std. error of mean 0.00 0.00 0.00 0.0228 0.0278 0.0370 Median 0.04 0.04 0.03 1.4047 1.3854 1.5047 Std. deviation 0.04 0.04 0.02 0.2707 0.2804 0.2313 Coeff. of variation 80.85 82.80 60.17 19.12 20.20 15.54 Skewness 3.53 3.37 1.63 0.30 0.21 0.33 Kurtosis 18.45 15.89 2.26 0.69 0.78 0.13 Range 0.30 0.30 0.09 1.6247 1.6247 0.9907 Minimum 0.01 0.01 0.01 2.1321 2.1321 1.9801 Maximum 0.31 0.31 0.10 0.5074 0.5074 0.9894 Abbreviations: HC = hydrolyzable organic C ; LogHC = log10 of hydrolyzable organic C ; LogMC = log10 of mineralizable organic C ; LogRC = log10 of recalcitrant organic C ; LogSC = log10 of hot water soluble organic C ; LogTC = log10 of total organic C ; MC = mineralizable organic C ; RC = recalcitrant organic C ; SC = hot water soluble organic C ; TC = total organic C
173 Table 6 2. Comparative results of the three geostatistical methods used to model soil organic carbon (C) properties at 030 cm. Property Average SOC (kg m 2 ) 1 RMSEt (kg m 2 ) 1 RMSEv (kg m 2 ) 1 LK R K/SMLR RK/RT LK RK/SMLR RK/RT LK RK/SMLR RK/RT TC 6.41 7.07 7.73 5.16 9.65 11.23 3.34 6.96 9.99 RC 4.82 5.63 6.07 4.74 8.32 6.30 2.85 6.36 2.57 HC 1.73 2.75 2.73 1.14 1.94 1.59 0.73 2.49 1.73 SC 0.36 12.74 1.49 0.10 1.43 1.16 0.14 10.88 1.19 MC 0.05 13.37 1.25 0.02 1.70 1.31 0.02 11.43 1.22 Abbreviations: HC = hydrolyzable organic C ; LK = lognormal kriging; MC = mineralizable organic C; RC = recalcitrant organic C ; RK/SMLR = regression kriging using stepwise multiple linear regression to map the globa l s patial trend; RK/RT = regression kriging using regression tree to map the global spatial trend ; RMSEt = r oot mean square error calculated on the training set ; RMSEv = r oot mean square error calculated on the validation set ; SC = hot water soluble organi c C ; TC = total organic C. 1 Results obtained from the best models are shown in italics.
174 Table 6 3. Stepwise multiple linear regression (SMLR) models, and variables selected by regression tree (RT) models of the global trend of soil organic carbon (C) pr opertie s. Property Stepwise multiple linear regression Regression tree Model Rt 2 Rv 2 Selected v ariables Rt 2 Rv 2 TC 0.69 + 0.41 WETLAND* 0.24 DRNEXC 0.12 PLEISTOCENE 0.43 0.12 LULC*, DEPWATTBL, EVGEOCL, DR N GEOUNIT SLOPE(3x3), ETMVEGIND(7X7), CTI( 7x7), MAP 0.69 0.18 RC 0.53 + 0.47 WETLAND* 0.33 DRNEXC 0.15 PLEISTOCENE 0.47 0.11 DEPWATTBL 0.20 0.00 HC 0.65 + 0.36 WETLAND* + 0.003 ETMB5(3x3) 0.002 DEPWATTBL 0.07 ETMPC6(3x3) + 0.03 ETMPC6 0.31 0.01 DEPWATTBL*, LULC 0.22 0.05 SC 0.50 0.002 KSAT* + 0.20 WETLAND + 0.14 HYDRIC 0.43 0.14 HYDGROUP 0.21 0.13 MC 2.06 0.002 KSAT* + 0.01 ETMPC2(7x7) + 0.01 CTI 0.14 FOREST + 0.13 HYDGEODEP 0.42 0.05 CLAY 0.18 0.01 Abbreviations: CLAY = clay content in th e soil; CTI = compound topographi c index; CTI(7x7) = compound topographic index averaged within a 7x7 pixel window; DEPWATTBL = depth to water table; DRNEXC = excessively drained soil; DR N = soil drainage class; ETMB5(3x3) = reflectance of band 5 (short wave infrared) of Landsat Enhanced Thematic Mapper Plus (ETM+) averaged within a 3x3 pixel window; ETMPC2(7x7) = principal component (PC) 2 derived from ETM+ averaged within a 7x7 pixel window; ETMPC6 = PC 6 derived from ETM+; ETMPC6(3x3) = PC 6 derived from ETM+ averaged within a 3x3 pixel window; ETMVEGIND(7x7) = vegetation index (infraredred difference ) derived from ETM+ averaged within a 7x7 pixel window; GEOUNIT = geologic unit ; HC = hydrolyzable organic C ; HYDGEODEP = hydrogeologic class (coastal deposits); HYDGROUP = soil hydrologic group; HYDRIC = hydric soil; KSAT = soil saturated hydraulic conductivity; LULC = land use/land cover; MAP = mean annual precipitation from 2003 to 2005; MC = m ineralizable organic C; PLEISTOCENE = geologic unit originated during the Pleistocene; RC = recalcitrant organic C ; SC = hot water soluble organic C ; SLOPE(3x3) = slope averaged within a 3x3 pixel window; Rt 2 = coefficient of determination calculated using the training set; Rv 2 = coefficient of determination calculated using the validation set; TC = total organic C. Most important variables in the model. Variable importance was based on the standardized coefficients (not reported) in SMLR, and on an index of relative importance reported by the software in RT.
175 Table 6 4. Semivariogram parameters of the fitted exponential models of the soil organic carbon (C) propert ies. Property Lag options Semivariogram parameters Size (m) Number Nugget effect [ (log kg m 2 ) 2 ] Sill [ (log kg m 2 ) 2 ] Effective range (m) Nugget/sill (%) LogTC 2200 17 0.0159 0.0879 11,579 18.14 LogRC 2200 17 0.0260 0.1162 11,700 22.38 LogRC 1 2200 17 0.0132 0.0920 5039 14.35 LogHC 2150 18 0.0363 0.0923 12,053 39.33 LogSC 2150 18 0.0070 0.0540 9664 12.96 LogMC 2150 18 0.0138 0.0738 10,293 18.70 Abbreviations: LogHC = log10 of hydrolyzable organic C; LogMC = log10 of mineralizable organic C; LogRC = log10 of recalcitrant organic C ; LogSC = log10 of hot water soluble organic C; LogTC = log10 of total organic C. 1 Residual LogRC from the global trend model obtained with regression tree
176 Figure 61. Sampling design, and elevation in the Santa Fe River watershed (SFRW), Florida.
177 A B C D Figure 62. Output maps of estimated soil organic carbon (C) properties: A) t otal organic C (TC) modeled by lognormal kriging (LK) (from Chapter 5); B ) recalcitrant organic C (RC) modeled by regression kriging using regression tree to map the global spatial trend; C ) hydrolyzable organic C (HC) modeled by LK; D ) hot water soluble orga nic C (SC) modeled by LK; and E ) mineralizable organic C (MC) mod eled by LK.
178 E Figure 62. Continued.
179 Figure 63. Regression tree of t he logtransformed recalcitrant organic carbon (LogRC) global trend model Abbreviations: Avg = average of LogRC observations in the node in log kg m2; DEPWATTBL = depth to water ta ble in centimeters ; N = number of observations in the node; STD = standard deviation of LogRC observations in the tree node in log kg m2.
180 CHAPTER 7 INFLUENCE OF GRAIN, EXTENT, AND GEOGRAPHIC REGION ON SOIL CARBON MODELS IN FLORIDA, U SA Summary Isolating the causes of spatial variability of soil total carbon (TC) has been a challenge in Florida, where distinct environmental conditions (e.g., karst terrain, leveled topography, and high water table) generated unique spatial distribution patterns of TC. Scal ing of TC may be influenced by grain (spatial resolution), extent (size) of study area and inherent variability of TC and environmental properties, and geographic region. Yet it is unknown how and to what extent these major scaling parameters impact TC mod eling in the southeastern U.S. Thus, we evaluated the influence of grain, extent, and geographic region on the composition and accuracy of TC models. We collected soil samples, and measured TC to 1 m in Florida (FL; 150,000 km2; 1193 samples), and at two nested study areas: the Santa Fe River watershed (SFRW; 3585 km2; 141 samples) and the University of Florida Beef Cattle Station (BCS; 5.58 km2; 152 samples). We evaluated the effect of grain in the SFRW, and observed a preferable selection of hydrologic pr operties in TC models regardless of grain, and an overall decrease in model accuracy, and transferability at higher grains, due to the smoothed variability of environmental predictors. In terms of extent, the FL TC model had highest accuracy, followed by t he SFRW, while the BCS model performed poorly. The TC models transferred well between FL and the SFRW, but not to, and from the BCS. Finally, geographic region influenced the spatial distribution of TC and model transferability, with better transferability of regional (SFRW) and global (FL) models when compared to fieldscale models. Our results show that TC models are grain, extent and regionspecific, but also indicate that a multi scale TC model is feasible for FL since some transferability was observ ed.
181 Introduction Soil is a very heterogeneous entity, as it is composed of organic and inorganic materials that result from direct interactions with the atmosphere, lithosphere, hydrosphere, and biosphere. Soil supports all biotic activity within the terre strial ecosystems and influences directly the biogeochemical cycle of principal nutrient elements (such as C, N, P, and S) and water (Lal et al., 1998; Jacobson et al., 2000). Carbon (C) is directly linked to the physical, chemical and biological processes in the soil, and through soil organic matter (SOM) contributes to the overall soil quality through the regulation of nutrients and toxic substances, water holding capacity, soil structure and erodibility, vegetation communities, microbial activity, pedodi versity, and sustainability, to name a few (Kay, 1998; Ernst, 2004). Furthermore, soil has the potential to sequester atmospheric carbon dioxide and mitigate global warming (Follett et al., 2000; Smith and Heath, 2004). It has been estimated that the total global soil C pool including wetlands and permafrost (3250 Pg C) is about five times the biotic pool (650 Pg C) and about four times the atmospheric pool (780 Pg C) (Field et al., 2007). The geographic distribution of this global soil C resource varies according to numerous soil forming factors, including other soil properties, climate, topography, parent material, vegetation, land use, and human influence (Jenny, 1941; Post et al., 1982; Olson et al., 1985; Burke et al., 1989; Jobbgy and Jackson, 2000; P ost and Kwon, 2000; Guo and Gifford, 2002; McBratney et al., 2003; Tan et al., 2004), which operate at different spatial and temporal scales. Detailed, spatially explicit assessment of soil total C (TC) at regional scale is needed to direct sustainable man agement measures and ecosystem service valuation. Rivero et al. (2007; 2009) demonstrated use of remote sensing imagery to indirectly infer on soil properties using statistical and geostatistical methods. Similar approaches can be used to develop TC infere ntial prediction models.
182 It is well known that spatial patterns of environmental properties are influenced by scale parameters, including the grain (i.e., spatial resolution, or pixel size), extent (i.e., size) of the study area, and geographic region (e.g., Meentemeyer and Box, 1987; Woodcock and Strahler, 1987; Turner et al., 1989). Furthermore, it has been indicated that these spatial environmental patterns are governed by processes acting at multiple spatial and temporal levels, generating ranges of str uctural scale dependence (Mandelbrot, 1983) within hierarchical spatial variations (ONeill et al., 1986). Soil properties exhibit in part such scaling behaviors (Burrough, 1983), which should be acknowledged in their spatial modeling over large regions (R yan et al., 2000). For some soil properties, it has been shown that their relationship with other soil and environmental properties changes as a function of scale (Bourennane et al., 2003; Lark, 2005; Powers and Schlesinger, 2002; Corstanje et al., 2007; P ringle and Lark, 2007; Martin and Bolstad, 2009). It can be expected based on these examples that the predictive capacity of environmental factors in relation to soil properties also changes as a function of scale. This would imply that the applicability o f derived predictive models is restricted to a range of scales that is specific to the study region, and soil and environmental properties involved. For example, Foody et al. (2003) derived predictive models of forest biomass from Landsat Thematic Mapper i magery independently at three sites (Brazil, Malaysia, and Thailand) and then applied the derived models at the other two sites, respectively. They observed differences in the strength, and direction (i.e., sign) of regression parameters, and correlation c oefficients between predicted and observed forest biomass. Similar studies are available in the literature for other environmental properties (e.g., Bian and Walsh, 1993; Townsend et al., 2003; Mykr et al., 2008). However, for multivariate spatial predict ive models of TC, the influence of grain, extent, and geographic region has still to be determined.
183 The overarching aim of this investigation was to elucidate on the choice of scale parameters to model TC over large regions. Specifically, our objectives we re to evaluate: ( i ) the inf luence of grain on TC models; (ii ) the transferabilit y of TC models across grains; (iii ) the influence of extent on TC models; (iv ) the transferability of TC models across extents; (v ) the influence of geographic region on the di stribution of TC; and (vi ) the transferability of TC models across regions within Florida, USA. We hypothesized that the TC models are sensitive to all scale parameters tes ted (objectives (i ) to ( iv ), and ( vi )); in other words, we hypothesized that the TC models are scalespecific relative to grain, extent, and region, respectively. Our overall goal was to identify the range of scales at which TC models are less sensitive to scale parameters, and vice versa, identify critical scales at which TC models are s ensitive to scale parameters. This eludes to development of multi scale TC models that explicitly take spatial scale into account. In addition, we hypothesized that the spatial distribution of TC was influenced by the geographic regio n within the state (ob jective (v )), and expected to find some correlation of TC with the presence of organic soils, and wetlands in the area. Materials and Methods Study Areas, Sampling Designs, and Laboratory Methods Our study was conducted at three nested study areas in the s tate of Florida, USA, including the whole state (FL; ~150,000 km2), the Santa Fe River watershed (SFRW; ~3585 km2), spanning across 9 counties in the northcentral portion of FL, and the University of Florida Beef Cattle Station (BCS; ~5.58 km2), located i n the northern part of Alachua County in the central portion of the SFRW (Figure 7 1).
184 The state of Florida The state of Florida is located in the subtropical climatic zone between latitudes 24.55 and 31.00 N, and longitudes 80.03 and 87.63 W. Mean annual precipitation is 1373 mm, and mean annual temperature is 22.3 oC (National Climatic Data Center, 2008). Dominant soils include Spodosols (32%), followed by Entisols (22%), Ultisols (19%), Alfisols (13%) and Histosols (11%) (Natural Resources Conservation Service, 2006). Land use/land cover (LULC) consists mainly of wetlands (28%), pinelands (18%), and urban and barren lands (15%), whereas agriculture, rangelands, and improved pasture occupy 9, 9, and 8% of the state, respectively (Florida Fish and Wildlife Conservation Commission, 2003a). The topography is relatively flat, with elevations below 114 m, and 0 to 5% slopes across most of the state, except along the Cody Scarp and Florida Panhandle, where slopes can reach 19% (United States Geological Survey, 1 984). Most important geologic formations include in the central west undifferentiated geology from the Quaternary period, Ocala Limestone, and Cypresshead, in the northwest Citronelle from the Pliocene, and in the south marine sediments of the Tertiary and Quaternary, coupled with Holocene sediments, and Miami Limestone (Florida Department of Environmental Protection, 1998). The FL extent (Figure 71) was considered to be the common area among the available Geographic Information Systems (GIS) layers, including soil survey data (Natural Resources Conservation Service, 2009), topographic data (United States Geological Survey, 1999), LULC data (Florida Fish and Wildlife Conservation Commission, 2003a), and Landsat Enhanced Thematic Mapper Plus (ETM+) imagery ( Florida Fish and Wildlife Conservation Commission, 2003c). These GIS data are described in the Preparation of environmental Geographic Information System layers section in the Materials and Methods
185 Field sampling in FL took place from 1965 to 1996 as part of the Florida Soil Characterization Project conducted jointly by the Soil and Water Science Department, University of Florida, and the Natural Resources Conservation Service, U.S. Department of Agriculture. Data from this project are available online (Florida Soil Characterization Database, 2009). Soil information retrieved from archived documents was digitized into a spreadsheet, and contained the taxonomic description of 8269 soil horizons, pertaining to 1288 soil profiles sampled throughout the sta te. Detailed physical and chemical characterization was available for 7875 soil horizons, 7716 of which included soil organic carbon (SOC) measurement s, pertaining to 1252 profiles. Details about the sampling design can be found in Chapter 4. Briefly, sampling sites were chosen ad hoc by soil survey crews within each county with the help of aerial photographs, map unit delineations, and supporting maps. Site locations were georeferenced based on available geographic coordinates (latitude and longitude), and reconstruction of coordinates using a combination of field notes, recorded Public Land Survey System (PLSS), and soil survey maps and site locations overplotted on orthophotos. At each site, soils were routinely described and sampled by horizon to 2 m or more. The maximum depth sampled in the field was 381 cm. Of the whole dataset, 1193 georeferenced sites were selected that had available SOC, and were located within the FL boundary (Figure 71). In mineral horizons, SOC was measured in the laboratory using the Walkley Black modified acid dichromate method (WB; Walkley and Black, 1934; Natural Resources Conservation Service, 1996). In organic horizons, SOM was measured by loss on ignition (LOI), and SOC was calculated by multiplying SOM by the van Bemmelen factor (0.58) (Natural
186 Resources Conservation Service, 1996). Air dried and sieved (2 mm) samples were used for analysis. The Santa Fe River w atershed The SFRW is located between latitudes 29.63 and 30.21 N and longitudes 82.88 and 82.01 W, in northcentral Florida (Figure 7 1). The climate is a little drier (1224 mm), and a little cooler (20.5 oC) than the FL annual average (National Climatic Data Center, 2008). Dominant soil orders of the watershed include Ultisols (47%), Spodosols (27%), and Entisols (17%). Histosols, Inceptisols, and Alfisols occupy the remaining areas. Most frequent soil series are: Sapelo, Blanton, Ocilla, Mascotte, and Foxworth (Natural Resources Conse rvation Service, 2009). Land use/land cover consists mainly of pine plantations (30% ), wetlands (14%), improved pasture (13%), rangeland (13%), and upland forest (13%) (Florida Fish and Wildlife Conservation Commission, 2003a). Urban and barren areas occupy about 11% of the watershed, and crops around 5%. Wetlands and a few lakes are wide spread in the watershed, while urban areas are sparsely distributed. The topography consists of leveled to slightly undulating slopes of less than 5% in almost the whole watershed, with moderate slopes of 512% occurring along the Cody Scarp, which cuts th e watershed in about half in the northwest southeast direction. Elevations range from around 1.5 m to 92 m above mean sea level (United States Geological Survey, 1999), and the geology is dominated by Ocala Limestone and undifferentiated geology in the wes t, Coosawhatchie Formation from the Miocene in the center, and undifferentiated sediments from the Pliocene and Pleistocene in the east (Florida Department of Environmental Protection, 1998). Field sampling was described in Chapter 2. Briefly, 141 sampling sites were chosen using a stratified random design based on land use and soil order combinations. At each site, composite
187 soil samples were collected within a 2 m radius at fixed depths (030, 3060, 60120, and 120180 cm) using an auger. The sampling de sign is shown in Figure 71. Collected samples were air dried, sieved (2 mm), and ball milled. Total C was measured by high temperature combustion (HTC) on a FlashEA 1112 Elemental Analyzer (Thermo Electron Corp., Waltham, MA). The University of Florida Be ef Cattle Station The BCS is located within the SFRW between latitudes 29.91 and 29.94 N and longitudes 82.47 and 82.51 W (Figure 71). Climate is equivalent to the SFRW, but soils are vastly dominated by Ultisols (78%). Other soil orders include Entisols (13%), Inceptisols (5%), Spodosols (2%), and Alfisols (1%). L and use/land cover consists of improved pasture (39%), wetlands (25%), and rangelands (13%), with the remaining areas classified as upland forest (10%), agriculture (8%), and pineland (5%) (Flori da Fish and Wildlife Conservation Commission, 2003a). The topography consists of flat to gentle slopes in the northern portion of the BCS, with undulating slopes occurring in the center and south, and highest values found in the easternmost areas. Slopes v ary from 0 to 21%, while the elevation spans from 13 to 43 m above sea level (United States Geological Survey, 1999). The parent material of the BCS was formed in the Miocene period, and is constricted within the Coosawhatchie Formation (Florida Department of Environmental Protection, 1998). Similar to the SFRW, sampling sites were located according to a random design stratified by land use and soil order. A total of 152 sites were visited (Figure 71), and samples were collected using an auger at the same depth intervals adopted in the SFRW (030, 3060, 60120, and 120180 cm, respectively). Soil samples were air dried, and sieved through a 2mm mesh. Soil organic matter was measured using LOI, and SOC was calculated using the van Bemmelen factor (Natural Resources Conservation Service, 1996).
188 Conversion of Soil Organic Carbon Measurements to Soil Total C arbon Soil C was measured using different laboratory methods in the three study areas. Thus, we calculated conversion factors based on pedotransfer functio ns (PTFs) to convert SOC measured by WB, and LOI, respectively, to equivalent TC by HTC. Complete PTFs are being developed by our research team (Myers et al., 2009), which will include not only soil C, but also bulk density, and maybe other soil properties At this point, only simple conversion factors were applied to standardize soi l C to a common unit, i.e., TC. Two PTFs were derived for this study using data from the Florida Soil Characterization database. Simple linear regressions where derived where th e dependent variable was TC measured by HTC, and the independent variable was either SOC measured by WB (mineral horizons in the database), or SOC measured by LOI (organic horizons). Conversion factors were obtained by making the regression lines cross the origin. The conversion models are shown in Equation 71 for WB SOC, and Equation 72 for LOI SOC, and explained 94%, and 97% of the variability of WB SOC, and LOI SOC, respectively. WB HTCSOC TC 98 0 (7 1) LOI HTCSOC TC 90 0 (7 2) W here: HTCTC = TC measured by HTC in %; WBSOC = SOC measured by WB in %; LOISOC = S OC measured by LOI in %. Calculation of P rofile Soil Total Carbon at 0 100 cm In order to compile a seamless dataset using data from t he three study areas it was also necessary to establish a reference depth to develop the TC models, which was chosen to be 0100 cm. First, TC at each horizon, or depth interval was converted to a common unit (%, i.e., dag kg1) in all study areas. Then, a depthweighted average TC was calculated in the 0 100cm soil profile using Equation 73.
189 n i i n i i iD D TC TC1 1 (7 3) W here: TC = soil total carbon at 0 100 cm in %; TCi = soil total carbon at the i th horizon, or depth interval in %; Di = depth of the portion of the i th horizon, or depth interval constrained within 0 100 cm, in cm; n = number of horizons, or depth intervals containing at least a portion within 0 100 cm. Regression M odeling of Soil Total C arbon All hypotheses were tested using stepwise multiple linear regression (SMLR), with a F probability of including, and removing variables of 0.10. Stepwise regression models of TC were derived based on the CLORPT model of soil formation proposed by Jenny (1941), later revised by McBratney et al. (20 03) into the SCORPAN model, which are shown in Equations 74, and 75, respectively. Because TC had a positively skewed frequency distribution at the three study areas, we transformed TC using natural log to approximate a Gaussian distribution. Thus, the S MLR models were derived using lntransformed of TC (LnTC). The SMLR models of TC were derived as a function of 24 collocated environmental properties represented as individual GIS layers, including soil survey data (9 layers), digital elevation model (DEM) and topographic derivatives (9 layers), LULC data (1 layer), and reflectance data and derivatives (13 layers) obtained from a mosaic of Landsat ETM+ images covering FL (Table 7 1), according to Equation 76. Categorical variables (e.g., LULC) were convert ed to category indicator (i.e., dummy) variables to be included in the models. Model accuracy was evaluated using the coefficient of determination (R2) in Equation 77, calculated using the training set (Rt 2), or the validation set (Rv 2), respectively. p,t r, o, cl, f C T (7 4) n a, p, r, o, c, s, f C T (7 5)
190 e y x, F y x TCp i i i 0 1, (7 6) W here: TC = soil total carbon; cl or c = climate; o = organisms, including human activity; r = relief; p = parent material; t or a = time or age; s = soil property; n = spatial position defined by the x and y coordinates and other spatial distance measures; 0 = model intercept; i = regression coefficient of the i th selected explanatory environmental factor; Fi = i th selected explanatory environmental factor; p = number of selected explanatory environmental factors with i = 1, 2, p; e = model residuals n i i n i iy y y y R1 2 1 2 2 (7 7) W here: y = predicted values; y = mean of observed values; y = observed values; n = number of observations with i = 1, 2, n. Influential ou tliers were identified by comparing Cooks distance with F ( p, np) at a 0.50 significance level, where p is the number of regression parameters, and n is the number of observations (Cook, 1977). Regression outliers were identified for the comparison of models among extents (FL, SFRW, and BCS), and regions, but not for the comparison among grains, which was conducted only within the SFRW. In this case, we wanted to assure TC models had exactly the same number of observations at all grains. All other assumpt ions of SMLR were verified, except for some collinearity among predictors, and heteroscedasticity in the residuals in some models, and correlation between residuals and observed TC, due to the remaining unexplained variability of TC. Preparation of soil to tal carbon data To be able to develop, and validate the TC models within the same extent, we separated the soil samples into a training set comprising about 70% of the samples, which was used to derive the models, and a validation set containing about 30% of the samples, used exclusively to validate the models. In the SFRW, and in the BCS, training and validation sets were separated randomly. In FL, we first stratified the samples by soil order, then randomly separated training (~
191 70%), and validation sampl es (~ 30%) within each strata. The total number of training, and validation samples was, respectively, 106, and 46 in the BCS, 102, and 39 in the SFRW, and 792, and 401 in FL. Note that the number of TC observations per unit area decreased with increasing extent: 27 samples km2 at the BCS, 0.04 samples km2 at the SFRW, and 0.008 samples km2 at FL. To evaluate the transferability of TC models to different regions, we split the FL dataset into 10 hydrologic units (Florida Department of Environmental Protec tion, 1997) comprising FL, which are shown in the background in Figure 72. This way, each hydrologic unit constituted one geographic region. Conversely, to evaluate the influence of extent on TC models, we split the FL dataset into 10 extent subsets in a way that each extent subset spanned across FL. In order to accomplish this, each extent subset was created by randomly drawing about 12 samples from each geographic region, and grouping then, so that each extent subset comprised about 119 observations di stributed across the FL extent. To confirm that we had an unbiased selection of samples within extent subsets, we tested the effect of the selection on mean TC using analysis of variance (ANOVA). Levenes test (Levene, 1960) indicated that variances were u nequal among extent subsets (pvalue = 0.002); thus, we used Welchs ANOVA (Welch, 1951) to compare mean TC. Preparation of environmental Geographic Information System layers The first step in preparing the GIS layers for modeling was to assemble the envir onmental data. We collected GIS data from various sources (Table 7 1), and all sources, except for the LULC layer, provided data in separate layers. Soil survey data included 70 layers, DEM included 18 rasters, and Landsat ETM+ data included 14 scenes to c over FL. Integrating the soil survey vector data, as well as the DEM data only required to merge the separate layers. Integrating the Landsat ETM+ scenes required a two step process. First, we matched the histograms of all
192 scenes to the histogram of the cl earest scene using the Histogram Matching tool of ERDAS Imagine (Leica Geosystems Geospatial Imaging, LLC, Norcross, GA). Then we mosaicked the 14 scenes covering the whole state of FL. For the whole state of FL the 9 layers of soil properties were convert ed from vector to raster format to overlay with the other rasters. We used 30 m pixels, because that was the finest grain common to all of the original GIS layers. Excluded areas from the state of Florida included water bodies identified both in soil surve y layers, and in the LULC layer, areas where the chosen environmental properties were not available in one of the layers, and coastal areas, as the Florida boundary was slightly different among GIS data sources. Thus, the most restrictive layer was the soi l survey data layer, since major portions of the Everglades are still not surveyed, and some of the soil properties were not available in all surveyed areas. Finally, we clipped all GIS layers within the defined FL boundary and used 30m pixel length to pr epare the finest grain GIS layers of all environmental properties. Modeling of TC within the SFRW, and the BCS used the same layers of environmental properties however different TC samples. Evaluating the influence of grain The influence of grain on TC models was tested in the SFRW. The environmental properties listed in Table 7 1 were resampled (or aggregated) to seven grains (i.e., pixel sizes), with 30, 60, 120, 240, 480, 960, and 1920 m pixel lengths, respectively. They were resampled using bilinear convolution (i.e., averaging within a 2x2 pixel window) for continuous properties, and nearest neighbor for categorical properties. And because the grain size was doubled at every resample level, bilinear convolution assured data consistency as the new value of the property matched the average of the four cells included in the resample window. To integrate TC
193 observations with environmental properties, the pixel values of the environmental layers were extracted to the TC point observations. At each grain, a S MLR model of TC was derived using the TC training set and the environmental GIS layers at that specific grain. Figure 7 3 shows an overview of the framework adopted to test grains, extents, and regions. Our first objective was to examine the influence of t he grain on the relationship between TC and environmental properties. To do this, we compared the explanatory variables selected by the TC models among different grains. Our second objective was to test the influence of the grain on the accuracy of TC mode ls, to determine whether a model derived at one specific grain could be applied at (i.e., transferred to) another grain. In order to do this, we evaluated the TC models derived at each specific grain at all other grains using the respective independent val idation set. In other words, we applied the TC regression model derived at one specific grain using the environmental properties resampled at another grain to assess the accuracy of the model accounting for change of resolution in the explanatory variables As a reference, we also validated the models at the same grain at which they were derived using the independent validation set. Evaluating the influence of extent At each extent (BCS, SFRW, and FL), SMLR models of TC were derived using the respective training sets (compare Figure 7 3 for an overview of the methodology). At each extent, the prediction quality of the model was assessed using the independent validation set at the same extent. Similarly to evaluating the influence of grain, we compared the selection of explanatory environmental factors among TC models derived at different extents. In addition, we also wanted to assess the transferability of the TC models among extents. Thus, we evaluated the models derived using the training set at one specifi c extent on the whole sets at the other two extents. Since the FL TC observation set was much larger than the ones at the BCS and SFRW,
194 we applied a jackknife evaluation scheme using the 10 extent subsets spread across FL (compare the Preparation of soil total carbon data section in the Materials and Methods ) to avoid the influence of uneven sample sizes on model validation/evaluation. Evaluating the influence of geographic regions The influence of geographic region (each region corresponding to a hydrologic unit) on the distribution of TC was tested using Welchs ANOVA, followed by D unnetts T3 post hoc test (Dunnett, 1980) to identify pair wise homogenous TC means among regions Levenes test indicated that variances were unequal among regions (pvalue ~ 0), suggesting the use of the above mentioned methods. To evaluate the transferability of TC models to geographic regions in FL, we applied the TC models derived at the three extents (BCS, SFRW, and FL) (Influence of Extent on Soil Total Carbon Regress ion Models section in the Results and Discussion, Table 7 4) individually at the 10 geographic regions in FL, presented in the Preparation of soil total carbon d ata section in the Materials and Methods (Figure 7 3). Goodness of fit statistics were p rovided to compare the accuracy of the TC models among geographic regions. Results and Discussion Descriptive Statistics The FL dataset had the largest range (0.0354.59%), and the most variable TC among the three extents (Table 7 2), which was expected si nce the FL dataset spreads across a larger area, encompassing the other two extents. For the same reason the SFRW had a larger range (0.1217.03%), and more variable TC than the BCS (0.488.65%). Natural log brought the TC datasets closer to a normal distr ibution, and also approximated the distributions of the three extents. Overall, training sets encompassed the range of the validation sets, and their frequency
195 distributions had very similar properties, except for the higher dispersion (i.e., higher standa rd deviation) of the training set in the SFRW. Influence of Grain on Soil Total Carbon Regression Models Grain influenced the selection of explanatory environmental factors by the SMLR models of TC, and their prediction quality and transferability. The SML R TC models were significant (p value ~ 0) at all grains, and their accuracy, based on the Rt 2, decreased with an increase in grain (Table 7 3). Independent validation of the models at the same grain in which they were derived indicated a preference for th e 60 m grain (Rv 2 = 0.20), followed by 1920 m (Rv 2 = 0.14). At all other grains, however, the SMLR models could not explain 10% of the variability of TC in validation mode. Selection of explanatory environmental factors by the SMLR models of TC also showed some patterns relative to grain (Table 7 3). First, hydrologic properties were selected by the models at all grains. Up to 480m grain, soil available water capacity (AWC) was selected, whereas soil drainage class (DRN) was selected for coarser grains. Mo reover, according to the standardized coefficients, AWC was the most important factor at 30, 60, 120, and 480 m, and tasseled cap 3 (TC3) was the most important factor at 240 m to related to LnTC. Even though TC3 is classified as a reflectance property, it is actually a wetness index (Huang et al., 2002), thus essentially also related to hydrology. The most important environmental factors selected at 960, and 1920 m, respectively, were ASP (northfacing slopes), and LULC (pinelands). All classes of environm ental properties were represented in at least 3 of the 7 grains and topographic properties were selected at all grains. Soil properties were selected at 30, 60, and 480 m and LULC was additionally selected at 1920 m. Reflectance properties were selected at all grains above 240m, inclusively, and also at 60 m. The second most important environmental factors selected at specific grains included: LULC (wetlands) at 30, and 60 m; elevation (ELEV)
196 at 120 m; Landsat ETM+ band 6 (B6) at 240 m; sand content (SAND) at 480 m; TC3 at 960 m; and DRN (excessively drained) at 1920 m. Selection of environmental properties to predict LnTC at fine grains could indicate short range spatial patterns related to the short range variability of TC. Accordingly, variables selected at coarse grains capture the long range variability of TC, associated with large scale spatial patterns of the corresponding environmental properties. The overall spatial trend of TC captured by the SMLR models at all grains up to 480 m was very similar ( Figure 75), and reflected in part the distribution of AWC. However, at grains exceeding 240 m some areas of high and low TC became less differentiable due to the smoothing caused by aggregating environmental data into larger pixels. For example, some hots pot areas of TC disappeared after 240 m, mainly those along the Santa Fe River (clearly observable crossing the watershed in the northeast southwest direction at the 30and 60m grains), and close to wetlands in the central north, and southeast. At 960 m, even though the model did not select AWC, the spatial distribution of TC was still showed some resemblance with smaller grains, suggesting that other explanatory variables (e.g., DRN, and TC3) acted as surrogates for AWC to depict the major hydrologic tre nds. Only at the coarsest grain (1920 m) the distribution of TC became so smooth that the major spatial trends associated with environmental landscape factors was no longer distinguishable. The overall agreement of output TC maps across grains indicated th at the selection of distinct environmental predictors at specific grains captured the same general spatial patterns of TC. Another effect of increasing grain was the loss of variability of TC in the output maps. At the finest grains up to 240 m, the output range of LnTC values was similar to the observed one. At grains of 480 m and higher, even though minimum output LnTC was close to the observed
197 one, high LnTC was lower than the maximum observed LnTC. Therefore, our results indicate that using grains highe r than 240 m may considerably underestimate total TC content in the SFRW. This has important implications to regional models of TC in Florida, as it suggests that some models derived using coarse grains (e.g., global circulation models derived at 1 km reso lution or more) can considerably underrepresent the variability of TC. Woodcock and Strahler (1987) observed an increase in the local variance within satellite images as the grain of the image approached the average size of spatial features in the scene; l ocal variance peaked at grains a little smaller than the size of spatial features in the scene, after which it decreased as the grain size exceeded the size of spatial features. If the decrease in LnTC variability observed as the grain increased at the SFR W could be explained by the same reason, then it is possible that the smallest grain tested of 30 m was already larger than the average size of spatial features in the watershed, thus only the decreasing trend in LnTC variability could be observed. In the SFRW, and in Florida in general, hydrologic patterns play an important role of controlling soil properties, and other environmental factors. The flat topography associated with elevations close to the sea level, and high annual precipitation creates widesp read areas of wetlands, where accumulation of TC is fostered by the relatively slower anaerobic decomposition of organic matter. This was confirmed by several of the TC models derived at multiple grains, where hydrologic variables, such as AWC, TC3 and wet land LULC were positively correlated with TC. A study in the Florida Everglades (Obeysekera and Rutchey, 1997) tested multiple grains from 20 to 1000 m to describe land cover variability extracted from SPOT imagery using spatial indices with the aim to ide ntify ideal scales to derive spatial models in the region. The authors observed an almost linear decrease in the diversity index when broadening the grain, indicating
198 loss of information as the grain increased (beyond 700 m tree islands virtually disappear ed from the images). Furthermore, they observed self similarity only at the range from 20 to 100m grain, suggesting the adoption of grains smaller than 100 m to model the Everglades. Our study indicates that the grain of 60 m conveys adequate environmenta l variability from multiple sources of variation to model TC. Transferability of Soil Total Carbon Regression Models across G rains The transferability of the TC models across grains is demonstrated in Figure 7 4. The overall pattern in Figure 7 4 indicates that the transferability of TC models decreased from finer to coarser grains up to 480 m, when it starts to increase. It was expected that TC models would be more transferable to finer grains. Because the environmental properties at each grain were resamp led from the previous (finer) grain, at each resample level the information from the finer grain was carried over to the coarser grain. As a consequence, when models derived at coarser grains were evaluated at finer grains, they were exposed to a more detailed (and thus, variable) version of the environmental properties included in the mode (and from which they originated in the first place). In the opposite direction, models derived at finer grains were underrepresented by the explanatory properties at coa rser grains, thus degrading their quality when evaluated at the coarser grains. The increasing transferability after 480 m may indicate a sensitivity of the models to longrange spatial patterns of TC captured only by coarser grains. At 1920 m, TC models w ere as accurate as when evaluated using 120 m grain, and it is possible that even coarser grains than 1920 m would provide good model transferability. A critical grain was observed at 60 m, where the Rv 2 was highest for the majority of models derived at th e other grains. We could not find a definitive explanation for this behavior, but speculate that resampling of the environmental properties to 60m grain smoothed some of the variability over a pixel size of 3600 m2 to infer on soil TC. In other words, the relationship of
199 the environmental properties selected by the SMLR models with TC w as optimized at the 60 m grain. Influence of E xtent on Soil Total C arbon R egression M odels Similarly to grain, extent influenced the selection of explanatory factors and accuracy of SMLR models of TC. All TC models were significant (pvalue ~ 0), and the most accurate models were derived in the SFRW (Rt 2 = 0.61), and FL (Rt 2 = 0.60). However, independent validation of the models in the respective extent was considerably better in FL (Rv 2 = 0.52) than in the other extents. In contrast, TC models derived in the BCS had the lowest accuracy (Rt 2 = 0.25; Rv 2 = 0.07). The state of Florida was larger than the other two areas, and encompassed a higher variability of TC and explanatory environmental factors than the SFRW and BCS. In addition, the number of observations used to derive the models in FL was 8 times larger than in the other areas (Table 7 2), which provided a more comprehensive sample of TC, and consequent broader represent ation of its correlations with the available environmental properties. Conversely, the BCS was 2 orders of magnitude smaller than the SFRW, and 4 orders smaller than FL. Thus, the variability of TC, and collocated environmental properties, was constrained by the limited size of the BCS, reflecting in the poorer results obtained by SMLR relative to the other areas. Increase of the R2 obtained for the SFRW model in this section relative to the one obtained using the 30m grain in the previous section ( Influe nce of Grain on Soil Total Carbon Regression Models ), was due to the removal of one outlier from the training set, and one outlier from the validation set. Although the Rt 2 slightly improved, a considerable increase in the Rv 2 was observed, suggesting the sensibility of TC model to influential samples. The reader should not be concerned about this difference between the models because the investigation among
200 grains was conducted independently from the investigation among extents, and the results of the two analyses are not compared in any occasion. The SMLR model of TC in FL selected 12 explanatory factors, compared to 7 factors in the SFRW, and 5 in the BCS. All classes of environmental properties (i.e., soil, hydrology, topography, LULC and reflectance) w ere represented in the models, at least in one extent, but the most important one was again hydrology. In the SFRW and BCS, AWC was the most important explanatory factor (highest standard coefficient), whereas Ultisols (ORD), and all other soil orders were the most important factors in FL. Vegetation properties (LULC and NDVI) were selected in all extents, but only one topographic property (north facing slopes) was selected, specifically in the SFRW. Interestingly, soil properties explained a great portion of the variability of TC in FL, with two soil texture properties (SAND and SILT) selected, as well as all indicator variables of soil orders (ORD). This suggests that the spatial distribution of TC resembles at least in part the spatial pattern of soils in FL. To gain a better understanding of the influence of soil orders on TC in FL, we used Dunnetts T3 post hoc test to compare TC among soil orders. Indeed, the test identified 3 significantly different groups at the 0.05 confidence level, in decreasing or der of TC content: Other soil orders (Histosols, Inceptisols, Mollisols, and Vertisols) > Spodosols > Alfisols = Ultisols = Entisols. This order of TC content is exactly what would be expected, given the characteristics of these soils, thus confirming th e correspondence between soil orders and TC. Transferability of Soil Total Carbon Regression Models across E xtents With respect to transferability across extents (Figure 7 6), TC models were, to a certain extent, transferable between the SFRW and FL. Model s evaluated in FL had higher accuracy than those evaluated in the SFRW, regardless of whether they were derived in FL or in the SFRW. Our jackknife evaluation of the SFRW and BCS models in FL produced, respectively,
201 the following Rv 2: 0.49 0.03, and 0.01 0.005. Random assignment of FL samples into extent subsets produced homogeneous groups according to Welchs test (pvalue = 0.281). This offered a good indication that our jackknife scheme to evaluate TC models at the FL extent was not biased by any one extent subset. Comparatively, the TC model derived in FL, and validated using jackknifing across the 10 extent groups in FL, produced a Rv 2 of 0.54 0.04. Validation outliers were identified and removed when evaluating the BCS model in FL (139 outliers), and the SFRW model in FL (87 outliers), and in the BCS (4 outliers). As discussed in the previous section, because FL has a greater variability of TC and environmental properties, it was expected that models produced at that extent would have higher accur acy than models implemented at smaller extent. Similarly, because a large enough range of TC and environmental properties was observed in the SFRW, this same reasoning can be applied to explain the good transferability of the SFRW model to FL. In contrast, the BCS was not large enough to capture the overall variability of TC, and environmental properties that would be found in the larger extents; thus, models derived, or evaluated in the BCS produced poor results. Influence of Geographic R egion on the D istr ibution of Soil Total C arbon The number of TC observations in each geographic region varied from 64 in the Kissimmee unit to 194 in the St. Johns unit, with an average of about 119 samples per geographic region. Mean TC by region varied from 0.40% in Apal achicola to 3.65% in Southern Florida. The spatial distribution of TC was significantly influenced by geographic region in FL (Table 7 5), according to Welchs ANOVA (pvalue ~ 0). Moreover, Dunnetts T3 identified significant differences among specific regions at the 0.05 confidence level (Table 7 5). Based on mean values, the overall trend of LnTC by hydrologic unit across FL was (Figure 77): high values in the south in the Everglades and around Lake Okeechobee, in the east coast, and in the west (PeaceWithlacoocheeManateeMyakka Rivers complex); medium values in the St. Johns
202 River basin, and in the Florida Panhandle in the Escambia Choctawhatchee basin; and lowest LnTC values in the southcentral area around the Kissimmee River, and in the north, mai nly in the Apalachicola, and Ochlocknee River basins. Southern Florida had the highest LnTC mean due to the vast presence of wetlands, promoting the formation of C rich organic soils, mainly Histosols (i.e., peat soils), and also because of the contribution of the Everglades Agricultural Area, as TC in agriculture was the second highest among land uses. High TC values were also found in the east, and west coastal regions, and are associated with the presence of Spodosols, which accumulate C in subsurface sp odic horizons by association with iron, and aluminum oxides ( De Coninck, 1980; Harris and Hollien, 2000). The heterogeneous distribution of Florida ecosystems offers another source of variability to explain the somewhat unsystematic spatial patterns of TC in relation to geographic region. For example, regional patterns of natural upland forests are associated with the relatively low TC contents in the north and central portions of FL in the Apalachicola, Suwannee and Altamaha St. Marys hydrologic units. Pi nelands were generally in close proximity with natural forests in northcentral FL. Proximity to urban areas did not seem to affect TC; however, population density might have some indirect influence, which was not accounted for in this study. Transferability of Soil Total Carbon R egression M odels across Geographic R egions The prediction quality of the TC models derived in the BCS, SFRW, and FL, respectively ( Influence of Extent on Soil Total Carbon Regression Models section in the Results and Discussion ), and evaluated at the 10 geographic regions (i.e., hydrologic units) in FL showed some interesting patterns (Figure 77). Overall, the Rv 2 varied from 0.0003 to 0.10 for the BCS model, from 0.08 to 0.66 for the SFRW model, and from 0.15 to 0.73 for the F L model, whereas
203 mean Rv 2 were 0.04 0.01, 0.44 0.06, and 0.50 0.05 (mean standard error), for the BCS, SFRW, and FL model, respectively. The TC model derived in the BCS did not transfer well to other extents (compare the Influence of Extent on Soi l Total Carbon Regression Models section in the Results and Discussion ), thus it was expected that it would not transfer to any particular geographic region in FL with reasonable accuracy. Highest Rv 2 (0.10) were obtained at the Kissimmee, and Altamaha St. Marys areas which somewhat resemble the LULC of the BCS dominated by rangeland and pastures. We correlated the Rv 2 obtained at different regions with basic descriptive statistics of LnTC (number of observations, minimum, maximum, mean, standard deviat ion, etc.) at those regions to clarify the results. For the BCS model, the only significant correlation (r) found (pvalue = 0.03) was a negative one (R = 0.67) with the number of observations. In effect, Kissimmee, and Altamaha St. Marys had the smalles t number of observations among all regions and the highest Rv 2. The TC model derived in the SFRW performed best at the ChoctawhatcheeEscambia unit (Rv 2 = 0.66), followed by Southern Florida (Rv 2 = 0.62), and East Florida Coastal (Rv 2 = 0.58), and performe d poorly (Rv 2 < 0.12) only at Ochlocknee, and Apalachicola, both in the north of FL. Our correlation analysis showed that the variability of LnTC within hydrologic units dictated significantly (p value < 0.05) the overall quality of the SFRW model when eva luated at them. Specifically, the Rv 2 was positively correlated with the range of LnTC (r = 0.84), standard deviation (r = 0.75), and maximum (r = 0.79), and negatively correlated with minimum LnTC (r = 0.64). Moreover, the Rv 2 was also correlated with mean LnTC (r = 0.81; pvalue = 0.81), indicating that the SFRW was overall better transferable to areas of high TC content, which was
204 the case for Southern Florida, East Florida Coastal, and Peace Tampa Bay, but not for ChoctawhatcheeEscambia. We speculate that the poor evaluations of the SFRW model at Apalachicola, and Ochlocknee, respectively, were due to the extreme AWC values in these areas. In the former, mean AWC (0.10 cm cm1) was the highest relative to other geographic regions (mean AWC among regions was 0.08 cm cm1), whereas in the latter, mean AWC (0.07 cm cm1) was the lowest one. Because the most sensitive environmental property selected by the SFRW model was AWC, the SFRW estimated less accurately TC contents when the AWC values were in the two extremes, thus not well represented by the model. This interpretation is nonetheless limited by the fact that in SMLR looking at one explanatory factor at a time does not account for the effect of interactions among the factors selected in the model. Over all, the FL model transferred reasonably well (Rv 2 > 0.41) to all regions, except Apalachicola (Rv 2 = 0.15). Best results were obtained at PeaceTampa Bay (Rv 2 = 0.73), followed by East Florida Coastal (Rv 2 = 0.61), and Suwannee (Rv 2 = 0.59). The transfera bility of the FL model was significantly correlated (p value < 0.01) with the mean (r = 0.76), maximum (r = 0.80), and range (r = 0.80) of LnTC within the hydrologic unit, suggesting that the quality of the model depended on the TC content. Indeed the Peac e Tampa Bay unit had the highest maximum, and range of LnTC, whereas the East Florida Coastal had the second highest mean LnTC. We anticipated that evaluation results at different geographic regions obtained using the model derived in FL would be best rela tive to the models derived at the other extents, simply because of the more representative variability of TC, and explanatory environmental factors (12 in total) encompassed by the FL model. Compared to the SFRW model, this expectation held
205 true for only 6 out of the 10 regions evaluated. The 4 regions where the Rv 2 was higher using the SFRW model were the Altamaha St. Marys, ChoctawhatcheeEscambia, Southern Florida, and St. Johns units. Both Altamaha St. Marys, and St. Johns units border the SFRW, thu s, because of locality, in these units the SFRW model was a better fit than the broader FL model. On the other hand, the Choctawhatchee Escambia, and Southern Florida units do not border the SFRW, and we could not find a definitive explanation to the bette r performance of the SFRW model in these regions. However, we believe that these results were due to the random assignment of validation samples that agreed better with t he SFRW than with the FL model. In summary, the TC models derived in the SFRW and in F L, had some transferability across geographic regions in FL, but the model derived in the BCS did not. The most critical factors controlling model transferability observed in this study were the amount and variability of TC. In Apalachicola, for example, L nTC was the lowest, and least variable (Table 7 5), degrading the quality of estimations. In the opposite extreme, LnTC was highest, and highly variable in Southern Florida, East Florida Coastal, and Peace Tampa Bay; in accordance, both SFRW and FL models produced good results in these areas. Other than the amount and variability of TC, no other factors, or clear regional patterns were evident to indicate which regions would be the most reliable to apply TC models derived elsewhere. It was observed, however that hydrology greatly influenced the spatial distribution of TC at all scales, thus accurately measuring hydrologic patterns within the state of Florida would offer the possibility to improve the TC models derived in this study, and gain a better unders tanding of the scaling properties of TC. Conclusions In this study, we compared SMLR models of TC derived in the SFRW using (i) environmental variables observed at 7 different grains (i.e., spatial resolutions), (ii) derived using
206 soil samples and environm ental variables observed at 3 nested extents (BCS, SFRW, and FL), and (iii) evaluated TC models derived in these 3 extents in 10 geographic regions (i.e., hydrologic units) encompassing FL. Our results showed that grain influenced the quality and transfera bility of the TC model. The quality of TC models (Rt 2) overall decreased with an increase in grain, but the transferability of TC models showed only a weak preference toward smaller grains. A critical grain was observed at 60 m, to which virtually all TC m odels derived at other grains had highest transferability, according to the Rv 2. Further research is needed to identify the cause of the preference for the 60 m grain and to confirm if the 60m grain could be an ideal grain to derive TC models in Florida. In terms of extent, the quality of TC models was similar (Rt 2 ~ 0.60) between FL and the SFRW, and worse in the BCS (Rt 2 = 0.25). The transferability of the TC model was positively related to the extent at which they were evaluated. These patterns are related to the variability of TC and environmental properties used to derive or evaluate the models, respectively, which increased with the extent. As such, the FL model was derived using more comprehensive data than the other two extents, thus had a higher Rt 2. In turn, when models derived at the other two extents were evaluated in FL, the variability of TC and environmental properties of the BCS and SFRW was encompassed by that in FL, providing a higher Rv 2, whereas in the opposite direction, narrower variabi lity degraded the Rv 2. Selection of explanatory environmental factors in the TC models was influenced by the grain and extent. Overall, larger extents produced more complex models, with more variables included, than smaller grains, but the same trend was not observed relative to grain. On the other
207 hand, regional TC models, such as the one derived for the SFRW, performs well if transferred to regions with similar soil landscape characteristics. Overall, there was a consistent preference of the TC models for hydrologic variables, both across grains and extents. Specifically, AWC was the most important variable in 4 out of the 7 grains, and 2 out of the 3 extents. These findings suggest the need to better characterize spatial hydrologic patterns in Florida, si nce the distribution of TC was influenced by hydrology irrespective of the grain or extent. In addition, all TC models selected variables in other environmental classes (soil, topography, LULC, and reflectance), but no preference for any of these classes w as apparent across grains or extents. Regional contextualization within FL was a significant factor (p value ~ 0) controlling the variability of TC, which considerably influenced the quality of TC models. A general spatial trend of decreasing LnTC was obse rved from south to north Florida, with the exception of the westernmost extreme of the Panhandle (ChoctawhatcheeEscambia hydrologic unit), which had high LnTC ( 0.66 ln%), and the Kissimmee unit, which had low LnTC ( 0.71 ln%) located in the middle of high LnTC regions. Overall, in terms of TC model transferability across regions in FL, those that had higher and more variable TC (e.g., Southern Florida, East Florida Coastal, and Peace Tampa Bay) produced better evaluation results than areas with low TC (e. g., Apalachicola), when evaluating the models derived in the SFRW or FL. The TC model derived in the BCS was not representative of any FL region. This highlights the concern of field or multi field studies that are not representative of a region and perfor m poorly when upscaled to coarser scales. Due to budget or labor constraints TC assessment is often limited to specific fields, LULC or soil types. But this imposes major limitations to transfer TC models from fine scale studies to regional scales.
208 We iden tified the influence of major scale parameters on spatial models of TC in Florida, and provided essential information to aid in the design of future soil sampling campaigns, and in establishing priorities for the collection and preparation of basic ancilla ry spatial data to support regional scale estimations of TC. Our analysis demonstrated that hydrology was in great part responsible for the spatial distribution of TC. Thus, it is critical to better characterize Floridas hydrologic patterns through the pr eparation of more detailed (i.e., fine scale) hydrologic maps for the state, and to assess how these patterns influence the distribution of coupled soil and environmental properties. Research to derive spatially explicit, fine resolution layers of soil hyd rologic properties is still in its infancies. Methods that provide accurate hydrologic soil datasets are still costly and labor intensive; thus, limitations exist to provide estimates across large regions that represent spatial and temporal patterns. But t his could greatly contribute to improve TC models derived in this study, as well as models of other soil and environmental properties.
209 Table 7 1. Environmental Geographic Information Systems (GIS) layers used as explanatory variables in the stepwise mult iple linear regression models of soil total carbon. Environmental property Source Reference Class Abbreviation Description Soil CLAY Clay content in % NRCS (2006a) NRCS (1996) Soil SAND Sand content in % NRCS (2006a) NRCS (1996) Soil SILT Silt conten t in % NRCS (2006a) NRCS (1996) Soil PH Soil pH in 1:1 water NRCS (2006a) NRCS (1996) Soil ORD Soil taxonomic order; 5 categories 1 NRCS (2006a) NRCS (1999) Hydrology AWC Soil a vailable water capacity in cm cm 1 NRCS (2006a) NRCS (1993) Hydrology KSAT S oil s aturated hydraulic conductivity in m s 1 NRCS (2006a) NRCS (1993) Hydrology DRN Soil d rainage class; 5 categories 2 NRCS (2006a) NRCS (1993) Hydrology HYG Soil h ydrologic group; 4 categories 3 NRCS (2006a) NRCS (1993) Topography ELEV Elevation above mean sea level in m USGS (1999) Topography SLOPE Highest slope within a 3x3 window in % USGS (1999) Topography CTI Compound topographic index USGS (1999) Gessler et al. (1995) Topography ASP Aspect; 4 categories 4 USGS (1999) LULC LULC Land use/land cover; 6 categories 5 F F W C C (2003a) FFWCC (2003b) Reflectance B1 5, B7 Landsat ETM+ bands 1 5, and 7 in DN FFWCC (2003c) FFWCC (2003b) Reflectance PC1 3 6 Principal components 1 through 3 in DN FFWCC (2003c) Reflectance TC1 3 6 Tasseled cap indices 1 through 3 in DN FFWCC (2003c) Huang et al. (2002) Reflectance NDVI 6 Normalized difference vegetation index FFWCC (2003c) EO (2009) Abbreviations: DN = digital number; EO = Earth Observatory; FFWCC = Florida Fish and Wildlife Conservation Commission; NRC S = Natural Resources Conservation Service; USGS = United States Geological Survey. 1 ORD categories: Alfisols, Entisols, Spodosols, Ultisols, and Other (Histosols, Inceptisols, Mollisols, and Vertisols) 2 DRN categories: poorly drained, somewhat poorl y drained, moderately well drained, well drained, and excessively drained. 3 HYG categories: A, B, C, and D 4 ASP categories: east north, west and southfacing slopes 5 LULC categories: agriculture, grasslands, pinelands, upland vegetation (including forest, scrub, coastal and exotic vegetation), urban and barren lands, and wetlands. 6 D erived from Landsat E nhanced T hematic M apper Plus (ETM + ) without band 6 (thermal).
210 Table 7 2. Descriptive statistics of soil total carbon (TC) and ln transformed TC (LnTC) at the three study areas. Study area Statistic Whole set Training Validation Whole set Training Validation TC (%) LnTC (ln%) FL Observations 1193 792 401 1193 792 401 Mean 1.99 1.98 2.01 0.6155 0.6413 0.5646 S td. dev iation 6.55 6.66 6.34 1.1791 1.1752 1.1865 Min imum 0.03 0.03 0.04 3.5599 3.5599 3.1706 Median 0.42 0.41 0.45 0.8687 0.8967 0.8088 Max imum 54.59 54.59 53.63 3.9999 3.9999 3.9820 Skew ness 5.26 5.27 5.25 1.71 1.75 1.64 SFRW Observations 141 102 39 141 102 39 Mean 0.88 0.96 0.65 0.5822 0.5465 0.6757 S td. deviation 1.81 2.09 0.66 0.7062 0.7386 0.6125 Minimum 0.12 0.12 0.24 2.0918 2.0918 1.4325 Median 0.50 0.51 0.49 0.7024 0.6724 0.7152 Maximum 17.03 17.03 3.52 2.8350 2.8350 1.2597 Skewness 6.64 5.90 3.19 2.09 2.17 1.56 BCS Observations 152 106 46 152 106 46 Mean 1.90 1.90 1.90 0.5058 0.5097 0.4968 S td. deviation 1.13 1.13 1.16 0.5128 0.5023 0.5419 Minimum 0.48 0.49 0.48 0.7268 0.7231 0.7268 Median 1.74 1.75 1.74 0.5560 0.5568 0.5540 M aximum 8.65 8.65 7.08 2.1580 2.1580 1.9566 Skewness 2.62 2.85 2.22 0.20 0.23 0.14
211 Table 7 3. Regression coefficients and goodness of fit statistics of the stepwise multiple linear regression model of ln transformed soil total carbon in ln% derived at different grains in the Santa Fe River watershed. Numbers in italic indicate the most important explanatory factor at a specific grain. Environmental property Grains 30 m 60 m 120 m 240 m 480 m 960 m 1920 m Regression coefficients Class Abbreviatio n U nstd Std Unstd Std Unstd Std Unstd Std Unstd Std Unstd Std Unstd Std Intercept 3.03 3.32 1.89 2.47 4.43 0.45 0.98 Soil SAND 0.03 0.40 PH 0.28 0.16 0.29 0.18 Hydro AWC 14.90 0.51 16.72 0.51 14.52 0.57 14.77 0.47 14.97 0.54 DRN WELL 0.51 0.20 DRNEXC 0.35 0.20 0.45 0.20 Topo ELEV 0.01 0.17 SLOPE 0.11 0.20 ASP N 0.24 0.14 0.40 0.21 0.29 0.17 0.50 0.26 LULC PINELAND 0.35 0.23 UPVEG 0.24 0.13 0.34 0.20 0.32 0.15 WETLAND 0.95 0.31 0.97 0.33 Refl B6 0.03 0.50 TC3 0.04 0.61 0.02 0.24 0.02 0.25 NDVI 0.50 0.14 1.37 0.17 R 2 Train Val Train Val Train Val Train Val Train Val Tra in Val Train Val 0.54 0.06 0.51 0.20 0.38 0.08 0.34 0.06 0.26 0.05 0.25 0.02 0.15 0.14 Abbreviations: ASPN = aspect (northfacing slopes); AWC = soil available water capacity in cm cm1; B6 = Landsat Enhanced Thematic Mapper Plus (ETM+) band 6 (midinf rared) in digital number; DRNEXC = soil drainage class (excessively drained); DRNWELL = soil drainage class (well drained); ELEV = elevation above mean sea level in m; Hydro = hydrology; LULC = land use/land cover; PINELAND = LULC class (pineland); NDVI = normalized difference vegetation index; PH = soil pH in 1:1 water; R2 = coefficient of determination; Refl = reflectance; SAND = sand content in %; SLOPE = h ighest slope within a 3x3 window in % ; Std = standardized coefficients; TC3 = t asse led cap index 3 in digital number; Topo = topography; Train = training set (Rt 2) ; UPVEG = LULC class (upland vegetation); Unstd = unstandardized coeficients; Val = validation set (Rv 2) ; WETLAND = LULC class (wetland)
212 Table 7 4. Regression coefficients and goodness of f it statistics of the stepwise multiple linear regression model of lntransformed soil total carbon derived at different extents. Numbers in italic indicate the most important explanatory factor at a specific extent. Environmental property Extents BCS SFR W FL Regression coefficients Class Abbreviation Unstd Std Unstd Std Unstd Std Intercept 0.93 3.20 1.78 Soil SAND 0.02 0.26 SILT 0.04 0.28 PH 0.45 0.27 ORD ALF S 1.47 0.46 ORD ENTS 1.15 0.40 ORDODS 1.05 0.39 ORDULT S 1.38 0.52 Hydrology AWC 6.76 0.37 14.33 0.53 9.10 0.39 KSAT 0.004 0.25 0.002 0.08 DRNPOOR 0.38 0.16 DRNSPD 0.29 0.22 HYGB 0.34 0.20 HY G D 0.66 0.26 Topography ASPN 0.22 0.13 LULC PINELAND 0.23 0.08 UPVEG 0.34 0.20 0.16 0.04 Reflectance B3 0.01 0.12 B4 0.01 0.15 B5 0.004 0.14 NDVI 1.19 0.33 R 2 Train Val Train Val Train Val 0.25 0.07 0.61 0.18 0.60 0.52 Abbreviations: ASPN = aspect (northfacing slope s); AWC = soil available water capacity in cm cm1; B3 5 = Landsat Enhanced Thematic Mapper Plus (ETM+) bands 3 through 5 in digital number; BCS = University of Florida Beef Cattle Station; DRNPOOR = soil drainage class (poorly drained); DRNSPD = soil dra inage class (somewhat poorly drained); FL = state of Florida; HYGB = soil h ydrologic group (B); HYGD = soil h ydrologic group (D); KSAT = soil s aturated hydraulic conductivity in m s1; LULC = land use/land cover; NDVI = normalized difference vegetation in dex; ORDALFS = soil taxonomic order (Alfisols); ORDENTS = soil taxonomic order (Entisols); ORDODS = soil taxonomic order (Spodosols); ORDULTS = soil taxonomic order (Ultisols); PH = soil pH in 1:1 water; PINELAND = LULC class (pineland); R2 = coefficient o f determination; SAND = sand content in %; SFRW = Santa Fe River watershed; SILT = silt content in %; Std = standardized coefficients; Train = training set (Rt 2) ; UPVEG = LULC class (upland vegetation); Unstd = unstandardized coeficients; Val = validation set (Rv 2).
213 Table 7 5. Descriptive statistics of soil total carbon (TC) and ln transformed TC (LnTC) by hydrologic unit (i.e., geographic region) in Florida. Hydrologic unit Statistics N Mean 1 SD Min Median Max Skewness TC (%) Altamaha St. Mary's 58 0.87 1.65 0.08 0.45 9.25 4.37 Apalachicola 84 0.40 0.26 0.07 0.34 1.23 1.64 ChoctawhatcheeEscambia 137 2.32 7.26 0.04 0.41 46.06 4.17 East Florida Coastal 75 1.42 3.02 0.04 0.48 22.87 5.39 Kissimmee 64 3.18 9.82 0.05 0.33 49.09 3.83 Ochlocknee 77 0.49 0.40 0.11 0.38 3.20 4.39 PeaceTampa Bay 168 2.24 7.38 0.03 0.44 54.59 5.04 Southern Florida 168 3.65 8.74 0.04 0.61 43.37 3.29 St. John's 194 2.35 7.91 0.09 0.42 53.63 4.83 Suwannee 168 1.04 3.44 0.11 0.41 37.12 8.75 LnTC (ln%) Altamaha St. Mary's 58 0.7462 abc 0.9331 2.4963 0.7990 2.2246 0.95 Apalachicola 84 1.0803 c 0.5720 2.6137 1.0780 0.2052 0.25 ChoctawhatcheeEscambia 137 0.6638 ab 1.2213 3.1706 0.8845 3.8298 2.24 East Florida Coastal 75 0.4882 ab 1.1668 3.2383 0.7258 3.1298 0.69 Kissimmee 64 0.7149 abc 1.5294 3.0476 1.1103 3.8937 1.49 Ochlocknee 77 0.8877 bc 0.5755 2.1918 0.9731 1.1621 0.50 PeaceTampa Bay 168 0.5347 ab 1.1812 3.5599 0.8195 3.9999 1.83 Southern Florida 168 0.2308 a 1.5707 3.2843 0.4971 3.7697 0.88 St. John's 194 0.5866 ab 1.2132 2.4396 0.8686 3.9820 1.92 Suwannee 168 0.6919 b 0.8513 2.1874 0.8912 3.6143 1.97 Abbreviations: Max = maximum; Min = minimum; N = number of observations; S D = standard deviation 1 Same letters indicate statistically equal LnTC means at the 0.05 confidence level according to Dunnetts T3 test.
214 Figure 71. Sampling locations at three nested extents (i.e., study areas) in Florida the state of Florida (FL), the Santa Fe River watershed (SFRW) and the University of Florida Beef Cattle Station (BCS). Normalized difference vegetation index (NDVI) derived from Landsat Enhanced Thematic Mapper Plus (ETM+) is shown in the background.
215 Figure 72. Separation of Florida samples into geographic regions (i.e., hydrologic units) and extent subsets. Hydrologic unit boundaries (Florida Department of Environmental Protection, 1997) are shown in the background.
216 Figure 73. Overview of the framework used to test the influence of grain, extent, and geographic region on the quality of stepwise multiple linear regression models of lntransformed soil total carbon. Extents included the state of Florida (FL), the Santa Fe River watershed (SFRW) and the University of Florida Beef Cattle Station (BCS).
217 Figure 74. Prediction quality of the stepwise multiple linear regression models of ln transformed soil total carbon in ln% de rived at specific grains, and evaluated at the other six grains i n the Santa Fe River watershed. Stars indicate validation at the same grain at which the model was d erived.
218 Figure 75. Output maps of lntransformed soil total carbon (LnTC) from the stepwise multiple linear regression models derived at seven grains, respectively, in the Santa Fe River watershed.
219 Figure 76. Prediction quality of the stepwise multiple linear regression models of ln transformed soil total carbon derived at a specific extent, and evaluated at the other two extents. Stars indicate validation at the same extent at which the model was derived. For the state of Florida, mean Rv 2 of the 10 extent subsets are shown along with their standard errors. Abbreviations: BCS = University of Florida Beef Cattle Station ; FL = state of Florida; Rv 2 = coefficient of determination of validation/evaluation; SFRW = Santa Fe River watershed.
220 Figure 77. Prediction quality of the stepwise multiple linear regression models of ln transformed soil total carbon (LnTC) derived at the University of Florida Beef Cattle Station (BCS), Santa Fe River watershed (SFRW), and state of Florida (FL), and evaluated at 10 geographic regions (i.e., hydrologic units) in FL. Labels show mean LnTC, and the coefficient of determination ( Rv 2) for evaluation of the BCS, SFRW, and FL models at each region, respectively. The color gradient from green to red indicates increasing LnT C.
221 CHAPTER 8 MULTI SCALE BEHAVIOR OF SOIL CARBON AT NESTED REGION S IN FLORIDA, USA Summary The spatial distribution of soil total carbon (TC) is controlled by a number of environmental landscape processes that evolve over a range of different spatial and temporal scales. Regional assessments of TC should account for these scaling effects to more accurately represent its spatial variability at escalating geographic domains. To provide information for such endeavors we characterized the spatial dependence o f TC at three nested scales within the state of Florida, US, using variogram and fractal analyses. The variability of TC increased with increasing extent, but also the unexplained short range variation (nugget variance), as sample spacing increased. At the field scale (5.58 km2), TC showed strong spatial dependence up to 354 m, and moderate dependence at distances up to 2905 m. At the watershed scale (3585 km2), strong dependence was observed up to 12 km, and at the state scale (150,000 km2), regional TC sp atial dependences appeared up to 151 km. At the three scales, fractal dimensions varied from 2.76 to 2.96, where short range variation of TC was dominant, characterizing anti persistence. Pooling data from the three scales facilitated to explain the short to longrange variability of TC across Florida, and provided a more robust variogram for TC, reflecting the patterns observed at the individual scales. Our results demonstrate that the spatial distribution of TC is self similar over a range of scales (< 1 .5 km, 1.5 31 km, and > 31 km), which indicate the regions within which the spatial dependence of TC is scaleindependent. Single scale studies of TC in Florida could use these ranges as guidelines to match TC observation and model scales. Our results also suggest the need of a multi scale approach to model TC across Florida.
222 Introduction Spatial modeling of soil properties is usually conducted at a single scale determined by the study area, within which a specific field sampling design is proposed to capture the most important soil, ecological or landscape attributes and their variability. To optimize observation schemes, it is desirable to know the underlying variability of soil ecological properties a priori which is often unknown. Furthermore, soils and ecological properties are not isolated within certain areas, but rather result from (i.e., are affected by) environmental processes occurring across many scales that are larger, or smaller than the scale of observation. Thus, in order to better define the extent of soil landscape analysis, and design corresponding field sampling, it is important to identify the inherent spatial scale(s) of the soil and ecological property of interest. Geostatistics, more specifically variogram analysis, has been used to de scribe the spatial dependence of many soil and ecological properties, including soil carbon (C). In variogram analysis, the semivariogram (hereafter referred to as variogram) is used to characterize the spatial dependence structure of a property by plotting the semivariance as a function of lag distance (Equation 81) (Chils and Delfiner, 1999, Grunwald, 2006). The spatial dependence of a property can be characterized by specific regions of the variogram, i.e., the nugget and sill variances, and the range, as well as by its overall shape. These characteristics depend not only on the magnitude and spatial distribution of the property, but also on the sampling design, including the number of observations, landscape conditions, and internal and external stress ors that generated the spatial patterns observed in the property of interest. h m i i ih x z x z h m h1 22 1 (8 1) Where: h = observed semivariance at lag distance h; h m = number of paired comparisons at lag distance h; h x z x z = measurements separated by a lag distance h.
223 Many studies have quantified the spatial dependence of soil carbon (C) using variogram analysis. McBratney and Pringle (1999), for example, identified an average spatial autocorrelation range of 310 m among 9 investigations of soil C in agricultural fields. Mueller and Pierce (1993) found ranges for soil total C (TC) between 118 and 249 m within a 13ha field in Michigan, USA, depending on the number of samples. More recently, Terra et al (2004) identified ranges for soil organic C (SOC) varying from 63 to 73 m, also depending on the number of samples, in a 9ha field in central Alabama, USA. And Simbahan et al. (2006) identified ranges for SOC varying from 89 to 450 m at 3 fields from 49 to 65 ha in Nebraska, USA. Along with the given examples, the vast majority of investigations of the spatial dependence of soil C were conducted at the field scale. However, some investigations have identified the spatial dependence of soil C at larger ex tents, including van Meirvenne et al. (1996) (3164 km2) in Belgium, McGrath and Zhang (2003) (41,462 km2) and Zhang and McGrath (2004) (15,460 km2) in Ireland, and Hengl et al. (2004) (2500 km2) in Croatia. These studies identified spatial autocorrelation ranges for SOC or soil organic matter in the order of 3 to 100 km. Thus, at larger scales (e.g., from regional to continental scales), the spatial dependence of soil C is not sufficiently characterized, and still subject to investigation. Besides variogram analysis, another method that has been adopted to characterize the spatial dependence of soil and ecological properties is fractal analysis. Fractals were introduced in the natural sciences by the seminal works of Mandelbrot (e.g., Ma ndelbrot, 1967; Mande lbrot, 1983), and has been applied to describe the spatial roughness of environmental properties (Pentland, 1984), and their degree of spatial dependence (Bian and Walsh, 1993). By definition, a fractal is a series in which the Hausdorf Besicovitch dimension, i.e., the fractal dimension (D), exceeds the topological (i.e., Euclidean) dimension (Burrough, 1981). Thus, for fractional
224 transects (points), D > 0, for fractional curves, D > 1, and for fractional surfaces, D > 2. In nature, fractal properties exhib it self similarity over a range of scales, meaning that zooming in to a fractal at finer scales will resolve more structure and roughness that is similar to that observed at the coarser scale (Burrough, 1981). Fractal analysis offers some advantages over v ariogram analysis. First, D is independent of scale, magnitude of the property, and direction (Eghball et al., 1999; McClean and Evans, 2000). Second, D can be used to guide interpolations, as it helps to understand the complexity of the spatial autocorrel ation over a range of scales (Burrough, 1981). Third, changes of D can identify scales of significant geographic interest (Mark and Aronson, 1984), since driving processes operate at particular ranges of spatial scale within which D is constant (Burrough, 1981). Finally, the definition of D implies that the amount of resolvable detail is a function of the scale of observation (Bian and Walsh, 1993), meaning that D can be used to characterize the multi scale spatial dependence. Fractals have been applied to describe the self similar nature of soil properties, including particle size distribution (Tyler and Wheatcraft, 1992; Su et al., 2004), fragmentation, as reviewed by Anderson and McBratney (1995), aggregation (Perfect and Kay, 1991; Castrignan and Stellu li, 1999), and structure (Perrier et al., 1999; Bartoli et al., 2006), with implications to soil hydrology (Bird et al., 1996). On the other hand, studies to specifically characterize the spatial dependence of soil properties across fields or landscapes us ing fractals are less common, but some have been done for cation exchange capacity (Bekele et al., 2005), pH (Culling, 1986), and other properties (Burrough, 1983). One of the few studies that assessed the multifractal properties of soil organic matter, phosphorus, and potassium was presented by Kravchenko et al. (1999). However, information about the fractal properties of soil C is still very limited.
225 Our objective was to characterize the spatial dependence of TC at multiple scales in the state of Florida, U.S. We hypothesized that the spatial dependence (i.e., spatial autocorrelation) of TC depends on the spatial scale, which subsequently affects modeling of the spatial patterns of TC at different scales, or across multiple scales. We conducted a multi sca le assessment of the spatial dependence of TC with the specific aims to: (i ) characterize the spatial dependence of TC at three nested extents; and (ii ) identify the scales over which TC exhibits fractal behavior, i.e., shows self similarity. Our results e lucidate multi scale behavior of TC across a large subtropical landscape in the southeastern U.S. Materials and Methods Study Area The study was conducted within the state of Florida, which spans about 150,000 km2 between latitudes 24.55 and 31.00 N, and l ongitudes 80.03 and 87.63 W (Figure 81). Mean annual precipitation is 1373 mm, and mean annual temperature is 22.3 oC (National Climatic Data Center, 2008). Florida soils are mainly Spodosols (32%), Entisols (22%), Ultisols (19%), Alfisols (13%), and Hist osols (11%) (Natural Resources Conservation Service, 2006), and land uses/land covers are predominantly wetlands (28%), pinelands (18%), croplands (9%), rangelands (9%), improved pasture (8%), and urban and barren lands (15%) ( Florida Fish and Wildlife Con servation Commission, 2003a ). The topography is relatively flat, with elevations below 114 m, and 0 to 5% slopes in most of the state (United States Geological Survey, 1984). Geologically, limestone bedrock is vastly present throughout Florida, which origi nated from marine sediments. In the south, these sediments are overlain by sapric organic materials and/or secondary carbonates (marl), while in the north sandy and loamy sediments originating from the continental U.S. were deposited.
226 Field Sampling and La boratory Analysis Field sampling was conducted at three nested areas within the state of Florida (Figure 82). The broadest area encompassed the whole state (~150,000 km2), and was named the state scale (SS); the second area had an intermediate size (~3585 km2), and was delimited by the boundary of the Santa Fe River watershed, in north central Florida, hereafter named the watershed scale (WS); and the third one was the University of Florida Beef Cattle Station (5.58 km2), named the field scale (FS), nested inside the Santa Fe River watershed. The sizes of these three areas were orders of magnitude different, reflecting in a progressively increasing variability of TC from the FS to the SS. Detailed descriptions of these three areas, with their respective fie ld sampling, and laboratory methods, was provided in Chapter 7. In brief, composite soil samples were collected at four depths (030, 3060, 60120, and 120180 cm) in a stratified sampling design along land use/land cover and soil taxonomic order combinat ions at the FS and WS. At the SS, purposive sampling was conducted at each county, where sampling sites were chosen based on tacit knowledge by soil survey crews as being representative of the major soil landscape complexes. Soils were collected and descri bed by horizon from 0 to 2 m or more. The SS data are part of the Florida Soil Characterization Database (2009). A total of 152, 141, and 1193 soil profiles were collected at the FS, WS, and SS, respectively (Figure 8 2). Laboratory analysis of TC was conducted using air dried and sieved (2 mm) samples, and included three methods. Loss on ignition was used to measure soil organic matter in all samples collected at the FS, and organic samples (i.e., soil samples from organic horizons) collected at the SS. So il organic C was calculated by multiplying the organic matter content by the van Bemmelen factor (0.58) (Natural Resources Conservation Service, 1996). At the WS, TC was measured by high temperature combustion on a FlashEA 1112 Elemental Analyzer (Thermo
227 E lectron Corp., Waltham, MA). Finally, SOC in mineral horizons at the SS was measured using the Walkley Black modified acid dichromate method (Walkley and Black, 1934; Natural Resources Conservation Service, 1996). In order to standardize the units of measurement, SOC measured using loss on ignition, or Walkley Black was converted to high temperature combustion TC units using conversion factors that were derived based on 144 samples from the SS dataset. The conversion factors were obtained by regressing TC a s a function of SOC measured using loss on ignition, or Walkley Black, respectively, and assigning a zero intercept. Details about the selection of samples, and construction of pedotransfer functions to estimate TC, and other soil properties using the SS dataset can be found in Myers et al. (2009). For this study, only preliminary versions of the pedotransfer functions were derived for the purpose of integrating soil C measurements, and are shown in Equation 82 (R2 = 0.94) for Walkley Black, and Equation 83 (R2 = 0.97) for loss on ignition measurements, respectively. WB HTCSOC TC 98 0 (8 2) LOI HTCSOC TC 90 0 (8 3) Where: HTCTC = TC measured by high temperature combustion in %; WBSOC = SOC measured by Walkley Black in %; LOISOC = SOC measured by loss on ignition in %. Soil total C concentration (%) was calculated in the first 1 m by averaging TC concentrations at each depth interval weighted by the thickness of the depth interval; in other words TC to 1 m was calculated as the depth weighted average TC across all depth intervals (FS and WS) or horizons (SS) to 1 m. Total C volumetric contents (stocks) were not calculated in order to avoid interference due to the variability (i.e., uncertainty) of soil bulk density. Total C was ln transformed using natural log to approximate a normal distribution.
228 Characterization of the Spatial Dependence of Soil Total Carbon We used two approaches to characterize the spatial dependence of TC at the three scales: variogram analysis, and fractal analysis. In the former, empirical variograms of TC were derived using observed TC values at the three scales, respectively. At all scales, the spatial dependence of TC observed over short, medium, and long distances was characterized by separate variograms that were derived using small, medium, and large lag sizes, respectively, totaling three variograms at each scale. Then, we merged the TC datasets from the three scales into a pooled dataset (1486 observations), and also derived variograms over short, medium, and long distances. Our aim was to identify the multi scale dependence of TC, and confirm the trends found at individual scales. We fit either exponential (Equation 84), or Gaussian (Equation 85) models (Chils and Delfiner, 1999) to the empirical variograms to derive variogram parameters, i.e., nugget variance ( c0), sill variance ( c ), and range of spatial autocorrelation ( r ). r h 0e c c h31 (8 4) 231 a h 0e c c h (8 5) Where: h= estimated semivariance at lag distance h; c0 = nugget variance; c = partial sill variance; e = natural exponential base; h = lag distance; r = effective range, where h achieves 95% of the total sill ( c0 + c ), at about 3a; a = range. For fractal analysis, we used the variograms derived at the three scales separately, as well as using the pooled dataset to calculate fractal dimensions (D). According to Mandelbrot (1975), the differences in observations ( Z ) between points on a fractional Brownian surface constitute a zero mean Gaussian random function, the socalled fractional Brownian function, whose variance can be described by a power function (Equation 86; Eghball et al., 1999). The fractional Brownian function is self similar, and has similar properties to the distribution of a regionalized variable (as represented by the variogram), including zero mean Gaussian
229 distribution, and stationarity (Mandelbrot and van Ness, 1968; Mandelbrot, 1975); thus D can be derived from the slope of t he loglog plot of the variogram (Equation 87; Eghball et al., 1999). H i ikh h x z x z E 2 (8 6) H D h H h kh h0 H2 1 3 log log (8 7) Where: h x z x z = measurements separated by a lag distance h; k = constant related to the extent of the variat ion; H = codimension, or Hurst coefficient, where 0 < H 2; h y= empirical variogram; 0 = intercept, where k0log ; D = fractal dimension. Fractal dimensions were derived from linear sections of the log log variograms of LnTC at each scale (FS, WS, and SS), and using the pooled dataset, respectively. We used Students t test to identify significant differences among the slopes of the loglog variograms at different scales, i.e., to compare H (and thus D) among scales. Students t test was performed by comparing the observed t (Equation 88) against the distribution of t using ( N 4) degrees of freedom, where N is the sum of the number of observations of the two samples being compared. 2 1b b 2 1s b b t (8 8) Where: b1, b2 = slopes of the first, and second sample, respectively; 2 1b bs = standard error of the difference between the slopes, where 2 22b b b bs s s1 2 1 Results and Discussion Descriptive Statistics Soil total C sho wed increasing variability as the scale increased (Table 8 1). At the FS, TC ranged from 0.48 to 8.65%, with a mean of 1.90%. At the WS, TC ranged from 0.12 to 17.03%, with a mean of 0.88%. At the SS, TC ranged from 0.03 to 54.59%, with a mean of 1.99%. In total, TC in the pooled dataset ranged the same as at the SS, since TC variation at the SS encompassed that of the two nested scales, and had a mean similar to the FS, and SS. The
230 frequency distribution of TC at the WS, SS, and the pooled dataset was more similar than at the FS, with closer median, and skewness, even after conversion to natural log. Spatial Dependence of Soil Total Carbon Variogram analysis In total, we produced 12 variograms, 3 at each scale, and 3 using the pooled dataset across scales, respectively, to characterize the spatial dependence of TC over short, medium, and long distances (Figure 8 2), and summarized the parameters of the empirical and fitted variograms in Table 8 2. As discussed in the previous section, the variability of TC i ncreased as the scale increased. This trend was confirmed by variogram analysis, where the total sill increased from 0.30 at the FS to 1.33 at the SS, having intermediate values at the WS (0.53). The range of spatial autocorrelation also increased as a fun ction of scale. At the FS, TC ranges varied from 354 m over short distances to 2905 m over long distances. At the WS, TC ranges varied from 1538 to 12,072 m. And at the SS, TC ranges varied from 2613 to 151,319 m. Pooled data showed spatial autocorrelation of TC with a range of 5560 m over short distances and a smaller range (119,424 m) over long distances relative to the SS. At the FS, the range of spatial autocorrelation of TC over short distances approximated the average range of 310 m identified by McBr atney and Pringle (1999) for crop fields. However, over longer distances, the range at the FS reached 2905 m, which is comparable to the ranges observed at the WS, and SS, both over short distances. Similar ranges were found for SOC at 030 cm by Wang et al. (2002b ) in a forested region in northeastern Puerto Rico (110 km2), and also for soil organic matter in the topsoil by Hengl et al. (2004) in central Croatia (2500 km2), who estimated ranges of 3070, and 3061 m, respectively. At larger scales, ranges of TC autocorrelation varied from 5456 m to as high as 151,319 m. At these scales, the spatial dependence of TC is dominated by regional patterns that are only
231 apparent over distances of kilometers (e.g., topographic/hydrologic, or physiographic patterns), a nd the plot and field variability of TC play a minimum role, maybe only contributing to explain some of the short range variation, thus lowering the nugget variance. At the regional scale, ranges in the same order of magnitude of those observed at the SS w ere also identified in other areas, thus corroborating our results. For example, McGrath and Zhang (2003) and van Meirvenne et al. (1996) observed ranges of about 40,000 m in southeastern Ireland (41,462 km2), and northwestern Belgium (3164 km2), respectiv ely. In a second study in a smaller region in southeastern Ireland (15,460 km2), Zhang and McGrath (2004) observed ranges from 58,000 to 100,000 m for SOC at 010 cm. The latter range was closer to that found at the SS over long distances, which in turn wa s more modest than the range of 632,000 m observed for SOC at 020 cm in a 3435km2 region in northeast China (Liu et al., 2006). The uncertainty about the short range variability of TC, as measured by the nugget variance, was comparable between the FS and the WS, but increased at the SS. Characterization of the short range variability of a property depends on the number of samples taken at a close distance and their spatial arrangement (Grunwald and Reddy, 2008), which might explain the higher portion of unexplained TC variation (i.e., higher nugget variance) at the SS. The strength of the spatial dependence, as indicated by the nugget to sill ratio (nugget/sill), was highest at the FS over short distances (nugget/sill = 2.1%). However, there was no clear t rend of the strength of spatial dependence as a function of scale. In effect, nugget/sill varied considerably when characterized over different distances at each scale, respectively. The only apparent trend was an increase in the nugget/sill (or decrease o f the strength of spatial dependence) in relation to the distance of observation (i.e., over short, medium, and long distances) within each scale (FS, WS and SS). According to the classification proposed by
232 Cambardella et al. (1994), TC showed strong spati al dependence (nugget/sill < 25%) only in 3 out of the 12 variograms, moderate spatial dependence (25% < nugget/sill < 75%) in 8 out of the 12 variograms, and weak dependence (nugget/sill > 75%) only at the FS over medium distances. To compare the spatial dependence of TC among scales we plotted the 12 fitted variograms together up to 10,000 m (Figure 83), and could observe some interesting patterns. First, the variograms derived using the pooled dataset reflected the spatial dependence of TC inherent at e ach scale. This is intuitive, since the pairs of points observed at specific scales were also available in the pooled dataset. For example, over short and medium distances, the empirical variogram using the pooled dataset (Figures 82j, and 82k) was very similar in shape and magnitude to the empirical variogram derived at the FS over long distances (Figure 82c). Over long distances, TC sampled at the SS (Figure 82f) was responsible for the patterns observed using the pooled dataset (Figure 82l); however the nugget variance was greatly reduced from 0.91 to 0.42 using the pooled dataset as more pairs of points added from the FS and WS over short distances explained better the short range variability of TC. When comparing the fitted variograms up to 10,000 m (Figure 83), similar patterns were modeled at the FS over short and medium distances, and at the WS over medium and long distances. Except for these cases, the parameters, and consequently shapes of the variograms were not in agreement. This indicates the presence of multiple scales of TC variation, or TC spatial dependence. Nested spatial scales of variation were also observed for exchangeable magnesium in Australian soils by McBratney (1992). He derived a de Wijsian variogram based on observations acr oss Australia, and identified the same pattern of increasing variability with increasing scale as observed in our study. McBratney (1992) pointed out that, due to lack of data, this pattern of observed increasing soil variance with increasing log(lag dista nce) (de Wijsian
233 variogram) should be considered only as a rule of thumb, or working hypothesis. However, in a more recent paper, McBratney (1998) explored the idea of unbounded soil variation with increasing scales, and suggested that the sill variances o bserved at different scales were simply a function of the geometric support and the sampling extent, with the sill increasing with lag distance, and the range of spatial autocorrelation being some fixed proportion of the extent. In our analysis, the range of spatial autocorrelation increased with increasing extent, but not with a fixed proportion. On the other hand, the sill variance related to the minimum distance between observations at each scale, and naturally to the variance of LnTC at each scale. Frac tal analysis In our study, D 2.76 at all scales, with corresponding H 0.47, except over short distances using the pooled dataset, where D = 2.55 (H = 0.90) (Table 82). This indicates an overall tendency of weak spatial dependence, and anti persistence, in spite of some compelling indications of strong spatial dependence based on nugget/sill (e.g., at the FS, and SS, both over short distances). Calculations of D are usually performed using exhaustive and dense data, such as in a grid, which was not the case in this study, since production of a TC map at any scale would be based on its variogram in the first place. Thus, we calculated D using only the sampling sites, and, because of this, the observed short range variation of TC might not reflect what wou ld be expected in a landscape continuum (note that at the current time there is no method available to accurately and cost effectively measure TC on a fine grid across the whole state of Florida). In other words, the geographic separation among sampling si tes may be responsible for the observed anti persistence of TC. Thus, we advise that these fractal properties be interpreted only relative to those obtained at the other scales, or by similar assessments elsewhere.
234 Fractal dimension is a measure of the rou ghness of the property in space; in other words, it measures whether the spatial distribution of the property is smooth or crumpled across the landscape. As such, it provides an alternative perspective to analyze the spatial dependence of the property. In the case of points distributed in space (i.e., with x,y coordinates), D can vary from 2 to 3; D ~ 2 means that the distribution of the property is smooth, without abrupt changes, and that longrange variation is important; on the other extreme, D ~ 3 means that the distribution of the property is crumpled, and controlled by short range variation (Burrough, 1983), characterizing anti persistenc e when H < 0.5 (Mandelbrot, 1983). For example, if the property has a strong, well characterized positive spatial au tocorrelation, then D is expected to be closer to 2; conversely, if the property has a weak positive spatial autocorrelation, or a negative one, then D approaches 3. Regions of constant D (i.e., linear sections in the loglog plots of the variogram) were observed at all scales, except at the WS over short and medium distances, respectively (Figure 8 4). Spatial ranges at which D is constant are the regions of self similarity of TC, i.e., where TC variation is scale independent, and indicate the scales at wh ich related environmental processes (e.g., soil forming processes) operate (Burrough, 1981). Thus, the specific distances at which D changes (i.e., where the slope in the loglog plot changes) indicate the scales where shifts in processes and factors cause different TC spatial patterns to emerge. Using the pooled dataset, we observed two approximate regions of constant D over short and medium distances (Table 82; Figures 8 4j, and 84k). Up to about 12611592 m (3.13.2 log m), H was small (0.060.08), ind icating weaker spatial dependence than after the change of slope, where H 0.43. A second change of D was observed over long distances (Figure 84l) at about 30,617 m (4.5 log m). In this case, the spatial dependence of TC at the medium range (smaller tha n 30,617 m) was
235 stronger than at the long range (above 30,617 m), leading to the general trend of the strength of spatial dependence increasing from short to medium range, and then decreasing at the long range. According to Burrough (1983), small H (large D) indicate that short range variation of TC is predominant, i.e., the variation of TC is pronounced at short distances, whereas longrange variation dominates in the case of large H (small D). Albeit this interpretation makes perfect sense up to 30,617 m, where the change of D at about 12611592 m expectedly marked a transition from short range TC variation to medium range variation, above 30,617 m, the expected dominance of longrange TC variation was not observed, since D actually increased. The only com parable TC study we could find characterized the multifractal properties of soil organic matter in a 2.59km2 corn soybean field in central Illinois, US (Kravchenko et al., 1999). The authors calculated a D of 2.0 within a range of 146 m, using 1752 sample s, indicating, in their case, a much smoother distribution than the one found in our study. However, they sampled with much higher density (6.76 samples ha1) than in our study (FS: 0.27 samples ha1). In addition, land use patterns were more homogeneous i n the Kravchenko study when compared to our set with more diverse land use, even at the FS. Using the pooled dataset, different regions of constant D were clearly discernible, suggesting that TC also exhibits multifractal behavior even at large extents acr oss Florida, or at least fractal behavior (i.e., self similarity) over constrained spatial scales. Among scales, the spatial dependence of TC based on fractal analysis showed homogeneous patterns, according to Students t test at the 0.05 confidence level (Table 8 2). We observed statistically equal H (and thus, D) for the distribution of TC among many scales, some of which are important to highlight (Table 82). First, regions of constant D for TC using the pooled dataset corresponded to the respective regions at the original scales. Specifically, at about
236 673074 m (1.83.5 log m), D observed at the FS over long distances (Figure 84c) was represented in the pooled dataset both over short and medium distances (Figures 84j, and 84k), where the distance ranges matched (first linear sections in the log log plots). Also, at 777718,663 m (3.94.3 log m), D observed at the WS (Figure 84e) was equal to that at 222530,617 m (3.3 4.9 log m) in the pooled dataset over long distances (Figure 84l). And at 6888383,756 m (3.85.6 log m), D observed over long distances at the SS (Figure 84g) was equal to that observed at about the same range in the pooled dataset over long distances (30,617356,713 m; 4.55.6 log m) (Figure 84l). Second, D observed at specific scales was also observed at other scales, which indicates that the spatial dependence of TC was captured independently by the sampling design at different scales. This was the case between the FS over short and medium distances, the WS over medium distances, and the SS over short and medium distances. Although the distance ranges did not match perfectly, they did overlap in some regions. Our results indicate that the spatial dependence of TC depends on the scale, as different D were observed at different scales, corroborating the idea that soil properties exhibit self similarity only within certain scale ranges (Burrough, 1981), thus contradicting the widespread notion that D is scale invariant. However, equal D was also observed at certain regions among different scales. Leduc et al. (1994) calculated D for forest cover in Quebec, Canada, and also concluded that D depended on the scale of observation, specifically on the extent, grain, and direction of observation transects. In this study, we did not test the e ffect of grain, and direction, but the extent was implicitly introduced with the arrangement of nested scales. We acknowledge that these factors, as well as the relative region within the state of Florida, may influence D. Conclusions Our results indicate that the spatial dependence of TC in Florida depends on the scale (i.e., extent). However, we found ranges of self similarity in the spatial distribution of TC, where the
237 variation of TC is scale independent. The limits of these ranges were located at about 1.5 and 31 km, respectively, and identify regions where dominating landscape (e.g., soil forming) processes may shift. Up to about 1.5 km, TC variation is probably dictated by plot to field scale variability, and will more likely resemble local landscap e characteristics (e.g., land use/land cover, intensity of management, and soils). Between 1.5 km and 31 km, watershedto regional scale processes are responsible for TC variation, and might include topographic and hydrologic gradients, vegetation patterns, and regional soil erosional and depositional processes (e.g., the transport of sediments within the watershed). Beyond 30 km, basically regional scale environmental processes dominate, including those driven by climate, regional hydrology, geology, and even regional socio politico economic preferences. The spatial dependence of TC varied from weak to strong at different scales (FS, WS and SS), according to nugget/sill. However, D was larger than 2.5 at all scales, suggesting the dominance of short range variation in the distribution of TC. The main implication of large D to quantify TC in Florida is that, unless a dense dataset is available, short range TC variation may mask out longrange TC variation if traditional interpolation methods are used (e.g., point kriging, inverse distance weighting, or splines). One alternative would be to use some method of bulking or block kriging, which would account for variations over longer ranges (Burrough, 1981). However, regional studies often do not have the luxury to sample at short distances, due to the size of the study area, missing out to characterize fine medium and longrange patterns. Thus, a strength of this study was to allow to separate out different scales of TC variation (FS, WS and SS) and investigat e a pooled dataset, which was only possible because of our comprehensive dataset across multiple scales.
238 This study elucidated the influence of scale on the range, magnitude, and strength of the spatial dependence of TC, including a thorough assessment of its variogram, and fractal characteristics observed at different scales, and some discussion of their implications for modeling of TC in Florida. Pooling data across spatial scales explained short to longrange variability of TC in Florida, and provided a more comprehensive variogram for TC, representing its multi scale spatial dependence structure. Further research is needed to fully understand the complex interactions among TC, landscape factors, and scale.
239 Table 8 1. Descriptive statistics of soil tota l carbon (TC), and lntransformed TC (LnTC) at the three nested scales, and for the pooled dataset across scales Statistic Scale FS WS SS Pooled dataset TC (%) Observations 152 141 1193 1486 Mean 1.90 0.88 1.99 1.87 Std. dev iation 1.13 1.81 6.55 5.92 Minimum 0.48 0.12 0.03 0.03 Median 1.74 0.50 0.42 0.49 Maximum 8.65 17.03 54.59 54.59 Skewness 2.62 6.64 5.26 5.81 LnTC (ln%) Observations 152 141 1193 1486 Mean 0.5058 0.5822 0.6155 0.4977 Std. dev iation 0.5128 0.7062 1.1791 1.1422 Minimum 0.7268 2.0918 3.5599 3.5599 Median 0.5560 0.7024 0.8687 0.7206 Maximum 2.1580 2.8350 3.9999 3.9999 Skewness 0.20 2.09 1.71 1.41 Abbreviations: FS = field scale; N = number of observations; SS = state scale; WS = watershed scale. .
240 Table 8 2. Va riogram and fractal parameters of ln transformed soil total carbon (LnTC) over short, medium, and long distances at the three nested scales, and using the pooled dataset across scales Scale Empirical variogram Fitted variogram Fractal analysis Dist Lag options Model Nugget effect ( c0) [ (ln%)2] Partial sill ( c ) [ (ln%)2] Total sill [ (ln%)2] Range 1 (m) Nugget/ sill2 (m) Distance range 3 (m) Slope (b) R 2 D Size (m) M H 4 SE FS Short 30 22 Exp 0.01 0.24 0.25 354 2.1 179631 0.27 def 0.08 0.47 2.87 Med 66 16 Exp 0.19 0.06 0.25 598 77.2 200989 0.16 cd 0.04 0.63 2.92 Long 390 9 Exp 0.21 0.09 0.30 2905 69.4 1163074 0.10 abc 0.02 0.82 2.95 WS Short 750 9 Exp 0.14 0.36 0.50 1538 27.9 Med 1550 13 Exp 0.12 0.41 0.53 12,072 22.3 777718,633 0.47 de f 0.17 0.57 2.76 Long 2050 18 Exp 0.09 0.39 0.48 6849 19.1 SS Short 240 20 Exp 0.08 0.79 0.87 2613 8.9 4924560 0.38 def 0.10 0.46 2.81 Med 425 23 Exp 0.34 0.66 1.00 5456 33.7 1229339 0.26 de 0.03 0.78 2.87 Long 24,000 17 Exp 0.91 0.42 1.33 151,319 68.4 6888383,756 0.07 ab 0.01 0.75 2.96 Pooled dataset Short 210 30 Gaus 0.24 0.70 0.94 5560 25.9 671261 0.08 abc 0.02 0.71 2.96 12616084 0.90 g 0.07 0.88 2.55 Med 400 36 Gaus 0.26 0.70 0.96 6142 26.9 1201592 0.06 a 0.01 0.88 2.97 159214,001 0.43 f 0.06 0.64 2.79 Long 15,500 24 Exp 0.42 0.87 1.29 119,424 32.5 222530,617 0.30 e 0.01 1.00 2.85 30,617356,713 0.10 bc 0.01 0.82 2.95 Abbreviations: D = fractal dimension; Dist = distance; Exp = exponential model; FS = fie ld scale; Gaus = Gaussian model; H = codimension, i.e., the slope (b); M = number of lags; Med = medium distance; R2 = coefficient of determination of the fit in the linear section in the log log plot of the empirical variogram; SE = standard error of the slope; SS = state scale; WS = watershed scale. 1 Range of spatial autocorrelation ( a) for the Gaussian model, and effective range ( r ) for the exponential model, where the semivariance achieves 95% of the total sill ( c0 + c ), at about 3a. 2 Nugget to sill r atio, i.e., nugget/sill = c0 / ( c0 + c ). 3 Indicates the range within which D is constant, i.e., where a linear section in the log log plot of the empirical variogram is observable N o linear sections were observed in the log log plot at the WS over short, and medium distances and two linear sections were observed using the pooled dataset across scales over short, medium, and long distances, respectively. 4 Equal letters indicate statistically equal H at the 0.05 confidence level, according to the t test a mong slopes
241 Figure 81. Three nested scales within the state of Florida, with their respective sample distributions of soil total carbon (TC). Latitude/longitude coordinates correspond to the scale of each map individually
242 A B C D E F G H I Figure 82. Variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively, at the field scale (FS) (A, B, and C), watershed scale (WS) (D, E, and F), state scale (SS) (G, H, and I), and for the pooled dataset across scales (J, K, and L).
243 J K L Figure 82. Continued.
244 Figure 83. Fitted variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively, at the field scale (FS), watershed scale (WS), and state scale (S S) up to 10,000 m
245 A B C D E F G H I Figure 84. Log log plots of the variograms of ln transformed soil total carbon (LnTC) over short, medium, and long distances, respectively, at the field scale (FS) (A, B, and C), watershed scale (WS) (D, E, and F), state scale (SS) (G, H, and I), and for the pooled dataset across scales (J, K, and Ll). Slopes (b) were calculated for the linear sections in the plots, along with the corresponding coefficient of determination (R2) of the fit.
246 J K L Figure 84. Continued.
247 CHAPTER 9 SYNTHESIS AND OUTLOOK Our study indicates the potential of visible/near infrared spectroscopy (VNIRS) to estimate soil total carbon (TC), and soil organic carbon (SOC) fractions both locally within the Santa Fe River watershed (SF RW; 3585 km2; 544 observations), and at the State level across Florida (150,000 km2; 7122 observations) (Chapters 2, 3, and 4). The spectral models explained up to 86% of the variability of TC in independent validation samples in the SFRW, up to 77% in Flo rida, and up to 82% of the variability of SOC fractions. The results from VNIRS analysis indicate the preference of local over statewide models, pointing towards the need to identify ideal geographic boundaries to derive these spectral models, as well as t o test the transferability of the models across domains. Model improvement could also be achieved by including more organic samples in the database. This is especially important when considering that important stocks of TC in Florida are concentrated in or ganic soils (i.e., Histosols). Building a soil spectral library for Florida opens many possibilities to facilitate the collection and analysis of soil samples to populate soil spatial databases. Once a VNIRS model is available for Florida, which is now the case, to obtain an estimate of TC only requires measuring the reflectance spectrum of the soil, reducing the cost, time, and labor of soil analysis. There is also the possibility to estimate multiple soil properties using this spectral library, with poten tial to save even more on soil analysis. Visible/near infrared spectroscopy has also the potential for monitoring soil properties including carbon (C) to assess soil C sequestration, and support soil C auditing, marketing, and regulation as an ecosystem se rvice, with widespread implications in the future. In situ measurements of soil spectra could save the step of sampling the soil, however more research is needed to isolate the effects of field conditions, including moisture, uneven particle sizes, atmosph eric conditions, and shades.
248 Based on our findings with VNIRS, we see a great potential to combine other proximal and remote sensors to estimate multiple soil properties. To this end, simple integration of data from multiple sensors using some multivariate regression technique is possible, but also multi sensor platforms can provide data to develop complex soil inference systems with a wide range of applications varying from environmental studies to precision agriculture. To name a few examples, in the group of proximal sensors, besides VNIRS, midinfrared spectroscopy is sensitive to the spectral regions of TC and many other soil constituents, gamma radiometry is sensitive to moisture and lithologic properties, electromagnetic induction is sensitive to soil electrical properties linked to exchange capacity and salinity, and profile cone penetrometry is sensitive to soil structure, texture and other physical properties. From a remote sensing perspective, there is potential to integrate VNIRS with fine resolut ion airborne hyperspectral images to apply VNIRS models spatially, but also to derive models based solely on the hyperspectral images. Multispectral sensors, albeit more readily available (e.g. from satellites), may aggregate to much spectral information i n a single measurement (due to the wide ranges of the spectral bands) that the important reflectance features are masked out, but this is still open for research. Albeit the many benefits of VNIRS, there are still some limitations that hinder its widesprea d adoption. Because VNIRS is an indirect method, its main limitation is the need to calibrate a model for every soil population of interest within certain geographic domains, and thus it still requires measuring soil samples using conventional laboratory m ethods. Even though this is currently a drawback, our regional and statewide spectral libraries can be enhanced by incorporating additional signatures that more exhaustively represent soils in Florida (ongoing project). Critical to further advance spectral libraries is to identify ideal stratification criteria to
249 improve model accuracy (note that some of the well accepted soil analytical methods are also indirect, relying on calibration curves, such as phosphorus by colorimetry, or soil mineralization rates ; the difference being simply the level of accuracy of the calibration curves). Another limitation of VNIRS is its upfront cost, which is still high compared to some conventional methods, e.g., using chemical oxidation or ignition, although some instrument s can be more costly than the spectroradiometer (e.g. elemental analyzers). Finally, VNIRS requires expertise to derive the models and interpret the results, since the most common calibration methods are convoluted, either requiring a twostep regression ( e.g. partial least squares regression), or iterative fitting that is not transparent, which is the case of some black box data mining methods that work for large soil databases (e.g., committee trees, artificial neural networks). Upscaling of TC, and SOC fractions in the SFRW (Chapters 5, and 6) showed that about 39.3 Tg (teragrams) of C are stored in the upper 1 m of soils, of which 23.0 Tg (59%) is stored in the first 30 cm. Recalcitrant C (RC) was the dominant SOC fraction, accounting for more than 75% of TC, but significant amounts of labile C were estimated. If TC estimates were extrapolated to the state of Florida, they would amount to about 1,644 Tg of C in the upper 1 m, and about 963 Tg in the first 30 cm. Given the vast presence of wetlands and a ssociated peat soils (i.e., Histosols) in the south of the State, these estimates are very conservative, as our results also highlighted the importance of these soils to store C in hydrology controlled environments like Florida. We also found significant a mounts of C at deeper layers to 180 cm, which are not included in these numbers. And even below 180 cm we expect to find major amounts of C throughout the State of Florida. The amount of RC relative to TC suggests that the majority of the C currently avail able in the soil could potentially remain stable for decades to thousands of years.
250 We found that TC in the SFRW was controlled by land use/land cover, soil order, soil drainage class, geologic formation, and depth of measurement. In turn, the distribution of TC was reflected in that of the SOC fractions due to their strong correlation, suggesting that SOC fractions are also influenced by the same environmental factors. However, unexpectedly, not a single environmental variable was strongly related to TC, or SOC fractions. One of the consequences of this was that overall the spatial dependence (i.e., spatial autocorrelation) of TC explained its spatial variability better than the global and local trends combined (as in regression kriging). In other words, the global trend models based on soil landscape correlations did not add significant explanatory power relative to the spatial distribution of TC alone (as in lognormal kriging). This has two important implications. First, it suggests that the spatial patter ns of TC can be well characterized using only the TC data at the sampling sites. In this case, it is critical that the sampling design is representative of the main sources of variability in the region of study and accounts for fine medium and coarsesc ale TC patterns. Second, the poor performance of virtually all global trend models indicates that some (maybe the most) important environmental driver of soil C was not captured by the model. This can be explained by either not including an environmental v ariable in the list of independent variables, or because it was included in the list but had poor quality (which can be related to its original scale, or to the accuracy of the data). The latter could be the case with hydrologic properties, which were over all important variables in the models, but nonetheless only explained a small portion of the variability of soil C. Nevertheless, knowing the environmental factors that impart control on TC within a spatially explicit framework allow to manage a landscape with the aim to sequester more C and improve the quality of soils across Florida.
251 Our investigation of TC at multiple scales (Chapters 7, and 8) increased considerably our understanding of the influence of multiple scale factors on upscaling of TC, and on its spatial dependence, with direct implications for future spatial assessments of TC. In summary, grain (i.e., spatial resolution), extent (i.e., size of the study area), and geographic region affected the quality of TC upscaling models. Model accuracy de creased at coarser grains, and increased at larger extents, in both cases reflecting the variability of TC and environmental variables (the larger the variability the better the model). Our study also demonstrated the sensitivity of geographic and attribut e boundary conditions to assess TC, providing guidance for future TC assessment in Florida. Transferability of the models among grains was only reasonable when transferred to grains up to 60m. Among extents, models transferred reasonably well only between Florida (1193 observations) and the SFRW (141 observations), in both directions. The preferred direction of model transferability was from coarser to finer grains, and from smaller to larger extents, both preferring the direction towards a more detailed r epresentation of attribute variability. Finally, geographic region influenced the amount of TC, and the transferability of upscaled TC models, with some general trend of better transferability to regions with higher TC. The limited variability encompassed within the University of Florida Beef Cattle Station (BCS; 5.58 km2; 152 observations) degraded the quality of upscaled TC models, and did not represent the soil and environmental variability captured by the models derived at the other extents. This allude s to the constraints of fine scale studies (fields or plots) that capture only a small portion of the variability found in a landscape, which in turn have limited value to upscale TC models to larger regions.
252 Hydrologic properties were the most important v ariables to explain soil C spatial patterns at virtually all scales, especially soil available water capacity. Other important environmental determinants of TC varied as a function of scale, and TC models selected variables in environmental classes other t han hydrology (i.e., soil, topography, land use/land cover, and vegetation), but no preference for any of these classes was apparent among grains, or extents. Based on these findings, one recommendation to improve regional soil C models in Florida is to mo re accurately characterize hydrologic patterns that can be used as input in the models. However, we acknowledge that this may be a more difficult task than to characterize the soil C patterns in the first place. In addition, we caution that adoption of coa rse grains (> 240 m) to derive spatial models of TC does not fully account for its fine scale variability, and may smooth out the variation of TC, producing unreliable estimates compared to models derived at finer (< 60 m) grains. This is the case, for exa mple, of global change models derived at coarse scales (> 1 km). Our findings show that TC has a multi scale and multi fractal spatial dependence. For regional modeling of TC in Florida, this implies that some consideration about the extent of the study ar ea must be taken, e.g. to assure second order stationarity for kriging, and to avoid abrupt changes in TCs spatial dependence. Otherwise, if a model is envisioned across the whole State, we suggest that an explicit multi scale upscaling method (e.g. factorial kriging) be considered to account for the nested spatial variations of TC. Alternatively, our analysis using the pooled dataset across scales suggested that it is possible to characterize the multi scale dependence of TC if enough observations are ava ilable at all levels of scales. In other words, a single scale upscaling model of TC across Florida can be derived if the short medium and longrange variations of TC are well characterized in the model. One option to explore is to identify an
253 environm ental variable with strong correlation with TC that has a denser spatial distribution, and use its spatial dependence to guide, or complement the spatial dependent of TC, e.g. as in co kriging. In this study, such a variable was not available. We obser ved ranges of selfsimilarity in the spatial dependence of TC at about < 1.5 km, 1.530 km, and > 30 km, which indicate the regions within which the spatial dependence of TC is scaleinvariant. The straightforward interpretation of this is that within thes e ranges a single variogram can adequately translate the spatial autocorrelation of TC. However, this does not mean that upscaling models of TC should be constrained within these ranges. Rather, they should be considered indications of possible ranges wher e TC forming processes operate. Again, given a strong explanatory factor and/or a robust variogram, it is possible to model TC at virtually any scale. As a final observation, the ranges of spatial autocorrelation of TC varied according to the extent, but a lso reflected the lag distances used to derive the empirical variograms. Field scale spatial autocorrelation ranges varied from 354 to 2905 m, whereas State scale ranges went as far as 151 km, which can only be explained by some longrange gradient or proc ess driving TCs spatial distribution (maybe physiographic regions, or large scale hydrologic patterns). Two limitations of the investigation of scale influences on TC assessment, which also highlight research needs, are discussed in the next sentences. Fi rst, fine scale TC observations were available only at a single region (i.e., the BCS), which was not representative of the variability of TC and environmental factors (such as land use, topography, and soils) in Florida. Second, the link between observati on support (i.e., composite samples) and the spatial resolution (or grain, minimum mapping unit) of upscaled maps was not fully characterized due to sample constraints. Thus, our recommendations for future research to complement the understanding of scale influences on the spatial patterns of TC include to: (i) investigate the fine scale spatial
254 dependence of TC and the link between observation support and the resolution of upscaled maps (e.g. 30 m); (ii) investigate the influence of grain on TC models across Florida; (iii) investigate further the influence of geographic region in Florida on TC; and (iv) characterize the spatial dependence of environmental properties, and the spatial cross dependence between TC and environmental properties aiming to identify the environmental factors/processes that control the spatial dependence of TC. In essence, research is needed to address the current gaps (literally, gaps) that still exist between the isolated TC point measurements and a map with complete coverage across Florida. Given current climate change trends, it is important to quantify the potential of soils to sequester C. Our study provided estimates of TC, and SOC fractions in the SFRW, as well as the methodological framework, as a first step to advance in the direction of a statewide soil C inventory. It has important implications to guide policy related to the conservation of soil C reserves in Florida, which should aim to both promote its accretion in areas that are suitable to store longlasting stable C for ms (i.e., RC) (e.g. Histosols, and Spodosols), and avoid its loss in areas where labile C fractions are most sensitive to environmental disturbances (e.g. due to land use shifts, changes in hydrologic patters, etc.). By identifying some of the environmenta l determinants of TC, our study also facilitates to interpret the effects of socio politico economic decisions on soil C resources, with applications in urban and regional planning, ecosystem restoration (e.g. restoration of the Everglades), and decisions related to climate change. In summary, our study characterized the overall spectral and spatial signatures of soil total carbon, and four chemically active soil organic carbon fractions within a pilot study area the Santa Fe River watershed in northcentral Florida. In addition, it elucidated the influence of scaling properties on upscaling (i.e., regionalizing) models of TC across three nested scales (i.e.,
255 study areas) in Florida with escalating extents and soil and environmental variability, and also TCs spatial dependence structure at these three scales. It advanced the knowledge and science about soil C in Florida, and provided frameworks to efficiently and accurately estimate TC, and SOC fractions. Our conclusions are rooted from understanding the soillandscape relationships at multiple spatial scales providing more reliable information about the amount and spatial patterns of soil C when compared to previously available soil C maps. Thus, this research project has considerably advanced our unders tanding of soil C across different spatial scales. Information and methodology provided in this study have important implications for the development of C trading systems and markets, food and energy security, and regional assessment of soil ecosystem serv ices.
256 LIST OF REFERENCES Ahn, M.Y., Zimmerman, A.R., Comerford, N.B., Sickman, J.O., Grunwald, S., 2009. Carbon mineralization and labile organic carbon pools in the sandy soils of a north Florida watershed. Ecosystems 12, 672685. Al Abbas, A.H., Swain, P.H., Baumgardner, M.F., 1972. Relating organic matter and clay content to the multispectral radiance of soils. Soil Sci. 114, 477 485. Alvarez, R., Alvarez, C.R., 2000. Soil organic matter pools and their associations with carbon mineralization kinetics Soil Sci. Soc. Am. J. 64, 184 189. Alvarez, R., Diaz, R.A., Barberob, N., Santanatoglia, O.J., Blotta, L., 1995. Soil organic carbon, microbial biomass and CO2C production from three tillage systems. Soil Till. Res. 33, 17 28. Analytical Spectral Devices Inc. (ASD), 2003. QualitySpec Pro manual. ASD Document, 600510, Rev. 1. ASD, Boulder, CO. 10 pp. Analytical Spectral Devices Inc., 2008. Product specifications: QualitySpec Pro. Available at: http://www.asdi.com/products_specifications QSP.asp Last verified: Jan. 13, 2008. Anderson, A.N., McBratney, A.B., 1995. Soil aggregates as mass fractals. Aust. J. Soil Res. 33, 757772. Bartoli, F., Philippy, R., Doirisse, M., Niquet, S., Dubuit, M., 2006. Structure and self similarity in silty and sandy soils: the fractal approach. Eur. J. Soil Sci. 42, 167185. Batjes, N.H., 2008. Mapping soil carbon stocks of Central Africa using SOTER. Geoderma 146, 5865. Bekele, A., Hudnall, W.H., Daigle, J .J., Prudente, J.A., Wolcott, M., 2005. Scale dependent variability of soil electrical conductivity by indirect measures of soil properties. J. Terramech. 42, 339351. Bellamy, P.H., Loveland, P.J., Bradley, R.I., Lark, R.M., Kirk, G.J.D., 2005. Carbon los ses from all soils across England and Wales 19782003. Nature 437, 245248. Berry, W.D., Feldman, S., 1985. Multiple regression in practice. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07 050. Sage Publications, Beverl y Hills, CA. 95 pp. Bian, L., Walsh, S.J., 1993. Scale dependencies of vegetation and topography in a mountainous environment of Montana. Prof. Geog. 45, 1 11. Bird, N., Bartoli, F., Dexter, A.R., 1996. Water retention models for fractal soil structures. Eur. J. Soil Sci. 47, 16.
257 Bonferroni, C.E., 1936. Teoria statistica delle classi e calcolo delle probabilit. Pubb. R. Instit. Super. Sci. Econ. Comm. Firenze 8, 362. Bor vka, L., Mldkov, L., Penek, V., Drbek, O., Vat, R., 2007. Forest soil acidi fication assessment using principal component analysis and geostatistics. Geoderma 140, 374 382. Bouchard, V., Cochran, M., 2006. Wetland and carbon sequestration. In: Lal, R. (Ed.), Encyclopedia of soil science, Vol. 2. CRC Press, Boca Raton, FL, pp. 1887 1890. Bourennane, H., Salvador Blanes, S., Cornu, S., King, D., 2003. Scale of spatial dependence between chemical properties of topsoil and subsoil over a geologically contrasted area (Massif central, France). Geoderma 112, 235251. Bowers, S.A ., Hanks, R.J., 1965. Reflection of radiant energy from soils. Soil Sci. 100, 130 138. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123140. Breiman, L., 1998. Arcing c lassifiers. Ann. Stat. 26, 801 824. Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and regression trees. The Wadsworth Statistics/Probability Series. Wadsworth International Group, Belmont, CA. 358 pp. Brown, D.J., Bricklemyer, R.S., Miller, P.R., 2005. Validation requirements for diffuse reflectance soil characterization models with a case study of VNIR soil C prediction in Montana. Geoderma 129, 251267. Brown, D.J., Shepherd, K.D., Walsh, M.G., Dewayne Mays, M., Reinsch, T.G., 2006. Global soil characterization with VNIR diffuse reflectance spectroscopy. Geoderma 13 2, 273 290. Brown, M.B., Forsythe, A.B., 1974. The ANOVA and multiple comparisons for data with heterogeneous variances. Biometrics 30, 719724. Brown, R.B., Stone, E.L., Carlisle, V.W., 1990. Soils. In: Myers, R.L., Ewel, J.J. (Eds.), Ecosystems of Florid a. University of Central Florida Press, Orlando, FL, pp. 3569. Bruland, G.L., Grunwald, S., Osborne, T.Z., Reddy, K.R., Newman, S., 2006. Spatial distribution of soil properties in Water Conservation Area 3 of the Everglades. Soil Sci. Soc. Am. J. 70, 1662 1676. Burke, I.C., Yonker, C.M., Parton, W.J., Cole, C.V., Flach, K., Schimel, D.S., 1989. Texture, climate, and cultivation effects on soil organic matter content in U.S. grassland soils. Soil Sci. Soc. Am. J. 53, 800 805. Burrough, P.A., 1981. Fractal dimensions of landscapes and other environmental data. Nature 294, 240242.
258 Burrough, P.A., 1983. Multiscale sources of spatial variation in soil. I. The application of fractal concepts to nested levels of soil variation. J. Soil Sci. 34, 577597. Cambarde lla, C.A., Moorman, T.B., Novak, J.M., Parkin, T.B., Karlen, L.D., Turco, R.F., Konopka, A.E., 1994. Field scale variability of soil properties in central Iowa soils. Soil Sci. Soc. Am. J. 58, 1501 1511. CAMO Technologies Inc., 2006. The Unscrambler appendices: method references. Available at: http://www.camo.com/TheUnscrambler/Appendices/The%20Unscrambler%20Method%20 References.pdf Last verified: Apr. 15, 2006. Carbon Dioxide Information Analysis Center (CDIAC) Oak Ridge National Laboratory, 2008. Atmospheric CO2 records from sites in the SIO air sampling network. CDIAC, Oak Ridge, TN.Available at: http://cdiac.esd.ornl.gov/trends/co2/siokeel.html Last verified: Nov.15, 2008. Castrignan, A., Stelluti, M., 1999. Fractal geometry and geostatistics for describing the field variability of soil aggregation. J. Agric. Eng. Res. 73, 13 18. C ausarano, H.J., Franzluebbers, A.J., Shaw, J.N., Reeves, D.W., Raper, R.L., Woods, C.W., 2008. Soil organic carbon fractions and aggregation in the Southern Piedmont and Coastal Plain. Soil Sci. Soc. Am. J. 72, 221 230. Chang, C., Laird, D.A., 2002. Near i nfrared reflectance spectroscopic analysis of soil C and N. Soil Sci. 167, 110 116. Chang, C., Laird, D.A., Mausbach, M.J., Hurburgh, Jr., C.R., 2001. Near infrared reflectance spectroscopy principal components regression analysis of soil properties. Soi l Sci. Soc. Am. J. 65, 480490. Chen, F., Kissel, D.E., West, L.T., Adkins, W., 2000. Fieldscale mapping of surface soil organic carbon using remotely sensed imagery. Soil Sci. Soc. Am. J. 64, 746753. Chils, J.P., Delfiner, P., 1999. Geostatistics: mode ling spatial uncertainty. John Wiley and Sons, New York, NY. 695 pp. Coleman, K., Jenkinson, D.S., 1996. RothC 26.3 A model for the turnover of carbon in soil. In: Powlson, D.S., Smith, P ., Smith, J.U. (Eds.), Evaluation of soil organic matter models usi ng existing, longterm datasets. NATO ASI Series I, 38. Springer Verlag, Heidelberg, Germany, pp. 237 246. Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics 19, 1518. Corstanje, R., Schulin, R., Lark, R.M., 2007. S caledependent relationships between soil organic carbon and urease activity. Eur. J. Soil Sci. 58, 10871095.
259 Creed, I.F., Trick, C.G., Band, L.E., Morrison, I.K., 2002. Characterizing the spatial pattern of soil carbon and nitrogen pools in the Turkey La kes watershed: a comparison of regression techniques. Water Air Soil Pollut.: Focus 2, 81102. Culling, W.E.H., 1986. Highly erratic spatial variability of soilpH on Iping Common, West Sussex Cate na 13, 8198. Dalal, R.C., Henry, R.J., 1986. Simultaneous determination of moisture, organic carbon, and total nitrogen by near infrared reflectance spectrophotometry. Soil Sci. Soc. Am. J. 50, 120123. Daniel, K.W., Tripathi, N.K., Honda, K., 2003. Artif icial neural network analysis of laboratory and in situ spectra for the estimation of macronutrients in soils of Lop Buri (Thailand). Aust. J. Soil Res. 41, 4759. De Coninck, F., 1980. Major mechanisms in formation of spodic horizons. Geoderma 24, 101 128. Dunn, B.W., Beecher, H.G., Batten, G.D., Ciavarella, S., 2002. The potential of near infrared reflectance spectroscopy for soil analysis a case study from the Riverine Plain of south eastern Australia. Aust. J. Exp. Agric. 42 607 614. Dunnett, C.W., 1980. Pair wise multiple comparisons in the unequal variance case. J. Am. Stat. Assoc. 75, 796800. Earth Observatory, National Aeronautics and Space Administration, 2009. Measuring vegetation (NDVI & EVI). Available at: http://earthobservatory.nasa.gov/Features/MeasuringVegetation/measuring_vegetation_2.p hp. Last verified: Feb. 14, 2009. Echeverra, M.E., Markevitz, D. Morris, L.A., Hendrick, R.L., 2004. Soil organic matter fractions under managed pine plantations of the southeastern USA. Soil Sci. Soc. Am. J. 68, 950958. Eghball, B., Hergert, G.W., Lesoing, G.W., Ferguson, R.B., 1999. Fractal analysis of spatial and temporal variability. Geoderma 88, 349362. Ernst, W.H.O. 2004. Vegetation, organic matter and soil quality. In: Doelman, P ., Eijsackers, H.J.P., Vital soil: function, value and properties. Developments in Soil Science, 29. Elsevier Academic Press, Amsterdam, The Netherlands, pp. 41 98. Fid ncio, P.H., Poppi, R.J., Andrade, J.C., 2002a. Determination of organic matter in soils using radial basis function networks and near infrared spectroscopy. Anal. Chim. Acta 453, 125 134. Fidncio, P.H., Poppi, R.J., Andrade, J.C., Cantarella, H., 2002b. Determination of organic matter in soil using near infrared spectroscopy and partial least squares regression. Commun. Soil Sci. Plant Anal. 33, 16071615.
260 Field, C.B., Sarmiento, J., Hales, B., 2007. The carbon cycle of North America in a global context. In: King, A.W., Dilling, L., Zimmerman, G.P., Fairman, D.M., Houghton, R.A., Marland, G., Rose, A.Z., Wilbanks, T.J. (Eds), The first state of the carbon cycle report (SOCCR): the North American carbon budget and implications for the global carbon cycle. U.S. Climate Change Science Program Synthesis and Assessment Product 2.2. National Climatic Data Center, Asheville, NC, pp. 21 28. Florida Department of E nvironmental Protection (FDEP), 1997. Hydrologic cataloging units of Florida. Vector layer. Original scale: 1:24,000. FDEP, Tallahassee, FL. Florida Department of Environmental Protection (FDEP), 1998. Statewide surficial geology coverage. Vector layer. Or iginal scale: 1:100,000. FDEP, Tallahassee, FL. Florida Division of Emergency Management, 2009. El Nio/Southern Oscillation (ENSO).Available a t: http://www.floridadisaster.org/bpr /EMTOOLS/elnino/elnino.htm Last verified: Apr. 14, 2009. Florida Fish and Wildlife Conservation Commission (FFWCC), 2003a. Digital vegetation and land cover data set for Florida derived from 2003 Landsat ETM+ imagery. Raster layer. Spatial resolution: 30 m. FFWCC, Tallahassee, FL. Florida Fish and Wildlife Conservation Commission (FFWCC), 2003b. Florida vegetation and land cover data derived from 2003 Landsat ETM+ imagery. FFWCC, Tallahassee, FL. Available at: http://myfwc.com/GIS/LandCover/methods.pdf Last verified: May 15, 2009. Florida Fish and Wildlife Conservation Commission (FFWCC), 2003c. Landsat ETM+ imagery used to derive the 2003 digital vegetation and land cover data set for Florida. 14 scenes. Raster layers. Spatial resolution: 30 m. FFWCC, Tallahassee, FL. Florida Soil Characterization Database, 2009. Florida Soil Characterization Data Retrieval System. Available at: http://flsoils.ifas.ufl.e du. Last verified: Apr. 14, 2009. Florinsky, I.V., Eilers, R.G., Manning, G.R., Fuller, L.G., 2002. Prediction of soil properties by digital terrain modeling. Environ. Model. Softw. 17, 295 311. Follett, R.F., Kimble, J.M., Lal, R. (Eds.), 2000. The potential of U.S. grazing lands to sequester carbon and mitigate the greenhouse effect. CRC Press, Boca Raton, FL. 442 pp. Foody, G.M., Boyd, D.S., Cutler, M.E.J., 2003. Predictive relations of tropical forest biomass from Landsat TM data and their transferabil ity between regions. Remote Sens. Environ. 85, 463474. Fox, G.A., Sabbagh, G.J., 2002. Estimation of soil organic matter from red and near infrared remotely sensed data using a soil line euclidean distance technique. Soil Sci. Soc. Am. J. 66, 19221929. F ranzluebbers, A.J., 1999. Potential C and N mineralization and microbial biomass from intact and increasingly disturbed soils of varying texture. Soil Biol. Biochem. 31, 1083 1090.
261 Gaffey, S.J., McFadden, L.A., Nash, D., Pieters, C.M., 1993. Ultraviolet, v isible, and near infrared reflectance spectroscopy: laboratory spectra of geologic materials. In: Pieters, C.M., Englert, P.E. (Eds.), Remote geochemical analysis: elemental and mineralogical composition. Topics in Remote Sensing Series, 4. Cambridge Unive rsity Press, Cambridge, United Kingdom, pp. 4377. Galbraith, J.M., Kleinman, P.J.A., Bryant, R.B., 2003. Sources of uncertainty affecting soil organic carbon estimates in northern New York. Soil Sci. Soc. Am. J. 67, 1206 1212. Galvo, L.S., Pizarro, M.A., Epiphanio, J.C.N., 2001. Variations in reflectance of tropical soils: spectral chemical composition relationships from AVIRIS data. Remote Sens. Environ. 75, 245255. Gauch, Jr., H.G., Gene Hwang, J.T., Fick, G.W., 2003. Model evaluation by comparison of model based predictions and measured values. Agron. J. 95, 14421446. Geladi, P., Kowalski, B.R., 1986. Partial least squares regression: a tutorial. Anal. Chim. Acta 185, 1 17. Gessler, P.E., Moore, I.D., McKenzie, N.J., Ryan, P.J., 1995. Soil landscape m odeling and spatial prediction of soil attributes. Int. J. Geog. Inf. Sci. 9, 421 432. Ghani, A., Dexter, M., Perrott, K.W., 2003. Hot water extractable carbon in soils: a sensitive measurement for determining impacts of fertilisation, grazing and cultivat ion. Soil Biol. Biochem. 35, 1231 1243. Goddu, R.F ., Delker, D.A., 1960. Spectra structure correlations for the near infrared region. Anal. Chem. 32, 140141. Gregorich, E.G., Beare, M.H., Stoklas, U., St Georges, P., 2003. Biodegradability of soluble orga nic matter in maize cropping soils. Geoderma 113, 237252. Grimm, R., Behrens, T., Mrker, M., Elsenbeer, H. 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island Digital soil mapping using Random Forests analysis. Geoderma 146, 1 02 113. Grunwald, S., 2006. What do we really know about the space time continuum of soil landscapes? In: Grunwald, S. (Ed.), Environmental soil landscape modeling: geographic information technologies and pedometrics. CRC Press, Boca Raton, FL, pp. 336. G runwald, S., 2008a. Disaggregation and scientific visualization of earthscapes considering trends and spatial dependence structures. New J. Phys. 10, 125011. 15 pp. Grunwald, S., 2008b. Role of Florida soils in carbon sequestration. In: Mulkey, S., Alavala pati, J., Hodges, A., Wilkie, A.C., Grunwald, S., Opportunities for greenhouse gas reduction through forestry and agriculture in Florida. School of Natural Resources and Environment, University of Florida, Gainesville, FL, pp. 3950.
262 Grunwald, S., 2009. Multicriteria characterization of recent digital soil mapping and modeling approaches. Geoderma. In review. Grunwald, S., Reddy, K.R., 2008. Spatial behavior of phosphorus and nitrogen in a subtropical wetland. Soil Sci. Soc. Am. J. 72, 1174 1183. Grunwald, S., Harris, W.G., Comerford, N.B., Bruland, G.L., 2007. Rapid assessment and trajectory modeling of changes in soil carbon across a southeastern landscape. Core project of the North American Carbon Program. National Research Initiative, U.S. Department of Agriculture, Washington, DC. Guo, L.B., Gifford, R.M., 2002. Soil carbon stocks and land use change: a meta analysis. Global Change Biol. 8, 345360. Guo, Y., Amundson, R., Gong, P., Yu, Q., 2006a. Quantity and spatial variability of soil carbon in the co nterminous United States. Soil Sci. Soc. Am. J. 70, 590600. Guo, Y., Gong, P., Amundson, R., Yu, Q., 2006b. Analysis of factors controlling soil carbon in the conterminous United States. Soil Sci. Soc. Am. J. 70, 601612. Harris, W.G., Hollien, K.A., 2000. Changes across artificial E Bh boundaries formed under simulated fluctuating water tables. Soil Sci. Soc. Am. J. 64, 967 973. Henderson, B.L., Bui, E.N., Moran, C.J., Simon, D.A.P., 2005. Australia wide predictions of soil properties using decision trees Geoderma 124, 383398. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression kriging. Geoderma 120, 75 93. Hill, J., Schtt, B., 2000. Mapping complex patterns of erosion and stabi lity in dry Mediterranean ecosystems. Remote Sens. Environ. 75, 557569. Homann, P.S., Sollins, P., Fiorella, M., Thorson, T., Kern, J.S., 1998. Regional soil organic carbon storage estimates for western Oregon by multiple approaches. Soil Sci. Soc. Am. J. 62, 789796. Huang, C., Wylie, B., Yang, L., Homer, C., Zylstra, G., 2002. Derivation of a tasselled cap transformation based on Landsat 7 at satellite reflectance. Int. J. Remote Sens. 23, 1741 1748. Hunt, G.R., 1977. Spectral signatures of particulate m inerals in the visible and near infrared. Geophysics 42, 501513. Islam, K., Singh, B., McBratney, A., 2003. Simultaneous estimation of several soil properties by ultra violet, visible, and near infrared reflectance spectroscopy. Aust. J. Soil Res. 41, 1101 1114.
263 Jacobson, M.C., Charlson, R.J., Rodhe, H ., Orians, G.H. (Eds.), 200 0. Earth system science: from biogeochemical cycles to global change. International Geophysics Series, 72. Elsevier Academic Press, London, United Kingdom 527 pp. Jenny, H., 1941. Factors of soil formation. McGraw Hill, New York, NY. 281 pp. Jobbgy, E.G., Jackson, R.B., 2000. The vertical distribution of soil organic carbon and its relation to climate and vegetation. Ecol. Appl. 10, 423 436. Kay, B.D., 1998. Soil structure and orga nic carbon: a review. In: Lal, R., Kimble, J.M., Follett, R.F ., Stewart, B.A. (Eds.), Soil processes and the carbon cycle. CRC Press, Boca Raton, FL, pp. 169197. Keeling, C.D., Whorf, T.P., Wahlen, W., van der Plicht, J., 1995. Interannual extremes in the rate of rise of atmospheric carbon dioxide since 1980. Nature 375, 666670. Kooistra, L., Wehrens, R., Leuven, R.S.E.W., Buydens, L.M.C., 2001. Possibilities of visible near infrared spectroscopy for the assessment of soil contamination in river floodplai ns. Anal. Chim. Acta 446, 97 105. Kooistra, L., Wanders, J., Epema, G.F., Leuven, R.S.E.W., Wehrens, R., Buydens, L.M.C., 2003. The potential of field spectroscopy for the assessment of sediment properties in river floodplains. Anal. Chim. Acta 484, 189 200. Kravchenko, A.N., Robertson, G.P., 2007. Can topographical and yield data substantially improve total soil carbon mapping by regression kriging? Agron. J. 99, 12 17. Kravchenko, A.N., Boast, C.W., Bullock, D.G., 1999. Multifractal analysis of soil spati al variability. Agron. J. 91, 10331041. Kravchenko, A.N., Robertson, G.P., Hao, X., Bullock, D.G., 2006. Management practice effects on surface total carbon: differences in spatial variability patterns. Agron. J. 98, 15591568. Krishnan, P., Alexander, J. D., Butler, B.J., Hummel, J.W., 1980. Reflectance technique for predicting soil organic matter. Soil Sci. Soc. Am. J. 44, 12821285. Krogh, L., Noergaard, A., Hermansen, M., Greve, M.H., Balstroem, T., BreuningMadsen, H., 2003. Preliminary estimates of contemporary soil organic carbon stocks in Denmark using multiple datasets and four scalingup methods. Agr. Ecosyst. Environ. 96, 1928. Kubelka, P., Munk, F., 1931. Ein beitrag zur optik der farbanstriche. Technische Physik 12, 593 601. Lahoche, F., Godard C., Fourty, T., Lelandais, V., Lepoutre, D., 2003. A multi sensor approach for generating infield pedological variability maps. In: Robert, P.C. (Ed.), Proceedings of the 6th International Conference on Precision Agriculture and Other Precision Resource s Management, Minneapolis, MN, July 1417, 2002. American Society of Agronomy, Madison, WI, pp. 1038 1048.
264 Lal, R., Kimble, J ., Follett, R.F. 1998. Pedospheric processes and the carbon cycle. In: Lal, R., Kimble, J.M., Follett, R.F ., Stewart, B.A. (Eds.), Soil processes and the carbon cycle. CRC Press, Boca Raton, FL, pp. 18. Lark, R.M., 2000. Regression analysis with spatially autocorrelated error: simulation studies and application to mapping of soil organic matter. Int. J. Geogr. Inf. Sci. 14, 247264. Lark, R.M., 2005. Exploring scale dependent correlation of soil properties by nested sampling. Eur. J. Soil Sci. 56, 307 317. Leavitt, S.W., Follett, R.F., Paul, E.A., 1996. Estimation of slow and fast cycling soil organic carbon pools from 6N HCl hydrolysis. Radiocarbon 38, 231239. Leduc, A., Prairie, Y.T., Bergeron, Y., 1994. Fractal dimension estimates of a fragmented landscape: sources of variability. Landscape Ecol. 9, 279 286. Levene, H., 1960. Robust tests for equality of variances. In: Olkin, I. (Ed.), Contributions to probability and statistics: essays in honor of Harold Hotelling. Stanford University Press, Palo Alto, CA, pp. 278292. Liu, D., Wang, Z., Zhang, B., Song, K., Li, Z., Li, J., Li, F., Duan, H., 2006. Spatial distribution of soil or ganic carbon and analysis of related factors in croplands of the black soil region, northeast China. Agric. Ecosyst. Environ. 113, 73 81. Lobell, D.B., Asner, G.P., 2002. Moisture effects on soil reflectance. Soil Sci. Soc. Am. J. 66, 722727. Lpez Granad os, F., JuradoExpsito, M., Pea Barragn, J.M., GarcaTorres, L., 2005. Using geostatistical and remote sensing approaches for mapping soil properties. Eur. J. Agron. 23, 279289. Lpez Granados, F., JuradoExpsito, M., Atenciano, S., Garca Ferrer, A. de la Orden, M.S., GarcaTorres, L., 2002. Spatial variability of agricultural soil parameters in southern Spain. Plant Soil 246, 97105. Mandelbrot, B.B., 1967. How long is the coast of Britain? Statistical self similarity and fractional dimension. Sci ence 156, 636 638. Mandelbrot, B.B., 1975. Stochastic models for the Earth's relief, the shape and the fractal dimension of the coastlines, and the number area rule for islands. Proc. Nat. Acad. Sci. USA 72, 38253828. Mandelbrot, B.B., 1983. The fractal geometry of nature. W.H. Freeman, New York, NY. 468 pp. Mandelbrot, B.B., van Ness, J.W., 1968. Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10, 422437.
265 Mark, D., Aronson, P., 1984. Scale dependent fractal dimensions of topogr aphic surfaces: an empirical investigation, with applications in geomorphology and computer mapping. Math. Geol. 16, 671683. Martens, H., Ns, T., 1989. Multivariate calibration. John Wiley and Sons, Chichester, United Kingdom. 419 pp. Martin, J.G., Bolst ad, P.V., 2009. Variation of soil respiration at three spatial scales: components within measurements, intra site variation and patterns on the landscape. Soil Biol. Biochem. 41, 530543. Masserschmidt, I., Cuelbas, C.J., Poppi, R.J., Andrade, J.C., Abreu, C.A., Davanzo, C.U., 1999. Determination of organic matter in soils by FTIR/diffuse reflectance and multivariate calibration. J. Chemom. 13, 265273. McBratney, A.B., 1992. On variation, uncertainty and informatics in environmental soil management. Aust. J. Soil Res. 30, 913935. McBratney, A.B., 1998. Some considerations on methods for spatially aggregating and disaggregating soil information. Nutr. Cycl. Agroecosyst. 50, 51 62. McBratney, A.B., Pringle, M.J., 1999. Estimating average and proportional var iograms of soil properties and their potential use in precision agriculture. Precis. Agric. 1, 125152. McBratney, A.B., Mendona Santos, M.L ., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3 52. McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., D unbar, M.S., Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97, 293327. McCarty, G.W., Reeves III, J.B., Reeves, V.B., Follett, R.F., Kimble, J.M., 2002. Mid infrared and near infrared diffuse reflectance spectro scopy for soil carbon measurement. Soil Sci. Soc. Am. J. 66, 640646. McClean, C.J., Evans, I.S., 2000. Apparent fractal dimensions from continental scale digital elevation models using variogram methods. Trans. GIS 4, 361378. McClure, W.F., 2003. 204 yea rs of near infrared technology: 18002003. J. Near Infrared Spec. 11, 487518. McGrath, D., Zhang, C., 2003. Spatial distribution of soil organic carbon concentrations in grassland of Ireland. Appl. Geochem. 18, 16291639. McKenzie, N.J., Austin, M.P., 1993. A quantitative Australian approach to medium and small scale surveys based on soil stratigraphy and environmental correlation. Geoderma 57, 329 355.
266 McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma 89, 6794. McLauchlan, K.K., Hobbie, S.E., 2004. Comparison of labile organic matter fractionation techniques. Soil Sci. Soc. Am. J. 68, 1616 1625. Meentemeyer, V., Box, E.O., 1987. Scale effects in landscape studies. In: Turner, M.G. (Ed.), Land scape heterogeneity and disturbance. Ecological Studies, 64. Springer Verlag, New York, NY, pp. 1534. Meyer, J.H., 1989. Rapid simultaneous rating of soil texture, organic matter, total nitrogen and nitrogen mineralization potential by near infrared refle ctance. S. Afr. J. Plant Soil 6, 59 63. Minasny, B., McBratney, A.B., Mendona Santos, M.L., Odeh, I.O.A ., Guyon, B., 2006. Prediction and digital mapping of soil carbon storage in the Lower Namoi Valley. Aust. J. Soil. Res. 44, 233244. Moore, I.D., Gessl er, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain analysis. Soil Sci. Soc. Am. J. 57, 443 452. Mueller, T.G., Pierce, F.J., 2003. Soil carbon maps: enhancing spatial estimates with simple terrain attributes at multiple scales. Soil Sci. Soc. Am. J. 67, 258267. Myers, D.B., Vasques, G.M., Grunwald, S., 2009. Pedotransfer functions to derive soil organic carbon for southeastern soils. In preparation. Myers, J.L., Well, A.D., 2003. Research design and statistical analysis. 2nd edition. Lawrence Erlbaum Associates, Mahwah, NJ. 760 pp. Mykr, H., Aroviita, J., Kotanen, J., Hmlinen, H., Muotka, T., 2008. Predicting the stream macroinvertebrate fauna across regional scales: influence of geographical extent on model performan ce. J. N. Am. Benthol. Soc. 27, 705 716. Ns, T., Irgens, C., Martens H., 1986. Comparison of linear statistical methods for calibration of NIR instruments. Appl. Stat. 35, 195206. National Climatic Data Center (NCDC), National Oceanic and Atmospheric Adm inistration, 2008. Monthly surface data. NCDC, Asheville, NC. Available at: http://www.ncdc.noaa.gov/oa/ncdc.html Last verified: Feb. 26, 2008. Natural Resources Conservation Service, U.S. Department o f Agriculture (USDA), 1996. Soil survey laboratory methods manual. Version 3.0. Soil Survey Investigations Report, 42. USDA, Washington, DC. 693 pp. Natural Resources Conservation Service, U.S. Department of Agriculture (USDA), 1999. Soil taxonomy: a basic system of soil classification for making and interpreting soil surveys. 2nd edition. Agriculture Handbook, 436. USDA, Washington, DC. 754 pp.
267 Natural Resources Conservation Service (NRCS), U.S. Department of Agriculture, 2006. State Soil Geographic (STATS GO) database. Vector layer. Original scale: 1:250,000. NRCS, Fort Worth, TX. Available at: http://soildatamart.nrcs.usda.gov Last verified: Nov. 21, 2006. Natural Resources Conservation Service (NRCS), U.S Department of Agriculture, 2009. Soil Survey Geographic (SSURGO) database. Vector layer. Original scale: 1:24,000. NRCS, Fort Worth, TX. Available at: http://soildatamart.nrcs.usda.gov Last verified: Jan 21, 2009. Norris, K.H., Williams, P.C., 1984. Optimization of mathematical treatments of raw near infrared signal in the measurement of protein in hard Red Spring wheat, I: influence of particle size. Cereal Chem. 62, 158 165. ONeill, R.V., DeAngelis, D .L., Waide, J.B., Allen, G.E., 1986. A hierarchical concept of ecosystems. Monographs in Population Biology, 23. Princeton University Press, Princeton, NJ. 253 pp. Obeysekera, J., Rutchey, K., 1997. Selection of scale for Everglades landscape models. Lands cape Ecol. 12, 7 18. Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1994. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma 63, 197214. Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1995. Further results on prediction of soil properties from terrain attributes: heterotropic cokriging and regressionkriging. Geoderma 67, 215226. Olson, J.S., Watts, J.A., Allison, L.J., 1985. Major world ecosystem complexes ranked by carbon in live v egetation: a database. Carbon Dioxide Information Analysis Center (CDIAC), NDP 017. CDIAC, Oak Ridge National Laboratory, Oak Ridge, TN. Available at: http://cdiac.ornl.gov/ndps/ndp017.html Last veri fied: Feb. 14, 2009. Palacios Orueta, A., Ustin, S.L., 1998. Remote sensing of soil properties in the Santa Monica Mountains I. Spectral analysis. Remote Sens. Environ. 65, 170 183. Parton, W.J., Anderson, D.W., Cole, C.V., Stewart, J.W.B., 1983. Simulati on of soil organic matter formation and mineralization in semiarid agroecosystems. In: Lowrance, R.R., Todd, R.L., Asmussen, L.E., Leonard, R.A. (Eds.), Nutrient cycling in agricultural ecosystems. Special Publication, 23. University of Georgia Press, Athe ns, GA, pp. 533 550. Paul, E.A., Morris, S.J., Bohm, S., 2001. Determination of soil C pool sizes and turnover rates: biophysical fractionation and tracers. In: Lal, R., Kimble, J.M., Follet, R.F., Stewart, B.A. (Eds.), Assessment methods for soil carbon. CRC Press, Boca Raton, FL, pp. 193 206.
268 Paul, E. A., Morris, S.J., Conant, R.T., Plante, A.F., 2006. Does the acid hydrolysis incubation method measure meaningful soil organic carbon pools? Soil Sci. Soc. Am. J. 70, 1023 1035. Pentland, A.P., 1984. Fractal based description of natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. 6, 661674. Perfect, E., Kay, B.D., 1991. Fractal theory applied to soil aggregation. Soil Sci. Soc. Am. J. 55, 1552 1558. Perrier, E., Bird, N., Rieu, M., 1999. Generalizing the fractal model of soil structure: the pore solid fractal approach. Geoderma 88, 137164. Post, W.M., Kwon, K.C., 2000. Soil carbon sequestration and landuse change: processes and potential. Global Change Biol. 6, 317 327. Post, W.M., Emanuel, W.R., Zinke, P.J., Stangenberger, A.G., 1982. Soil carbon pools and world life zones. Nature 298, 156 159. Powers, J.S., Schlesinger, W.H., 2002. Relationships among soil carbon distributions and biophysical factors at nested spatial scales in rain forests of northeas tern Costa Rica. Geoderma 109, 165190. Pringle, M.J., Lark, R.M., 2007. Scale and locationdependent correlations of soil strength and the yield of wheat. Soil Till. Res. 95, 47 60. Quideau, S.A., 2006. Organic matter accumulation. In: Lal, R. (Ed.), Enc yclopedia of soil science, Vol. 2. CRC Press, Boca Raton, FL, pp. 1172 1175. Randazzo, A.F., Jones, D.S., 1997. The geology of Florida. University Press of Florida, Gainesville, FL. 327 pp. Reeves III, J.B ., McCarty, G.W., Mimmo, T., 2002. The potential of diffuse reflectance spectroscopy for the determination of carbon inventories in soils. Environ. Pollut. 116, S277 S284. Reeves III, J.B., McCarty, G.W. Reeves, V.B., 2001. Midinfrared diffuse reflectance spectroscopy for the quantitative analysis of agr icultural soils. J. Food Chem. 49, 766 772. Rice, C.W., 2006. Organic matter and nutrient dynamics. In: Lal, R. (Ed.), Encyclopedia of soil science, Vol. 2. CRC Press, Boca Raton, FL, pp. 1180 1183. Rivero, R.G., Grunwald, S., Bruland, G.L., 2007. Incorpor ation of spectral data into multivariate geostatistical models to map soil phosphorus variability in a Florida wetland. Geoderma 140, 428443.
269 Robson, A., Phinn, S., Wright, G., Fox, G., 2004. Combining near infrared spectroscopy and infrared aerial imager y for assessment of peanut crop maturity and aflatoxin risk. In: Fischer, T., Turner, N., Angus, J., McIntyre, L., Robertson, M., Borrell, A., Lloyd, D. (Eds.), New directions for a diverse planet: proceedings o f the 4th International Crop Science Congress Brisbane, Australia, Sep. 16Oct. 1, 2004. The Regional Institute, Gosford, Australia. Available at: http://www.cropscience.org.au/icsc2004. Last verified: Jan. 26, 2009. Ryan, P.J., McKenzie, N.J., OConnell, D., Loughhead, A.N., Leppert, P.M., Jacquier, D ., Ashton, L., 2000. Integrating forest soils information across scales: spatial prediction of soil properties under Australian forests. Forest Ecol. Manag. 138, 139 157. Sarkhot, D.V., Comerford, N .B., Jokela, E.J., Reeves III, J.B., 2007a. Effects of forest management intensity on soil carbon and nitrogen in different soil size fractions of a North Florida Spodosol. Plant Soil 294, 291303. Sarkhot, D.V., Comerford, N.B., Jokela, E.J., Reeves III, J.B., Harris, W.G., 2007b. Aggregation and aggregate carbon in a forested Southeastern Coastal Plain Spodosol. Soil Sci. Soc. Am. J. 71, 17791787. Savitzky, A., Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627 1639. Shan, J., Morris, L.A., Hendrick, R.L., 2001. The effects of management on soil and plant carbon sequestration in slash pine plantations. J. Appl. Ecol. 38, 932941. Shepherd, K.D., Walsh, M.G., 2002. Development of reflecta nce spectral libraries for characterization of soil properties. Soil Sci. Soc. Am. J. 66, 988 998. Siesler, H.W., Ozaki, Y., Kawata, S., Heise, H.M. (Eds.), 2002. Near infrared spectroscopy: principles, instruments, applications. Wiley VCH, Weinheim, Germany. 348 pp. Silveira, M.L., Comerford, N.B., Reddy, K.R., Cooper, W.T., El Rifai, H. 2008. Characterization of soil organic carbon pools by acid hydrolysis. Geoderma 144, 405 414. Simbahan, G.C., Dobermann, A., Goovaerts, P., Ping, J., Haddix, M.L., 2006 Fine resolution mapping of soil organic carbon based on multivariate secondary data. Geoderma 132, 471 489. Six, J., Callewaert, P., Lenders, S., De Gryze, S., Morris, S.J., Gregorich, E. G., Paul, E. A., Paustian, K., 2002. Measuring and understanding c arbon storage in afforested soils by physical fractionation. Soil Sci. Soc. Am. J. 66, 1981 1987. Smith, J.M ., Heath, L.S., 2004. Carbon stocks and projections on public forestlands in the United States, 19522040. Environ. Manag. 33, 433 442.
270 Sohi, S.P., Mahieu, N., Arah, J.R.M., Powlson, D.S., Madari, B., Gaunt, J.L., 2001. A procedure for isolating soil organic matter fractions suitable for modeling. Soil Sci. Soc. Am. J. 65, 1121 1128. Sparling, G., Vojvodi Vukovi M., Schipper, L.A., 1998. Hot water soluble C as a simple measure of labile soil organic matter: the relationship with microbial biomass C. Soil Biol. Biochem. 30, 1469 1472. Stark, E.W., 1988. Calibration methods for NIRS a nalysis. In: Creaser, C.S., Davies, A.M.C. (Eds.), Analytical applications of spectroscopy. Royal Society of Chemistry, London, United Kingdom, pp. 21 34. Steinberg, D., Colla, P., 1997. CART: tree structured nonparametric data analysis. Salford Systems, San Diego, CA. 342 pp. Stone, E.L., Harris, W.G., Brown, R.B., Kuehl, R.J., 1993. Carbon storage in Florida Spodosols. Soil Sci. Soc. Am. J. 57, 179 182. Su, Y.Z., Zhao, H.L., Zhao, W.Z., Zhang, T.H., 2004. Fractal features of soil particle size distribution and the implication for indicating desertification. Geoderma 122, 43 49. Tan, Z.X., Lal, R., Smeck, N.E., Calhoun F.G., 2004. Relationships between surface soil organic carbon pool and site variables. Geoderma 121, 187195. Terra, J.A., Shaw, J.N., Reev es, D.W., Raper, R.L., van Santen, E., Mask, P.L., 2004. Soil carbon relationships with terrain attributes, electrical conductivity, and soil survey in a Coastal Plain landscape. Soil Sci. 169, 819 831. Torrent, J., Schwertmann, U., Fechter, H., Alferez, F ., 1983. Quantitative relationships between soil color and hematite content. Soil Sci. 136, 354358. Townsend, C.R., Doldec, S., Norris, R., Peacock, K ., Arbuckle, C., 2003. The influence of scale and geography on relationships between stream community composition and landscape variables: description and prediction. Freshwater Biol. 48, 768 785. Tukey, J. W., 1953. The problem of multiple comparisons. Unpublished notes. Princeton University, Princeton, NJ. 396 pp. Turner, M.G., ONeill, R.V., Gardner, R.H. Milne, B.T., 1989. Effects of changing spatial scale on the analysis of landscape pattern. Landscape Ecol. 3, 153162. Tyler, S.W., Wheatcraft, S.W., 1992. Fractal scaling of soil particle size distributions: analysis and limitations. Soil Sci. Soc. Am. J. 56, 362 369. Udelhoven, T., Emmerling, C., Jarmer, T., 2003. Quantitative analysis of soil chemical properties with diffuse reflectance spectrometry and partial least square regression: a feasibility study. Plant Soil 251, 319 329.
271 United States Geologi cal Survey (USGS), 1984. Digital Elevation Model (DEM) of Florida. Raster layer. Original Scale: 1:250,000. Spatial resolution: 90 m. USGS, Reston, VA. United States Geological Survey (USGS), 1999. National Elevation Dataset (NED). Raster layer. Original S cale: 1:24,000. Spatial resolution: 30 m. USGS, Sioux Falls, SD. Vgen, T., Shepherd, K.D., Walsh, M.G., 2006. Sensing landscape level change in soil fertility following deforestation and conversion in the highlands of Madagascar using Vis NIR spectroscopy Geoderma 133, 281 294. van Meirvenne, M., Pannier, J., Hofman, G., Louwagie, G., 1996. Regional characterization of the longterm change in soil organic carbon under intensive agriculture. Soil Use Manag. 12, 86 94. Vasques, G.M., Grunwald, S., Sickman, J.O., 2008. Comparison of multivariate methods for inferential modeling of soil carbon using visible/near infrared spectra. Geoderma 146, 14 25. Vasques, G.M., Grunwald, S., Sickman, J.O., 2009. Modeling of soil organic carbon fractions using visible near infrared spectroscopy. Soil Sci. Soc. Am. J. 73, 176184. Viscarra Rossel, R.A., Walvoort, D.J.J., McBratney, A.B., Janik, L.J ., Skjemstad, J.O. 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 5975. von Ltzow, M., Kgel Knabner, I., Ekschmitt, K., Flessa, H., Guggenberger, G., Matzner, E., Marschner, B., 2007. SOM fractionation methods: relevance to functional pools and to stabilization mechanism s. Soil Biol. Biochem. 39, 2183 2207. Walkley, A., Black, I.A., 1934. An examination of the Degtjareff method for determining soil organic matter and a proposed modification of the chromic acid titration method. Soil Science 37, 29 38. Wang, H., Cornell, J .D., Hall, C.A.S., Marley, D.P., 2002 a Spatial and seasonal dynamics of surface soil carbon in the Luquillo Experimental Forest, Puerto Rico. Ecol. Model. 147, 105122. Wang, H., Hall, C.A.S., Cornell, J.D., Hall, M.H.P., 2002b. Spatial dependence and the relationship of soil organic carbon and soil moisture in the Luquillo Experimental Forest, Puerto Rico. Landscape Ecol. 17, 671684. Webster, R., Oliver, M.A., 2001. Geostatistics for environmental scientists. John Wiley and Sons, Chichester, United Kingdom. 271 pp. Welch, B.L., 1951. On the comparison of several mean values: an alternative approach. Biometrika 38, 330 336.
272 Williams, P.C., 1987. Variables affecting near infrared reflectance spectroscopic analysis. In: Williams, P. Norris, K. (Eds.). Near infrared technology in the agricultural and food industries. American Association of Cereal Chemists, St. Paul, MN, pp. 143167. Woodcock, C.E., Strahler, A.H., 1987. The factor of scale in remote sensing. Remote Sens. Environ. 21, 311332. Wright, A.L., W ang, Y ., Reddy, K.R., 2008. Loss on i gnition method to a ssess soil o rga nic carbon in calcareous Everglades w etlands. Commun. Soil Sci. Plan. 39, 3074 3083. Zhang, C ., McGrath, D. 2004. Geostatistical and GIS analyses on soil organic carbon concentrations in grassland of southeastern Ireland from two different periods. Geoderma 119, 261 275. Zimmermann, M., Leifeld, J., Schmidt, M.W.I., Smith, P., Fuhrer, J., 2007. Measured soil organic matter fractions can be related to pools in the RothC model. Eur. J. Soil Sci. 58 658667.
273 BIOGRAPHICAL SKETCH Gustavo de Mattos Vasques was born in Niteri, Rio de Janeiro, Brazil. He got his Bachelor of Science degree with honors in f orestry e ngineering in 2005 at the Federal University of Viosa, Minas Gerais, Brazil. D uring his undergraduate program, Gustavo studied for two semester s at the University of Florida as a non degree seeking student in an exchange program between the two universities. During this time, he was introduced to Geographic Information Systems and s oillandscape modeling, which he immediately fell in love w ith. Gustavos scientific interests include soil science, forestry, ecology, geography, and interdisciplinary environmental sciences. He enjoys the contact with nature, traveling, camping, hiking, listening to music, dancing, and to talk about scientific and philosophic topics and life in general Gustavo would like to continue doing research and teaching, and hopes to contribute to make the world a better place.