IMPROVING PLANT BREEDING EFFICIENCY WITH QUANTITATIVE METHODOLOGIES By LIN XING A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2017
2017 Lin Xing
To my parents for their support and love
4 ACKNOWLEDGMENTS When writing this note of thanks and wrap up on my dissertation, it is excited to know that I am very close to the finish of my Ph.D. study. It has been a period of intense learning for me, not only in the scientific arena, but al so on a personal understanding on life. I would like to reflect on the people who have supported and helped me so much throughout this period. I would first like to thank my advisors, Dr. Patricio Munoz and Dr. Kevin Kenworthy for their strong guidance on the research directions and creation of excellent opportunities for me to conduct a very interesting dissertation. I also would like to express the deepest appreciation to my committee member Dr. Salvador Gezan, who always is helpful and set aside time for prompt communications. Without his guidance and persistent help this dissertation would not have been possible. Special thanks to Dr. Schwartz for helping me improve considerably my understanding on the biology and statistics related to turfgrass studies. In addition, a very important thank you to Dr. Md Ali Babar, who has provided excellent insight regarding the applications of sensor technology on crop biology. Nobody has been more important to me in the pursuit of this degree than the members of my fami ly. I would like to thank my parents, whose love is always with me in whatever I pursue. Most importantly, I wish to thank my loving girlfriend, Yu, who spend countless late night staying up with me to provide unending inspirations. I am very grateful to all of those with who I have had the pleasure to work during this and other related projects. Thanks to my friends and colleagues, especially Yolanda Lopez, Luis Inostroza for their support and friendship.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ ............... 4 LIST OF TABLES ................................ ................................ ................................ ........................... 7 LIST OF FIGURES ................................ ................................ ................................ ......................... 8 LIST OF ABBREV IATIONS ................................ ................................ ................................ .......... 9 ABSTRACT ................................ ................................ ................................ ................................ ... 10 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .................. 12 2 IMPROVED GENETIC PARAMETER ESTIMATIONS IN ZOYSIAGRASS BY IMPLEMENTING POST HOC B LOCKING ................................ ................................ ........ 23 Background ................................ ................................ ................................ ............................. 23 Materials and Methods ................................ ................................ ................................ ........... 23 Experiment ................................ ................................ ................................ ...................... 23 Statistical Analysis ................................ ................................ ................................ .......... 24 Results ................................ ................................ ................................ ................................ ..... 27 Discussion ................................ ................................ ................................ ............................... 29 3 STATISTICAL ANALYSES OF MULTIPLE RESPONSE MEASUREMENTS TO UNDERSTAND IMPACT OF DROUGHT AND GROWTH IN ZOYSIAGRASS ............ 37 Background ................................ ................................ ................................ ............................. 37 Materials and Meth ods ................................ ................................ ................................ ........... 37 Experimental Data ................................ ................................ ................................ ........... 37 Statistical Analysis ................................ ................................ ................................ .......... 39 Results ................................ ................................ ................................ ................................ ..... 41 Discussion ................................ ................................ ................................ ............................... 45 4 IMPROVING PREDICTABILITY OF MULTI SENSOR DATA WITH NONLINEAR STATISTICAL METHODOLOGIES ................................ ................................ .................... 56 Background ................................ ................................ ................................ ............................. 56 Materials and Methods ................................ ................................ ................................ ........... 56 Experiment Description ................................ ................................ ................................ ... 56 Measurements ................................ ................................ ................................ .................. 57 Feature Engineering ................................ ................................ ................................ ......... 58 Criteria of Model Performance ................................ ................................ ........................ 58 Evaluation of the Prediction Methodologies ................................ ................................ ... 59
6 Evaluation of Predictor Variables ................................ ................................ ................... 61 Results ................................ ................................ ................................ ................................ ..... 62 Discussion ................................ ................................ ................................ ............................... 64 5 CONCLUSIONS ................................ ................................ ................................ .................... 73 APPENDIX A VARIANCE COVARIANCE MATRIX OF GXE AND THE TEMPERATURE INFORMATION AT VARIOUS LOCATIONS OF CHAPTER 3 ................................ ....... 76 B SUMMARY OF GXE INTERACTION AND BROAD SENSE HERITABILITY CONSIDERING ALL LOCATIONS WITHIN SERIES ................................ ....................... 78 C PREDICTABILITY AND STANDARD ERROR FOR CHAPTER 4 MODELS ................ 82 LIST OF REFERENCES ................................ ................................ ................................ ............... 85 BIOGRAPHICAL SKETCH ................................ ................................ ................................ ......... 95
7 LIST OF TABLES Table page 2 1 Experimental site information for the five zoysiagrass trials, average turf quality scores (TQ, scale 1 to 9), and number of measurements ................................ ................... 33 2 2 Post hoc blocking designs IB and R C, at the site level, including all repeated measurements.. ................................ ................................ ................................ ................... 34 2 3 Calculated genetic gains (%) from selecting overall top 10, 15, and 20% of parental genotypes. ................................ ................................ ................................ .......................... 34 3 1 Summary of five response variables in each of the seven trails including all series.. ....... 49 3 2 Summary of broad sense heritability estimates for different calculated response variables in different trials for each series. ................................ ................................ ........ 50 A 1 The variance covariance matrix reflects the covariance (lower diagonal), variance (diagonal), and correlation (upper diagonal) of seven sites. ................................ .............. 76 A 2 The average precipitation and temperature of selected sites within given months.. .......... 77 B 1 Site to site Type B genetic correlations for different response variables in various series. Values in parenthesis correspond to the standard error of the estimates. ............... 78 B 2 Summary of broad sense heritability of calculated response variables considering all trials within series. ................................ ................................ ................................ ............. 79 B 3 Bivariate analysis of TQD against TQND and TQG against TQNG traits with all the data from sites within years. ................................ ................................ .............................. 80 B 4 The correlation matrix obtained by modeling data from seven trails in series 2011 with CORGH variance covariance structure. ................................ ................................ .... 81 C 1 Prediction performance of target harvest date in dry matter yield trait with sensor data measurements using statistical models built with all pr evious h arvest data .............. 82 C 2 Use of dataset from one harvest to predict an other in dry matter yield trait ..................... 82 C 3 Predictability of agronomical important traits with applications of statistical methodologies, PLS regression, ridge regression, SVM, and RF.. ................................ .... 83
8 LIST OF FIGURES Figure page 2 1 Geographical location of the five experimental sites within Florida ................................ 35 2 2 Likelihood ratio test (LRT) comparison.. ................................ ................................ .......... 35 2 3 Estimation of narrow sense heritability h 2 for the ana lysis of each of the five sites ....... 36 2 4 Changes of rankings of top 10 genotypes at different sites based on all available measurements per site ................................ ................................ ................................ ........ 36 3 1 Distribution of measurements at different trails. The x axis marks the measurement month and the y axis shows the corresponding trails.. ................................ ...................... 52 3 2 Site to site Type B genetic correlations, r B for different calculated response variables by series. The whisker bars represent the standard errors of the mean. ............. 53 3 3 Summary of broad sense heritability estimates, H 2 of calculated response variables considering data from all trails within a series.. ................................ ................................ 53 3 4 Trait to trait Type A genetic correlations, r A for each trail based on the bivariate analysis of TQD against TQND with data from all trails within series.. ........................... 54 3 5 Percentage of top genotype matching from series 2011 to 2014 with the selection intensities of 10%, 20% and 30%. ................................ ................................ ..................... 55 4 1 An example of nested cross validation with outer and inner loops both set as 5 fold. ..... 68 4 2 Predictability, correlation between predicted and test dataset values, of agronomically important traits. ................................ ................................ .......................... 69 4 3 Predicted vs. observed values of dry matter yield and total digestible nutrient traits.. ..... 70 4 4 The agriculturally important traits and their corresponding normalized root mean square error (NRMSE) with applications of statistical methodologies ............................. 71 4 5 The model performance in dry matter yield after removing each of the variables in sequence from the dataset.. ................................ ................................ ................................ 72
9 LIST OF ABBREVIATIONS AIC Akaike information criteria BIC Bayesian information criteria IB Incomplete block PLS Partial least square RC Row column RCB Randomize complete block RF Random forest TQ The score rating of turf quality trait TQD The turf quality rated under drought conditions TQG The turf quality rated under growth period TQND The turf quality rated under non drought conditions TQNG The turf quality rated under non growth period
10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPROVING PLANT BREEDING EFFICIENCY WITH QUANTITATIVE METHODOLOGIES By Lin Xing December 2017 Chair: Kevin Kenworthy Cochair: Patricio Munoz Major: Agronomy By 2050, the food demand is expected to increase at least 59% worldwide, which has imposed a tremendous challenge to current agricultural infrastructures and practices. Besides utilization of improved pesticides and better understanding of field management, breeding crop cultivars efficiently becomes a key point of achieving the goal, feeding the world. Plant breeding ha s been described from its origin as the art and science of the selection process. Today the statement still hold true, however, with the evolution of the complex data, the improvement of analytical methodologies become s an important component of plant bree ding which can make the decision making process more efficient. T h e goal of this dissertation is to test different statistical methodologies to improve analytical challenges related to plant breeding. Due to various field conditions, the environmental vari ation undoubtedly affects the performance of experimental units, which makes unbiasedly evaluating genotype performance very challenging. To alleviate the environmental impact, post hoc blocking was tested which improve d genetic parameter estimations by c onsidering the heterogeneous environmental variation and showed to help with decision making in plant breeding. T his study also use d zoysiagrass data as model crop to evaluate the impact of drought conditions, being more
11 prevalent in the past years, on th e estimations of broad sense heritability and genotype by environment interaction. In addition to the improved estimation and understanding about the drought, being able to collect phenotypic data in a fast manner will boost the breeding efficiency. Hence, this study focused on evaluation of statistical methodologies to improve the prediction performance of multi sensor system. With competitive prediction accuracy, the current implemented sensor system could be a more lucrative alternative to traditional da ta harvest. In summary, this study indicates th at appropriate statistical methodologies could be very helpful to enhance plant breeding efficiency and has showed potential solutions to existing issues known to breeding programs.
12 CHAPTER 1 INTRODUCTION Plant breeding has been described from its origin as the art and science of the selection process. However with the evolution of complex data collection and improv ed analytical methodologies, the process of decision making relies more on scie ntific information T h e goal of this dissertation is to test different statistical methodologies to improve analytical challenges related to plant breeding Randomized complete block (RCB) designs are widely used in plant science because of their simplicit y to design, analyze (Clewer and Scarisbrick, 2013) and compare treatments, including different genotypes (Anderson and McLean 1974, Montgomery et al. 1984) For the RCB design to be effective, the environmental variation within a given block should be relatively small. However, this is usually not the case when numerous (hundreds or thousands) genotypes are tested in a single breeding experiment (Montgomery et al. 1984) The block size increases as the number of genotypes tested increase control for environmental variation. Several alternatives exist for this challenge at both the design and analysis stages. At the design stage, more efficient experimental designs that consider so me control of spatial variation, can be implemented, such as incomplete block (IB) designs and row column (R C) designs (Welham et al. 2014) which use smaller and incomplete strata to group genotypes. At the analysis stage, ther e are statistical alternatives such as spatial analysis (Gilmour et al. 1997) and post hoc blocking that can provide relevant improvements in the estimation and prediction of parameters (Gezan et al. 2006) Post hoc blocking is a tool that superimposes a new experimental layout onto the original design and then the data are analyzed assuming this new design (Patterson and Hunter 1983) This method was initially proposed as a low co st alternative to evaluate new potential
13 experimental designs without the necessity of establish the experiments in the field (Patterson and Hunter 1983, Gezan et al. 2006) For example, implementation of resolvable IB on top of an RCB design increases the homogeneity within the smaller inc omplete blocking structure. In other cases, if a two trend gradient environmental variation exist, then a R C design can be superimposed to increase uniformity of the experiment by considering two way spatial blocking (John and Eccleston 1986) The increase in efficiency at controlling experimental errors for both the IB and R C designs over the original RCB designs has been previously presented using experimental field data in wheat, Triticum spp. (Qiao et al. 2000) and with simulated data (Fu et al. 1998, Fu et al. 1999, Qiao et al. 2000, Gezan et al. 2006, Kravchenko et al. 2006) In breeding programs, often many genotypes are screened using RCB designs. By evaluation of the application of post hoc block ing methods on already established experiments, we expect to improve the estimation accuracy of genetic effects; and therefore, increase the reliability of information when making selections, that, in turn, will result in greater genetic gains. Moreover, t he evaluation of post hoc blocking may provide evidence of more efficient experimental designs, which can be implemented in future. Zoysiagrass ( Zoysia spp. ) is an important warm season turfgrass species adapted for states in humid, warm, transitional zone s and into the Midwest and Northeastern United States. Two species are important to the turfgrass industry, Z. japonica Steud. and Z. matrella (L.) Merr. (Brede and Sun, 1995). Both species are tetraploids, 2n = 4x = 40. Phenotypically they can be differen tiated by leaf texture, where Z. japonica typically has leaves that range from medium to coarse leaf texture, and leaves of Z. matrella are fine textured (Forbes, 1952) Because zoysiagrass is a tetraploid and the species can hybridize, breeding programs typically observe significant phenotypic variation. Variat ion in zoysiagrass has been reported to occur for turf
14 quality, seed head density (Schwartz et al. 2009) DNA content (Schwartz et al., 2010 a ) s hade (Morton et al. 1991) salinity (Marcum et al. 1998; Qian et al. 2000) drought ( Marcum et al. 1995; White et al., 2001) tempe rature adaptations (Patton and Reicher, 2007) diseas es (Green et al. 1994) nematodes (Busey et al. 1982; Schwartz et al., 2010 b ) insects (Braman et al., 2000; Reiner t and Engelke, 1992) and fusillade herbicide resistance (Leon et al. 2014) With the extensive presence of genetic variation within this genus and the development of several breeding programs across the US, it is important to determine the most efficient analytical tools that provide best accuracy of estimation of genetic parameters. Therefore, in the second chapter of this study, the post hoc blocking will be tested to check if it could improve the accuracy of genetic parameter estimations. To further improve the selection efficiency in cultivar development and provide critical information regarding its performance in various locations, establishment of experiments at multiple sites to learn about the heritability, breeding values of genotypes and related genotype by environment (GxE) interaction become more and more important. The concept of interaction between genotype and environment was first raised (Haldane, 1946) he environment would have an impact on the phenotypic features of creatures. Later, in Allard and Bradshaw (1964), the genotype by environment (GxE) interactio n concept was revisited again and emphasize d in context of plant breeding. Moreover, this concept has been reviewed many times in various disciplines, such as plant science (Eberhart and Russell, 1966; Hoeck et al. 2000) animal behavior (Bergeman and Plomin, 1 989; Plomin and Hershberger, 1991; Plomin et al. 1977) and plant breeding (Allard and Bradshaw, 1964; Annicchia rico, 2002; Becker and Leon, 1988)
15 The concept of GxE interaction has a critical role in cultivar development, specifically in evaluating the pheno typic stability in crop breeding (Kang, 1997) In the scenario of plant breeding, the genotype usually refers to a cultivar or tested genotypes (i.e. with genetically homogeneous mater ials or heterogeneous) and the term environments relate to the climate, soil, pest pressure at a given location of the year. The GxE interaction is reflected in the performance of genotypes at various locations and impact differing traits such as extreme t emperatures and water shortage (Billings, 1987; Hoekstra et al., 2001) Specifically, some genotypes may thrive in particular regions, whereas, in other regions they may perform poorly (Finlay and W ilkinson, 1963; Lin and Binns, 1988; Otoo and Asiedu, 2006) In plant breeding, multiple locations with the same genotypes are typically set up of cultivar development, especially, the selection of genotypes targets macro regions (Finlay and Wilkinson, 1963) The establishment of multiple location trials to estimate the GxE interaction has been found quite often in various crop species. Johnson et al. (1955) estimated genetic and environmental variability in soybeans and found signific ant environmental variability occurred in one of the test populations. Al Jibouri et al. (1958) conducted experiments with eight yield and fiber traits in upland cotton and reported that the interaction between progeny and environment is small. Similarly, there was one study showing small GxE interaction for all traits except yield and bolls per plant in upland cotton (Miller et al., 1958) Not only is GxE interaction reflected in traditional phenotypic data analysis, but there are studies exploring the GxE interaction for genetic mapping. Jansen et al. (1995) illustrated the MQM mapping that c onsiders the GxE interaction to provide a better mapping method with Arabidopis thaliana data. In plant breeding, best linear unbiased prediction (BLUP) can provide a good estimation of variance covariance structure for the GxE interaction (Piepho et al., 2008) Moreover, to have a more reliable and
16 accurate understanding on genotype, environment, and their interaction, Chapman ( 2008) used crop models to simulat e plant breeding traits and reported the analysis. In addition to the impact of the GxE interaction on cultivar development, the global climate also plays an important role in the genotype selection process. Many studie s are pointing out the occurrence of global drought and probability of extreme heat events (Allen et al., 2010; Hayhoe et al., 2010; Luber and McGeehin 2008) hence breeding for drought resistant cultivars seem to be a reasonable consideration. Moreover, since the water availability in the United States may be limited during certain months of the year, especially in July and August, there are turfgra ss breeding programs actively breeding cultivars in response to the climate change. The drought condition usually occurs when the precipitation drops below the long term average of several years, leading to soil moisture decreases to the point that negati vely impact plant development. Drought resistance or drought tolerance indicates how well the turf holds quality and color during the dry period when there is no irrigation or rainfall. Favorable drought responses could also include holding of turf color d uring the drought conditions, recovery of color after the irrigation, or even the reduced usage of water to maintain healthy conditions. Huang et al. (1997a) showed that shoot dry matter production partially recovered for zoysiagrass after re watering. Similar drought effect studies were carried out for the root response and found that root dry weight recovered partially for zoysiag rass after re watering as well and the drought resistance may be associated with enhanced root growth, rapid water uptake at deeper soil layers, and rapid root regeneration after re watering (Huang et al. 1997b) Additional studies indicate that the rooting depth, weight and branching at lower depth are responsible for drought resis tance mechanism in zoysiagrass (Marcum et al., 1995) Hays et al. (1991) showed that the root mass significantly correlated w ith turf quality at multiple levels during drought conditions but there
17 was no direct impact on root carbohydrate distribution. More research related to zoysiagrass also showed that the turfgrass rooting and drought resistance level could be markedly affec ted by the genetic tolerance, providing support of turf breeding effort on the drought resistance cultivars. Even though there are plenty of physiological evidences to prove that breeding for drought resistance cultivars are necessary and approachable, not much direct breeding data is available in the literature. Hence, in the third chapter, the broad sense heritability (H 2 ) and GxE interactions will be estimated and, subsequently, the impact on estimation of drought will be explored. The traditional rating system in turfgrass breeding is very common an d well recognized as it is straightforward and intuitive. However, when it comes to the phenotyping data, especially dry matter yield and other quality traits, in forage crops, the process becomes very time co nsuming, labor intensive and expensive. Bermudagrass is an excellent warm season forage species native to southeast Africa, and has been adapted and extensively used in southeastern United States (Sleper et al., 1989) Due to its excell ent active production during the months of May through October, in which most other forage species decreasing the production activities (Bouraoui et al., 2002) it fits into a unique niche among forages. Extensive efforts have been spent on testing its nutritional responses and management (Franzluebbers et al., 2001; Overman et al., 1990) The evaluation of the biomass of bermudagrass usually takes multiple harvests that are traditional ly considered time consuming and labor intensive, requiring additional drying facilities as an additional expense. To assess the nutritional values of bermudagrass, the forage nutritive value analysis has been the traditional choice, which involves evaluat ions performed through wet and ignition laboratory (Kellems and Church, 2009) and NIRS laboratory analysis (Norris et al., 1976) Even though t he test results were considered accurate and reliable, the
18 analysis process could take weeks to complete, leading to high turnaround time (Pittman et al., 2016) To solve the drawbacks of traditi onal labor and time consuming bermudagrass harvests and nutritive value evaluation, mainly two kinds of approaches, advanced machinery and sensor technology, have been proposed (Srivastava et al. 2006; Stoll and Kutzbach, 2001) Since sensor technology has significant advantages over machinery in terms of expenses and flexib ility, there have been many studies examining the application of sensors on bermduagrass. Utilization of remote sensing strategies for prediction of the forage nutritive values and biomass could potentially allow efficient adjustment of grazing management and rapid decision making on the inclusion of feed supplements (Pittman et al., 2016) The sensor combinations (ultrasonic, laser, and spectral sensors) were expected to outperform the traditional remote sensing approach as various sources of data could provide more relevant information and complement each other Ultrasonic proximity sensors reflect the approximate distance to target by calculate the time interval between the signal sending and rec eiving. In agricultural practice, they have been extensively used to characterize the canopy coverage in orchards and corn ( Zea mays L.) (Aziz et al., 2004; Escol et al., 2011) growth in wheat ( Triticum aestivum L.) (Scotford and Miller, 2004) and to measure height in cotton ( Gossypium hirsutum L.) (Sui and Thomasson, 2006) Laser proximity sensors work either through the phase shift method that compares the reflection of beams with the time of flight method that is based on calculation of the time taken to capture the reflected optical pulse. Due to its measurement mechanism, laser sensors have been successfully employed in multiple scenarios to characterize the height of targets. Laser sensors have been used to predict the leaf area index (LAI) that is related to grapevine foliage and indirectly determining grape ( Vitis L. spp) yield and quality (Arn et al., 2013)
19 Laser sensors have also been successfully utilized to quantify and characterize forest structure, ecozones, and stem measurements of standing trees (Henn ing et al., 2006; Hopkinson and Chasmer, 2009; van Leeuwen and Nieuwenhuis, 2010) Genc et al. (2017) utilized light detection and ranging (LIDAR) to determine the vegetation height on wet lands. Laser sensors have also been employed to measure biomass density in oilseed rape ( Brassica napus L.), rye (S ecale cereale L.), and wheat (Ehlert et al., 2008) ; to approximate winter wheat height (Ehlert et al., 2010; Hosoi and Omasa, 2009) ; and to characterize properties of corn stands (Selbeck et al., 2010) An application to provide measurement comparison in apple ( Malus pumila Mill.) orchards for various spray volume deposition models also illustrated the wide adaptation properties of laser sensors (Walklate et al., 2002) The spectral strategies rely on the responses to plants and their ambient environments with respect to reflectance and absorption of various wavelengths. Successful applications of spectral strategies have been documented in various literatures. At the leaf level, the spectral values depend on the content of chlorophyll, and the reflectance measurements partially illustra ted the relationship (Cartelat et al., 2005) The diurnal changes of photosynthesis efficiency of sunflower ( Helianthus annuus L.) canopies could also be captured with a spectral index (Gamon et al., 1992) Additionally, Vogelmann et al. (1993) reported the spectral properties from measurements on sugar maple ( Acer saccharum Marshall) leaves. At the canopy level, spectral strategies are responsive to variation in moisture content (Bowye r and Danson, 2004) vegetation water content (Ceccato et al., 2002) and tropical pasture quality (Mutanga et al., 2005) In addition, spectral strategies could provide information regarding the photosynthetic light use efficiency of Dou glas fir ( Pseudotsuga menziesii (Mirb.) Franco) forests (Middleto n et al., 2009) and indirectly reflect the lignin and nitrogen concentrations (Martin and Aber, 1997)
20 At the landscape level, the spectral strategies could capture the structural measurements of forest vegetation (Roberts et al., 2004) and reflect the effects of prescribed burning on pine forests (Finney et al., 2005) These studies indicate that the spectral strategies are very useful in different applications and could be used to derive various indexes such as normalized difference vegetation index (NDVI) and reflected photosynthetically a ctive radiation (RPAR) among others. However, the spectral strategies could be less reliable when the saturation of reflectance occurs (Gnyp et al., 2014; Hong et al., 2007) In term of sensor usage in bermudagrass, the spectral reflectance type of sensor was initially focused on by researchers. Various range s of wavebands were explored to find good estimations of different traits ; for example, multiple wavebands in the 368 to 1100 n m and 350 to 1125 nm ranges to estimate N concentratio n (Starks et al., 2008; Starks a nd Brown, 2010) Since some spectral devices may depend on consistent lighting and limited sampling conditions, researchers actively sought alternative approaches. Pittman et al. ( 2015) examined the usage of combined sensors to estimate biomass and later reported crude protein estimation with active spectral and canopy height data (Pittman et al., 2016) Most of these studies use traditional analytical methodologies (Partial Least Square, PLS) to study the capacity of different sensor(s) as a proxy to predict economically important trait. However, they ignore the contrib ution that different analytical methods could have in this process. Thus, this study will focus on testing different statistical methodologies to improve the estimation of economically important bermudagrass traits. Partial least square regression is a wid ely used methodology to regress predictor data against target prediction traits based on the assumption that the response variables are from a process generated by unobserved latent variables (Rosipal and Krmer, 2006) It is a competitive
21 prediction model both due to its light computational requirements and superior performance when collinearity exists in the data set (Wold et al., 1984) Even though PLS regression initially gained popularity in chemometric studies for its superior prediction performance (Sjstrm et al., 1983; Geladi and Kowalski, 1986; Frank, 1987; Tobias, 1995) its application has been expanded to various research fields, such as genetics and ecological studies (Nguyen and Rocke, 2002; Carrascal et al., 2009) In spectral sensor data analysis, PLS regression was initially used as the default statistical methodology to predict variables of interest [e.g., estimation of grass biomass and measurement of nitrogen status (Hansen and Schjoerring, 2003; Cho et al., 2007) ]. However, there are other prediction methodologies, such as ridge regression and random forest, which have been used in other applications and shown competitive performance which could potentially outperform PLS regression. Ridge regression is a methodology initially proposed by Hoerl and Kenn ard (1970) to address the potential instability in the least square estimations by adding a small constant value to the diagonal entries of the matrix X T X before taking its inverse. Even though the ridge regression estimators are biased, the prediction performance of this methodology is quite competitive. Because of the small noise added to the diagonal entries of the matrix X T X, the ridge regression can handle multicollinearity very neatly, which is especially helpful in the sensor data as the collinear ity is not uncommon (Mahajan et al., 1977; Rook et al., 1990) Support vector machine regression was based on the classification algorithm that projected the data into hyperplanes for differentiation and has been adapted to regression problems through a fixed feature space transformation (Bishop, 2007) Considering the flexibility of kernel functions to capture the non linearity relationships, it has been widely adapted for various usage such as pr ediction of corporate financial distress, exchange rate prediction, wind speed prediction, and remote sensing (Mohandes et al., 2004; Pai et al., 2006; Hua et al.,
22 2007; Mountrakis et al., 2011) Random forest i s an ensemble learning method based on constructing multiple decision trees and obtaining the regression prediction by the mean of each (Liaw and Wiener, 2002) The combination of random selection of variables at each tree n ode split, full tree length growth, and multiple tree copies gives RF superior performance in multiple problems and effectively avoids overfitting issues. In the fourth chapter of this study, the performance of prediction methodologies regarding the agron omic important traits will be evaluated and, based on the dry matter yield trait, the impact of sensor variables on predictive model performance will be studied.
23 CHAPTER 2 IMPROVED GENETIC PARAMETER ESTIMATIONS IN ZOYSIAGRASS BY IMPLEMENTING POST HOC BLOCKING 1 Background Randomized complete block (RCB) designs are widely used in plant science because of their simplicity to design, analyze and compare treatments, including different genotypes However, since plant breeding programs usually set up hundreds or thousands of genotypes in a single experiment, the assumptions of RCB designs usually cannot be fulfilled. To alleviate the issues to accommodate the already set up experiments in the anal yzing phase, the post hoc blocking was introduced. In this chapter, the post hoc blocking will be tested to check if it could improve the accuracy of genetic parameter estimations and subsequently illustrate its impact on the ranking of top genotypes. Mat erials and Methods Experiment The data used in this study was from five zoysiagrass trials established across Florida, USA ( Figure 2 1 ) between 2011 and 2012, representing different soil types and weather conditions (Table 2 1 ). All five experiments were established as RCB designs with three replicates and 80 genotypes per replicate. These genotypes consisted of F1 hybrids, parental breeding lines, and two commercial cultivars used as checks. Experimental units corresponded to plots of 1.52 1.52 m 2 Plot s were established by planting 10, 5 5 cm clonal plugs. Standard turfgrass maintenance practices were utilized to maintain all experiments (Brede, 2000) Mowing frequency varied by location, with Citra, Duda Sod (Duda), Bethel Farms (Bethel), Jay, and RB Farms mowed weekly, biweekly, biweekly, biweekly, and monthly, respectively. This Chapter published in Euphytica 213( 8 ) 195
24 Statistical A nalysis The performan ce of the post hoc blocking was compared with the original experimental design at the single measurement level across sites, and later using all measurements within a site. The software ASREML v. 3 (Gilmour et al. 2009) was used to fit linear mixed models that provided variance component estimates, best linear unbiased prediction (BLUP) for each genotype, and REML log likelihood values. After anal yses, the best post hoc blocking experimental design was selected, and its impact on genetic parameter estimation and predictability was assessed. At a first stage, comparisons of post hoc blocking with the original experimental design were implemented wit h the data from all 100 measurements belonging to the five sites. Only two measurements with extremely low genetic signal obtained from single measurement analysis were removed. Here, for each measurement site dataset, the linear mixed model for the origin al RCB expe rimen tal design was fitted ( 2 1). Then, a model was fitted by superimposing a n IB design ( 2 2) with a total of eight genotypes per incomplete block, and a superimposed R C design that considered the specified rows and columns within a block of e ach experimental unit ( 2 3 ). These fitted models were RCB: y = 1 + Xb + Z 1 g + e (2 1) IB: y = 1 + Xb + Wi + Z 1 g + e (2 2) R C: y = 1 + Xb + W 1 r + W 2 c + Z 1 g + e (2 3) where y is the vector of phenotypic value (i.e. turf quality rating score); is the overall mean effect; b is the fixed vector of replicate (or block) effects; i is the random vector of incomplete block effect nested in the original replicate, with i ~ N(0, i 2 I); r is the random vector of row effect nested in the original replicate, with r ~ MVN(0, r 2 I); c is the random vector of column
25 effected nested in the original replicate, with c ~ N(0, c 2 I); g is the random vector of parental effects with g ~ MVN(0, g 2 A); and e is the random vector of error, with e ~ MVN(0, e 2 I). The letters X, Z 1 W, W 1 W 2 represent the incidence matrices for their respective effects. The matrix A is the additive relationship matrix obtained from pedigree information, and I is an i dentity matrix of its proper size. After fitting the above models, log likelihood values were recorded and likelihood ratio tests (LRT, Gilmour 2009) were performed for IB design vs. RCB design and R C design vs. = 0.05 (Eq. 4), as: d = 2[LogL2 LogL1] ~ where LogL2 is the log likelihood values for R C or IB designs and LogL1 is the log likelihood values for original RCB design, and similarly for df2 and df 1 In a second stage, data from each location were pooled and thr ee different linear mixed models were fitted ( 2 4, 2 5 and 2 6 ) to evaluate the effect of post hoc blocking when repeated measurements were considered. This was done for each site individually and later the results were used to compare the post hoc blocking perf ormance. The fitted models were RCB: y = 1 + X 1 u + X 2 b(u) + Zg(u) + e ( 2 4 ) IB: y = 1 + X 1 u + X 2 b(u) + Wi(u) + Zg(u) + e (2 5) R C: y = 1 + X 1 u + X 2 b(u) + Wr(u) + Wc(u) + Zg(u) + e (2 6) w here y is the vector of phenotypic values; is the overall mean effect; u is the fixed vector of measurement; b(u) is the fixed vector of block effect within measurement; i(u) is the random effects vector of incomplete block effect nested within block within measurement, with i(u) ~ MVN( 0, D iu ); r( u) is the random effect vector of row effect nested within block within measurement, with r(u) ~ MVN( 0, D ru ); c(u) is the random effect vector of column effect nested within block within measurement, with c(u) ~ MVN( 0, D cu ); g(u) is the random vector of parent
26 effect within each measurement, with g(u) ~ MVN( 0 A G ); and e is the random vector of errors, with e ~ MVN(0, R I ). The matrices D iu D ru D cu are diagonal matrices where each of the jth measurement has a different and inde pendent incomplete, row and column variance component, iuj 2 ruj 2 cuj 2 respectively. The matrix A is the numerator relationship matrix for parents, G is a matrix of variance covariance (dimension determined by the number of measurements within site) b etween genotypes across measurements, modeled by considering a single genetic correlation term, r B and a unique j th variance term, gui 2 for each measurement (i.e., CORUH), R is a matrix of variance covariance components (dimension determined by the number of measurements within site) between residuals of measurements defined as an autoregressive heterogeneous order 1 error structure with a correlation between residuals of e and a different residual variance for each j th measurement, ej 2 All the other matrices were previously defined. Based on the estimated variance components for each of the analyses, a narrow sense heritability was calculated based on the expression: h 2 = ( gu 2 r B )/( gu 2 + biu 2 + bru 2 + bcu 2 + e 2 ) (2 7) where the components correspond to averages of estimates across all measurements. Later, the log likelihood values of different models were recorded, based on which the Bayesian Information Criterion (BIC) (Liddle, 2007) and Akaike information criterion (AIC) were calculated and used to assess the goodness of fit of the evaluated models. Moreover, the re liability was calculated serving as additional model selection criterion, which reflected the correlation between true and predicted breeding values. The reliability was calculated based on the expression:
27 r 2 ( ) = 1 (2 8) Finally, to assess the impact of post hoc blocking on the selection process, the response of parental selection, i.e. genetic gain, for different selection intensities was evaluated. For a given site, the predicted mean values for each genotype tested in t he experiment across all measurements were used to calculate the genetic gains in relative terms by fitting a rand omized complete block (RCB, 2 5) and a row column (R C, 2 7) design. Genetic gains were obtained by averaging the prediction values of the top 10, 15, and 20% of the genotypes and dividing them by the average phenotypic response. Additionally, as during the turfgrass genotype selection, the ranking of the genotypes will affect which genotypes selection for the next breeding cycle. Understanding the impact of post hoc blocking on this process can be critical. Hence, the genotype rankings based on breeding values within sites were plotted for each site to illustrate the effect of post hoc blocking. Results First, we determined whether data fitting, at the single measurement level, with post hoc blocking was significantly better than the original RCB design (Fig ure 2 2 ). When comparing the post hoc IB design against the RCB design, in 44% of the comparisons, post hoc IB designs performed < 0.05) (Fig ure 2 2 ). Additionally, we found that for 77% of comparisons, the post hoc R C designs were significantly better than the original RCB designs, with no statistical significant differences in the remaining comparisons ( Figure 2 2 ). For the fitted model, considering all repeated measurements within a site, in the case of the LRT, the post hoc blocking designs performed overall better than the original RCB designs (Table 2 2 ). At the Jay site, t he model fitting of the IB design was significantly better than RCB
28 ( p < 0.001), while there was no significant difference between R C and RCB designs ( p = 0.999). In contrast, for Duda, the data fitting of the post hoc R C design was better than the origi nal design ( p < 0.001), but there was no significant difference between IB design and RCB ( p = 0.199). At RB Farms and Bethel Farms, both post hoc IB and R C designs were < 0.05). However, due to the sm aller p values with the R C designs for the hypothesis testing and their respective AIC values, the R C designs were found to have a better fit than the RCB designs. At PSREU, AIC and reliability all indicated that the IB design was better than the R C des ign (Table 2 2 ). Overall, for the combined information from measurements across a site, the R C design performed marginally better. From the analyses at the site level, the narrow sense heritability ( h 2 ) varied between 0.239 and 0.399 for the R C design wh ile it ranged from 0.248 to 0.398 for the RCB design ( Figure 2 3 ). The h 2 from Jay and RB Farms were the highest and lowest among all the locations, respectively ( Figure 2 3 ). The G M interaction, presented as a genetic correlation across measurements va ried from 0.558 at RB Farms to 0.996 at Jay for the R C design, whereas it ranged from 0.566 at RB Farms to 0.991 at Jay for the RCB design ( Figure 2 3 ). The genetic gains, calculated based on selection of the top 10, 15, and 20% parental genotypes, result ed, with some minor exceptions, in marginally higher values for the original RCB designs in relation to the R C designs at all three selection intensities for PSREU, RB Farms and Bethel Farms (Table 2 3 ). For Jay, at 10% selection intensity, the genetic ga in of R C design was slightly higher, whereas, at 15 and 20%, the opposite was true. Interestingly, at Duda, the application of the post hoc R C design yielded higher genetic gains in all selection intensities.
29 Comparisons between the original RCB and R C designs based on the top 10 genotype performers (~12% of total genotypes) at each of five sites were presented in Figure 2 4 The site with the highest h 2 and GxM, Jay, had no genotype ranking change. At the other sites, there were changes in genotype rank ing and the degree of change varied from site to site. For example, at Bethel Farms, the genotype UFZ11 ranked fifth in the original design, but with post hoc R C design it ranked second. Even more relevant for breeding, is that some of the genotypes origi nally ranked in the top 10 performers dropped out of selection after the implementation of post hoc blocking with an R C design ( Figure 2 4 ). This last phenomenon would have a considerable impact on selections. Discussion Post hoc blocking by superimposing incomplete block (IB) and row column (R C) designs were compared to the original randomized complete block (RCB) design. At the single measurement level, for the datasets analyzed, the post hoc R C design was found to perform better than the original RCB and IB designs ( Figure 2 2). These comparisons were also performed at the site level with multiple measurements using a likelihood ratio test, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). Again, the post hoc R C design resu lted in the best performance at Duda, RB Farms, and Bethel Farms; whereas, the post hoc IB design was better at Jay and PSREU. However, when the R C designs were compared to the original RCB designs, they almost always had a better fitting model than the R CB design except at Jay. In addition, the residual variance was reduced an average of 18.7% by implementing the post hoc R C design over the original RCB design. The residual variance decreased by 2, 13, 24, 27 and 27% at Jay, PSREU, Duda, Bethel and RB Fa rm, respectively. This indicates a very strong local environmental heterogeneity in this experimental site that is successfully controlled for by the added row and column effects. Because the residual variance
30 decreased significantly at most sites, even wi th different criteria, such as relative efficiency (Simon and Maitournam, 2004) the R C design would be considered to have better performance than other designs teste d. The superior performance of the post hoc R C over the IB designs may be due to the existence of dominant global gradients (that are better controlled for by row and columns) instead of small patches (controlled by incomplete blocks) (Gezan et al., 2006) The different decrease in residual variance among locations could also be influence by the layout conditions at each location. However, we could not find a trend associated w ith the planting layout (square versus rectangular blocks) of the location in this specific study. The size of the incomplete block, another factor to consider, determines the effectiveness of the IB design. In this case, the number of genotypes within eac h post hoc IB design was based on the criteria proposed by Williams 2002 to use a value marginally smaller than the square root of the number of treatments (here, 8 < 8.9 = investigated, but larger blocks are exp ected to be less efficient, and smaller blocks might capture a portion of the genetic signal reducing heritability (Gezan et al., 2006) The post hoc R C design did not seem to have a big effect in term of the estimated h 2 values when compared to the RCB design estimates. As expected, there was some positive association between heritability and measurement by measurement genetic correlation estimates, indicating that sites with better genetic signals tend to have lower levels of G M interaction. The magnitude of G M reported here provides useful information for turfgrass breeders who might want to optimize the frequency of data collection in the selection process. For example in Jay, r B between measurements is 0.996, indicating a very high agreement between consecutive measurements ( Figure 2 3). Therefore, in this and other experimental sites, the frequency of data collection could be reduced without affecting genetic gains a nd the accuracy of
31 the final selections. However, at Duda and RB Farms, the correlations were around 0.6, indicating that some loss of information might occur if fewer measurements are considered. The consistency of measurements at various sites may be imp acted by many factors such as weather conditions, disease prevalence, and management practices. The change of genotype ranking ( Figure 2 4 ) illustrates the direct impact from implementation of the post hoc designs. For example, UFZ126 and UFZ154 were the g enotypes at Duda, which were estimated to be in the top 10 with the RCB design. However, with the implementation of post hoc R C design, they dropped out of this potential selection list. This could be critical information for plant breeders, particularly when the selection intensity is high and limited resources are available. Bethel was one of the most affected locations, in terms of ranking change, with the implementation of post hoc blocking. The significant slope, compared to other locations, could hav e contributed to this change and this reinforced the idea of implementation of post hoc blocking to control these potential intra blocks environmental variation. Besides post hoc blocking, others have tried to address this challenge in different ways. (Cullis et al. 1989) showed that some spatial model could be incorporated to enhance the estimation of BLUP of genotypic values in early generation variety trails. Stroup and Mulitze ( 1991) developed nearest neighbor adjusted BLUP and found it to have considerable estimation efficiency improvement. Moreover, some geostatistical methods such as Kriging, provide local predictions utilizing the observation information at neighbors based on spatial correlation (Schabenberger and Gotway, 2017) Even though the selection criteria and cultivar development procedures of multiple testing sites for selections in turfgrass b reeding are very well developed (Ebdon and Gauch,
32 2002; Raymer and Braman, 2006; Watkins et al., 2011) for this species, there is not information available on how different experimental designs may impact genotype testing and selection. Results from this study indicate that genetic parameters, ranking of genotypes and genetic gain could be impacted when the environmental variation is better controlled. Implementation of post hoc blocking analysis will allow turfgra ss, and other plant breeders, to utilize their data more efficiently. In addition, implementation of R C designs, followed by its analysis, is recommended for future testing efforts in turfgrass for Florida. The utilization of multiple testing sites is com mon, which add valuable information regarding genotype stability ( Fan et al., 2007) and provide an indication of genotype by environment interaction (G E). In future studies, it would be worthy to determine the post hoc blocking on the estimation improvement of G E by combining analyses of multiple data resources. In conclusion, in this study, the contrasting of different post hoc experimental designs indicated that R C designs clearly outperformed RCB and are marginally better than IB designs. It was shown here, that post hoc blocking a nalysis could provide important spatial control of the local environmental variation.
33 Table 2 1. Experimental site information for the five zoysiagrass trials, average turf quality scores (TQ, scale 1 to 9), and number of measurements. Sta ndard devi ations in parentheses. Sites Jay PSREU* Duda RB farm Bethel Planting Date June 2011 July 2011 July 2012 May 2012 Aug. 2011 Soil Type Sandy loam Deep sand Flat wood soil Muck soil Flat wood soil County Santa Rosa Alachua Seminole Highlands DeSoto Location Jay Citra Cocoa Beach Lake Placid Arcadia TQ average 5.93 (1.12) 3.45 (1.54) 4.85 (1.55) 5.03 (1.42) 4.78 (1.63) Measurement # 13 27 18 18 24 *Plant Science Research and Education Unit
34 Table 2 2. Post hoc blocking designs IB and R C, at the site level, including all repeated measurements. incomplete block (IB) and row column (R C) designs were compared with original randomized complete block (RCB) design with data from five different locations analyzed by all measurements available by site (Eq. 2 5, 2 6 and 2 7). For the Akaike information criteria (AIC) and the Bayesian information criteria (BIC) the smaller value the better goodness of fit, whereas, for the log likelihood value (LogL), the larger valu e, the better the fit. Reliability is the correlation between true and predicted breeding values. df is the degree of freedom for the likelihood ratio test. The models with best fitting within a site are highlighted with bold. Site Design LogL BIC AIC Reli ability df p value RCB 8200.23 16625.27 16456.46 0.792 Jay IB 8175.79 16680.76 16433.58 0.792 13 < 0.001 R C 8195.6 16824.75 16499.2 0.796 26 0.999 RCB 8029.23 16526.53 16170.46 0.729 PSREU IB 7947.42 16588.59 16060.84 0.731 27 < 0.001 R C 7949.53 16818.49 16119.06 0.725 54 < 0.001 RCB 2749.66 5816.94 5575.32 0.698 Duda IB 2738.27 5944.61 5588.54 0.699 18 0.199 R C 2576.35 5771.22 5300.7 0.708 36 < 0.001 RCB 2732.13 5781.88 5540.26 0.666 RB Farms IB 2712.68 5893.43 5537.36 0.668 18 0.003 R C 2609.44 5837.40 5366.88 0.668 36 < 0.001 RCB 7382.38 15196.72 14864.76 0.786 Bethel IB 7361.17 15361.64 14870.34 0.790 24 0.012 R C 7117.69 15082.02 14431.38 0.763 48 < 0.001 Table 2 3 Calculated genetic gains (%) from selecting overall top 10, 15, and 20% of parental genotypes based on the all available measurements per site by fitting a randomized complete block (RCB, Eq. 2 5) and a row column (R C, Eq. 2 7) design. Number in parenthe sis correspond to the number of genotypes selected. Note that the total number of parental genotypes considered here are only those tested in the experiment. Design Selection Jay PSREU Duda RB farms Bethel 10% (8) 12.69 37.21 21.63 18.54 28.78 RCB design 15% (12) 11.46 33.87 20.48 17.14 25.90 20% (16) 10.60 31.89 19.55 15.62 23.54 10% (8) 12.70 36.53 22.93 17.54 26.48 R C design 15% (12) 11.45 33.32 21.15 16.27 24.25 20% (16) 10.57 31.38 19.81 14.74 22.52
35 Figure 2 1. Geographical location of the five ex perimental sites within Florida Figure 2 2. Likelihood ratio test (LRT) comparison between post hoc IB design and RCB design (A), and between post hoc R C design and RCB design (B). The lines indicate the critical value of two sided chi square test with 1 degree of freedom (3.84, in A) and with 2 degrees of freedom (5.99, in B).
36 Figure 2 3. Estimation of narrow sense heritability h 2 ( 2 8) (A) and genotype by measurement interaction r B (B) for the analysis of each of the five site s with all their available measurements by fitting a rand omized complete block (RCB, 2 5) and a row column (R C, 2 7) design. Figure 2 4. Changes of rankings of top 10 genotypes at different sites based on all available measurem ents per site by fitting a rand omized complete block (RCB, 2 5) and a row column (R C, 2 7) design. Points with line connected are the same genotypes and the cross of lines indicate the ranking changes. Points with the names are genotypes that move out of the top 10 after implementation of post hoc blocking with an R C design.
37 CHAPTER 3 STATISTICAL ANALYSES OF MULTIPLE RESPONSE MEASUREMENTS TO UNDERSTAND IMPACT OF DROUGHT AND GROWTH IN ZOYSIAGRASS Background The concept of GxE interaction has a critical role in cultivar development, specifically in evaluating the phenotypic stability in crop breeding. The GxE interaction is reflected in the performance of genotypes at various locations and impact differing traits such as extrem e temperatures and water shortage. In plant breeding, multiple locations with the same genotypes are typically set up of cultivar development, especially, the selection of genotypes targets macro regions. To improve cultivar development efficiency, underst anding GxE of target species is an important step. The drought condition usually occurs when the precipitation drops below the long term average of several years, leading to soil moisture decreases to the point that negatively impact plant development. As the probability of extreme heat events in global scale increases, the breeding for drought resistant cultivar seem to be a reasonable consideration. Even though there were research related to zoysiagrass showed that the turfgrass rooting and drought resist ance level could be markedly affected by the genetic tolerance, providing support of turf breeding effort on the drought resistance cultivars, there is s till limited literature describing the impact of drought conditions on the genetic parameter estimation s. Therefore, in this chapter, the broad sense heritability (H 2 ) and GxE interactions will be estimated and, subsequently, the impact on estimations under drought conditions will be explored. Materials and Methods Experimental D ata The data used in this s tudy originates from z oysiagrass trails that were established between 2011 and 2014 in five states (seven locations) in the United States including Florida
38 (1), Texas (2), Georgia (2), North Carolina (1), and Oklahoma (1) representing different soil type s and climate zones. A total of four series of seven trails each (identified as series 2011, 2012, 2013 and 2014 ) were established as randomized complete block (RCB) designs with two replicates and they include repeated measurements taken on an approximate monthly basis. Standard turfgrass maintenance practices were utilized to manage all experiments (Brede, 2000) an d mowing frequency varied by location and was determined by local turfgrass breeding programs. There were 164, 164, 164, and 84 experimental genotypes tested in series 2011, 2012, 2013, and 2014, respectively half of which originated from the Texas A&M Un iversity and half from the University of Florida breeding programs Four common cultivars ( ) were also planted in all four series and served as checks. G enotypes were different across the series but identical within a series; hence, series were explored independently. The response variable evaluated was turf quality (RTQ) rated with a 1 9 scale as described by the National Turfgrass Evaluation Program (NTEP) (Morris and Shearman, 1998) which was recorded in every evaluation Here, a rating of 9 indicates outstanding or ideal turf, and 1 reflects very poor or dead turf. In general, a rating of 5 is considered the minimum acceptable turf quality Due to different planting dates, the number of rating measurements differs by series, trials and years. To simplify further analyses, a new variable, TQ, was obtained by averaging the repeated measurements over time and replication within a given experi mental unit In addition, to facilitate the exploration of turf quality ratings at various conditions, other response variables were defined corresponding to averages of repeated measurements under : drought conditions (TQD) ; normal non drought conditions ( TQND) ; actively growing months, April/May to
39 October/November (TQG) ; and non growing month s November/December to March April (TQNG). D rought conditions and active growing months were defined based on field observations of conditions and active growth peri od of plants during the year for each trial. Further details of phenotypic data are presented in Figure 3 1 and Table 3 1. Statistical Analysis The statistical analyses were carried out at different levels following some data cleaning procedures. First, t o better understand the genetic signals, the individual RTQ measurements were fitted separately using the following linear mixed model y = 1 + X + Zg + e (3 1) w he re y is the vector of phenotypic value (i.e. turf quality rating score); is the overall mean effect; is the fixed vector of replicate (or block) effects; g is the random vector of genetic effect with g ~MVN( 0 g 2 I) ; and e is the random vector of error, with e ~MVN( 0 e 2 I ). The letters X Z represent the incidence matrices fo r their respective effects 1 is a vector of ones, and I is an identity matrix of its proper size. Based on the estimated variance components for each of the analyses, a broad sense heritability (H y 2 ) was calculated as H y 2 = (3 2) First, M easurements with very low genetic signals (H y 2 < 0.05) were removed. Second, TQ, TQD, TQND, TQG, and TQNG were calculated with the re maining measurements to be used for further analysis. Since the majority of genotypes tested in the four series were different, the analyses were performed separately by series considering all their corresponding seven trails. Within series, the levels of genetic control and genotype by environment (GxE) interactions were assessed by fitting the model
40 y = 1 + X 1 u + X 2 b(u) + Zg(u) + e (3 3) where y is the vector of phenotypic values (TQ, TQD, TQND, TQG and TQNG) ; is the overall mean effect; u is the fixed vector of trial ; b(u) is the fixed vector of block effect within trial ; g(u) is the random vector of genetic effect within each trial with g(u) ~MVN( 0 I G ); and e is the random vector of errors, with e ~MVN( 0 R ). G is a matrix of variance covariance (dimension determined by the number of trials within a series ) between genotypes across trials modeled by considering a single Type B genetic correlation term, r B and a unique i th variance term, gui 2 for each trial (i.e., CORUH) R is a block diagonal matrix of variance covariance components (dimension determined by the number of t rials within a series ) with a different residual variance for i th trial ei 2 The letters X 1 X 2 and Z represent the incidence matrices for their respective effects is the Kronecker or direct product and a ll other matrices were as previously defined. Based on the estimated variance components for each of the analyses, the broad sense heritability at each site ( ) and overall broad sense heritability (H 2 ) of a series were calculated for each of the responses with the following expressions H ( i ) 2 = (3 4) H 2 = ( x r B ) /( + ) (3 5) where the bars over the variance components identify averages of variance components over the trial estimates. Third, to understand the level of pleiotropy of genotype performance at various environmental conditions, bivariate analyses were performed for TQD against TQND and TQG against TQNG. As before, t hese bivariate analyses were done to each series separately, as different genotypes were tested in each series. The model fitted corresponded to
41 w here y 1 and y 2 are the data vector s for the two traits of interest ; t is the fixed vector of group effects, where each w as defined as the combination of trait and trial ; (t) is the fixed vector of block effects within group; g(t) is the random vector of genetic effect within group, with g(t) ~MVN( 0 I G ); and e is the random vector of errors, with e ~MVN( 0 I R ). The matrix G is a 2 x 2 variance covariance matrix between traits defined by a single trait to trait T ype A genetic correlation term ( r A ), and a unique j th variance term, gtj 2 for each trait (i.e., CORUH). The matrix R is a uniform heterogeneous 2 x 2 matrix of varia nce covariance components between residuals of the same group. All the other matrices were previously defined. Finally, even though the relationship between genotype performances under different conditions were explored with the above models, specific perf ormance of genotypes needs to be further explored. In order to provide further insights, the top 10% 20% and 30% genotypes under TQD and TQND were compared. All linear mixed models were fitted with the software ASreml v. 3 (Gilmour et al., 2009) that estimates variance component under REML and provides BLUP values for each genotype. Results The analys e s that were carried out on the single measurement level of RTQ provide insight on the level of genetic signal of the collected data. The total number of measurements available for R TQ over all series corresponded to 206 with the highest broad sense heritability, H y 2 of 0.839 and the lowest of 0.001. Six measurements with H y 2 < 0.05 were removed from dataset before calculating the other aggregated response variables. The variables TQ, TQD, TQND, TQG, and TQNG help to summarize the data from different trails that have been coll ected over multiple measurements The response variable TQ
42 had an average value by trial that ranged from 4.13 ( Dallas ) to 6.65 ( Griffin ) and there were different number of measurements at locations (Table 3 1). In total, there were 200 measurements acros s all combinations of trial series with the lowest number of measurements collected at Tifton ( 10 ) and highest number of measurements at Citra (37) (Table 3 1). The number of measurements for response variable TQD and TQND was similar ( 92 and 102, respecti vely ) As expected, t he average value of TQ under non drought (TQND) was consistently higher than that under drought ( TQD ) at multiple trials except for Dallas. On the other hand, TQ under growth measurements (TQ G ) had higher number of measurements ( 187 ) t han that under non growth measurements ( TQNG ) which indicated that the focus of data collection of turfgrass by the breeders w as concentrated on the growing seasons. The average value s of the TQG were higher than those of TQNG at Citra and Dallas and lower at College Station and Stillwater Overall, there was no clear pattern identified between TQG and TQNG and there w ere no measurements for some of the trails ( Griffin, Sandhills, and Tifton ) for the non growth period The site to site Type B gen etic correlation s ( r B ) use d indirectly to evaluate the GxE interaction varied for TQ from its lowest value of 0.350 in series 2011 to its highest value of 0.727 in series 2012, with an average of 0.519 (Figure 3 2 and Table B 1 ). In series 2011, the perf ormances of genotypes at various locations were quite different; whereas, in series 2012, genotype performances were more alike Moreover, when comparing the responses TQND and TQD, the yearly Type B genetic correlations of TQND were 0.296, 0.724, 0.598, 0 .689 ; consistently higher than th ose of TQD, 0.277, 0.608, 0.308, 0.425. Similar results were found for TQG and TQNG, which showed that the Type B genetic correlations of TQNG w ere consistently higher than th ose of TQG ( Figure 3 2 and Table B 1 ). By compar ing the average of the correlations across the five response variables ( TQ, TQD, TQND, TQG and TQNG ) series 2011
43 had the lowest values ( 0.378 ) and series 2012 had the highest value ( 0.734 ) which holds across all the traits except for TQNG ( Figure 3 2 and Table B 1 ). The broad sense heritability, H ( i ) 2 of the calculated average response variables were analyzed by sites and year to reflect the site specific and year specific genetic information. In series 2011, TQND had the highest heritability ( 0.485 ) and TQNG had the lowest ( 0.236 ) which can be explained by the active expression of genetic signals during normal non drought conditions and the difficulty of differentiating phenotypic performance across genotypes during the non growing season ( Table 3 2 ). In series 2012, TQG had the highest H ( i ) 2 ( 0.558 ) which was very close to that of TQ ( 0.557 ) On the other hand, TQNG still had the lowest H ( i ) 2 ( 0.337 ) across all the response variables. Interestingly, in series 2013, TQD had the highest H ( i ) 2 ( 0.50 9 ) followed by TQ ( 0.500 ) and TQG ( 0.500 ) indicating th at drought conditions in this series did not have an important impact on the phenotypic signal of the genotypes evaluated In series 2014, TQND had the highest H ( i ) 2 ( 0.561 ) and TQNG had the lowest value ( 0.239 ) ( Table 3 2 ). Given that the previous ly report ed heritability values were calculated on a by site and by series basis the se estimates may be inflated by other sources of variance such as GxE interaction To eliminate th is bias, an analysis was carried out considering all the sites within a series. For TQ, the H 2 ranged from 0.165 to 0.397 with an average of 0.290 (Figure 3 3 and Table B 2 ). The H 2 of TQD were consistently higher than that of TQND. Similar trends were found for TQG with highe r heritability values than TQNG with an exception of series 2014, in which the two estimates were very close (Figure 3 3 and Table B 2 ). When averaged across series the H 2 values of the five calculated response s ranged from 0.181 ( TQD ) to 0.296 ( TQND )
44 The bivariate analysis by series on TQD and TQND, TQG and TQNG were carried out to compare the performance of genotypes under different environmental conditions (in this case drought and non drought, and growth and non growth) providing useful information to assist in effects of indirect selection. Since for each series, the genotypes were different, the analys e s w ere carried out separately by series. S eries 2011 presented a trait to trait Type A genetic correlation ( r A ) between TQD and TQND at various tr ials, with values ranging from 0.032 to 0.820 with most of the values between 0.703 to 0.878 (Figure 3 4 and Table B 3 ). Moreover, the Type A correlation between TQG and TQNG varied from 0.070 to 0.959, which indicated, within some of the trials th at TQG and TQNG were almost independent traits, while in other trials TQG and TQNG were almost identical traits ( Figure 3 4 and Table B 3 ). In series 2012, the Type A correlation between TQD and TQND within all the sites varied from 0.708 to 0.999. Similar tren ds were observed in series 2013 for TQD and TQND as well, when Type A correlation values ranged from 0.622 to 0.976 with most values above 0.873 (Figure 3 4 and Table B 3 ). Interestingly, the correlation between TQG and TQNG in 2012 2013, at College Statio n, Dallas and Stillwater were 0.948, 0.851, 0.999, respectively; indicating that selections under growth and non growth periods in this season were very similar ( Figure 3 4 and Table B 3 ). Similar trends were observed in series 2014 with Type A correlations ranged from 0.852 to 0.929. In contrast, for series 2014, Type A correlations between TQD and TQND differed for different locations considerably (rang ing from 0.079 to 0.999 ), which indicates that the impact of drought conditions on genotype p erformance varied considerably at different sites (Figure 3 4 and Table B 3 ). Top genotypes BLUP values were compared for the analyses on the calculated response variables TQD and TQND, in which the percentage of genotypes (10, 20 and 30%) on both top
45 perf ormers list s, w as obtained (Figure 3 5). The identified top genotype s that agree d for TQD and TQND in series 2011 and 2014 were lower than those from series 2012 and 2013 (Figure 3 5). As the selection pressure decreased, the percentage of top genotype mat ching increased. Overall, the percentage of top genotypes matching under drought and non drought conditions peaked at 30% selection intensity and the mean across multiple series was 57.75 % leading to poor agreement of top identified genotypes in TQD and T QND (Figure 3 5); thus, reflecting the difference nature of these traits Discussion The level of GxE interaction between sites w as explored by calculating a single correlation value (i.e. Type B genetic correlation ) within series estimated by fitting a complex linear mixed model (using the CORUH structure) giving an overall indication of the genotype performance across trials For example, in series 2011, the r B of sites was 0.350 (0.048), indicating a relatively high level of GxE across trials However by model ing the data in less parsimonious way with a different correlation parameter between pairs of trials (i.e. CORGH), it is not hard to note that the genetic correlation between Tifton and Citra, Tifton and Griffin were 0.728, 0.624, respectively w hich indicates lower GxE interaction levels for those sites than previously estimated (Table B 4 ). By comparing the overall H 2 (Figure 3 3 and Table B 2 ) with the GxE effect ( Figure 3 2 and Table B 1 ), a trend was identified, where low er GxE interaction le vels usually are accompanied by high er H 2 values, and vice versa, which c an be found across the five calculated response variables ( TQ, TQ D, TQ ND, TQ G, and TQ NG ) However, the direct comparisons of response variables between series in terms of H 2 and GxE are not very reasonable as the genotypes differ from series to series. The available literature on zoysiagrass breeding is limited. Some studies reported heritability values for relative leaf firing, shoot, and root growth (Qian et al., 2000) but there is
46 not much information available regarding the estimation of the heritability in turf quality traits. Schwartz et al. (2009) reported heritability estimat es of turf quality based on three leaf textures : very fine, fine, coarse. The se values ranged from 0 .70 to 0.76 for different textured zoysiagrass species based on one location data These values are cons iderable high when compared to the estimated heritability presented in our study of TQ trait that ranged from 0.165 to 0.3 97 ( Figure 3 3 and Table B 2 ) ; which were estimated by considering multiple sites, and the estimate from Schwartz et al. (2009) may be inflated by GxE effects. The measurement s in this study were mainly collected during the growth period, which is reasonable considering the biologica l conditions of turfgrass. Howe ver, the number of measurements between trials and series vary considerably. For example, for TQ measurements in Citra, there were 37 across multiple series, but there were only 10 in Tifton, which created unbalanced datasets for evaluation and estimation of genetic effects. In addition, by observing the frequency of measurements taken, some trials had consistently one or two measurements per month, a fact that might favor estimation of effects on some trials over others. Acco rding to Xing et al. ( 2017) that evaluated the estimations of genotype by measurement interaction i n zoysiagrass, the authors suggested that monthly rating could be potentially reduced without significant impacting on reducing the quality of the information. The drought effects on zoysiagrass have been well studied on the root and shoot, and after droug ht recovery responses (Huang, et al., 1997; Marcum et al., 1995) but most of the studies were focused on the physiological aspects without covering their impact on breeding selections. By comparing the TQ D agai nst TQ ND in term s of GxE interaction and H 2 it was shown in this study that drought effect s induce differential genetic expressions of turf quality In detail, the estimation of Type B correlation a nd heritability of the response variable TQD were
47 consistently lower than these of TQND a phenomenon that could be caused by many factors. One possibility is that the drought conditions may have some masking effect by limiting the genetic expressions reflected as more similar phenotypic responses To b etter understand the drought effect to breeding selections, bivariate analyses between TQD and TQND were performed. Most of the genetic correlations (i.e. 11 out of 12), in series 2011 and 2012, were higher than 0.7. Thus, it appears that both responses pr ovide with similar genetic information. However, because breeding selection is based on the top performance genotypes, the bivariate analyses regarding the overall correlation between two responses may not completely reflect the relationship between top pe rformers. From observing the matching percentage of top genotypes for TQD and TQND, it is reasonable to assume that the drought conditions create very different breeding selection sets of genotypes (Figure 3 5). On the other hand, the correlations from biv ariate analyses between TQG and TQNG were fluctuated in a wide range, from being almost independent to highly correlated, which make the two response measurements unreliable to represent each other. Therefore, plant breeders need to be aware of the differe nce in data information collected at different growth conditions and, based on breeding goals, carry out the data collection accordingly. In summary, this study reported the levels of GxE interaction and heritability of zoysiagrass with data from seven lo cations measured over time over a range of environmental conditions, which add reliable genetic information to the understanding of this species for breeding purposes. It was noted that drought conditions affect the estimated genetic parameters and the ran k of top genotypes. To have more reliable plant breeding selections, it is recommended to use the data from the non drought conditions to perform rankings. Finally, the analytical approach used is study can be potentially used as a template to answer simil ar
48 questions in other commercially relevant crops, particularly those that deal with several observations over one or multiple sessions, such as phenotypic scores or harvests.
49 Table 3 1. Summary of five response variables in each of the seven trails including all series. The first and second values represent the mean and its standard error, respectively. Number of measurements is shown in parenthesis. Response Citra College Station Dallas Griffin Sandhills Stillwater Tifto n TQ 4.150.02 (37) 4.830.02 (36) 4.130.02 (35) 6.650.02 (15) 4.830.02 (31) 4.420.02 (36) 5.030.03 (10) TQD 3.800.02 (21) 4.300.02 (23) 4.560.03 (13) 5.680.04 (3) 4.120.02 (11) 3.600.02 (18) 3.860.04 (3) TQND 4.590.02 (16) 5.680.02 (13) 3.900.02 (22) 6.910.02 (12) 5.220.02 (20) 5.410.02 (18) 5.530.03 (7) TQG 4.180.02 (35) 4.770.02 (34) 4.310.02 (29) 6.650.02 (15) 4.830.02 (31) 4.350.02 (33) 5.030.03 (10) TQNG 3.670.07 (2) 5.640.06 (2) 3.360.03 (6) (0) (0) 5.350.05 (3) (0)
50 Table 3 2 Summary of broad sense heritability estimates for different calculated response variables in different trials for each series. Values in parenthesis correspond to standard error of the estimates. TQ, averages of repeated measurements of turf quality; TQD, averages of repeated measurements of turf quality under drought conditions ; TQND, averages of repeated measurements of turf quality under non drought conditions; TQG, averages of repeated measure ments of turf quality in growing months (April/May to October/November); TQNG, averages of repeated measurements of turf quality in non growing month s ( No vember/December to March April); College Stn, abbreviation of location, College Station. Series Trial TQ TQ D TQ ND TQ G TQ NG Citra 0.404 (0.064) 0.385 (0.067) 0.347 (0.067) 0.405 (0.065) 0.086 (0.052) College Stn 0.346 (0.069) 0.363 (0.068) 0.523 (0.059) 0.323 (0.070) 0.319 (0.071) Dallas 0.552 (0.054) 0.557 (0.054) 0.536 (0.055) 0.538 (0.055) 0.540 (0.055) Griffin 0.344 (0.068) 0.359 (0.068) 0.347 (0.068) 2011 Sandhills 0.382 (0.066) 0 .000 (0 .000 ) 0.462 (0.062) 0.384 (0.066) Stillwater 0.411 (0.067) 0.299 (0.072) 0.485 (0.061) 0.438 (0.066) 0 .000 (0 .000 ) Tifton 0.687 (0.041) 0.685 (0.041) 0.686 (0.041) Average 0.447 0.321 0.485 0.446 0.236 Citra 0.264 (0.057) 0.347 (0.065) 0.234 (0.065) 0.278 (0.058) College St n 0.571 (0.051) 0.398 (0.062) 0.473 (0.062) 0.507 (0.055) 0.325 (0.070) 2012 Dallas 0.872 (0.019) 0.828 (0.024) 0.831 (0.024) 0.873 (0.019) 0.676 (0.043) Griffin 0.412 (0.058) 0.414 (0.062) 0.141 (0.062) 0.419 (0.059) Sandhills 0.502 (0.055) 0.386 (0.063) 0.580 (0.063) 0.510 (0.055) Stillwater 0.487 (0.055) 0.249 (0.061) 0.601 (0.061) 0.529 (0.055) 0.010 (0.014) Tifton 0.794 (0.056) 0.571 (0.053) 0.803 (0.053) 0.794 (0.029) Average 0.557 0.456 0.523 0.558 0.337
51 Table 3 2 Continued. Series Trail TQ TQD TQND TQG TQNG Citra 0.357 (0.067) 0.324 (0.069) 0.274 (0.065) 0.357 (0.067) College St n 0.246 (0.072) 0.116 (0.066) 0.384 (0.067) 0.246 (0.072) Dallas 0.592 (0.050) 0.698 (0.040) 0.533 (0.055) 0.592 (0.050) 2013 Griffin 0.297 (0.069) 0.253 (0.064) 0.297 (0.069) Sandhills 0.509 (0.057) 0.406 (0.064) 0.503 (0.057) 0.509 (0.057) Stillwater 0.824 (0.026) 0.837 (0.024) 0.588 (0.051) 0.824 (0.026) Tifton 0.674 (0.044) 0.676 (0.044) 0.674 (0.044) Average 0.500 0.509 0.423 0.500 Citra 0.636 (0.069) 0.566 (0.079) 0.533 (0.080) 0.604 (0.074) 0.477 (0.088) College Stn 0 .000 (0 .000 ) 0 .000 (0 .000 ) 0.359 (0.087) 0 .000 (0 .000 ) Dallas 0.634 (0.068) 0.638 (0.066) 0.618 (0.068) 0.633 (0.069) 2014 Griffin 0.457 (0.082) 0.135 (0.087) 0.539 (0.074) 0.457 (0.082) Sandhills 0.720 (0.052) 0.443 (0.087) 0.700 (0.056) 0.720 (0.052) Stillwater 0.689 (0.057) 0.550 (0.075) 0.614 (0.068) 0.685 (0.058) 0 .000 (0 .000 ) Tifton . . Average 0.523 0.388 0.561 0.516 0.239
52 Figure 3 1. Distribution of measurements at different trails. The x axis marks the measurement month and the y axis shows the corresponding trails. Different shapes indicate which series the measurements belong to. The red and green color identifies those measurements were taken unde r drought or normal conditions, respectively.
53 Figure 3 2. Site to site Type B genetic correlations, r B for different calculated response variables by series. The whisker bar s represent the standard error s of the mean. Figure 3 3. Summary of broad sense heritability estimates, H 2 of calculated response variables considering data from all trails within a series. The whisker bars represent the standard errors of the mean.
54 Figure 3 4. Trait to trait Type A genetic correlations, r A for each trail based on the bivariate analysis of TQD against TQND with data from all trails within series. The whisker bars represent the standard errors of the mean.
55 Figure 3 5. Percentage of top genotype matching from series 2011 to 2014 with the selection intensities of 10%, 20% and 30%.
56 CHAPTER 4 IMPROVING PREDICTABILITY OF MULTI SENSOR DATA WITH NONLINEAR STATISTICAL METHODOLOGIES 1 Background The evaluation of the forage qu ality nutritive value and biomass usually takes multiple harvests, and is considered time consuming, labor intensive, and expensive. The use of sensors to evaluate different forage traits has been proposed as a method to alleviate this problem. However, mo st analytical techniques involve the use of traditional linear methods to predict and the prediction models can still be improved with the use of non linear methods. In this study, nonlinear methodologies were evaluated for their prediction accuracy in 16 agronomic important traits compare to traditional approaches and the importance of prediction variables were explored. Materials and Methods Experiment Description The experiment was carried out on a >15 bermudagrass field at the Noble Research Institute, LLC Red River Research and Demonstration Farm near Burneyville, OK (33.88 o N, 97.28 o W; elevation 234 meters). The soil was characterized as Slaughterville fine sandy loam (coarse loamy, mixed, superactive, thermic Ud ic Haplustolls) with N nitrate P and K soil test value s of <5, and 64 017 g kg 1 respectively, and pH of 6.3. Soil was amended with 178 kg K ha 1 ( 0 0 60 muriate of potash ) ( Pittman et al. 2015) The trial was setup as a RCBD with seven level s of N treatments (0, 28, 56, 84, 112, 168, and 224 kg N ha 1 ), and four block s ( 3.0 m x 6.0 m plot size) The treatment design was Chapter submitted to Crop Science Journal in August 2017
57 struc tured to ensure maximum variability in responses (DMY and nutritive value), which is needed for model construction. The N treatments were initially applied on 1 May2015 and reapplied on 15 June 2015 ( Pittman et al. 2015) Measurements Data were collected in 2015 using both standard physical methods and sensor measurements with a multi sensor array including ultrasonic, laser, and spectral sensors described by Pittman et al. ( 2015) Seven harvests occurred across the summ er, with first harvest on May 18 th and the last harvest on August 18 th with an average of 14 days harvest interval The total number of measurements w as 532 with 17 predictors and 16 response variables. The measurement variables include lasers ( Pittman et al. 2015) ultrasonic, NDVI from GreenSeeker (Trimble Inc., Sunnyvale, CA) infrared reflectance vegetative index ( IRVI ) normalized difference red edge index ( NDRE ) NDVI from Crop Circle ACS 430 active canopy se nsor ( HSNDVI ) (Holland Scientific, Lincoln, NE) leaf area index proxy index ( L AI VI ) chlorophyll content index ( CCCVI ) red edge reflectance (RE), near infrared reflectance (NIR), red reflectance ( RED ) v egetation temperature (VegTemp), i ntercepted photos ynthetically active radiation (IPAR), RPAR, reflectance at 532 nm ( nm532 ) reflectance at 550 nm ( nm550 ), and reflectance at 700nm ( nm700 ). The agronomical ly important traits were DMY and concentrations of crude protein (CP), Ca, P, K, Mg acid detergent f iber ( ADF ) neutral detergent fiber ( NDF ) total digestible nutrients ( TDN ), lignin in vitro true dry matter digestibility ( IVTDMD ) a sh, 48 hr digestible NDF ( dND F 48 ) fructana s ugars, and water soluble carbohydrates ( WSC ). The DMY and forage nutritive value analyses were estimataed by hand clipping two 0.11 m 2 quadrats per plot to a 2.5 cm stubble height. Samples were dried in a forced draft oven at 50 C for five days weighed, ground in a Wiley mill (Thomas Scientific, Swedesboro, NJ) to pass a 1 mm screen and
58 submi tted for nutritive value analysis. All nutritive value analyses were conducted using the Foss 6500 NIRS instrument. The samples were scanned using Foss ISIScan software (Infrasoft, 2003) and nutritive values were estimated using 2013 prediction equations for grass hay developed by the NIRS Forage and Feed Testing Consortium (Hillsboro, WI) Missing values of sensor measurements were handled in two ways. If the missing values for a single observation were foun d in more than three sensor variables, the observation was removed. If the missing values were in less than three sensor variable measurements, then they were filled with imputation from a linear model. To impute the missing values, all the sensor variable s without missing values on the target imputation rows were fitted on a linear model, then a stepwise AIC was utilized to reach the final model. Thereafter, the prediction values from the final models were used to fill in the missing values in the target r ows. In this study, five sonar data points were imputed. Feature Engineering Based on the variables in data set the canopy temperature depression (CTD) index and fraction intercepted photosynthe tically active radiation (FIPAR) were calculated with CTD = airTemp VegTemp ( 4 1 ) FIPAR = ( 4 2 ) where airTemp is the air temperature; VegTemp is the vegetation temperature; IPAR is the incident photosynthe tically active radiation; RPAR is the reflect ed photosynthe tically active radiation. Criteri a of Model Performance To evaluate the performance of models to predict a target trait the root mean square error (RMSE) was proposed. However, because there are multiple traits with different
59 measurement units in the dataset and comparison among the performance of different traits were also of interest, the normalized root mean square error (NRMSE) was utilized. The RMSE and NRMSE were calculated as RMSE = ( 4 3 ) N RMSE = ( 4 4 ) where denoted the observed value and denoted the predicted value of corresponding observation; is the mean of the observations in the test dataset. Even though the usage of NRMSE makes the model performance across all the traits comparable, the accuracy of the prediction was not very well characterized. Therefore, the predictability (the correlation between predicted and observed values) was also calculated. Evaluation of the Prediction Methodologies Partial least square regression is a widely used methodology to regress predictor data aga inst target prediction traits based on the assumption that the response variables are from a process generated by unobserved latent variables (Rosipal and Krmer, 2006) It is a competitive prediction model both due to its light computational requirements and superior performance when collinearity exists in the dataset (Wol d et al., 1984) Even though PLS regression initially gained popularity in chemometric studies for its superior prediction performance ( Sjstrm et al. 1983 ; Geladi and Kowalski, 1986 ; Frank, 1987; Tobias, 1995) its application has been expanded to various research fields, such as genetics and ecological studies ( Nguyen and Rocke, 2002 ; Carrascal et al., 2009 ) In spectral sensor data analysis, PLS regression was initially used as the default statistical methodology to predict variables of interest [e.g., estimation of grass biomass and measurement of nitrogen status (Hansen and Schjoe rring, 2003 ; Cho et al., 2007 ) ]. However, there are other prediction methodologies, such as ridge regression and random forest, that have
60 been used in other applications and shown high performance, which could potentially outperform PLS regression. Ridg e regression is a methodology initially proposed by Hoerl and Kennard ( 1970) to address the potential instability in th e least square estimations by adding a small constant value to the diagonal entries of the matrix X T X before taking its inverse. Even though the ridge regression estimators are biased, the prediction performance of this methodology is quite competitive. Be cause of the small noise added to the diagonal entries of the matrix X T X, the ridge regression can handle multicollinearity very neatly, which is especially helpful in the sensor data as the collinearity is not uncommon (Mahajan et al. 1977; Rook et al. 1990) Support vector machine regression was based on the classification algorithm that projected the data into hyperplanes for differentiation and has been adapted to regression problems through a fixed feature space transformation (Bishop, 2007) Considering the flexibility of kernel functions to capture the non linearity relationships, it has been widely adapted for various usage such as prediction of corporate financial distress, exchange rate prediction, wind speed prediction, and remote sensing (Mohandes et al., 2004; Pai et al., 2006; Hua et al., 2007; Mountrakis et al., 2011) Random forest is an ensemble learning method based on constructing multiple decision (Liaw and Wiener, 2002) The combination of random selection of variables at each tree node split, full tree length growth, and multiple tree copies gives RF superior performance in multiple problems and effectively avoids overfitting issues. In this study, PLS regression was used as the benchmark, and other statistical method ologies need to surpass its performance to be considered as alternatives. The methodologies tested included ridge regression, SVM and RF The tuning of the model parameters was determined with the inner loop of the nested cross validation and their
61 perform ance was evaluated with the hold out dataset in outer loop (Figure 4 1). For the PLS 10), respectively. Additional tuning parameters, such as number of trees, were determined by defined testing parameters to form the tuning grids. For ridge regression and SVM, the number was set to 20, which is a compromise between the computing time and reliability of parameter selections. The tuning parameter of ridge regression was weight decay, lambda, whereas, SVM had two parameters, sigma and cost to tune After the determination of model parameters, the models could be potentially used to predict future harvests with upcoming sensor data measurements. T he estimation of model performance with cross validation could be biased (Varma and Simon 2006) therefore to have a solid evaluation on the performance of predictive models in this study, the nest cross validation was used (Krstajic et al., 2014) In the nest cross validation setup, the data were split into k fold and each time one fold was retained for testing and t he remaining k 1 folds were used for training and validation (Fig ure 4 1) Different from k fold cross validation, the hold out sets do not have impacts on the parameter tuning and selections, only providing performance measurements of models. Evaluation of Predictor Variables Since biomass is the most important agronomical trait in this study, it was utilized as prediction target to illustrate the model performance after the removal of variables. The deterioration of performance after removal of a variable was used to measure the im portance of the removed variable. The importance of variables was determined with a step by step elimination. The NRMSE was recorded for each removed variable, and after each variable had been eliminated, the one with the lowest NRMSE was removed. In each round, only one variable
62 was removed and the rest were used for model training and prediction testing. The assumption of removing the variable with the lowest NRMSE is that if the variable is important, by removing it, the performance of model should deter iorate quickly, resulting in high NRMSE. By repeating the process until only two variables remained, the importance of each sensor variable was (Team, 2014) Results Predictability, which is the correlation between predicted and test dataset values, of the four methodologies was compared (Fig ure 4 2 and Table C 3 ). The RF and SVM with radial kernel were consistently better in prediction accuracy than PLS regression and ridge regression. In all the traits except DMY, SVM predicted better than the RF with higher predictability values. Both PLS regression and ridge r egression had similar performance and were not statistically different in all traits. By the criteria of the predictability of traits greater than 0.85 for the best performance model, CP, DMY, dNDF48, IVTDMD, TDN, K, Mg, P, ADF, fructan, and NDF could be r eliably predicted by sensor data (Fig ure 4 2 and Table C 3 ). The DMY and TDN traits were selected based on largest and smallest NRMSE and to represent biomass and forage quality for further detailed investigation on the prediction performance (Fig ure 4 3). For DMY, most of the values clustered between 0 and 2500 kg ha 1 in which the prediction values in PLS regression and ridge regression had greater variance than SVM and RF. At observations of high DMY, the prediction of all the methodologies tended to underestimate the observed value (Fig ure 4 3). The prediction patterns between SVM and RF were very similar and the predictability of RF (0.887) and SVM (0.883) was almost the same (Fig ure 4 3). Conversely, the overall value distribution of TDN was relati vely uniform across the value range. The SVM and RF had more accurate predictions on TDN compared to PLS
63 regression and ridge regression, which reflected as data points were closer to the 1:1 perfect prediction line (Fig ure 4 3). There was no obvious bias found in the SVM model and its correlation is 0.934, indicating it could be reliably used for future predictions (Fig ure 4 3). Even though the predictability reflects the overall trends of relative model performance, it does not provide a good measurement of total missing prediction loss and the bias, which are the magnitude of prediction deviations from the real measured values. Therefore, the NRMSE of the models were also reported to reflect the absolute missed prediction It was apparent that model predi ction performance of RF and SVM with radial kernel were more accurate better than PLS regression and ridge regression (Fig ure 4 4 ). Between RF and SVM, the performance greatly depends on the traits in the study. For ash and TDN, RF and SVM have quite simil ar performance. The RF had a more accurate prediction performance in DMY trait, whereas the SVM was slightly better in CP, sugars, WSC, dNDF48, IVTDMD, Ca, K, Mg, P, ADF, fructans, lignin, and NDF (Fig ure 4 4 ). Comparisons among the traits indicate that CP DMY, sugars, and WSC are more difficult to predict than the other traits in terms of the total prediction loss. Out of all the traits, DMY is the most difficult trait to predict, whereas TDN, dNDF48, ADF, and NDF do not have much bias in predictions and are the traits most easily predicted. The results from NRMSE and predictability were not perfectly matched as some of the traits, such as DMY and CP, had decent predictability but its NRMSE had quite high values. After the performance of the models was com pared in different traits, the importance of sensor variables was studied as well. As DMY is the most important trait, it was used to represent the impact of removing of variables on the model prediction performance. In this study, the interval between har vests was approximate 1 4 day. T he average DMY of seven harvests across the summer from the first to last were 1757, 2408, 2857, 2062, 2621, 1893, 1746 kg ha 1
64 respectively. The greatest yield occurred at the 3rd harvest (June 22 nd ) and lesser yields measured in early and late summer. Since NRMSE could reflect the performance of models on both the prediction and bias aspects, it was used in the study of impact of removed variables on the model performance. The removal of the first eight variables, NDRE RE, LaiVI, FIPAR, HSNDVI, CCCVI, CTD, and IRVI, did not have much impact on the prediction of DMY trait and the performance of all predictive methods was stable (Fig ure 4 5). For PLS regression and ridge regression, after removal of the 9 th variable, IRV I, the prediction performance exhibited a steep drop, and similar situation occurred when the 14 th variable, RED, was removed, at which NRMSE increased at least 40% compared to the starting point. On the other hand, following the removal of the variable ND VI and the remaining variables, RF prediction performance gradually decreased while NRMSE increased around 21% compare to the original. Moreover, the NRMSE of SVM started at a greater value than RF, but was stable in the variable reduction process, resulti ng in a 9% increase of NRMSE. Overall, RF and SVM were more stable than PLS regression and ridge regression when there is variable information missing in the dataset. Discussion To describe model performance, two criteria, predictability and NRMSE, were u sed. Some traits (e.g. CP) were very high in terms of predictability, but not as good as expected measured in NRMSE, which could result from the bias of the prediction. Some of the predictions at high values may be under predicted by the models (e.g. DMY) however, they are still higher in values compared to the predictions with lower observation values, which would be reflected in NRMSE but would not be covered by the correlations (Barnston, 1992) Regardless of criteria used to represent model performance, SVM always performed better than PLS regression and ridge regressio n with higher predictability and lower NRMSE values In other words, the
65 prediction accuracy of nonlinear models is better than the linear models, which could be caused by some nonlinear trend in the mixed sensor dataset (Fan et al., 2009) In forage crop experiments, multiple harvests within one year is not uncommon (Woodard and Prine, 1991; Robins et al., 2007; Tahir et al., 2011 ; Inostroza et al. 2015; Inostroza et al. 2016) When multiple harvests are involved in the experiment, determination of harvests that should be included in the future harvest prediction become very important. In this study, the result s of utilizing all the previous harvests to predict future harvest ind icate that usage of all the harvests to predict future harvest may not be the ideal condition ( Table C 1). Harvests 4 and 5 have the best prediction performance compared to the predictions on the remaining harvest s At the same time by increasing the numb er of harvests used for the model building, harvest 6 and 7 did not have a superior prediction performance ( Table C 1). When a single harvest was used in model building and prediction for future harvest, some of the future harvest s could be well predicted by the previous harvests. For example, harvest 4 was predicted with harvest 3 data ( NRMSE = 0.14), while others were not (Table C 2). Hence, the prediction performance heavily depends on whether previous harvest could provide useful information to predict the future harvest. In future studies, if the research interest is to predict future harvest, it may be worth determining how many harvests should be included for model building to have the optimal prediction performance even though the harvest frequency may greatly depend on the physiological natur e of the forage crops ( Woodard and Prine, 1991 ; Sanderson et al., 1999 ) Compared to other applications of sensor technology in forage crops, our analysis showed high prediction performance. Zhao et al. ( 2007) studied the prediction of forage biomass and quality parameters of bermudagrass with canopy reflectance measurements in which NDF, ADF, CP, and biomas s had predictability around 0.72, 0.45, 0.85, and 0.74, respectively. In
66 comparison, by analyzing the combined sensor data with the SVM model, NDF, ADF, CP, and DMY in our study had predictability around 0.94, 0.93, 0.94, and 0.88, respectively. Knox et al ( 2011 ) reported the total variance explained by regression for fiber and P traits were 65 and 57%, respectively, when hyperspectral Carnegie airborne observatory sensor was used to predict African savanna forage quality. In our study, the predictability of fiber and P w ere 90 and 87%, respectively, which could be considered an improvement compared to previous studies. Additionally, Zhao et al. ( 2007) reported bermudagrass r 2 values of NDF, ADF, and CP of 0.23, 0.21, 0.51, respectively, with two band reflectance ratios (R (NIR) /R (red) ). However, in our study, the same three traits had r 2 values > 0.87 ( Fig ure 4 2 ). Similarly, Lee et al. ( 2005) reported r 2 = 0.85 when predicting P concentration with multispectral image analysis on b ahiagrass ( Paspalum n otatum Flugge) which was lower than our measurement of r 2 = 0.9 0 Additionally, in their evaluation of prediction performance, neither any form of cross validation nor hold out dataset was utilized, which may lead to the overoptimistic results and model p rone to overfitting issue. Moreover, compared to the cross validation used by Pittman et al. ( 2016) to evaluate the model prediction, nested cross validation in this study provided more reliable e stimation as the testing set is not encountered by the model construction even by indirect manners (Cawley and Talbot, 2010 ) In summary, by observing 11 out 16 of traits in this study that could be reliably predicted by the sensor data (with the correlation between observation and prediction greater than 0.85), the application of combined sensor systems in this type of re search seems very promising. Additionally, this study provided some evidence that some nonlinear models had superior performance and they were more robust when there was limited information available. Even though the removal of variables may have had some impact on the prediction performance, the
67 degree of prediction accuracy change was quite different. It is interesting to observe that the removal of the first eight variables, NDRE, RE, LaiVI, FIPAR, HSNDVI, CCCVI, CTD, and IRVI, had little impact on predi ction performance, which may be interesting information to be considered by engineers in future product development. Moreover, SVM and RF have a more robust prediction performance compared to the PLS regression and ridge regression, showing superior predic tion accuracy even with reduced information.
68 Figure 4 1. An example of nested cross validation with outer and inner loops both set as 5 fold.
69 Figure 4 2. Predictability correlation between predicted and test dataset values, of agronomical ly important traits with applications o f statistical methodologies, partial least square regression, ridge regression, support vector machine (with radial kernel) and random forest The whisker bar represents the standard error of the mean.
70 Figure 4 3. Predicted vs. observed values of dry matter yield and total digestible nutrient traits. The 1:1 line was used to illustrate a perfect matching for the predictions and observations. PLS regression, partial least square regression; SvmR adial, support vector machine with radial kernel; RF, random forest.
71 Figure 4 4. The agricultural ly important traits and their corresponding normalized root mean square error (NRMSE) with applications of statistical methodologies. Lower valu e of NRMSE reflects better model prediction performance. The whisker bar represents the standard error of the mea n
72 Figure 4 5 The model performance in dry matter yield after removing each of the variable s in sequence from the data set. Each of variable name s denote s the normalized root mean square error (NRMSE) after removal of that variable and lower value of NRMSE indicate s better prediction performance. The whisker bar represents the standard error of the mean. Pls, partial least square regression; rf random forest; ridge, ridge regression; svmRadial, support vector machine with radial kernel; NDRE normalized difference red edge index ; RE red edge reflectance ; LAIVI leaf area index proxy index ; FIPAR, fraction intercepted photosynthe tically active radiation ; HSNDVI, NDVI from Crop Circle ACS 430 active canopy sensor (Holland Scientific, Lincoln, NE) ; CCCVI chlorophyll content index; CTD, canopy temperature depression index; IRVI infrared reflectance vegetative index ; nm70 0, reflectance at 700nm ; N DVI, NDVI from GreenSeeker (Trimble Inc., Sunnyvale, CA) ; nm550 reflectance at 550 nm; NIR near infrared reflectance ; soncm, ultrasonic measured in cm; RED, red reflectance ; nm532 reflectance at 532 nm.
73 CHAPTER 5 CONCLUSIONS To embrace the fast generation of data information and improve the cultivar development efficiency, the utilization of powerful statistical methodologies become more and more important when facing with the present challenges in plant breeding. The adoption of improved statist ical methodologies should include both the advanced experimental design, reliable analysis tools, and accurate prediction machines. To solve the problem associated with testing large number of genotypes, causing the blocks of randomized complete block (RC B) design losing its environmental control capability, the post hoc blocking designs were tested. In our study, the post hoc R C design are superior compared to the post hoc IB designs and the original RCB designs both at the single measurement level and a t site level. The narrow sense heritability estimate, h 2 after post hoc blocking did not change considerably, but the ranking of the top performance genotypes varied and this can have a significant impact on the breeding selection process. The h 2 estimated with post hoc R C design were 0.399, 0.362, 0.321, 0.239, 0.309 for Jay, PSREU, Duda, RB farm and Bethel trials, respectively. The type B correlation follow ed a similar trend as the h 2 except for Bethel with a lower h 2 value than expected. Since the post hoc blocking providing superior estimation accuracy of genetic parameters, when analyzing experimental trails with large number of genotypes, we recommend using the post hoc blocking, sometimes, even better, at the experimental setup stage, plann ing a more efficient experimental design that provide better control of the smaller size environmental variation of genotypes should be implemented The issues that the post hoc blocking employed to tackle are mostly related to local heterogeneous environ mental variation; however, there are also other challenges existed at a larger scale to the plant breeding. As the unpredictability of temperature change happening in the
74 past few years, drought effects have become more frequent and stronger in many plant science field, especially in turfgrass breeding. To facilitate the selection of genotypes under the occurrence of drought conditions, the multiple site trails were analyzed to report the genotype by environment (GxE) interaction and broad sense heritabilit y (H 2 ) in zoysiagrass. The GxE interaction estimated from all the data were ranging from 0.350 to 0.727 at different years, indicating a wide variation of GxE interaction exists When the GxE interaction and H 2 were estimated with only drought and non dro ught data, the genetic parameters from non drought data showed stronger signals. Hence, we recommend utilizing the data without drought condition to calculate the breeding values for genotypes in selection of general turf quality traits. From bivariate ana lysis, the agreement of genotype performance under drought and non drought conditions varied significantly based on year and sites. Similar trends were found in the bivariate analysis of turf quality under growth and non growth conditions, however, most of the time, the performance of genotypes was highly correlated. To alleviate the impact of drought on the cultivar development, tufgrass breeders could potentially consider focusing on non drought and the growth season data to aid with breeding selections. Besides the implementation of statistical approaches to handle the environmental variations and the impact of drought on phenotypical evaluations of genotypes, this study also provided an alternative to fast accessing the forage quality and biomass, which are usually time consuming, labor intensive, and expensive. By improvement of prediction performance through nonlinear statistical methodologies, the utilization of sensor technologies becomes more appealing for the applications in agricultural related res earch fields. The non destructive multi sensor system that accommodate spectral, ultrasonic, and laser data were tested on a bermudagrass experiment, from which the random forest (RF) and support vector machine with
75 radial kernel (SVM) were found to be mos t competitive. The RF had the best performance with correlation being 0.89 in dry matter yield (DMY) trait prediction, whereas SVM performed best in the rest 15 traits with correlation ranging from 0.72 to 0.95. Besides the prediction performance of statis tical models, this study also provided some insight about the importance of variables by removing variables and re evaluate s the model performance in DMY trait. As most of the traits in this study could be reliably predicted by the SVM and RF, we are expec ting to see more of sensor technology applications in the agronomic field in future. Overall, the results of this study support the plant breeding from the statistical methodology aspect, which can be extended beyond the turfgrass or forage breeding. Whil e the species zoysiagrass and bermudagrass were used as models here, the recommendations from this study can extend to different plant breeding programs
76 APPENDIX A VARIANCE COVARIANCE MATRIX OF GXE AND THE TEMPERATURE INFORMATION AT VARIOUS LOCATIONS OF CHAPTER 3 Table A 1. The variance covariance matrix reflects the covariance (lower diagonal), variance (diagonal), and correlation (upper diagonal) of seven sites. Citra ColSta Dallas Griffin Sandhills Stillwater Tifton Citra 0.415 0.340 0.511 0.406 0.380 0.019 0.728 ColSta 0.101 0.212 0.578 0.066 0.017 0.046 0.300 Dallas 0.223 0.180 0.458 0.286 0.291 0.057 0.592 Griffin 0.075 0.009 0.056 0.083 0.145 0.114 0.624 Sandhills 0.192 0.006 0.154 0.033 0.611 0.618 0.397 Stillwater 0.006 0.011 0.020 0.017 0.249 0.265 0.148 Tifton 0.448 0.132 0.383 0.172 0.297 0.073 0.914
77 Table A 2. The average precipitation and temperature of selected sites within given months. The calculation of average following three steps : 1). manually confirm the data collection months within a File (includes two years) 2). search the precipitation information related to confirmed months, 3). get an average of precipitation of the months. File Site Average precipitat ion (inch) Average temperature(F) Citra 5.48 73.64 College Station 2.26 76.98 2011 2012 Dallas 2.52 72.83 Stillwater 1.31 Tifton 8.41 75.85 Average 4.00 74.83 Citra 2.48 75.58 College Station 1.44 80.06 2012 2013 Dallas 2.16 77.61 Stillwater 2.86 Tifton 5.19 Average 2.83 77.75 Citra 4.7 77.32 College Station 3.3 82.12 2013 2014 Dallas 2.11 81.97 Stillwater 4.05 Tifton 2.61 80.83 Average 3.35 80.56 Citra 3.02 72.77 College Station 2.16 83.55 2014 2015 Dallas 3.04 81.6 Stillwater 4.09 Tifton . Average 3.08 79.31
78 APPENDIX B SUMMARY OF GXE INTERACTION AND BROAD SENSE HERITABILITY CONSIDERING ALL LOCATIONS WITHIN SERIES Table B 1 Site to site Type B genetic correlations for different response variables in various series. Values in parenthesis correspond to the standard error of the estimates. Series TQ TQD TQ ND TQ G TQ NG 2011 0.350 (0.048) 0.277 (0.070) 0.296 (0.045) 0.333 (0.048) 0.633 (0.113) 2012 0.727 (0.035) 0.608 (0.046) 0.724 (0.035) 0.710 (0.036) 0.899 (0.094) 2013 0.378 (0.047) 0.308 (0.049) 0.598 (0.054) 0.377 (0.047) 2014 0.621 (0.060) 0.425 (0.079) 0.689 (0.056) 0.621 (0.061) 0.999 ( ) Average 0.519 0.405 0.577 0.510 0.844
79 Table B 2. Summary of broad sense heritability of calculated response variables considering all trials within series. The calculation of heritability was based on Eq. 5 and their respective variance components were estimated with Eq. 3. Values in parenthesis correspond to th e standard error of the estimates. TQ, averages of repeated measurements of turf quality; TQD, averages of repeated measurements of turf quality under drought conditions ; TQND, averages of repeated measurements of turf quality under non drought conditions; TQG, averages of repeated measurements of turf quality in growing months (April/May to October/November); TQNG, averages of repeated measurements of turf quality in non growing month s ( No vember/December to March April) Series TQ TQ D TQ ND TQ G TQ NG 2011 0.165 (0.010) 0.071 (0.007) 0.147 (0.007) 0.156 (0.009) 0.108 (0.018) 2012 0.397 (0.021) 0.296 (0.018) 0.385 (0.021) 0.391 (0.021) 0.309 (0.043) 2013 0.214 (0.009) 0.185 (0.007) 0.246 (0.020) 0.214 (0.009) 2014 0.385 (0.023) 0.172 (0.018) 0.405 (0.031) 0.379 (0.038) 0.380 (0.075) Average 0.290 0.181 0.296 0.285 0.266
80 Table B 3. Bivariate analysis of TQD against TQND and TQG against TQNG traits with all the data from sites within years. College Stn, abbreviation of location, College Station. Series Trail TQD vs. TQ ND TQ G vs. TQ NG Citra 0.820 (0.072) 0.959 (0.276) College Stn 0.032 (0.140) 0.224 (0.170) Dallas 0.878 (0.030) 0.956 (0.016) 2011 2012 Griffin Sandhills 0.703 (0.087) Stillwater 0.541 (0.132) 0.070 (0.202) Tifton Citra 0.905 (0.037) College Stn 0.766 (0.125) 0.948 (0.128) Dallas 0.884 (0.024) 0.851 (0.032) 2012 2013 Griffin 0.999 (0.000) Sandhills 0.864 (0.051) Stillwater 0.708 (0.083) 0.999 (0.427) Tifton 0.713 (0.056) Citra 0.873 (0.042) College Stn 0.622 (0.212) Dallas 0.886 (0.029) 2013 2014 Griffin Sandhills 0.976 (0.047) Stillwater 0.757 (0.049) Tifton Citra 0.721 (0.106) 0.852 (0.072) College Stn 0.079 (0.293) Dallas 0.358 (0.128) 2014 2015 Griffin 0.999 ( 0 000 ) Sandhills 0.822 (0.094) Stillwater 0.682 (0.113) 0.929 (0.036) Tifton
81 Table B 4 The correlation matrix obtained by modeling data from seven trails in series 2011 with CORGH variance covariance structure College Stn, abbreviation of location, College Station. Citra College Stn Dallas Griffin Sandhills Stillwater Tifton Citra 1 0.334 0.511 0.406 0.380 0.019 0.728 College Stn 1 0.578 0.066 0.017 0.046 0.300 Dallas 1 0.286 0.291 0.057 0.592 Griffin 1 0.145 0.114 0.624 Sandhills 1 0.618 0.397 Stillwater 1 0.148 Tifton 1
82 APPENDIX C PREDICTABILITY AND STANDARD ERROR FOR CHAPTER 4 MODELS Table C 1. Prediction performance of target harvest date in dry matter yield trait with sensor data measurements using statistical models built with all previous harvest data. The calculated values are the normalized root mean square error (NRMSE). Harvest Date NRMSE June 2 nd 0.77 June 22 nd 0.57 July 9 th 0.23 July 21 st 0.29 August 6 th 0.77 August 18 th 0.70 Table C 2 Use of dataset from one harvest to predict another in dry matter yield trait. Column names are the harvests to be predicted and the row names are the harvests used to build the models for prediction. All the predictions are based on one harvest and the calculated values are the normalized root mean square error (NRMSE). Harvest2 Harvest3 Harvest4 Harvest5 Harvest6 Harvest7 Harvest1 0.77 0.74 0.29 0.52 0.99 0.84 Harvest2 0.62 0.34 0.27 0.73 0.87 Harvest3 . 0.14 0.40 0.72 0.80 Harvest4 . 0.47 0.80 0.86 Harvest5 . . 0.65 0.59 Harvest6 . . 0.76
83 Table C 3 Predictability of agronomical important traits with applications of statistical methodologies, PLS regression, ridge regression, SVM, and RF. The predict ability corresponds to prediction vs. observation values and SE is the standard error of respective means. Trait Methods Predictability SE CrudeProtein pls 0.886 0.004 CrudeProtein rf 0.918 0.005 CrudeProtein ridge 0.890 0.006 CrudeProtein svmRadial 0.942 0.003 DMY pls 0.863 0.011 DMY rf 0.891 0.014 DMY ridge 0.859 0.008 DMY svmRadial 0.879 0.015 Sugars pls 0.544 0.018 Sugars rf 0.681 0.018 Sugars ridge 0.513 0.017 Sugars svmRadial 0.712 0.027 WSC pls 0.578 0.021 WSC rf 0.684 0.021 WSC ridge 0.555 0.022 WSC svmRadial 0.725 0.030 Ca pls 0.694 0.016 Ca rf 0.743 0.031 Ca ridge 0.689 0.019 Ca svmRadial 0.786 0.016 K pls 0.776 0.016 K rf 0.823 0.017 K ridge 0.777 0.017 K svmRadial 0.856 0.014 Mg pls 0.824 0.017 Mg rf 0.880 0.011 Mg ridge 0.833 0.011 Mg svmRadial 0.900 0.009 P pls 0.904 0.005 P rf 0.939 0.002 P ridge 0.906 0.006 P svmRadial 0.947 0.003 Ash pls 0.700 0.022 Ash rf 0.755 0.025 Ash ridge 0.703 0.031
84 Ash svmRadial 0.764 0.019 dNDF48 pls 0.805 0.017 dNDF48 rf 0.844 0.020 dNDF48 ridge 0.802 0.015 dNDF48 svmRadial 0.862 0.018 IVTDMD pls 0.884 0.003 IVTDMD rf 0.917 0.008 IVTDMD ridge 0.886 0.009 IVTDMD svmRadial 0.935 0.007 TDN pls 0.896 0.006 TDN rf 0.925 0.007 TDN ridge 0.900 0.005 TDN svmRadial 0.932 0.008 ADF pls 0.896 0.006 ADF rf 0.925 0.007 ADF ridge 0.900 0.005 ADF svmRadial 0.933 0.008 Fructan pls 0.761 0.028 Fructan rf 0.846 0.006 Fructan ridge 0.770 0.010 Fructan svmRadial 0.857 0.006 Lignin pls 0.779 0.015 Lignin rf 0.812 0.018 Lignin ridge 0.780 0.030 Lignin svmRadial 0.834 0.016 NDF pls 0.899 0.004 NDF rf 0.925 0.007 NDF ridge 0.902 0.008 NDF svmRadial 0.941 0.006
85 LIST OF REFERENCES Al Jibouri, H., Miller, P. A., & Robinson, H. F. (1958). Genotypic and environmental variances and covariances in an upland cotton cross of interspecific origin. Agronomy Journal 50 (10), 633 636. Allard, R. W., & Bradshaw, A. D. (1964). Implicat ions of Genotype Environmental i nteraction s in applied plant b reeding. Crop Science 4 (5), 503. Cobb, N. (2010). A global overview of drought and heat induced tree mortality reveals emerging climate change risks for forests. Forest Ecology and Management 259 (4), 660 684. Annicchiarico, P. (2002). Genotype x environment interactions: challenges and opportunities for plant breeding and cultivar recommendations. Food & Agriculture Org. 174 Arn, J., Escol, A., Valls, J. M., Llorens, J., Sanz, R., Masip, J., Palacin, J., & Rosell Polo, J. R. (2013). Leaf area index estimation in vineyards using a ground based LiDAR scanner. Precision Agriculture 14 (3), 290 306. Aziz, S. A., Steward, B. L., Birrell, S. J., Kaspar, T. C. & Shrestha, D. S. (2004). Ultrasonic Sensing for Corn Plant Canopy Characterization. Paper Number 041120, 2004 ASAE Annual Meeting, August 1 4, 2004 1 11. Barnston, A. G. (1 992). Correspondence among the c orrelation, RMSE, and Heidke Forecast Verification Measures; Refinement of the Heidke Score. Weather and Forecasting 7 (4), 699 709. Becker, H. C., & Leon, J. (1988). Stability analysis in plant breeding. Plant Breeding 101 (1), 1 23 Bergeman, C. S., & Plomin, R. (1989). Genotype environment interaction. In M. H. Bornstein & J. S. Bruner (Eds.), Intera ction in human development 157 171. Hillsdale, NJ: Erlbaum. Billings, W. D. (1987). Constraints to plant growth, reproduction, and establishment in arctic environments. Arctic and Alpine Research 19 (4), 357. Bishop, C. (2007). Pattern Recognition and Mac hine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York Bouraoui, R., Lahmar, M., Majdoub, A., Djemali, M., & Belyea, R. (2002). The relationship of temperature humidity index with milk production of dairy cows in a Mediterranean climate. Animal Research 51 (6), 479 491. Bowyer, P., & Danson, F. M. (2004). Sensitivity of spectral reflectance to variation in live fuel moisture content at leaf and canopy level. Remote Sensing of Environment 92 (3), 297 308.
86 Braman, S. K., Duncan, R. R., & Engelke, M. C. (2000). Evaluation of turfgrass selections for resistance to fall armyworms (Lepidoptera: Noctuidae). HortScience 35 (7), 1268 1270. Brede, D. (2000). sports, lawns, and golf Ann Arbor Press. Busey, P., Reinert, J. A., & Atilano, R. A. (1982). Genetic and environmental determinants of zoysiagrass adaptation in a subtropical region. Journal of American Society for Horticultural Science 107 (1), 79 82. C arrascal, L. M., Galvn, I., & Gordo, O. (2009). Partial least squares regression as an alternative to current regression methods used in ecology. Oikos 118 (5), 681 690. Cartelat, A., Cerovic, Z. G., Goulas, Y., Meyer, S., Lelarge, C., Prioul, J. & Moya, I. (2005). Optically assessed contents of leaf polyphenolics and chlorophyll as indicators of nitrogen deficiency in wheat (Triticum aestivum L.). Field Crops Research 91 (1), 35 49. Cawley, G. C., & Talbot, N. L. C. (2010). On Over fitting in Mode l Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research 11 (Jul), Ceccato, P., Gobron, N., Flasse, S., Pinty, B., & Tarantola, S. (2002). Designing a spectral index to estimate vegetation water content from remote sensing data: Part 1: Theoretical approach. Remote Sensing of Environment 82 (2), 188 197. Chapman, S. C. (2008). Use of crop models to understand genotype by environment interactions for drought in real world and simulated plant breedi ng trials. Euphytica 161 (1 2), 195 208. Cho, M. A., Skidmore, A., Corsi, F., van Wieren, S. E., & Sobhan, I. (2007). Estimation of green grass/herb biomass from airborne hyperspectral imagery using spectral indices and partial least squares regression. In ternational Journal of Applied Earth Observation and Geoinformation 9 (4), 414 424. Clewer, A. G., & Scarisbrick, D. H. (2013). Practical Statistics and Experimental Design for Plant and Crop Science Jo hn Wiley & Sons. Cullis, B. R., Lill, W. J., Fisher, J. A., Read, B. J., & Gleeson, A. C. (1989). A New Procedure for the Analysis of Early Generation Variety Trials. Applied Statistics 38 (2), 361. Ebdon, J. S., & Gauch, H. G. (2002). Additive Main Effect and Multiplicative Interaction Analysis of National Turfgrass Performance Trials. Crop Science 42 (2), 497 506. Eberhart, S. A., & Russell, W. A. (1966). Stability parameters fo r comparing varieties Crop Science 6 (1), 36. Ehlert, D., Heisig, M., & Adamek, R. (2010). Suitability of a laser rangefinder to characterize winter wheat. Precision Agriculture 11 (6), 650 663.
87 Ehlert, D., Horn, H. J., & Adamek, R. (2008). Measuring crop biomass density by l aser triangulation. Computers and Electronics in Agriculture 61 (2), 117 125. & Gil, E. (2011). Performance of an ultrasonic ranging sensor in apple tree canopies. Sensors 11 (3) 2459 2477. Fan, W., Hu, B., Miller, J., & Li, M. (2009). Comparative study between a new nonlinear model and common linear model for analysing laboratory simulated forest hyperspectral data. International Journal of Remote Sensing 30 (11), 2951 2962. Fa n, X. M., Kang, M. S., Chen, H., Zhang, Y., Tan, J., & Xu, C. (2007). Yield Stability of Maize Hybrids Evaluated in Multi Environment Trials in Yunnan, China. Agronomy Journal 99 (1), 220 Finlay, K., & Wilkinson, G. (1963). The analysis of adaptation in a plant breeding programme. Australian Journal of Agricultural Research 14 (6), 742. Finney, M. A., McHugh, C. W., & Grenfell, I. C. (2005). Stand and landscape level effects of prescribed burning on two Arizona wildfires. Canadian Journal of Forest Research 35 (7), 1714 1722. Forbes, I. (1952). Chromosome Numbers and Hybrids in Zoysia1. Agronomy Journal 44 (4), 194. Frank, I. E. (1987). Intermediate least squares regression method. Chemometrics and Intelligent Laboratory Systems 1 (3), 233 242. Fran zluebbers, A. J., Stuedemann, J. A., & Wilkinson, S. R. (2001). Bermudagrass Management in the Southern Piedmont USA. Soil Science Society of America Journal 65 (3), 834. Gamon, J. A., Peuelas, J., & Field, C. B. (1992). A narrow waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sensing of Environment 41 (1), 35 44. Geladi, P., & Kowalski, B. R. (1986). Partial least squares regression: a tutorial. Analytica Chimica Acta 185 1 17. Genc, L ., Dewitt, B., & Smith, S (2004 ). Determination of wetland vegetation height with LIDAR. Turkish Journal of Agriculture and forestry 28 (1), 63 71. Gezan, S. A., Huber, D. A., & White, T. L. (2006). Post hoc blocking to improve heritability and precision of best linear unbiased genetic predictions. Canadian Journal of Forest Research 36 (9), 2141 2147. Gilmour, A., Cullis, B., & Verbyla, A. (1997). Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural, Biological, and Envir onmental Statistics 2 (3), 269 293.
88 Gilmour, A., Gogel, B., Cullis, B. & Thompson R. (2009). ASReml user guide release 3.0. VSN International Ltd, Hemel Hempstead Hyperspectral canopy sensing of paddy rice aboveground biomass at different growth stages. Field Crops Research 155 42 55. Green, D., Fry, J., Pair, J., & Tisserat, N. A. (1994). Influence of management practices on Rhizoctonia large patch disease in zoysiagrass. HortScience 29 186 188. Haldane, J. B. S. (1946). The interaction of nature and nurture. The Annals of Human Genetics 13 (1), 197 205. Hansen, P. M., & Schjo erring, J. K. (2003). Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression. Remote Sensing of Environment 86 (4), 542 553. Hayhoe, K., Sheridan, S., Kalkstein, L., & Greene, S. (2010). Climate change, heat waves, and mortality projections for Chicago. Journal of Great Lakes Research 36 (SUPPL. 2), 65 73. Hays, K. L., Barber, J. F., Kenna, M. P., & McCollum, T. G. (1991). Drought avoidance mechanisms o f selected bermudagrass genotypes. HortScience 26 (2), 180 182. Henning, Jason G.; Radtke, P. J. (2006). Detailed Stem Measurements of Standing Trees from Ground Based Scanning Lidar. Forest Science 52 (14), 67 80. Hoeck, J. A., Fehr, W. R., Murphy, P. A., & Welke, G. A. (2000). Influence of genotype and environment on isoflavone contents of soybean. Crop Science 40 (1), 48. Hoekstra, F. A., Golovina, E. A., & Buitink, J. (2001). Mechanisms of plant desiccation tolerance. Trends in Plant Science 6 (9), 431 438 Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 12 (1), 55 67. Hong, S., Schepers, J. S., Francis, D. D., & Schlemmer, M. R. (2007). Comparison of Ground Based Remote Sensors for Evaluation of Corn Biomass Affected by Nitrogen Stress. Communications in Soil Science and Plant Analysis 38 (15 16), 2209 2226. Hopkinson, C., & Chasmer, L. (2009). Testing LiDAR models of fractional cover across multiple forest e cozones. Remote Sensing of Environment 113 (1), 275 288. Hosoi, F., & Omasa, K. (2009). Estimating vertical plant area density profile and growth parameters of a wheat canopy at different growth stages using three dimensional portable lidar imaging. ISPRS Journal of Photogrammetry and Remote Sensing 64 (2), 151 158.
89 Hua, Z., Wang, Y., Xu, X., Zhang, B., & Liang, L. (2007). Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Systems with App lications 33 (2), 434 440. Huang, B., Duncan, R. R., & Carrow, R. N. (1997a). Drought resistance mechanisms of seven warm season turfgrasses under surface soil drying: i. shoot response. Crop Science 37 (6), 1858. Huang, B., Duncan, R. R., & Carrow, R. N. (1997b). Drought resistance mechanisms of seven warm season turfgrasses under surface soil drying: II. Root aspects. Crop Science 37 (6), 1863 1869. Inostroza, L., Acu a, H., Munoz, P., Vsquez C., Ibez & Aguilera, H. (2016). Using aerial images and canopy spectral reflectance for high throughput phenotyping of white clover. Crop Science 56 (5), 2629 2637. Inostroza, L., Acua, H., & Mndez, J. (2015). Multi physiological trait selection indices to identify Lotus tenuis genotypes with high dry matter production under drought conditions. Crop and Pasture Science 66 (1), 90 99. Jansen, R. C., Van Ooijen, J. W., Stam, P., Lister, C., & Dean, C. (1995). Genotype by environment interaction in genetic mapping of m ultiple quantitative trait loci. Theoretical and Applied Genetics 91 (1), 33 37. Johnson, H. W., Robinson, H. F., & Comstock, R. E. (1955). Estimates of genetic and environmental variability in soybeans. Agronomy Journal 270 (7), 314 318. Kang, M. S. (1997 ). Using genotype by environment interaction for crop cultivar development. Advances in Agronomy 62 (C), 199 252 Kellems, R. O., & Church, D. C. (2009). Livestock Feeds and Feeding. Livestock Feeds and Feeding. Prentice Hall. Knox, N. M., Skidmore, A. K., Prins, H. H. T., Asner, G. P., van der Werff, H. M. A., de Boer, & Grant, R. C. (2011). Dry season mapping of savanna forage quality, using the hyperspectral Carnegie Airborne Observatory sensor. Remote Sensing of Environment 115 (6), 1478 1488. K rstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 6 (1). Lee, W. S., Jordan, J. D., In, A., Craig, J. C., & Manager, R. S. (2005). Multispectral Image Analysis for Phosphorus Measurement in Bahia Grass. Plant Science 300 (5), 1 7. Leon, R. G., Unruh, J. B., Brecke, B. J., & Kenworthy, K. E. (2014). Characterization of Fluazifop P butyl Tolerance in Zoysiagrass Cultivars. We ed Technology 28 (2), 385 394.
90 Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News 2 (December), 18 22. Liddle, A. R. (2007). Information criteria for astrophysical model selection. Mon. Not. R. Astron. Soc 377 74 78. Lin, C. S., & Binns, M. R. (1988). A superiority measure of cultivar performance for cultivar location data. Canadian Journal of Plant Science 68 (1), 193 198. Luber, G. & McGeehin, M. (2008 ). Climate Change and Extreme Heat Events. American Journal of Preventive Medicine 35(5), 429 435. Mahajan, V., Jain, A. K., & Bergier, M. (1977). Parameter Estimation in Marketing Models in the Presence of Multicollinearity: An Application of Ridge Regression. Journal of Marketing Research 14 (4), 586. Marcum, K. B. Anderson, S. J., & Engelke, M. C. (1998). Salt Gland Ion Secretion: A Salinity Tolerance Mechanism among Five Zoysiagrass Species. Crop Science 38 (3), 806. Marcum, K. B., Engelke, M. C., Morton, S. J., & White, R. H. (1995). Rooting characteristics and associated drought resistance of zoysiagrass. Ag ronomy Journal 87 (3), 534 538. Martin, M. E., & Aber, J. D. (1997). High spectral resolution remote sensing of forest canopy lignin, nitrogen, and ecosystem processes Ecological Applications 7 (2), 431 443. Middleton, E. M., Cheng, Y. B., Hilker, T., Black, T. A., Krishnan, P., Coops, N. C., & Huemmrich, K. F. (2009). Linking foliage spectral responses to canop y level ecosystem photosynthetic light use efficiency at a Douglas fir forest in Canada. Canadian Journal of Remote Sensing 35 (2), 166 188. Miller, P., Williams, J., Robinson, H., & Comstock, R. (1958). Estimates of genotypic and environmental variances a nd covariances in upland cotton and their implications in selection. Agronomy Journal 50 (3), 126 131. Mohandes, M. A., Halawani, T. O., Rehman, S., & Hussain, A. A. (2004). Support vector machines for wind speed prediction. Renewable Energy 29 (6), 939 94 7. Morris, K., & Shearman, R. (1998). NTEP turfgrass evaluation guidelines. Turfgrass Evaluation Workshop, Beltsville, MD 17 1 5. Morton, S. J., Engelke, M. C., & White, R. H. (1991). Performance of four warm season turfgrass genera cultured in dense sha de. II. Stenotaphrum secundatum. PR Texas Agricultural Experiment Station (USA) Mountrakis, G., Im, J., & Ogole, C. (2011). Support vector machines in remote sensing: A review. ISPRS Journal of Photogrammetry and Remote Sensing 66 (3), 247 259.
91 Mutanga, O., Skidmore, A. K., Kumar, L., & Ferwerda, J. (2005). Estimating tropical pasture quality at canopy level using band depth analysis with continuum removal in the visible domain. International Journal of Remote Sensing 26 (6), 1093 1108. Nguyen, D. V., & R ocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18 (1), 39 50. Norris, K. H., Barnes, R. F., Moore, J. E., & Shenk, J. S. (1976). Predicting forage quality by infrared reflectance spec troscopy. Journal of a nim al science 43 (4), 889 897. Otoo, E., & Asiedu, R. (2006). Cultivar evaluation and mega environment investigation of Dioscorea cayenensis cultivars in Ghana based on the GGE biplot analysis. Journal of Food, Agriculture and Environment 4 (3 4), 162 166. Overman, A. R., Neff, C. R., Wilkinson, S. R., & Martin, F. G. (1990). Water, harvest interval, and applied nitrogen effects on forage yield of bermudagrass and bahiagrass. Agron omy journal 82 (5 ), 1011 1016. Pai, P. F., Hong, W. C., Lin, C. S., & Chen, C. T. (2006). A hybrid support vector machine regression for exchange rate prediction. International Journal of Information and Management Sciences 17 (2). Patton, A. J., & Reicher, Z. J. (2007). Zoysiagrass Species and Genotype s Differ in Their Winter Injury and Freeze Tolerance. Crop Science 47 (4), 1619. Piepho, H. P., Mhring, J., Melchinger, A. E., & Bchse, A. (2008). BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161 (1 2), 209 228. Pittman, J. J., Arnall, D. B., Interrante, S. M., Wang, N., Raun, W. R., & Butler, T. J. (2016). Bermudagrass, wheat, and tall fescue crude protein forage estimation using mobile platform, active spectral and canopy height data. Crop Science 56 (2), 870 881. Pittm an, J. J. oshua, Arnall, D. B. rian, Interrante, S. M., Moffet, C. A., & Butler, T. J. (2015). Estimation of biomass and canopy height in bermudagrass, alfalfa, and wheat using ultrasonic, laser, and spectral sensors. Sensors (Basel, Switzerland) 15 (2), 2 920 2943. Plomin, R., DeFries, J. C., & Loehlin, J. C. (1977). Genotype environment interaction and correlation in the analysis of human behavior. Psychological Bulletin 84 (2), 309 322. Qian, Y. L., Engelke, M. C., & Foster, M. J. V. (2000). Salinity effects on zoysiagrass cultivars and experimental lines. Crop Science 40 (2), 488. Qiao, C. G., Basford, K. E., DeLacy, I. H., & Cooper, M. (2000). Evaluation of experimental designs and spatial analyses in wheat breeding trials. TAG Theoretical a nd Applied Genetics 100 (1), 9 16. Raymer, P., & Braman, K. (2006). Breeding seashore paspalum for recreational tuf use. JL Nus (Ed.) 36.
92 Reinert, J. A., & Engelke, M. C. (1992). Resistance in zoysiagrass ( Zoysia spp. ) to the tropical sod webworm ( Herpeto gramma phaeopteralis ). PR Texas Agricultural Experiment Station (USA) Roberts, D. A., Ustin, S. L., Ogunjemiyo, S., Greenberg, J., Dobrowski, S. Z., Chen, J., & Hinckley, T. M. (2004). Spectral and Structural Measures of Northwest Forest Vegetation at Lea f to Landscape Scales. Ecosystems 7 (5), 545 562. Robins, J. G., Bauchan, G. R., & Brummer, E. C. (2007). Genetic mapping forage yield, plant height, and regrowth at multiple harvests in tetraploid Alfalfa (Medicago sativa L.). Crop Science 47 (1), 11 18. Rook, A. J., Dhanoa, M. S., & Gill, M. (1990). Prediction of the voluntary intake of grass silages by beef cattle. 2. Principal component and ridge regression analyses. Animal Production 50 (3), 439 454. Rosipal, R., & Krmer, N. (2006). Overview and Recen t Advances in Partial Least Squares (pp. 34 51) Springer, Berlin, Heidelberg. Sanderson, M. A., Read, J. C., & Reed, R. L. (1999). Harvest management of switchgrass for biomass feedstock and forage production. Agronomy Journal 91 (1), 5 10. Schabenberger, O., & Gotway, C. (2017). Statistical methods for spatial data analysis. CRC Press Schwartz, B. M., Kenworthy, K. E., Crow, W. T., Ferrell, J. A., Miller, G. L., & Quesenberry, K. H. (2010). Variable Responses of Zoysiagrass Genotypes to the Sting Nematode. Crop Science 50 (2), 723. Schwartz, B. M., Kenworthy, K. E., Engelke, M. C., Denn is Genovesi, A., & Quesenberry, K. H. (2009). Heritability estimates for Turfgrass performance and stress response in Zoysia spp. Crop Science 49 (6), 2113 2118. Schwartz, B. M., Kenworthy, K. E., Engelke, M. C., Genovesi, A. D., Odom, R. M., & Quesenberr y, K. H. (2010). Variation in 2C Nuclear DNA Content of spp. as Determined by Flow Cytometry. Crop Science 50 (4), 1519. Scotford, I. M., & Miller, P. C. H. (2004). Combination of Spectral Reflectance and Ultrasonic Sensing to monitor the Growth of Winter Wheat. Biosystems Engineering 87 (1), 27 38. Selbeck, J., Dworak, V., & Ehlert, D. (2010). Testing a vehicle based scanning lidar sensor for crop detection. Canadian Journal of Remote Sensing 36 (1), 24 35. Simon, R., & Maitournam, A. (2004). Evaluating th e Efficiency of Targeted Designs for Randomized Clinical Trials. Clinical Cancer Research 10 (20).
93 Sjstrm, M., Wold, S., Lindberg, W., Persson, J. ., & Martens, H. (1983). A multivariate calibration problem in analytical chemistry solved by partial lea st squares models in latent variables. Analytica Chimica Acta 150 61 70. Sleper, D. A., Asay, K. H., Pedersen, J. F., & Burton, G. W. (1989). Progress and Benefits to Humanity from Breeding Warm Season Forage Grasses. In Contributions from Breeding Forag e and Turf Grasses (pp. 21 29). C rop Science Society of America. Srivastava, A. K., Goering, C. E., Rohrbach, R. P., & Buckmaster, D. R. (2006). Engineering principles of agricultural machines (NO. 631.3/S774). St. Joseph, Mich.: American society of agricu ltural engineers. Starks, P. J., & Brown, M. A. (2010). Prediction of forage quality from remotely sensed data: Comparison of cultivar specific and cultivar independent equations using three methods of calibration. Crop Science 50 (5), 2 159 2170. Starks, P. J., Zhao, D., & Brown, M. A. (2008). Estimation of nitrogen concentration and in vitro dry matter digestibility of herbage of warm season grass pastures from canopy hyperspectral reflectance measurements. Grass and Forage Science 63 (2), 168 178 Stoll, A., & Kutzbach, H. D. (2001). Guidance of a forage harvester with GPS. Precision Agriculture (Vol. 2, pp. 281 291). Kluwer Academic Publisher s. Stroup, W. W., & Mulitze, D. K. (1991). Nearest Neighbor Adjusted Best Linear Unbiased Prediction. The American Statistician 45 (3), 194 200. Sui, R., & Thomasson, J. (2006). Ground based sensing system for cotton nitrogen status determination. Transactions of the ASABE 49(6), 1983 1991. Tahir, M. H. N., Casler, M. D., Moore, K. J., & Brummer, E. C. (2011) Biomass Yield and Quality of Reed Canarygrass under Five Harvest Management Systems for Bioenergy Production. Bioenergy Research 4 (2), 111 119. Team, R. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Comput ing, Vienna, Austria. 2013. Tobias, R. D. (1995). An Introduction to Partial Least Squares Regression. In Proceedings of the twentieth annual SAS u sers g roup international conference (pp. 1250 1257). SAS Institue Cary, NC van Leeuwen, M., & Nieuwenhuis, M. (2010 ). Retrieval of forest structural parameters using LiDAR remote sensing. European Journal of Forest Research Springer Verlag. Varma, S., & Simon, R. (2006). Bias in error estimation when using cross validation for model selection. BMC Bioinformatics 7 (1), 91. Vogelmann, J. E., Rock, B. N., & Moss, D. M. (1993). Red edge spectral measurements from sugar maple leaves. International Journal of Remote Sensing 14 (8), 1563 1575.
94 Walklate, P. J., Cross, J. V., Richardson, G. M., Murray, R. A., & Baker, D E. (2002). IT Information Technology and the Human Interface: Comparison of Different Spray Volume Deposition Models Using LIDAR Measurements of Apple Orchards. Biosystems Engineering 82 (3), 253 267. Watkins, E., Fei, S., Gardner, D., Stier, J., Bughrar Input Turfgrass Species for the North Central United States. Ats 8 (1), 0. Welham, S. J., Gezan, S. A., Clark, S. J., & Mead, A. (2014). Statistical methods in biology: Design and analysis of experiments and regres sion CRC Press. White, R. H., Engelke, M. C., Anderson, S. J., Ruemmele, B. A., Marcum, K. B., & Taylor, G. R. (2001). Zoysiagrass Water Relations. Crop Science 41 (1), 133. Wold, S., Ruhe, A., Wold, H., & Dunn, III, W. J. (1984). The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses. SIAM Journal on Scientific and Statistical Computing 5 (3), 735 743. Woodard, K. R., & Prine, G. M. (1991). Silage characteristics of Elephantgrass as affected by harvest frequency and genotype. Agronomy Journal 83 (3), 541 546. Xing, L., Gezan, S., Kenworthy, K., Unruh, J. B., & Munoz, P. (2017). Improved genetic parameter estimations in z oysiagrass by implementing post hoc blocking. Euphytica 213 (8), 195. Zhao, D., Starks, P. J., Brown, M. A., Phillips, W. A., & Coleman, S. W. (2007). Assessment of forage biomass and quality parameters of bermudagrass using proximal sensing of pasture can opy reflectance. Grassland Science 53 (1), 39 49.
95 BIOGRAPHICAL SKETCH Lin Xing was born in a military family in Beijing, China and he is the only child in his family. Later his family moved to Weihai, a beautiful co a stal city, where he lived d uring his childhood. Inspired by the urgent of agricultural development in China, he would like to be an agronomist when he was at young age. In late 2006, he started his journey in China Agricultural University to study plant protection and began to explo re his interest in research. In 2010, he went to University of Florida to study e ntomology and received a Master of Science degree in e ntomology and nematology during which he mainly focused on the research of termite molting and the mechanism of chitin s ynthesis inhibitor on termite control. In early 2014, he started to study plant breeding and work with forage crops at the University of Florida. In late 2014, coincidently asked by his lab mate to take one statistics course together, from which he develop ed his interest in statistics. Followed his passion, he began to study statistics as a dual degree with Doctor of Philosophy and also received a tatistics in 2016. To be more efficient in conducting data exploration and analysis, he began to develop programming skills as well. In the short term, he looks forward to working to help other people making sense of their data and providing insight to the ir questions.