Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0046906/00001
## Material Information- Title:
- Improving Medium-Range and Seasonal Hydroclimate Forecasts in the Southeast USA
- Creator:
- Tian, Di
- Place of Publication:
- [Gainesville, Fla.]
Florida - Publisher:
- University of Florida
- Publication Date:
- 2014
- Language:
- english
- Physical Description:
- 1 online resource (249 p.)
## Thesis/Dissertation Information- Degree:
- Doctorate ( Ph.D.)
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Agricultural and Biological Engineering
- Committee Chair:
- MARTINEZ,CHRISTOPHER J
- Committee Co-Chair:
- GRAHAM,WENDY DIMBERO
- Committee Members:
- JONES,JAMES W
FRAISSE,CLYDE WILLIAM ANNABLE,MICHAEL D - Graduation Date:
- 8/9/2014
## Subjects- Subjects / Keywords:
- Analytical forecasting ( jstor )
Climate models ( jstor ) Climatology ( jstor ) Evapotranspiration ( jstor ) Forecasting models ( jstor ) Mathematical variables ( jstor ) Probability forecasts ( jstor ) Seasons ( jstor ) Statistical models ( jstor ) Weather ( jstor ) Agricultural and Biological Engineering -- Dissertations, Academic -- UF climate -- downscaling -- ensemble -- evapotranspiration -- forecast -- gcm -- hydrology -- nwp -- precipitation -- probability -- statistics -- temperature -- water -- weather City of Lake Alfred ( local ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) government publication (state, provincial, terriorial, dependent) ( marcgt ) born-digital ( sobekcm ) Electronic Thesis or Dissertation Agricultural and Biological Engineering thesis, Ph.D.
## Notes- Abstract:
- Accurate hydro-climate forecasts are important for decision making by water managers, agricultural producers, and other stake holders. Numerical weather prediction models and general circulation models may have potential for improving hydro-climate forecasts at different scales. In this study, forecast analogs of the Global Forecast System (GFS) and Global Ensemble Forecast System (GEFS) based on different approaches were evaluated for medium-range reference evapotranspiration (ETo), irrigation scheduling, and urban water demand forecasts in the southeast United States; the Climate Forecast System version 2 (CFSv2) and the North American national multi-model ensemble (NMME) were statistically downscaled for seasonal forecasts of ETo, precipitation (P) and 2-m temperature (T2M) at the regional level. The GFS mean temperature (Tmean), relative humidity, and wind speed (Wind) reforecasts combined with the climatology of Reanalysis 2 solar radiation (Rs) produced higher skill than using the direct GFS output only. Constructed analogs showed slightly higher skill than natural analogs for deterministic forecasts. Both irrigation scheduling driven by the GEFS-based ETo forecasts and GEFS-based ETo forecast skill were generally positive up to one week throughout the year. The GEFS improved ETo forecast skill compared to the GFS. The GEFS-based analog forecasts for the input variables of an operational urban water demand model were skillful when applied in the Tampa Bay area. The modified operational models driven by GEFS analog forecasts showed higher forecast skill than the operational model based on persistence. The results for CFSv2 seasonal forecasts showed maximum temperature (Tmax) and Rs had the greatest influence on ETo. The downscaled Tmax showed the highest predictability, followed by Tmean, Tmin, Rs, and Wind. The CFSv2 model could better predict ETo in cold seasons during El Nino Southern Oscillation (ENSO) events only when the forecast initial condition was in ENSO. Downscaled P and T2M forecasts were produced by directly downscaling the NMME P and T2M output or indirectly using the NMME forecasts of Nino3.4 sea surface temperatures to predict local-scale P and T2M. The indirect method generally showed the highest forecast skill which occurs in cold seasons. The bias-corrected NMME ensemble forecast skill did not outperform the best single model. ( en )
- General Note:
- In the series University of Florida Digital Collections.
- General Note:
- Includes vita.
- Bibliography:
- Includes bibliographical references.
- Source of Description:
- Description based on online resource; title from PDF title page.
- Source of Description:
- This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
- Thesis:
- Thesis (Ph.D.)--University of Florida, 2014.
- Local:
- Adviser: MARTINEZ,CHRISTOPHER J.
- Local:
- Co-adviser: GRAHAM,WENDY DIMBERO.
- Statement of Responsibility:
- by Di Tian.
## Record Information- Source Institution:
- UFRGP
- Rights Management:
- Copyright Tian, Di. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 969977018 ( OCLC )
- Classification:
- LD1780 2014 ( lcc )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

PAGE 1 1 IMPROVING MEDIUM RANGE AND SEASONAL HYDROCLIMATE FORECASTS IN THE SOUTHEAST USA By DI TIAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2014 PAGE 2 2 Â© 2014 Di Tian PAGE 3 3 This dissertation is dedicated t o the memory of my father, Tian Changshan PAGE 4 4 ACKNOWLEDGMENTS First of all I would like to thank my advisor, Dr. Chris Martinez for his consistent patience, support, encouragement and guidance for my study. This research would not have been possible without him. I also thank my committee members Drs. James Jones, Wen dy Graham, Michael Annable, and Clyde Fraisse for their thoughtful input and help in completing this research. I would like to thank my colleagues and friends, Jerome Maleski for his every discussion not only on modeling but also on many exciting topics i n science, society, culture, and history during these four years, Dr. Robert Rooney for his explanation on climate data and analog methods when I started this research in the first year, Yogesh Khare for his generous help on calculating Agricultural Refere nce Index for Drought , Dr. Tirusew Asefa for sharing his urban water demand forecast code and data, and Drs. Syewoon Hwang and Alison Adams for their useful discussions on seasonal climate forecast and its hydrological applications. I would like to thank D r. Song Liang ( ) for his advices on my career and life during the last year of my graduate study . Finally , I thank my late father Tian Changshan ( ) , my mother Lian Zengxiu ( ) , and my sisters Tian Chen ( ) and Tian Zhao ( ) for their love, support, and belief in me . Last but most, I thank my wife Fang Wang ( ) for her inspiration , patience , and profound love . PAGE 5 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 8 LIST OF FIGURES ................................ ................................ ................................ .......... 9 ABSTRACT ................................ ................................ ................................ ................... 14 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 16 Needs for Accurate Hydroclimate Forecasts ................................ ........................... 16 Bias correction and Downscaling of NWPs and GCMs Outputs ............................. 17 Research Question and Objectives ................................ ................................ ........ 19 2 FORECASTING REFERENCE EVAPOTRANSPIRATION USING RETROSPECTIVE FORECAST ANALOGS IN THE SOUTHEASTERN UNITED STATES ................................ ................................ ................................ .................. 20 Chapter Introduction ................................ ................................ ............................... 20 Data ................................ ................................ ................................ ........................ 23 Methods ................................ ................................ ................................ .................. 25 ET o Calculation Methods ................................ ................................ .................. 25 Fore cast Analog Method ................................ ................................ .................. 26 Evaluation Procedure ................................ ................................ ....................... 28 Results and Discussion ................................ ................................ ........................... 30 Evaluation of Reference Evapotranspiration Methods in Time ......................... 31 Evaluation of Reference Evapotranspiration Methods in Space ....................... 35 Evaluation of Reference Evapotranspi ration Methods by Forecast Lead Day .. 36 Summary and Concluding Remarks ................................ ................................ ....... 36 3 COMPARISON OF TWO ANALOG BASED DOWNSCALING METHODS FOR REGIONAL REFERENCE EVAPOTRANSPIRATION FORECASTS ..................... 55 Chapter Introduction ................................ ................................ ............................... 55 Study Area and Data Sources ................................ ................................ ................ 58 Methods ................................ ................................ ................................ .................. 60 ET o Estimation Methods ................................ ................................ ................... 60 The Analog Downscaling Methods ................................ ................................ ... 61 Evaluation Metrics ................................ ................................ ............................ 63 Deterministic Forecasts ................................ ................................ ............. 63 Probabilistic Forecasts ................................ ................................ ............... 64 Results and Discussion ................................ ................................ ........................... 66 PAGE 6 6 Eval uation of Deterministic Forecasts ................................ .............................. 66 Evaluation of Probabilistic Forecasts ................................ ................................ 70 Concluding Remarks ................................ ................................ ............................... 73 4 THE GEFS BASED DAILY REFERENCE EVAPOTRANSPIRATION (ETo) FORECAST AND ITS IMPLICATION FOR WATER MANAGEMENT IN THE SOUTHEASTERN UNITED STATES ................................ ................................ ..... 90 Chapter Introduction ................................ ................................ ............................... 90 Data ................................ ................................ ................................ ........................ 92 Methods ................................ ................................ ................................ .................. 94 ETo Calculation ................................ ................................ ................................ 94 The Analog M ethod ................................ ................................ .......................... 95 Forecast Verification ................................ ................................ ......................... 96 Irrigation Scheduling Model ................................ ................................ .............. 98 Evaluation of Analog Forecast of WD ................................ ............................. 100 Results ................................ ................................ ................................ .................. 101 Comparison between Observations and NLDAS 2 ................................ ........ 101 Skill of the Analog ETo Forecast ................................ ................................ .... 102 Evaluation of WD Forecast ................................ ................................ ............. 1 03 Co ncluding Remarks ................................ ................................ ............................. 104 5 IMPROVING SHORT TERM URBAN WATER DEMAND FORECASTS USING FORECAST ANALOGS OF THE GLOBAL ENSEMBLE FORECAST S YSTEM (GEFS) ................................ ................................ ................................ .................. 116 Chapter Introduction ................................ ................................ ............................. 116 Study Area and Data ................................ ................................ ............................. 118 Methods ................................ ................................ ................................ ................ 120 Results and Discussion ................................ ................................ ......................... 125 Chapter Conclusions ................................ ................................ ............................. 127 6 SEASONAL PREDICTION OF REGIONAL REFERENCE EVAPOTRANSPIRATION (ETo) BASED ON CLIMATE FORECAST SYSTEM VERSION 2 (CFSv2) ................................ ................................ ............................ 135 Chapter Introduction ................................ ................................ ............................. 135 Data ................................ ................................ ................................ ...................... 139 NCEP CFSv2 Forecasts ................................ ................................ ................. 139 Forcing Dataset of NLDAS 2 ................................ ................................ .......... 140 Methods ................................ ................................ ................................ ................ 141 ETo Estimation Methods ................................ ................................ ................ 141 Downscaling Methods ................................ ................................ .................... 144 SD method ................................ ................................ ............................... 144 SDBC method ................................ ................................ .......................... 145 Evaluation Statistics ................................ ................................ ....................... 146 Results ................................ ................................ ................................ .................. 148 PAGE 7 7 Evaluation of Forecast Skill in Different Seasons ................................ ........... 148 Evaluation of Forecast Skill over Space ................................ ......................... 150 Evaluation of Forecast Skill for Different Leads ................................ .............. 151 Evaluation of Forecast Skill during ENSO Events ................................ .......... 153 Concluding Remarks ................................ ................................ ............................. 154 7 STATISTICAL DOWNSCALING MULTI MODEL FORECASTS FOR SEASONAL PRECIPITATION AND SURFACE TEMPERATURE OVER SOUTHEASTERN USA ................................ ................................ ........................ 176 Chapter Introduction ................................ ................................ ............................. 176 Data and Methods ................................ ................................ ................................ 180 The Datasets ................................ ................................ ................................ .. 180 The MME Scheme s ................................ ................................ ........................ 182 The PP Downscaling Methods ................................ ................................ ....... 182 The MOS Downscaling Methods ................................ ................................ .... 185 Forecast Verification ................................ ................................ ....................... 187 Results ................................ ................................ ................................ .................. 189 Model Performance on Forecasting NiÃ±o3.4 SST ................................ .......... 190 Overall Mean Forecasting Skill ................................ ................................ ....... 190 Performance of Single Models ................................ ................................ ....... 191 Performance of Ensemble Forecasts ................................ ............................. 192 Discussion and Conclusions ................................ ................................ ................. 194 8 CONCLUSIONS ................................ ................................ ................................ ... 219 APPENDIX A MONTEITH EQUATION .. 223 B DERIVATION OF THE FAO PENMAN MONTEITH EQUATION FOR THE HYPOTHETICAL GRASS REFERENCE CROP ................................ .................. 227 LIST OF REFERENCES ................................ ................................ ............................. 231 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 249 PAGE 8 8 LIST OF TABLES Table page 2 1 Summary of methods used to find forecast analogs ................................ ........... 39 2 2 The overall average LEPS skill score and BSS for lead day 1. LEPS skill score and BSS are respectively evaluating the overall skill and categorical skill; the five categories represent <10%, <1/3, 1/3 2/3, >2/3, >90%. ................. 40 2 3 As in Table 2 2 but for lead day 5 ................................ ................................ ....... 41 3 1 A comparison of average ETo from FAO, NARR, and GFS downscaled lead day 1 forecast by CA and NA ................................ ................................ ............. 76 3 2 The overall mean R, MSESS, and Bias for CA and NA in lead day 1 and 5. ...... 77 3 3 The LEPS skill score and BSS for CA and NA in lead day 1 and 5. LEPS skill score and BSS are respectively revaluating the overall skill and categorical skill; the five categories represent <10%, <1/3. 1/3 2/3, >2/3, >90%. ................. 78 4 1 Evaluation of ensemble and deterministic forecast of water deficits (WDs, mm) at five locations in the SEUS ................................ ................................ .... 107 5 1 Variables of original and modified models ................................ ........................ 129 5 2 Summary of forecasting performances for the original model, the analog from 9/23/2004 to 2/25/2010. ................................ ................................ ............ 130 6 1 The overall average MSESS and BSS for different downscaled CFSv2 variables in 0 month lead by SD and SDBC methods. The MSESS and BSS are evaluate the the overall skill and tercile skill, respectively. ......................... 159 6 2 As in Table 6 1 but for downscaled ETo ................................ ........................... 160 6 3 Spatial average of monthly sensitivity coefficients for Tmean, Tmax, Tmin, Rs, and Wind over the SEUS. ................................ ................................ .......... 161 7 1 Basic information of the NMME hindcast archives used in this study ............... 198 7 2 Overall mean of precipitation forecast skill for the NMME models in lead 0. The highest skill scores are highlighted in bold for each model and each category. ................................ ................................ ................................ ........... 199 7 3 Overall mean of temperature forecast skill for the NMME models in lead 0. The highest skill scores are highlighted in bold for each model and each category. ................................ ................................ ................................ ........... 200 PAGE 9 9 LIST OF FIGURES Figure page 2 1 Example of a subset of 9 grid points covering the Tampa Bay area. The small points denote where NARR data are available. Large black points denote where the GFS and Reanalysis 2 dat a are available. Small grey points denote where analog patterns will be applied, which is selected from the large black points. ................................ ................................ ............................... 42 2 2 Comparison of LEPS skill score for the 7 methods as a function of time of the year: (a) LEPS skill score at lead day 1, (b) LEPS skill score at lead day 5 ....... 43 2 3 Comparison of BSS for the 7 methods extremes forecasts as a function of time of the year: (a) BSS at lead day 1, (b) BSS at lead day 5 ........................... 44 2 4 As in Figure 2 3 but for tercile forecasts ................................ ............................. 45 2 5 Comparison of the categorical forecasts in time for PM_RH in lead day 1 and 5 ................................ ................................ ................................ ......................... 46 2 6 As in Figure 2 5 but for PM_RHRs ................................ ................................ ..... 47 2 7 ROC diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, (f) PM _RHRs lead day 5. ................................ ................................ ........................ 48 2 8 Reliability diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, (f) PM_RHRs lead day 5. ................................ ................................ ........................ 49 2 9 Comparison of the monthly and annual ET o over Oct 1979 to Sep 2009 for NARR ET o and PM_RHRs ensemble mean ET o analogs: (a) lead day 1; (b) lead day 5. The gray zones represent 10 to 90 percentile of of NARR ET o , reflecting spatial variation of the monthly and annual ET o over the five states. .. 50 2 10 The average LEPS skill score of the PM_RH and PM_RHRs across the five states ................................ ................................ ................................ .................. 51 2 11 The average BSS of the PM_RH and PM_RHRs for the 5 categories forecasts across the five states ................................ ................................ .......... 52 2 12 BSS of the PM_RH for 5 categorical forecasts as a function of time and the lead time of the forecast: (a) Lower extreme forecast, (b) Lower tercile forecast, (c) Middle tercile forecast, (d) Upper tercile forecast, (e) Upper extreme forecast, (f) Overall forecast ................................ ................................ . 53 2 13 As in Figure 2 12 but for PM_RHRs ................................ ................................ ... 54 PAGE 10 10 3 1 The location of the five states in the southeastern United States and the map of grids for each dataset. ................................ ................................ .................... 79 3 2 Comparison of the average monthly and annual ET o from Oct 1979 to Sep 2009 for the NARR ET o and ensemble mean of ET o analogs: (a) average monthly ETo for lead day 1, (b) average monthly ET o for lead day 5, (c) average annual ET o for lead day 1, (d) average annual ET o for lead day 5. The gray zones represent 10 to 90 percentile of NARR ET o , reflecting spatial variation o f the monthly and annual ET o over the southeastern United States. .. 80 3 3 Comparison of the average R, MSESS, and Bias for the CA and NA methods as a function of the months of the year: (a) lead day 1, (b) lead day 5. .............. 81 3 4 The average values of the metrics for the deterministic forecasts for CA and NA methods in lead day 1 and 5 across the southeastern United States: (a) the correlation coefficient between daily forecasted ET o and daily NARR ET o , (b) the mean square error skill score, (c) the bias. ................................ ............. 82 3 5 R and MSESS of the forecasts as a function of time and the lead time of the forecast: (a) the CA, (b) the NA. ................................ ................................ ......... 83 3 6 Comparison of the monthly mean LEPS skill score for the overall (upper row) forecasts and BSS for the tercile (middle row) and extreme (bottom row) forecasts using the CA and NA methods as a function of the months of the year: (a) lead day 1, (b) lead day 5. ................................ ................................ .... 84 3 7 ROC and reliability diagrams: (a) ROC diagrams of the CA method in lead day 1, (b) ROC diagrams of the CA method in lead day 5, (c) ROC diagrams of the NA method in lead day 1, (d) ROC diagrams of the NA method in lead day 5; (e) to (h) as in (a) to (d), but for Reliability diagrams. ............................... 85 3 8 The average LEPS skill score for CA and NA methods in lead day 1 and 5 across the southeastern United States. ................................ .............................. 86 3 9 The average BSS of five categorical forecasts for CA and NA methods in lead day 1 and 5 across the southeastern United States. ................................ .. 87 3 10 LEPS skill score and BSS of the CA method for overall and categorical forecasts as a func tion of time and the lead time of the forecast: (a) Lower extreme forecast, (b) Lower tercile forecast, (c) Middle tercile forecast, (d) Upper tercile forecast, (e) Upper extreme forecast, (f) Overall forecast. ............ 88 3 11 Same as Figure 3 10 but for the NA method. ................................ ..................... 89 4 1 Study area and an example of a subset of sixteen GEFS grid points and corresponding NLDAS 2 grid points in the Tampa Bay area ............................ 108 4 2 Locations of the observed stations and selected locations for WD forecast ..... 109 PAGE 11 11 4 3 Box and whisker plots of the NLDAS 2 e a (T min ) (red) and the NLDAS 2 e a (SH) (blue) in 12 months at Saint Leo, FL (a) and Smithfield, NC (b). .......... 110 4 4 Box and whisker plots of NLDAS 2 (red) and observations (blue) for ETo, Tmax, Tmin, Rs, and Wind in 12 months at Alfred Lake, FL (a) and Clayton, NC (b). ................................ ................................ ................................ .............. 111 4 5 BSS and LEPS skill score (%) as a function of month and lead day. The BSS reflected the extreme and tercile forecasting sk ill (a e); the LEPS skill score reflected the overall forecasting skill (f). ................................ ........................... 112 4 6 Reliability (a b) and ROC (c d) diagrams for tercile and extreme forecasts at lead days 1 and 5. ................................ ................................ ............................ 113 4 7 LEPS skill score (first row, %) and BSS (second to sixth rows) over the SEUS. The LEPS skill score reflected the overall forecasting skill (first row); the BSS reflected the tercile and extreme forecasting skill (second to sixth rows). ................................ ................................ ................................ ................ 114 4 8 The first row compares LEPS skill scores (%) for reforecast2, reforecast1a, and reforecast2b at lead days 1 and 5 throughout the year. The second and third rows compare LEPS skill scores (%) for reforecast2, re forecast1a, and reforecast2b at lead days 1 and 5 over the SEUS . ................................ ........... 115 5 1 GEFS grid points and locations of the water demand forec ast models. Each of the POCs was labeled. ................................ ................................ ................. 131 5 2 Performance of deterministic forecasts (MSESS) for weather variables in different months. (a) one week ahead WeekRain forecasts for seven models, (b) one week ahead RainDays forecasts for seven models, (c) one week ahead CosRainDays forecasts for seven models, and (d) one week ahead HotDays forecast and 7 day T forecasts for a ll models ................................ .... 132 5 3 Same in Figure 5 2, but for probabilistic forecasts (RPSS). ............................. 133 5 4 Comparison of observed and predicted (median of the ensemble forecast) weekly water demands for seven modified water demand models driven by forecast analogs during the valid ation period from 9/23/2004 to 2/25/2010. The 95PPU (95% prediction uncertainty) is the area between the 2.5% and 97.5% percentiles of the ensemble forecast. ................................ .................... 134 6 1 Example of a subset of sixteen (1.0 o Ã— 1.0 o ) grid points covering the Tampa Bay area. The small points denote where NLDAS 2 data are available. Large black squares denote where the CFSv2 reforecast data are available. ............ 16 2 6 2 Comparison of skill scores of five CFSv2 downscaled variables by the SD and SDBC methods as a function of season of the year in 0 month lead: (a) MSESS for SD, (b) below normal BSS for SD, (c) near normal BSS for SD, PAGE 12 12 (d) above normal BSS for SD, (e) MSESS for SDBC, (f) below normal BSS for SDBC, (g) near normal BSS for SDBC, and (h) above normal BSS for SDBC. ................................ ................................ ................................ .............. 163 6 3 Comparison of sk ill scores of downscaled ETo by the SD and SDBC methods as a function of season of the year in 0 month lead: (a) MSESS, (b) below normal BSS, (c) near normal BSS, (d) above normal BSS. .................... 164 6 4 The average skill scores of the downscaled CFSv2 variables by the SD method for the deterministic and tercile forecasts across the southeastern U.S. ................................ ................................ ................................ .................. 165 6 5 As in Figure 6 4 but for the SDBC method ................................ ....................... 166 6 6 The average skill scores of the downscaled ETo by the SD and SDBC methods for the deterministic and tercile forecasts across the southeastern U.S. ................................ ................................ ................................ .................. 167 6 7 Monthly average of absolute sensitivity coefficients for Tmean, Tmax, Tmin, Rs, and Wind over all seasons in the SEUS. ................................ .................... 168 6 8 The average skill scores of downscaled CFSv2 Tmean, Tmax, Tmin, Rs, and Wind (top to bottom) by the SD method as a function of seasons (JFM to DJF) and lead time (0 to 9 month). The thick contour de notes 0. ..................... 169 6 9 As in Figure 6 8 but for the SDBC method ................................ ....................... 170 6 10 The average skill scores of ETo1 and ETo2 by the SD method as a function of seasons and lead time. The thick contour denotes 0. ................................ ... 171 6 11 As in Figure 6 10 but for the SDBC method ................................ ..................... 172 6 12 The skill scores of downscaled ETo by the SDBC method as a function of season of the year in 0 month lead during ENSO events: (a) MSESS, (b) below normal BSS, (c) near normal BSS, (d) above normal BSS ..................... 173 6 13 The average skill scores of ETo1 and ETo2 by the SDBC method as a function of seasons and lead time. The forecast initial season is during ENSO events. The thick contour denotes 0. ................................ ..................... 174 6 14 The average skill scores of ETo1 and ETo2 by the SDBC method as a function of seasons and lead time. The forecast target season is during ENSO events. The thick contour denotes 0. ................................ ..................... 175 7 1 Study area and example of the NM ME and NLDAS 2 grid points. ................... 201 7 2 Bias corrected NiÃ±o3.4 SST forecast skill of six NMME models as a function of season and lead time. ................................ ................................ .................. 202 PAGE 13 13 7 4 Precipitation for DJF (left panel) and JJA (right panel) forecast skill of six NMME models downscaled by the LWPR me thod at 0 month lead over the SEUS. ................................ ................................ ................................ ............... 204 PAGE 14 14 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPROVING MEDIUM RANGE AND SEASONAL HYDRO CLIMATE FORECASTS IN THE SOUTHEAST USA By Di Tian August 2014 Chair: Christopher J. Martinez Major: Agricultural and Biological Engineering Accurate hydro climate forecasts are important for decision making by water managers, agricultural producers, and other stake holders. Numerical weather prediction models and general circulation models may have potential for improving hydro climate forecas ts at different scales. In this study, forecast analogs of the Global Forecast System (GFS) and Global Ensemble Forecast System (GEFS) based on different approaches were evaluated for medium range reference evapotranspiration (ETo), irrigation scheduling, and urban water demand forecasts in the southeast United States ; the Climate Forecast System version 2 (CFSv2) and the North American national multi model ensemble (NMME) were statist ically downscaled for seasonal forecasts of ETo, precipitation (P) and 2 m temperature ( T2M ) at the regional level . The GFS mean temperature (Tmean) , relative humidity, and wind speed (Wind) re forecast s combined with the climatology of R eanalysis 2 solar radiation (Rs) produced higher skill than using the direct GFS output only. C onstructed analog s showed slightly higher skill than natural analog s for deterministic forecasts. Both irrigation scheduling driven by the GEFS based ETo forecasts and GEFS based ETo forecast skill were generally positive up to one week throughout the year. The GEFS improved ETo PAGE 15 15 forecast skill compared to the GFS. The GEFS based analog forecasts for the input variables of an operational urban water demand model were skillful when a pplied in the Tampa Bay area . The modified operational models driven by GEFS analog forecasts show ed higher forecast skill than the operational model based on persistence . T he results for CFSv2 seasonal forecasts showed maximum temperature (Tmax) and Rs ha d the greatest influence on ETo . The downscaled Tmax showed the highest predictability, followed by Tmean, Tmin, Rs, and Wind. The CFSv2 model could better predict ETo in cold seasons during El Ni n o Southern Oscillation (ENSO) events only when the forecast initial condition was in ENSO. D ownscaled P and T2M forecasts were produced by directly downscaling the NMME P and T2M output or indirectly using the NMME forecasts of Ni n o3.4 sea surface temperatures to predict local scale P and T2M. The indirect method generally showed the highest forecast skill which occurs in cold seasons. T he bias corrected NMME ensemble forecast skill did not outperform the best single model. PAGE 16 16 CHAPTER 1 INTRODUCTION Needs for Accurate Hydroclimate Forecasts T he magnitude of the recent drought in many regions of the USA emphasized the need from different stake holders in the region to use weather and climate information in managing water resource s. Most of the operational decision makings in water resources man agement are in medium range (0 to 2 weeks) and seasonal (0 to 12 months) time scales. Accurate medium range and seasonal hydroclimate forecasts will be useful for reducing operational risks in water resources management at the regional level. In recent ye ars, many efforts have been made to incorporate weather and climate forecasts in to hydrological or hydrometeorological simulations and water resources management. Extensive research focused on using climate information to improve long lead (seasonal to int erannual) streamflow forecasts (e.g., Grantz et al., 2005; Devineni et al., 2008; Bracken et al., 2010; Wood and Lettenmaier, 2006), using weather information to improve short to medium range streamflow forecasts (e.g., Voisin et al., 2011; Dutta et al., 2012; Shrestha et al., 2013), evaluating the benefits of forecasts in water supply management (e.g., Grantz et al., 2007; Golembesky et al., 2009), or developing frameworks for incorporating climate informed seasonal streamflow forecasts into practical wat er resources management (e.g., Sankarasubramanian et al., 2009; Gong et al., 2010; Towler et al., 2013). Despite the recent progress in incorporating weather and climate information in streamflow forecasts, i t is worth noting that many studies suggest that weather and climate forecast information are still underutilized by water managers (e.g., Bolson et al., 2013; This may due to the gaps for forecasting PAGE 17 17 some important hydroclimate variables such as evapotranspir ation and the gaps for applying these hydroclimate forecast information to improve agricultural and urban water demand forecasts. Bias correction and Downscaling of NWPs and GCMs Outputs The Numerical Weather Prediction Models ( NWPs ) and Global Climate Mo dels ( GCMs ) are tools to simulate continental scale climate dynamics. Forecasts from the NWPs and GCMs are potential source s of information for water management. There are barriers for using NWPs and GCMs in water resources management, which are due to the biases and systematic errors of the forecast system, the unknown capacities of the current models for forecasts associated with water management, the scale mismatch between the model resolution and regional applications, and the lack of appropriate statis tical models to tailor these forecasts for use in water management applications. Reforecast ar chives recently made available can be used to understand forecast model bias, evaluate forecast skill, quantify model uncertainty, and build statistical models to tailor to the needs for water management applications. For example, systematic errors of a forecast system can be reduced by statistical post processing of reforecasts and by using multimodel ensemble forecasts (e.g., Hagedorn et al. 2012; Hamill 2012). T here have been a number of seasonal and medium range reforecast datasets made publically available over past few years. For the seasonal to interannual time scale, these include the North American NMME system (Kirtman et al. 2013). The NMME Phase I has arc hived monthly forecasts of precipitation (P), 2 m temperature (T), and sea surface temperature (SST) from multiple global climate models (GCMs) in s easonal scale (up to 12 months) . Among all the GCMs in the NMME system, the CFSv2 model (Saha et al. 2013) h as archived reforecast data at a 6 hourly time step up PAGE 18 18 to 9 months, with more archived variables (such as maximum and minimum temperature at two meters, solar radiation, specific humidity, surface pressure, and wind speed) that can be used in hydrological applications. For the medium range time scale, the Global Forecast System (GFS) (Hamill et al. 2006) and the Global Ensemble Forecast System (GEFS) (Hamill et al. 2013) have archived reforecasts of medium range forecasts. Many of the archived fields are po tentially useful for water management applications. There is a scale mismatch between model outputs and regional water management applications. Downscaling techniques can be employed to overcome this mismatch, either spatially or temporally. There are two categories of downscaling methods: statistical and dynamical downscaling (Fowler et al. 2007). Statistical downscaling employs statistical relationships between the output of a GCM and local observations and inherently corrects the model bias. Statistical downscaling is computationally efficient and straightforward to apply. Maraun et al. (2010) divided statistical downscaling methods into three types: perfect prognosis (PP), model output statistics (MOS), and weather generators (WGs). PP approaches calibr ate the statistical model (often regression related methods) using the observed large scale field (predictor) and observed local scale weather (predictand) and use the GCM forecast field as the predictor in the calibrated statistical model to produce the l ocal scale forecast; MOS approaches produce the downscaled field by directly calibrating the spatially disaggregated or dynamically downscaled GCM output using observations; WGs are statistical models that generate random sequences of local scale weather w ith the temporal and spatial characteristics that resemble the observed weather (Maraun et PAGE 19 19 al. 2010). For dynamical downscaling, a GCM provides the boundary and initial conditions to a regional climate model (RCM) and the RCM runs at a finer resolution to produce local scale forecasts. Since dynamical downscaling requires additional statistical bias correction and is computationally intensive, this dissertation will mainly focus on statistical downscaling. Research Question and Objectives The over all research question is: Can NWPs and GCMs provide useful information for regional water resources management? To answer this question, t he following six specific objectives were evaluated: Objective 1: To explore the application of the GFS for medium range r eference evapotranspiration ( ETo ) forecasts u sing a forecast analog approach; Objective 2 : T o compare the performance of different forecast analog approaches for downscaling GFS ETo forecasts ; Objective 3 : T o evaluate daily ETo forecasts using the GEFS ref orecasts and explore the usefulness of the forecasts for agricultural water management; Objective 4 : T o evaluate forecast analogs of relevant weather variables from reforeca sts of the GEFS and test whether short term urban water demand forec asts can be imp roved using them; Objective 5 : To explore the potential of using the CFSv2 for seasonal predictions of ETo and its relevant climate fields; Objective 6 : To assess regional precipitation and surface air temperature prediction s produced from multiple GCMs using different types of statistical downscaling techniques . PAGE 20 20 CHAPTER 2 FORECASTING REFERENCE EVAPOTRANSPIRATION USING RETROSPECTIVE FORECAST ANALOGS IN THE SOUTHEASTERN UNITED STATES Chapter Introduction The accurate estimation of evapotranspiration (E T) is needed for determining agricultural water demand, reservoir losses, and driving hydrologic simulation models. In typical hydrological and agricultural practice, evapotranspiration is calculated from reference evapotranspiration (ET o ); where ET o is th e evapotranspiration from a well watered reference surface. In an effort to provide a common, globally valid standardized method for estimating ET o , the FAO 56 Penman Monteith (PM) equation ( Allen et al. 1998 ) was adopted by the Food and Agricultural Organ ization (FAO) of the United Nations. While the physically based PM equation has been shown to accurately estimate ET o ( Chiew et al. 1995; Garcia et al. 2004; L Ã³ pez Urrea et al. 2006; Yoder et al. 2005 ) , it requires a large amount of meteorological data whi ch are often not available in many regions. Forecast output from Numerical Weather Prediction Models (NWPMs) and Global Climate Models (GCMs) are potentially useful for ET o forecasting. However, local application usually requires a finer resolution than i s currently available from most coarse scale NWPM and GCM output ( Fowler et al. 2007 ) . Downscaling techniques are able to address this problem by using dynamical or statistical methods. Dynamical downscaling focuses on nesting a regional climate model (RCM ) in a NWPM or GCM to produce spatially complete fields of climate variables, and thus preserving some spatial correlation as well as physically plausible relationships between variables ( Maurer and Hidalgo 2008 ) . However, dynamical downscaling suffers fro m biases introduced by the driving model and from high computational demand ( Abatzoglou and Brown 201 2 ; PAGE 21 21 Hwang et al. 2011; Plummer et al. 2006 ) . Statistical downscaling methods develop empirical mathematical relationships between output from NWPM/GCMs and local climate observations ( Barsugli et al. 2009 ) . The advantage of statistical downscaling is computational efficiency and the ability to be a pplied across multiple models to develop ensembles for scenario building ( Abatzoglou and Brown 201 2) . Due to these advantages, extensive research has been conducted on statistical downscaling for a variety of purposes in recent years. For example, Maurer a nd Hidalgo (2008) downscaled reanalysis precipitation data over the western U.S. using both constructed analogs (CA) and the bias correction and spatial downscaling (BCSD) method of Wood et al. (2004). Another study compared three statistical downscaling m ethods: BCSD, CA, and a hybrid of the two (bias correction and constructed analogs; BCCA) to downscale reanalysis data and used the downscaled data to drive hydrologic models ( Maurer et al. 2010 ) . Abatzoglou and Brown (2012) compared two statistical downsc aling methods: BCSD and multivariate adapted constructed analog (MACA) to downscale reanalysis data for wild fire applications in the western U.S. Several studies have been conducted on ET o forecasts in recent years. Cai et al. (2007, 2009) developed and applied ET o forecasts using weather forecast messages produced by the China Meteorological Administration. Several studies have focused on the use of artificial neural network (ANN) or other empirical models to simulate or forecast ET o ( Chattopadhyay et a l. 2009; Dai et al. 2009; Kumar et al. 2011; Landeras et al. 2009; Ozkan et al. 201 1 ; Pal and Deswal 2009; Wang et al. 2011 ) . A recent review of ANN modeling of ET o can be found in Kumar (2011). Comparatively fewer studies have been conducted to dynamical ly or statistically downscale ET o forecasts using PAGE 22 22 NWPMs or GCMs. Ishak et al. (2010) used the mesoscale modeling system 5 (MM5) to dynamically downscale ERA 40 reanalysis data in the Brue catchment in southwest England, and found ET o to be overestimated by 27 46%. However, Ishak et al. (2010) noted that there were clear patterns in downscaled weather variables that could be used to correct the bias in the results. Silva et al. (2010) found that statistically corrected MM5 estimates of ET o improved results compared to raw model output in the Maipo basin in Chile. Direct statistical downscaling methods (i.e. methods that downscale directly from coarse scale NWPM or GCM output) that specifically address the needs for ET o forecasts appear to be generally lackin g to date. Archives of NWPM reforecasts have recently been made available for diagnosing model bias and improving forecasts; including the reforecast datasets using the National Ha mill et al. 2006 c) and that from the European Centre for Medium Range Weather Forecasts (ECMWF) ( Hamill et al . 2008; Hagedorn et al . 2008 ). A series of articles have introduced the use of the GFS reforecast dataset, including week 2 forecasts ( Hamill et al. 2004; Whitaker et al. 2006 ) , short range precipitation forecasts ( Hamill and Whitaker 2006 b ; Hamill et al. 2006 c ; Hamill et al. 2008 ) , forecasts of geopotential heights and temperature ( Hagedorn et al. 2008; Hamill and Whitaker 2007; Wilks and Hamill 2007 ) and streamflow predictions ( Clark and Hay 2004; Getnet 2011; Werner et al. 2005 ) . The GFS reforecast dataset includes over 31 years of 1 15 day 15 member forecasts for multiple variables at a T62 (approximately 200 km) resolution ( Hamill et al. 2006 c ). Hamill and Whitaker (2006b) demonstrated that a forecast analog technique could produce simple, skillful probabilistic forecasts at a high spatial resolution using the GFS PAGE 23 23 reforecast archive. The analog technique has been found to perform, in general, a s well or better than other statistical downscaling methods ( Timbal and McAvaney 2001; Wilby et al. 2004; Zorita and von Storch 1999 ) . The GFS reforecast dataset includes forecasted variables that include wind speed, temperature, and relative humidity whi ch are important input data for the PM equation; and thus may be useful for generating probabilistic ET o forecasts. However, the GFS reforecast dataset does not include archives of incoming solar radiation, nor maximum and minimum temperature (forecast va riables are archived in 12 hour intervals) which are also important input data for the PM equation. In this work, we employ the GFS reforecast dataset to generate 1 15 day probabilistic daily ET o forecasts and downscale the forecasts using a forecast analo g technique over the states of Alabama, Florida, Georgia, North Carolina, and South Carolina in the southeastern United States. Sections 2 and 3 provide the data and methodology used in this work. The description of results and discussion are presented in Section 4. Concluding remarks are given in Section 5. Data The GFS reforecast archive contains model forecasts from 1979 to present issued every day at a T62 resolution of 2.5 o Ã—2.5 o (roughly 200 km) ( Hamill et al. 2006 c) . The reforecasts have 15 ensemble members and range from 1 to 15 days lead with the forecast data recorded every 12 hours. Archived variables include 10 m wind speed ( u 10 ), 2 m temperature ( T ), and 700 mb relative humidity ( RH ); all which may be potentially useful to forecast ET o . For thi s work the 15 member forecasts were averaged to the ensemble mean; 12 hour 2 m T , 10 m u 10 , and 700 mb RH data were averaged into daily values; daily maximum and minimum temperature ( T max , T min ) were estimated PAGE 24 24 by comparing the 12 hour T values on each day, where the larger T was T max and the smaller T was T min . Wind speed at 2 m height ( u 2 ) was estimated from u 10 (Allen et al. 1998): ( 2 1) where z is the measurement height (1 0 m). For this work, we used data from 1 October 1979 to 31 September 2009 ; 30 years of reforecasts . Since the GFS reforecast archive does not include solar radiation output, and includes variables at a relatively coarse temporal resolution (12 hour) this work al so employed the R2 dataset. The R2 dataset includes variables such as incoming solar radiation ( R s ) and daily maximum and minimum temperature. The R2 dataset is available from 1979 to present at a T62 resolution ( Kanamitsu et al. , 2002 ) . Daily climatological mean values of the R2 dataset were calculated using a running Â±30 day window. As the long term continuous observed meteorological data needed for the estimation of ET o is generally not available, t h e North American Regional Reanalysis (NARR) dataset ( Mesinger et al. 2006 ) was used for forecast verification. The NARR dataset contains all of the variables required for estimation of ET o (as described below) and is available at an approximately 32 km grid resolution. While the NARR dataset may contain biases (e.g. Markovic et al. 2009; Vivoni et al. 2008; Zhu and Lettenmaier 2007), the NARR data used to calculate ET o was taken as a surrogate for long term observations in this work. PAGE 25 25 Methods ET o Calculation M ethods This work used the standard ized form of the PM equation recommended by the FAO (FAO PM) ( Allen et al. 1998 ) and the Thorthwaite equation (Thornthwaite 1948) to estimate daily ET o (mm) , The FAO PM equation is written as: ( 2 2) where R n is the net radiation at the crop surface [MJ m 2 day 1 ]; G is soil heat flux density [MJ m 2 day 1 ] , and is considered to be negligible for daily calculations ; T is the mean daily air temperature at 2 m height [ o C]; u 2 is wind speed at 2 m height [m s 1 ]; e s is s aturation vapor pressure [kPa]; e a is actual vapor pressure [kPa]; is the slope of the saturation vapor pressure curve [kPa o C 1 ]; and is the psychrometric constant [kPa o C 1 ]. Further details on the calcu lation of each of the terms in E quation 2 2, as well as the standard methods used to estimate variables that are not available (i.e. solar radiation) in the GFS reforecast archive can be found in Appendix A. Despite its accuracy in estimating ET o , the PM method requires a large number of inputs includi ng solar radiation, maximum temperature, minimum temperature, wind speed and relative humidity or dewpoint temperature. Given the limited number of variables archived in the GFS reforecast dataset, it is worthwhile to compare with the Thornthwaite equation to estimate ET o ( Thornthwaite 1948 ) . The advantage of the Thornthwaite equation is that it only requires mean temperature data as input ( Vicente Serrano et al. 2010 ) : PAGE 26 26 ( 2 3) where L is the maximum number of sun hours [ h ]: ( 2 4) where s is the sunset hour angle (defined in Appendix A). N is the number of days in the month [ ]; I is a heat index calculated as the sum of 12 monthly values of i : ( 2 5) where T m is monthly mean temperature [Â°C], and m is dependent on I : ( 2 6) Forecast Analog Method The ET o forecasts were produced using a moving spatial window forecast analog approach, as described by Hamill and Whitaker (2006b) and Hamill et al. (2006c). The moving window approach uses a limited number of GFS reforecast grid points, which increases the likelihood of finding skillful natural analogs ( van den Dool 1994 ) . The moving window forecast analog procedure used the calculated value of forecasted ET o at a given lead using one of the approaches in Table 2 1 to find a subset of analog forecasts from the historical reforecast archive that were most similar (based on root mean square error) within the limited spatial region. Once the dat es of the nearest analogs were chosen, the corresponding fine resolution estimates were obtained for these dates from the 32 km resolution grid ET o values computed from the NARR. For this work forecast analogs were chosen from a subset of 9 grid points ( Fi gure 2 1). This subset of 9 points was used to determine the fine scale analogs within the interior of the PAGE 27 27 domain ( Figure 2 1 ). This process was then repeated across the region of interest. Following Hamill and Whitaker (2006b), analog forecasts were sele cted within a Â±45 day window around the date of the forecast and the best 75 analogs were chosen to construct the forecast ensemble. A cross validation procedure was employed where dates from the current year were excluded from the list of potential analo gs. Since the GFS reforecast archive does not include solar radiation output, and only includes variables at a 12 hour temporal resolution this work employed the R2 dataset in the selection of forecast analogs by substitution of long term climatological me an daily values or bias correction of GFS reforecasts ( Table 2 1 ). The rationale to the substitution of climatological mean values from the R2 dataset is that it effectively provides an appropriate estimate (though not perfect) of the parameter in questio n. In (since all of the potential analogs within the Â±45 day window will have been calculated using the same/similar values) and the analog selection becomes w eighted tow ards other terms in E quation 2 2. Bias correction of GFS reforecast variables was conducted using a cumulative distribution function (CDF) mapping approach (Hwang et al. 2011; Ines and Hansen 2006; Panofsky and Brier 1958; Wood et al. 2004). The CDF mapping approach employed the following steps : 1) CDFs were created individually for each of the grids using the R2 climatology, which were selected from a running Â±30 day window from year 1979 to 2009; 2) CDFs of GFS reforecast daily raw data were created for the same grid as R2; 3) daily GFS reforecast raw data were bias corrected at each grid point using CDF mapping that preserves the probability of exceedence of the GFS reforecast PAGE 28 28 raw data, but corrects the raw data to the value that corresponds to the same probability of exceedence from th e R2. Thus bias corrected data on day i at grid j was calculated as, ( 2 7) where F(x) and F 1 (x) denote a CDF of daily data x and its inverse, and subscripts GFS reforecast and R2 indicate the GFS reforecast raw data and R2 data, respectively. This bias correction process was able to remove both bias of under prediction and over prediction assuming d irect correspondence between GFS reforecast and R2 exceedence probabilities. Evaluation P rocedure Daily forecast skill was evaluated for each lead day and summarized by month. The modified Linear Error in Probability Space (LEPS) (Potts et al. 1996) skill score was used to evaluate the overall skill between the ensemble of forecasts and observations. T he LEPS skill score has advantages over RMSE and anomaly correlation to assess forecasts of continuous variables ( Potts et al. 1996 ) . The modified LEPS proposed by Potts et al. (1996) measures the distance between the forecast and the observation in terms of their respective cumulative distribution functions. The LEPS score ( S ) is defined as: ( 2 8) where P f and P o are the probabilities of the forecast and observation from their respective empirical cumulative distributions. Val ues of S range from 2 to 1 and correct forecasts at the extremes score higher than those in the middle of the distribution, which means i t gives relatively more penalty when forecasting events around average values, PAGE 29 29 but gives relatively higher scores and less penalty for correct forecasts of extreme events (Potts et al. 1996; Sobash et al. 2011; Zhang and Casey 2000) . The value of S can be expressed as a skill score with a range 100 to 100%, where a skill score of 100% indicates a perfect forecast and a skill of 0% indicates a skill score equivalent to climatology : ( 2 9) where the denominator is the maximum or minimum possible value of S depending on whether the numerator is positive or negative. If the numerator is positive, S m is calculated as the best possible forecast, using E quation 2 8 with P f = P o . If the numerator is negative, S m is calculated as the worst possible score using E quation 2 8, with P f = 0 for P o P f = 1 for P o < 0.5. In addition to the LEPS skill score, the Brier Skill Score (BSS) was used to evaluate the forecast skill for both tercile categories (upper, middle, and lower terciles) and extremes (10th and 90th percentiles) and were summarized by month. The BSS is calculated as (Wilks 2006): ( 2 10) where BS f is the Brier score of the forecast and BS c is the Brier score of climatology. Climatology was computed from daily values Â± 30 days of the forecast date. The BS f and BS c are calculated as: ( 2 11) and PAGE 30 30 ( 2 12) where n is the number of forecasts and observations of a dichotomous event, p i f and p i c are the forecasted probability of the even t using the forecasts and climatology, respectively, and I i o = 1 if the event occurred and I i o = 0 if the event did not occur. The BSS ranges between values of 0 indicate that the skill of th e forecast is equivalent to climatology. The resolution and reliability of categorical forecasts were evaluated using the relative operating characteristic (ROC) diagrams (e.g. Sobash 2011; Wilks 2006 ) and reliability diagrams (e.g. Wilks 2006), respectively. The ROC diagram compares hit rates to false alarm rates at different forecast probability levels and is a measure of how well the probabilistic forecast discriminates between events and non events. An ROC curve that lies along the 1: 1 line indicates no skill and a curve that is far towards the upper left corner indicates high skill. The reliability diagram indicates the degree that forecast probabilities match observed frequencies. An overall measure of the reliability of the forecast s can be assessed by the deviation of the reliability curve from the diagonal. For a perfectly reliable forecast system, the reliability curve is aligned along the diagonal. Curves below (above) the diagonal indicate over (under ) forecasting. The nearer the curve is to horizontal, the less resolution in the forecast. Results and Discussion Table 2 2 and 2 3 show the overall mean LEPS and BSS skill scores for lead days 1 and 5 for all ET o methods used to find forecast analogs. Overall, PM_RH and PM_RHRs, which used 700mb RH data, were more skillful than the other methods. For lead day 1, PM_RHRs had the highest skill for overall results (based on the LEPS skill PAGE 31 31 score), lower extremes, lower terciles, and upper terciles, while methods Thorn and PM_Rs had t he highest skill for middle terciles and upper extremes, respectively ( Table 2 2 ). Among the five methods that did not use GFS reforecast RH data, Thorn which used the Thornthwaite equation with only mean temperature had the highest skill for middle tercil es ( Table 2 2 ); PM_Rs which used the PM equation with the combination of climatological mean values of R s from the R2 dataset and temperature and wind speed from the GFS reforecast archive ( Table 2 1 ) showed the highest forecast skill for overall results, lower extremes, lower terciles, upper terciles, and upper extremes. For lead day 5, the overall skill (based on the LEPS skill score) of PM_RH, Thorn, PM_Rs, and PM_RHRs are approximately 8.0, which outperformed PM_GFS, PM_RsT, and PM_BC in terms of the o verall forecast ( Table 2 3 ). The lower extreme and middle tercile forecasts showed no skill in lead day 5 for all the seven methods, while some forecasts were skillful for the lower tercile, upper tercile, and upper extreme categories, with PM_Rs and PM_RH Rs showing the highest skill for lower tercile and upper tercile, and PM_Rs and PM_RsT showing the highest skill for the upper extreme forecast. Evaluation of Reference Evapotranspiration Methods in Time Figure 2 4 show the skill of ET o forecasts, by me thod, for lead day 1 and lead day 5 in terms of LEPS skill score ( Figure 2 2 ), BSS of forecasted extreme values ( Figure 2 3 ), and BSS of forecasted terciles ( Figure 2 4 ). Overall, ET o estimated by PM_RH and PM_RHRs were more skillful than the methods that did not use GFS reforecast 700 mb RH data. According to the LEPS skill score ( Figure 2 2 ), PM_RH and PM_RHRs lead day 1 forecast s showed similar patterns of skill and were generally greater than other methods in cold months, with PM_BC showing slightly higher skill in warmer months (May August). The LEPS skill score in lead day 5 shows the skill is higher in cold PAGE 32 32 months than in warm months for all the 7 methods. PM_RHRs still performed the best in cold months, while PM_GFS and PM_BC forecasted equally best in warm months. In terms of BSS for lower extreme forecasts ( Figure 2 3 ), PM_RHRs was the most skillful for lead day 1 and lead day 5 when the BSS was above zero. For upper extreme forecast s , PM_Rs and PM_RsT generally showed the h ighest skill during the cold months, while PM_BC was showed the greatest skill in July and August for both lead day 1 and lead day 5. In terms of BSS for tercile forecasts ( Figure 2 4 ), PM_RHRs showed the greatest skill for both lead day 1 and lead day 5 f or lower terciles forecasts , with PM_BC showing slightly higher skill in June and July; for the middle tercile forecast, the BSS of all the 7 methods over the year were within a range of 0.1 to 0.1, and no single method was the best over other methods in terms of BSS in different months; for the upper tercile forecast, PM_RHRs had the highest skill in cold months, while PM_BC and PM_Rs showed the highest in the other months. While the GFS reforecast RH data used here was at 700 mb height (and not at 2 m he ight as typically required for E quation 2 2), it nevertheless played an important role in improving the forecast skill in the humid southeast ern US . In addition, using external R2 R s climatology was found to improve skill in cooler months, while the P M_BC approach produced higher skill in warmer months . In the PM equation, and u 2 were calculated using readily available output from the GFS reforecast archive, while the values of e a , e s and R n require, or can be estimated from (in the case of R n ), T max and T min . T max and T min were estimated from the available 12 hr output from the GFS reforecast archive, and daily RH mean was also determined from 12 hr output and at 700 mb height, which arguably makes the se terms the most uncertain in E quation 2 2. PAGE 33 33 Comp arison of PM_GFS, Thorn, PM_Rs, PM_RsT, and PM_BC suggests that using the Thornthwaite equation with only mean temperature data was able to achieve similar skill to using the PM equation with T mean and u 2 data and approximate (12 hour) GFS reforecast T min and T max data (PM_GFS), indicating that the PM equation failed to improve forecast skill alone without RH mean . This result also suggests that all the methods with either the PM or Thornthwaite equations show better skill compared to climatology, albeit le ss than when GFS reforecast RH data was included . Comparison of PM_RH (which did not use R2 R s climatology ) with PM_RHRs (which used R2 R s climatology ) suggests that the addition of R s contributed only slightly to improvement in skill and that the majority of skill improvement was due to the inclusion of GFS reforecast RH . Comparison of PM_GFS, PM_Rs, PM_RsT, and PM_BC suggests that bias correction of all the GFS reforecast variables with R2 (PM_BC) can improve the forecast skill above that from replacing a ll of those variables with R2 climatology (PM_RsT). For the ET o calculated by PM_RHRs, it is important to note that this method used R2 climatology of R s , so the selection of candidate analogs were likely more weighted based on GFS reforecast T mean , RH and u 2 . As PM_GFS, PM_RH, and Thorn suggest, u 2 was likely less important than T mean and RH in the selection of analogs. In Figure 2 5 and 2 6 the BSS of PM_RH and PM_RHRs are repotted to show the relative skill between the 5 categories. For both methods, the upper and lower terciles showed the greatest skill for both lead day 1 and lead day 5, the middle tercile the least for lead day 1 and th e extreme forecasts the least for lead day 5, and all forecasts in 5 categories were skillful over all 12 months for lead day 1 except the middle tercile forecast in June, August, and September. For the lead day 5 forecasts of PAGE 34 34 the two methods, only the upp er tercile, lower tercile, and lower extreme forecasts were skillful from December to July, while there was no skill in other categories and other months. Figure 2 7 and 2 8 show the ROC and reliability diagrams for the 5 categorical forecasts for PM_GFS, PM_RH, and PM_RHRs for both lead day 1 and 5. All of the ROC diagrams in Figure 2 7 show that the lower tercile forecast had the greatest resolution for all methods and both lead days, this was followed by the upper tercile, upper extreme, lower extreme, a nd middle tercile, respectively. The ROC diagrams show that the resolution of the three methods was very close; and similar results can be found for the lead day 1 and 5 forecasts. In Figure 2 8 , there were few differences in reliability among the three m ethods and the two lead days. For all methods there was some over forecasting at low probabilities. All methods showed some over forecasting bias, especially for the upper and lower tercile and upper extreme forecasts. High probabilities for the upper ext remes were over forecasted for all the three methods. Although the LEPS and BSS showed that PM_RH and PM_RHRs were more skillful than PM_GFS, and the skill in lead day 1 was greater than lead day 5, the ROC and reliability diagrams show the resolution and reliability for the three methods and two lead days were similar. While the BSS indicated no skill in several instances , the corresponding ROC and reliability diagrams showed positive skill. One possible explanation is that the climatological ET o may have significant variability across locations and across seasons, which can produce artificially high skill as explai ned in Hamill and Juras (2006a). PAGE 35 35 Figure 2 9 shows the mean monthly and annual ET o forecasts for lead days 1 and 5 using PM_RHRs. Also shown a re the 10 th and 90 th percentiles of the forecasts and observations. The figure indicates the mean of forecasted monthly ET o matched quite well with the mean of NARR monthly ET o , with a slight under prediction in May to September and over prediction in Nov ember to January for both lead day 1 and 5. Similarly, for the 10 th percentile of the monthly ET o , there was slight over prediction in warm months and under predictions in cool months; on the contrary, 90 th percentile forecasts were found to under predict during warmer months and over predict in cooler months. Figure 2 9 also indicates that the annual variation of ET o forecasts generally followed observations for both lead day 1 and 5. Evaluation of Reference Evapotranspiration Methods in Space Based on the LEPS skill score, PM_RHRs forecasts showed greater overall skill than PM_RH for both lead day 1 and lead day 5 ( Figure 2 10 ). The forecast skill for both lead day 1 and lead day 5 were highest in the west and northeast. Forecast skill was lowest over Flor ida and near the coast and in the more mountainous region at the confluence of North Carolina, South Carolina and Georgia. This may be due to the inability of coarse scale analogs to accurately match local scale phenomena related to topographic effects an d the influence of the sea breeze over the Florida peninsula ( e.g. Marshall et al. 2004; Misra et al. 2011 ). Figure 2 11 shows the BSS of the categorical forecasts for PM_RH and PM_RHRs. For the upper extreme forecasts, PM_RHRs showed greater skill than PM_RH; the greatest skill found in three states (North Carolina, South Carolina, and Alabama) and northern Georgia, while the least skill were mostly in coastal areas. The lower extreme forecasts were less skillful than upper extreme forecasts but showed a PAGE 36 36 similar pattern in space. The upper and lower tercile forecasts showed a similar spatial pattern to the upper and lower e xtreme forecasts, with PM_RHRs showing greater skill than PM_RH in most of the area. PM_RHRs showed slightly greater skill than PM_RH for lead day 1 middle tercile forecasts and the forecast skill in space was comparably homogeneous. Evaluation of Refere nce Evapotranspiration Methods by Forecast Lead Day Figure 2 12 and 2 13 show the BSS and LEPS skill score as a function of month and lead day for PM_RH and PM_RHRs, respectively . Overall, for PM_RH, the forecasts were more skillful in cooler months when t he skill scores were above zero. The LEPS skill score showed that the overall forecasts were skillful for the first 9 lead days. For the lower extreme and lower tercile forecasts, the BSS were mostly above zero before lead day 5 and the early half of the y ear were found to be more skillful at later lead days than later half of the year. For the upper extreme forecasts, the warmer months showed more skill at later lead days than cooler months. For the upper tercile forecasts, lead day 7 was still skillful f or the months of January, December, and July; however the skill was modest. For the middle tercile forecasts, the BSS was greater than zero from lead day 1 to 3 for most months; after lead day 3 the forecasts showed no skill over the year. PM_RHRs showed similar forecast patterns, except it has more skillful lead days in some months ( Figure 2 13 ). Summary and Concluding R emarks A forecast analog technique was successfully used to downscale 1 15 day 200 km resolution ET o forecasts using the GFS reforecast archive and R 2 climatology data using seven methods over the states of Alabama, Georgia, Florida, North Carolina, and South Carolina in the southeast ern United States. The 32 km resolution ET o calculated PAGE 37 37 from the NARR data set using the PM equation was use d to evaluate the forecast analogs . The skill of both terciles and extremes (10th and 90th percentiles) were evaluated. The ET o forecast methods which included GFS reforecast 700 mb RH data in the PM equation (PM_RH and PM_RHRs) showed greater skill compar ed to the methods that did not use RH . The inclusion of R2 solar radiation data with GFS reforecast data (PM_Rs, PM_RHRs, and PM_RsT) was found to provide a modest increase of forecast skill. The forecasts using both GFS reforecast 700 mb RH data and R2 s olar radiation (PM_RHRs) were found to produce slightly greater skill that forecasts using GFS reforecast 700 mb RH data alone (PM_RH) . The bias correction of all the GFS reforecast variables with R2 (PM_BC) was found to improve the forecast skill compared to substitution of several variables with R2 climatology (PM_RsT). While the 5 categorical forecasts were skillful, the skill of upper and lower tercile forecasts was greater than those of lower and upper extreme forecasts and middle tercile forecasts. Mo st of the forecasts were skillful in the first 5 lead days. Forecasting ET o using GFS reforecasts are advantageous in many respects. Previously applied ANN models for evapotranspiration forecasts are black box models; and ANN models developed for one loca tion cannot implemented in another without local training ( Kumar et al. 2011 ) . In contrast, using ET o from NWPM/GCM output preserves the physical relationships between different variables and preserves the spatial correlation of the output. Compared to the work of Cai et al. (2007), who used daily weather forecast messages to forecast ET o deterministically at eight stations over China, the data availability and the forecast resolution for downscaling NWPM/GCM forecasts is arguably more objective and at a fi ner resolution. There is no evidence to PAGE 38 38 show that statistical downscaling of forecasts are better than dynamical downscaling forecasts ( Abatzoglou and Brown 201 2) in terms of forecast skill, but statistical methods have an advantage in requiring significan tly less computational resources. The advantage of analog selection based on calculated values of ET o rather than finding analogs for each variable individually is that the PM equation appropriately weights the input variables according to their importance . The relative importance likely changes at different times of year and is captured by the analog approach used. Physically plausible relationships between variables and correlation between variables are also preserved. The disadvantage of this approach is that analogs are found based on the magnitude of ET o and not the relative contribution of advective or radiative forcing. For example, high ET o analog days may be found where wind and relative humidity played a larger contributing role compared to inc oming solar radiation and vice versa. This work showed that a forecast analog approach using the Penman Monteith equation with several approximated terms could successfully downscale ET o forecasts. However, the need to approximate several terms in the Penman Monteith equation was likely a limiting factor in forecast skill and indicates the importance in archiving relevant variables in future, next generation reforecast datasets. Future work in evaluating ET o forecasts generated from retrospective f orecast datasets should include comparison to direct model outputs of operational models in order to clearly demonstrate the value of statistical post processing relative to direct model output. PAGE 39 39 Table 2 1 . Summary of methods used to find forecast analogs Method ET o Method GFS Data Estimated Parameters 1 External Data (R2) PM_GFS Penman Monteith Tmean (2 m) R s =f(GFS Tmax, GFS Tmin) u (10 m) e s =f(GFS Tmax, GFS Tmin) Tmin (12 hr) e a =f(GFS Tmin) Tmax (12 hr) Tdew = Tmin PM_RH Penman Monteith Tmean (2 m) R s =f(GFS Tmax, GFS Tmin) u (10 m) e s =f(GFS Tmax, GFS Tmin) Tmin (12 hr) e a =f(GFS RHmean, GFS Tmax, GFS Tmin) Tmax (12 hr) RHmean (700 mb, 12 hr) Thorn Thornthwaite Tmean (2 m) PM_Rs Penman Monteith Tmean (2 m) e s =f(GFS Tmax, GFS Tmin) R s u (10 m) e a =f(GFS Tmin) Tmin (12 hr) Tmax (12 hr) Tdew = Tmin PM_RHRs Penman Monteith Tmean (2 m) e s =f(GFS Tmax, GFS Tmin) R s u (10 m) e a =f(GFS RHmean, GFS Tmax, GFS Tmin) Tmin (12 hr) Tmax (12 hr) RHmean (700 mb, 12 hr) PM_RsT Penman Monteith Tmean (2 m) e s =f(R2 Tmax, R2 Tmin) R s u (10 m) e a =f(R2 Tmin) T min T max T dew = T min PM_BC Penman Monteith Tmean (2 m) R s =BC f(BC Tmax, BC Tmin) R s to bias correct GFS R s u (10 m) e s =f(BC Tmax, BC Tmin) Tmin to bias correct GFS Tmin Tmin (12 hr) e a =f(BC Tmin) Tmax to bias correct GFS Tmax Tmax (12 hr) u to bias correct GFS u Tdew=Tmin 1 Estimation procedures are detailed in Appendix A PAGE 40 40 Table 2 2 . The overall average LEPS skill score and BSS for lead day 1. LEPS skill score and BSS are respectively evaluating the overall skill and categorical skill; the five categories represent <10%, <1/3, 1/3 2/3, >2/3, >90%. Method LEPS Brier Skill Score Overall Lower extreme Lower tercile Middle tercile Upper tercile Upper extreme PM_GFS 12.9 0.011 0.104 0.001 0.138 0.032 PM_RH 18.2 0.147 0.195 0.023 0.196 0.066 Thorn 13.3 0.051 0.114 0.047 0.166 0.102 PM_Rs 14.8 0.068 0.142 0.030 0.202 0.150 PM_RHRs 20.7 0.204 0.232 0.035 0.219 0.067 PM_RsT 8.6 0.041 0.096 0.036 0.155 0.140 PM_BC 12.9 0.043 0.124 0.009 0.147 0.053 PAGE 41 41 Table 2 3 . As in Table 2 2 but for lead day 5 Method LEPS Brier Skill Score Overall Lower extreme Lower tercile Middle tercile Upper tercile Upper extreme PM_GFS 6.9 0.063 0.019 0.017 0.011 0.037 PM_RH 7.7 0.034 0.004 0.013 0.025 0.047 Thorn 8.2 0.019 0.004 0.004 0.037 0.017 PM_Rs 7.9 0.015 0.013 0.006 0.042 0.017 PM_RHRs 8.3 0.003 0.014 0.009 0.031 0.05 PM_RsT 4.9 0.004 0.013 0.000 0.027 0.011 PM_BC 6.8 0.026 0.003 0.013 0.014 0.044 PAGE 42 42 Figure 2 1 . Example of a subset of 9 grid points covering the Tampa Bay area . The small points denote where NARR data are available. Large black points denote where the GFS and Reanalysis 2 data are available. Small grey points denote where analog patterns will be applied, which is selected from the large black points . PAGE 43 43 Figure 2 2. Comparis on of LEPS skill score for the 7 methods as a function of time of the year: (a) LEPS skill score at lead day 1, (b) LEPS skill score at lead day 5 PAGE 44 44 Figure 2 3. Comparison of BSS for the 7 methods extremes forecasts as a function of time of the year: (a) BSS at lead day 1, (b) BSS at lead day 5 PAGE 45 45 Figure 2 4 . As in Figure 2 3 but for tercile forecasts PAGE 46 46 Figure 2 5. Comparison of the categorical forecasts in time for PM_RH in lead day 1 and 5 PAGE 47 47 Figure 2 6 . As in Figure 2 5 but for PM_RHRs PAGE 48 48 Figure 2 7 . ROC diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, (f) PM_RHRs lead day 5. PAGE 49 49 Fig ure 2 8 . Reliability diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, (f) PM_RHRs lead day 5. PAGE 50 50 Figure 2 9 . Comparison of the monthly and annual ET o over Oct 1979 to Sep 2009 for N ARR ET o and PM_RHRs ensemble mean ET o analogs: (a) lead day 1; (b) lead day 5. The gray zones represent 10 to 90 percentile of of NARR ET o , reflecting spatial variation of the monthly and annual ET o over the five states. PAGE 51 51 Figure 2 10 . The average LEPS sk ill score of the PM_RH and PM_RHRs across the five states PAGE 52 52 Figure 2 11. The average BSS of the PM_RH and PM_RHRs for the 5 categories forecasts across the five states PAGE 53 53 Figure 2 12. BSS of the PM_RH for 5 categorical forecasts as a function of time and th e lead time of the forecast: (a) Lower extreme forecast, (b) Lower tercile forecast, (c) Middle tercile forecast, (d) Upper tercile forecast, (e) Upper extreme forecast, (f) Overall forecast PAGE 54 54 Figure 2 13. As in Figure 2 12 but for PM_RHRs PAGE 55 55 CHAPTER 3 COMPARISON OF TWO ANALOG BASED DOWNSCALING METHODS FOR REGIONAL REFERENCE EVAPOTRANSPIRATION FORECASTS Chapter Introduction Forecasting ET o is critical to helping water managers develop strategies for operating water resources efficiently. Genera l Circulation Models (GCMs) and Numerical Weather Prediction Models (NWPMs) can produce forecasts of relevant variables for ET o forecasts such as temperature (mean, maximum, and minimum), solar radiation, wind speed, and relative humidity. Thus, GCM and NW PM forecasts are a potentially useful source of information for water managers. However, most current models cannot provide reliable information under approximately 200 km resolution (Maraun et al., 2010; Meehl et al., 2007) , which precludes the use of the ir output for local hydrologic applications. Therefore, downscaling techniques are necessary to produce fine scale resolution forecasts from coarse scale GCM and NWPM output for ET o and other relevant variables for many hydrologic applications. Both dynam ical and statistical approaches have been developed to transform GCM or NWPM output to finer resolution for use in hydrological applications (Fowler et al., 2007) . Dynamical downscaling nests a regional climate model (RCM) into a GCM or NWPM to simulate me teorological processes with a higher resolution within a region of interest, but it is computationally intensive. Statistical downscaling methods employ statistical relationships between the output of GCMs or NWPMs and local weather information and are com paratively computationally efficient and straightforward to apply. Ishak et al. (2010) used the mesoscale modeling system 5 (MM5) RCM to downscale ERA 40 reanalysis data to the Brue catchment in southwest England, and found downscaled ET o were overestimat ed by 27 46%. However, Ishak et al. (2010) noted PAGE 56 56 that clear patterns could be used to correct the biases, and data assimilation using multiple sources could improve the downscaled results. Silva et al. (2010) found that statistically corrected MM5 output v ariables improved the quality of ET o estimation compared to using raw model output in the Maipo basin in Chile. Using a statistical downscaling approach, Li et al. (2012) studied the change of future ET o on the Loess Plateau of China by comparing the calcu lated historical ET o during 1961 2009 with downscaled ET o during 2011 2099 from the HadCM3 (Hadley Centre Coupled Model, version 3) GCM using the Statistical DownScaling Model (SDSM). Tian and Martinez (2012) statistically downscaled 1 to 15 day ET o foreca sts in the southeastern United States using the ensemble mean of the Global Forecast System (GFS) retrospective forecasts (reforecasts) by using natural forecast analogs. While outputs of dynamical downscaling have often been subjected to additional bias correction (e.g. Hwang et al., 2011; Ishak et al., 2010; Silva et al., 2010; Wood et al., 2004) , direct statistical downscaling has been shown to be efficient in computation and capable of eliminating most biases (e.g. Abatzoglou and Brown, 2012; Salathe J r. et al., 2007; Wood et al., 2002) . Of the multiple statistical downscaling methods available, a nalog method s have been shown to perform , in general , as well as more complicated methods, to produce the right level of variability of the local variable, and to preserve the spatial correlation between local variables (Zorita and Von Storch, 1999) . This method was initially used in the field of weather forecasting (Lorenz, 1969) and has been introduced by Zorita (1995) for downscaling purposes. The analog method as a downscaling approach has been extensively applied to downscale climate change scenarios (Imbert and Benestad, 2005; Timbal et al., 2003) , precipitation (FernÃ¡ndez - PAGE 57 57 Fe rrero et al., 2010; Hamill and Whitaker, 2006b; Hamill et al., 2006a; Hidalgo et al., 2008; Ibarra Berastegi et al., 2011; Matulla et al., 2008; Maurer and Hidalgo, 2008; Maurer et al., 2010; Wetterhall et al., 2005) , air temperature (Hidalgo et al., 2008; Maurer and Hidalgo, 2008; Maurer et al., 2010; Timbal and McAvaney, 2001) , surface moisture flux (Ibarra Berastegi et al., 2011) , wildfire (Abatzoglou and Brown, 2012) , and 1 15 day ET o forecasts (Tian and Martinez, 2012) . The commonly used analog method s are mainly based on natural occurring analogs (hereafter NA) or constructed analogs (hereafter CA) . According to Hidalgo (2008), using analog methods to downscale GCM or NWPM output include two steps. For NA, the first step is to find the closest analog s for a target forecast from a historical archive of coarse resolution GCM or NWPM forecasts. Once the dates of the best forecast analogs for the target have been identified, local high resolution fields corresponding to the coarse resolution analog dates are used as the downscaled forecast. Different from NA, CA does not simply apply the analog dates. The target is first linearly regressed to the best coarse resolution analogs. Then the regression coefficients are applied to the high resolution analogs whi ch are selected by the coarse resolution analog dates. CA has been mentioned to have the potential to produce better matches with greater skill than NA (Hidalgo et al., 2008; Van den Dool, 1994) . A successful application of an analog method requires a su fficiently long archive of past forecasts in order to find reasonable analogs. The National Center for (Hamill et al., 2006a) was recently developed for diagnosing model bias and improving forecasts. The GFS reforecast dataset includes over 31 years of 15 member forecasts for 15 lead days at a PAGE 58 58 T62 resolution, which can provide sufficiently long records to apply analog methods (Hamill et al., 2006a). Medium range downscaled forecasts which can be provi ded using the GFS reforecast dataset can meet the short term planning needs of both the water and irrigation management communities. Recently, Tian and Martinez (2012) produced skillful downscaled daily ET o probabilistic forecasts in the southeastern Uni ted States using the Penman Monteith equation and the NA approach using ensemble mean variables of the 15 members of the GFS reforecast archive and the NCEP D epartment of Energy (DOE) Reanalysis 2 dataset (R2). Since CA has been noted previously to have t he potential to produce forecasts with greater skill than NA, the objective of this work was to compare the performance of NA and CA methods to produce both probabilistic and deterministic downscaled 1 to 15 lead day daily ET o forecasts using the combinati on of the 15 member GFS reforecasts and the R2 dataset across the southeastern United States (Alabama, Georgia, Florida, South Carolina, and North Carolina). Section 2 and 3 provide the data and methods used in this work. A comparison and discussion of the results is described in Section 4. Concluding remarks are presented in Section 5. Study Area and Data Sources The study includes the states of Alabama, Florida, Georgia, North Carolina, and South Carolina in the southeastern United States (Fig ure 3 1). This region has a high ratio of evapotranspiration to precipitation (50 to 80 percent) , a good proportion of forest land cover, and diverse topographic features (i.e., coastal plains, piedmonts, and hilly mountains) (Liang et al., 2002; Lu et al., 2003; Lu et al., 2005; Sun et al., 2002) . The GFS reforecast archive were generated using the 1998 version of the NCEP Medium Range Forecast (MRF) model from 1979 to present at a T62 resolution PAGE 59 59 (2.5 o Ã—2.5 o ) (Hamill et al., 2006a) . The reforecasts include 15 ensembl e members and range from 1 to 15 days lead time with data recorded every 12 hours. The potentially useful meteorological variables for ET o forecasts are 10 m wind speed ( u 10 ), 2 m temperature ( T ), and 700 mb relative humidity ( RH ). In this work, the 12 hou r 2 m T , 10 m u 10 , and 700 mb RH data were averaged into daily values. Daily maximum temperature ( T max ) and minimum temperature ( T min ) were approximated from the 12 hour 2 m T ; 2 m RH was approximated by 700 mb RH . Wind speed at 2 m height ( u 2 ) was calcula ted from u 10 (Allen et al., 1998) : ( 3 1) where z is the measurement height (10m). For this work, the GFS reforecast data were extracted over the five southeastern states from 1 October 1979 to 31 September 2009. The GFS reforecasts were obtained from the Earth System Research Library, Physical Sciences Division (ESRL, 2010). The NCEP DOE R2 data includes values of daily incoming solar radiation ( R s ), which can serve as supplementary data to the GFS reforecasts for ET o fo recasts (Tian and Martinez, 2012). The R2 dataset is available from 1979 to present at a T62 resolution (Kanamitsu et al., 2002) . This work abstracted the R2 daily R s data over the five states from 1 January 1979 to 31 December 2010. Daily climatological m ean values of R s were calculated using a running Â±30 day window. In this work, the 32kmÃ—32km resolution North American Regional Reanalysis (NARR) (Mesinger et al., 2006) daily data were used both for forecast verification and as the dataset to where the an alog methods were applied (as described in 3.2). The NARR data contains all the meteorological variables required for ET o estimation (as PAGE 60 60 described in 3.1), including daily maximum temperature, minimum temperature, mean temperature, dew point temperature, wind speed, and solar radiation. While NARR data is a hybrid product of model simulations and observations and contains known biases (e.g. Markovic et al., 2009; Vivoni et al., 2008; Zhu and Lettenmaier, 2007), the availability of the high resolution long term continuous daily ET o estimates makes it worthwhile to use ET o calculated by NARR (hereafter NARR ET o ) as a surrogate for observed data. Monthly and annual estimates of ET o from both GFS based NA and CA forecasts and from NARR ET o were compared to the 10 arc minute estimates from the Food and Agricultural Organization of the United Nations (FAO, 2004). These ET o estimates were calculated using the CRU CL 2.0 Global Climate Dataset from the Climate Research Unit of the University of East Anglia, UK. The CRU data were interpolated from station observations for the period 1961 to 1990 (New. et al., 2002). Methods ET o Estimation M ethods The standardized form of the Penman Monteith (PM) equation recommended by the Food and Agriculture Organization (FAO) (FAO PM) (Allen et al., 1998) was used to estimate daily ET o : ( 3 2) where R n is the net radiation at the crop surface [ MJ m 2 day 1 ]; G is soil heat flux density [ MJ m 2 day 1 ] , and is considered to be negligible for daily calculations ; T is the mean daily air temperature at 2 m height [ o C ]; u 2 is wind speed at 2 m height [ m s 1 ]; e s PAGE 61 61 is saturation vapo r pressure [ kPa ]; e a is actual vapor pressure [ kPa ]; is the slope of the saturation vapor pressure curve [ kPa o C 1 ]; and is the psychrometric constant [ kPa o C 1 ]. For detailed procedure s of calculating parameters in E quation 3 2, the reader is referred to Allen et al. (1998). The Analog Downscaling M ethods Following the NA approach found by Tian and Martinez (2012) to produce the highest skill, the FAO PM with the combination of GFS reforecast data (2 m T mean , T max , T min , u 2 , and 700 mb RH ) and R2 climatology ( R s ) was employed to calculate 15 member 1 to 15 lead day ET o forecasts in the coarse T62 grid ( Figure 3 1 ). The FAO PM calculated using the NARR data was used to calculate the ET o in the high resolution grid ( Figure 3 1 ). Two analog methods, the NA method and CA method, were used to downscale the forecasts from the coarse grids to the finer grids ( Figure 3 1 ). In this work, NA was applied in a similar fashion to the moving spatial window forecast analog approach describ ed by Hamill and Whitaker (2006a) and Hamill et al. (2006b). It began with selecting the analog dates at the coarse grid scale ( Figure 3 1 ) for one member and one lead day forecast within a domain of 9 grid points. The root mean square error (RMSE) was cal culated by comparing the target ET o ( Z GFS ) on day n of the year with the historical ET o within a Â±45 day window of the same day and averaged over the 9 grid points. The analog dates for Z GFS were selected as the first 30 dates with the lowest RMSE. The ana logs for the target ET o on day n at the finer grid ( P analogs ) were selected from the ET o calculated using NARR ( Figure 3 1 ) by applying the 30 analog dates selected from the coarse grid. Thus, for each fine grid and each lead day, there would be 450 analog s (30 analogs time 15 members). Both deterministic PAGE 62 62 and probabilistic forecasts of this location were generated from the ensemble of the analogs. This process was repeated for each set of 9 grid points, and the forecast results for the region of interest w as generated by tiling together the local analog forecasts. The CA method, described in detail by Hidalgo et al. (2008), differs from NA in its approach to dealing with analogs. T he best 30 analogs for the 9 coarse grid points ( Z analogs ) ( Figure 3 1 ) f or Z GFS for each member forecast was first selected . The method assumes there is a linear relationship between Z GFS and Z analogs : ( 3 3) where is the predicted value, and A analogs is a column vector of fitted least squares estimates of the regression coefficients. The dimensions of the Z GFS matrix are p coarse Ã—1 , where p coarse is the number of grid points at the coarse resolution ( p coarse =9 in this work). The dimensions of Z analogs are p coarse Ã—n, where n is the number of analogs (i.e., n =30), and the dimension of A analogs is nÃ—1. Assuming Z analogs has full rank ( n ), and using the definition of the pseudo inverse (Moore Penrose inverse), A analogs is obtained from Equation 3 4 by: ( 3 4) The coefficients A analogs from equation 3 4 are applied to the NARR grids ( Figure 3 1 ) corresponding to the same day as the coarse resolution predictors Z analogs , according to: ( 3 5) From Equation 3 4: ( 3 6) PAGE 63 63 where is a constructed high resolution analog and P analogs , as mentioned before, is the set of high resolution historical patterns corresponding to the same days as the Z analogs . The dimension o f the vector is p fine Ã—1, where p fine is the number of high resolution grids ( p fine =100 in this work). Since there were 15 members in each forecast, there would be 15 for each day and each lead day. Th e deterministic forecast can be generated by the ensemble mean of those 15 members, and the probabilistic forecast can be produced by the ensemble of the 15 members. Evaluation M etrics Daily skill for both deterministic and probabilistic forecasts were e valuated for each NARR grid point and each lead day and summarized by month. A cross validation procedure was employed where dates from the current year were excluded from the list of potential analogs. Deterministic F orecasts The deterministic forecasts w ere determined by the ensemble mean of the 450 analogs (15 members x 30 analogs each) and the ensemble mean of the 15 constructed analogs (one for each member) for NA and CA, respectively. Pearson correlation coefficient (R), Mean Square Error Skill Score (MSESS), and Bias were computed from daily values for each grid point, each month, and each lead day. R is able to reflect the correlation between the downscaled ET o with the observed ET o . MSESS is a relative skill measure that compares the analog forecast with the climatological mean forecast, which was calculated from daily values Â± 30 days of the forecast date. The mean square error of the forecasted values ( MSE f ) and mean PAGE 64 64 square error of the climatological values ( MSE c ) were calculated for each grid po int for each month. MSESS can be calculated by Equation 3 7: ( 3 7) MSESS ranges from indicating that the forecast has equivalent skill as climatology. Bias measures the mean difference between the downscaled ET o and the observed ET o through Equation 3 8: ( 3 8) where N is the number of days for this grid and this month. Probabilistic F orecasts The overall skill for probabil istic forecasts was evaluated using the modified Linear Error in Probability Space (LEPS) (Piechota et al., 2001; Potts et al., 1996). The modified LEPS measures the distance between the forecast and the observation in cumulative distribution space. The LE PS score (S) is defined as: ( 3 9) where P f and P o are the probabilities of the forecast and observation from their respective empirical cumulative distributions. Values of S range from 2 to 1 and correct forecasts at the extremes score higher than those in the middle of the distribution, which means i t gives relatively more penalty when forecasting events around average values, but gives relatively higher scores and less pen alty for correct forecasts of extreme events (Potts et al., 1996; Sobash et al., 2011; Zhang and Casey, 2000) . The value of S can be expressed as a skill score with a range 100 to 100%, where a skill score of PAGE 65 65 100% indicates a perfect forecast and a skill of 0% indicates a skill score equivalent to climatology : ( 3 10) where the denominator is the maximum or minimum possible value of S depending on whether the numerator is positive or negative. If the numerator is positive, S m is calculated as the best possible forecast, using Equation 3 9 with P f = P o . If the numerator is negative, S m is calculated as the worst possible score using equation 3 9, with P f = 0 for P o P f = 1 for P o < 0.5. The Brier Skill Score (BSS) was used to evaluate the skill of probabilistic forecasts for five categories, including terciles (upper, middle, and lower) and extremes (10th and 90th percentiles). The BSS is written as (Wilks, 2011) : ( 3 11) where BS f is the Brier score of the forecast and BS c is the Brier score of climatology, which was computed from daily values Â± 30 days of the forecast date. The BS f and BS c were calculated as: ( 3 12) and ( 3 13) where n is the number of forecasts and observations of a dichotomous event, p i f and p i c are the forecasted probability of the event using the forecasts and climatology, PAGE 66 66 respectively, and I i o = 1 if the event occurred and I i o = 0 if the event did not occur. The BSS ranges between values of 0 indicate that the skill of the forecast is equivalent to climatology. In addition to LEPS skill score and BSS, Relative Operating Characteristic (ROC) diagrams (Sobash et al., 2011; Wilks, 2011) and reliability diagrams (Wilks, 2011) were used to evaluate the resolution and reliability across the study region and period. The ROC diagram compar es hit rates to false alarm rates at different forecast probability levels and is a measure of how well the probabilistic forecast discriminates between events and non events. An ROC curve that lies along the 1:1 line indicates no skill and a curve that i s far towards the upper left corner indicates high skill. The reliability diagram indicates the degree that forecast probabilities match observed frequencies. An overall measure of the reliability of the forecasts can be assessed by the deviation of the re liability curve from the diagonal. For a perfectly reliable forecast system, the reliability curve is aligned along the diagonal. Curves below (above) the diagonal indicate over (under ) forecasting, the nearer the curve to horizontal, the less resolution of the forecast. Results and Discussion The efficacy of the CA and NA methods were evaluated for the southeastern United States both in time (summarized for the region by month) and in space. The results for both deterministic and probabilistic forecasts using NA and CA follow. Evaluation of Deterministic F orecast s Table 3 1 shows a comparison of the annual and monthly mean ET o from the FAO, NARR ET o , and downscaled ET o by CA and NA. While both of the downscaled ET o matched quite well with the NARR ET o , there were biases compared to the FAO PAGE 67 67 ET o . This bias was likely caused by the bias of the NARR data itself and not the analog downscaling methods or the GFS reforecasts. While this bias could be corrected by applying the analog downscaling methods and th e GFS reforecasts to station observations instead of NARR, such long term and spatially complete daily observations of all the meteorological variables for the PM equation are not available in this region. Figure 3 2 shows the average monthly and annua l ET o from Oct 1979 to Sep 2009 for NARR ET o and the ensemble mean of downscaled ET o analogs for CA and NA. Figure 3 2 a and 3 2b compares the monthly spatial average and extreme (10th and 90th percentile) NARR ET o to the corresponding downscaled ET o foreca sts by CA and NA in lead days 1 and 5 over the study period. Both of the downscaling methods successfully reproduced the mean and extreme monthly ET o cycle, with lead day 1 forecasts more closely matching NARR ET o compared to lead day 5 forecasts. Figure 3 2 also indicates that the upper extreme forecasts were underestimated in warm months (May to September) and overestimated in cool months (January and December) for both methods. For the lower extreme forecasts, a slight overestimation was found in warm mo nths; and the NA produced more exaggerated forecasts of high and low extreme events compared to CA. Figure 3 2 c and 3 2d show the spatially averaged annual total NARR ET o and statistically downscaled ET o forecasts using CA and NA in lead day 1 and 5 over t he study period. It indicates that the mean and extreme annual ET o over the study period were accurately downscaled by both two methods. The forecast in lead day 1 showed better correspondence than in lead day 5. The upper PAGE 68 68 extreme forecasts were slightly o verestimated by both methods with NA showing more exaggeration; and the lower extreme forecast was overestimated by NA only. Table 3 2 shows the overall mean indices (R, MSESS, and Bias) to quantify the deterministic forecast skill for CA and NA in lead d ay 1 and 5. All of the three indices indicate CA had higher skill than NA for both lead day 1 and 5 forecasts, with the exception that NA had less Bias than CA for lead day 5. The higher R value of CA demonstrates the downscaled ET o forecasts using CA wer e more closely associated to the NARR ET o than those using NA; the higher MSESS value of CA indicates that the downscaled ET o forecasts using CA were better than using NA relative to historic mean climatology; the lower absolute Bias value of CA for lead d ay 1 implies the mean difference between downscaled ET o forecasts using CA and the NARR ET o was smaller than those using NA. In Figure 3 3 , the monthly skill for CA and NA showed notable differences between lead day 1 and 5. For lead day 1 forecasts, the skill was similar between the two methods in terms of all the three metrics, with CA showing greater Bias in August October and smaller Bias in January April. Both CA and NA showed less skill in lead day 5 than lead day 1, with NA showing less skill th an CA in lead day 5 forecasts. In lead day 5, the R values of CA were slightly higher than NA in July September and significantly higher than NA in other months. MSESS values of CA had significantly higher skill than NA in January March, August, and No vember, and slightly lower or higher skill in other months. CA showed less Bias than NA over 10 of 12 months, with May and November having slightly greater Bias. The MSESS of both methods were PAGE 69 69 near or less than 0 over the 12 months in lead day 5, indicatin g those forecasts were no better than climatology. The skill of CA and NA showed a similar spatial pattern in terms of R and MSESS ( Figure 3 4a and 3 4 b ). Expectedly, both methods had higher skill in lead day 1 than lead day 5. Across the study area, both m ethods had the lowest skill in the Florida peninsula and the more mountainous region at the confluence of North Carolina, South Carolina and Georgia. This may be due to the topographic effects and the influence of the sea breeze over the Florida peninsula (e.g. Marshall et al., 2004; Misra et al., 2011) , which are not well resolved in the coarse scale forecasts. In Figure 3 4c , CA and NA showed different Bias patterns, with positive Bias for CA and negative Bias for NA across most of the region. Figure 3 5 shows the R, MSESS, and Bias as a function of month and lead day for CA and NA. Overall, the skill decreased with lead day, and the skill of CA was slightly higher than NA. For both methods in lead day 1, the skills in cool months (December March) were higher than in warm months (May November). The skill decreased dramatically from lead day 1 to 5 and remained near zero after. This is something of a consequence of the analog methods and the number of analogs selected (30 for each of 15 members); the s kill of the forecasts tend to reduce to the climatological forecast when the analog selection based on RMSE approaches a random process (at increasing lead days) from the pool of potential analogs rather than becoming notably worse than climatology. CA s howed slightly higher skill than NA in terms of all the indices for deterministic forecasts. This is likely due to the different treatment of analogs between the two PAGE 70 70 methods. CA transformed the linear relationship between the analogs and the forecast targe t at the coarse grid to the finer grid and thus the target forecast in the finer grids can be calculated by the weighted analogs. NA did not consider the relative importance of the analogs to predict the target forecast; it simply took the ensemble mean of all the analogs by assuming all the analogs had equal weights. It is important to note that the values of R between downscaled ET o forecasts using CA and the NARR ET o is expected to be lower compared to other studies which have downscaled coarse scale reanalysis products rather than actual forecasts (e.g. Abatzoglou and Brown, 2012; Hidalgo et al., 2008; Maurer and Hidalgo, 2008) . Also , the daily T max and T min and the RH data used here were approximated, and the R s data were from the R2 climatological mean. Similar to the multivariate adapted constructed analogs (MACA) developed by Abatzoglou (2012), our approach performed jointly for T max , T min , T mean , u 2 , RH and R s by using the PM equation to calculate ET o , so that the physical relationships among the variables were considered as well as their relative importance when selecting analogs. It is likely that different variables (e.g. R s vs. u 2 ) have different relative imp ortance to the estimation of ET o using the PM equation at different times of year. Evaluation of Probabilistic F orecast s The skill of probabilistic forecasts was evaluated based on the overall forecast (LEPS) and categorical forecasts (BSS, ROC, and relia bility diagrams). Table 3 3 shows the overall mean of LEPS skill score and BSS for CA and NA in lead days 1 and 5. For the lead day 1 forecast, while CA had higher LEPS skill score than NA, the BSS was lower than NA in all five categories. This result coul d be due to the fact that the two verification methods are based on different scoring objectives. BSS measures the departure of the forecasts from the actual observations based on a binary system, PAGE 71 71 assigning 0 for no events and 1 for events that occurred d epending on whether the observations fall in a given category or not. While the LEPS skill score measures the departure between forecasts and observations in cumulative probability space by giving more penalty when forecasting events near average values an d less penalty for forecasts of extreme events. In addition, the CA ensemble was notable smaller than the NA ensemble (even though the original number of natural analogs se lected was the same for both). Figure 3 6 shows the monthly mean skill scores for ev aluating the downscaled probabilistic forecasts using CA and NA methods in lead days 1 and 5. The LEPS skill score showed the forecasts had greater skill in warm months than in cool months for both methods, and the skill of CA was higher than NA across the 12 months in lead day 1, while the two skill scores were almost the same in lead day 5. On the contrary, according the BSS, NA generally showed higher skill than CA in all the 12 months. This was likely due to the relatively smaller ensemble size of CA co mpared to NA. Both methods were better than climatology in lead day 1 except the middle tercile forecasts of CA in warm months. In lead day 5, NA still performed as well as climatology while the skill of CA has dropped below 0. The lower tercile forecasts have the highest skill and then followed by upper tercile, lower extreme, upper extreme, and middle tercile forecasts. Figures 3 9 and 3 10 show the resolution and reliability of the five categorical forecasts of CA and NA in lead days 1 and 5. The ROC d iagrams in Figure 3 9a to 3 9d reveal the downscaled forecasts using CA had slightly less resolution than NA in both lead day 1 and 5 ; and the resolution for both CA and NA was higher in lead days 1 than in lead day 5 . The reliability diagrams in Figure 3 9e to 3 9f indicated the reliability (bias) PAGE 72 72 of NA was slightly higher (lower) than CA, notably over forecasting at middle and high probabilities for CA in lead day 1. While the reliability for NA only showed a slight difference between lead day 1 and lea d day 5 , CA showed much higher reliability (lower bias) in lead day 1 and lead day 5 . The difference in resolution and reliability of CA and NA was likely due to the smaller ensemble size of CA. The temporally averaged LEPS skill score and BSS are shown in space across the study area in Figure 3 8 and 3 9 respectively. Overall, CA had larger areas than NA with high LEPS skill score and with low BSS. Similar to Figure 3 4, 3 8 and 3 9 showed the skill was the lowest in southern Florida and at the conflue nce region of North Carolina, South Carolina, and Georgia. Again, the skill of lead day 1 forecasts was higher than lead day 5 forecasts, and the lower tercile forecasts had the largest area with high BSS followed by upper tercile, lower extreme, upper ext reme, and middle tercile forecasts. Figure 3 10 and 3 11 s how the LEPS skill score and BSS as a function of month and lead day for CA and NA, respectively. Similar to Figure 3 5 , both Fig ure 3 10 and 3 11 show the skill decreased with the increase in lead day. In Fig ure 3 10, the BSS for CA indicates four of the five categorical forecasts were skillful in lead day 1 and dropped to almost no skill in lead day 3 with the middle tercile forecast showing no skill in all lead days. However, the overall forecast s showed some skill in lead day 9 in terms of the LEPS skill score. In Fig ure 3 11, the LEPS skill score for NA is notably lower than CA in lead day 1, while the level of skill was similar for NA and CA in subsequent lead days. The five categorical forecas ts for NA had at least 5 skillful lead days, and the forecasts for lower extreme, lower tercile, and upper tercile were skillful up to lead day 7. PAGE 73 73 Based on these results, the skill of NA was higher than CA in terms of probabilistic forecasts. As mentioned before, CA transformed the weights of analogs in coarse resolution to fine resolution so that the deterministic forecast is the weighted average of the analogs in fine grids, and the probabilistic forecast was generated with the deterministic forecast for each of 15 members. NA directly used all the analogs to the produce probabilistic forecast without combining them into deterministic forecasts. This indicates that combining the analogs as done for CA did not improve the skill of probabilistic forecasts. While the deterministic forecast of each of the 15 members may be an improvement over the overall mean of the analogs (the deterministic forecast from NA), the resulting decrease in the size of the ensemble decreased the probabilistic skill. Compared with the result of Tian and Martinez (2012) who, following the approach of Hamill and Whitaker (2006a) and Hamill et al. (2006b), used 75 natural analogs from the ensemble mean of the 15 GFS members, the ensemble of 30 analogs from each of the 15 members shows slightly higher skill. While the LEPS skill score and BSS conflicted in indicating whether CA or NA produced superior probabilistic forecasts, using more than one skill assessment scheme is necessary and is also of practical value in the evaluation of the model forecasts in different aspects (Zhang and Casey, 2000) . Concluding R emarks The use of two analog based methods, CA and NA, was shown to perform well in downscaling ET o forecasts to a fine resolution suitable for local hydrological applications. The skill of CA was considered greater than NA in terms of deterministic forecasts, while NA showed higher skill than CA in probabilistic forecasts in terms of most measures. CA is advantageous for deterministic forecasts due to the combining of PAGE 74 74 analogs based on their weights transferred from the coarse resolution. But this procedure was not found to improve the probabilistic forecast skill, likely due to the reduction in ensemble size. However, users can employ either of these two methods based on their applic ation purposes. Differing from the methodology of Hamill and Whitaker (2006a) and Hamill et al. (2006b) and adopted by Tian and Martinez (2012), our NA method was applied to each of the 15 members of the GFS reforecasts instead of the ensemble mean; by doi ng so, a greater number of closest matching analogs could be used to generate the ensemble forecasts even if fewer analogs were selected for each member. The skill of probabilistic forecasts using this approach was found to be slightly higher than the ense mble mean NA forecasts in the southeastern USA of Tian and Martinez (2012). Similar to MACA developed by Abatzoglou and Brown (2012), the advantages of our methods in using analogs is that they avoid interpolation based methods and perform jointly for T ma x , T min , T mean , u 2 , RH and R s by using the PM equation to calculate ET o , so that the physical relationships among variables and their relative importance were considered when selecting analogs. Compared with the ET o forecasts of Tian and Martinez (2012) us ing NA to downscale the ensemble mean of 15 members of GFS reforecasts, this work introduced the CA method to downscale both deterministic and probabilistic forecasts using each of the 15 members of the GFS reforecasts. Although these two analog methods ac hieved what can be considered to be good ET o forecasts, the skill was still influenced by features that were not well resolved in the coarse scale forecasts. PAGE 75 75 Fine resolution ET o forecasts that have been bias corrected using a suitably long archive of hist orical forecasts, such as those presented in this work, are important for short term planning of water and irrigation management. Such downscaled ET o forecasts can be readily incorporated into short term hydrological models, irrigation scheduling models, a nd municipal supply demand models by the water and irrigation management communities and can thus improve the reliability of decisions and reduce risk. In addition, this approach used in this work can be used with coarse scale reanalysis data in regions wh ere observational meteorological networks are sparse and/or incomplete in order to provide high spatial resolution and longer term estimates of daily ET o . PAGE 76 76 Table 3 1 . A comparison of average ETo from FAO, NARR, and GFS downscaled lead day 1 forecast by CA and NA Data Average ETo (mm) Annual Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FAO 1313 55 65 101 130 150 157 160 148 122 100 70 56 NARR 1100 44 52 80 105 133 137 137 127 105 82 56 43 CA 1099 45 53 80 106 131 135 136 123 103 82 58 46 NA 1109 46 54 81 107 131 135 136 125 105 83 59 46 PAGE 77 77 Table 3 2 . The overall mean R, MSESS, and Bias for CA and NA in lead day 1 and 5. Lead day Method Metrics for deterministic forecast R MSESS Bias 1 CA 0.63 0.35 0.005 NA 0.62 0.33 0.033 5 CA 0.27 0.07 0.040 NA 0.18 0.11 0.011 PAGE 78 78 Table 3 3 . The LEPS skill score and BSS for CA and NA in lead day 1 and 5. LEPS skill score and BSS are respectively revaluating the overall skill and categorical skill; the five categories represent <10%, < 1/3. 1/3 2/3, >2/3, >90%. Lead day Method Metrics for probabilistic forecast LEPS BSS Overall Lower extreme Lower tercile Middle tercile Upper tercile Upper extreme 1 CA 30.8 0.207 0.215 0.097 0.185 0.053 NA 20.8 0.221 0.250 0.046 0.238 0.090 5 CA 6.4 0.102 0.129 0.135 0.102 0.107 NA 7.3 0.020 0.046 0.002 0.067 0.006 PAGE 79 79 Figure 3 1. The location of the five states in the southeastern United States and the map of grids for each dataset. PAGE 80 80 Fig ure 3 2. Comparison of the average monthly and annual ET o from Oct 1979 to Sep 2009 for the NARR ET o and ensemble mean of ET o analogs: (a) average monthly ETo for lead day 1 , (b) average monthly ET o for lead day 5 , (c) average annual ET o for lead day 1 , (d) average annual ET o for lead day 5 . The gray zones represent 10 to 90 percentile of NARR ET o , reflecting spatial variation of the monthly and annual ET o over the southeastern United States. PAGE 81 81 Fig ure 3 3. Comparison of the average R, MSESS, and Bias for the CA and NA methods as a function o f the months of the year: (a) lead day 1, (b) lead day 5 . PAGE 82 82 Fig ure 3 4. The average values of the metrics for the deterministic forecasts for CA and NA methods in lead day 1 and 5 across the southeastern United States: (a) the correlation coefficient between daily forecasted ET o and daily NARR ET o , (b) the mean square error skill score, (c) the bias. PAGE 83 83 Fig ure 3 5 . R and MSESS of the forecasts as a function of time and the lead time of the forecast: (a) the CA, (b) the NA . PAGE 84 84 Fig ure 3 6 . Comparison of the monthly mean LEPS skill score for the overall (upper row) forecasts and BSS for the tercile (middle row) and extreme (bottom row) forecasts using the CA and NA methods as a function of the months of the year: (a) lead day 1, ( b) lead day 5. PAGE 85 85 Fig ure 3 7 . ROC and reliability diagrams: (a) ROC diagrams of t he CA method in lead day 1 , (b) ROC diagrams of t he CA method in lead day 5 , (c) ROC diagrams of t he NA method in lead day 1 , (d) ROC diagrams of t he NA method in lead day 5 ; (e) to (h) as in (a) to (d), but for Reliability diagrams. PAGE 86 86 Fig ure 3 8 . The average LEPS skill score for CA and NA methods in lead day 1 and 5 across the southeastern United States . PAGE 87 87 Fig ure 3 9 . The average BSS of five categorical forecasts for CA and NA methods in lead day 1 and 5 across the southeastern United States . PAGE 88 88 Fig ure 3 10 . LEPS skill score and BSS of the CA method for overall and categorical forecasts as a function of time and the lead time of the forecast: (a) Lo wer extreme forecast, (b) Lower tercile forecast, (c) Middle tercile forecast, (d) Upper tercile forecast, (e) Upper extreme forecast, (f) Overall forecast . PAGE 89 89 Fig ure 3 11 . Same as Figure 3 10 but for the NA method . PAGE 90 90 CHAPTER 4 THE GEFS BASED DAILY REFERENCE EVAPOTRANSPIRATION (ET o ) FORECAST AND ITS IMPLICATION FOR WATER MANAGEMENT IN THE SOUTHEAST ERN UNITED STATES Chapter Introduction Medium range (1 to 2 weeks ahead) reference evapotranspiration (ETo) forecasts can provide valuable information for water resource and irrigation management. Such ETo forecasts can be incorporated into hydrological models, water demand models, and irrigation scheduling models to reduce risk and improve reliability of decision making (e.g. Srivastava et al. in press). Wh en adequate meteorological data input are provided, ETo can be estimated using the FAO 56 Penman Monteith (PM) equation, which is considered to be the standard method to compute ETo (Allen et al. 1998) . Daily medium range ETo forecasts can be made using s tatistical models or Numerical Weather Prediction models (NWP). A typical statistical model may use prior ETo values calculated from ground based observations as the forecast ( e.g. Trajkovic et al. 2003) . The main drawbacks of such statistical models are t hat they are black box models and cannot be implemented in other locations without local training ( e.g. Kumar et al. 2011) , and the ground based observations are often not available ( e.g. Srivastava et al. 2013) . In recent years several works have been co nducted to predict daily ETo using weather forecast data as inputs to the PM equation. The operational forecast products include the Experimental Forecast Reference Crop Evapotranspiration (FRET) produced by the National Weather Service using output of the NWPs (e.g. http://www.wrh.noaa.gov/forecast/evap/FRET/FRET.php?wfo=sto ). Many studies mainly included public weather forecast messages (Cai et al. 2007; Cai et al. 2009) and PAGE 91 91 forecasts from mesoscale NWPs (Ishak et al. 2010; Silva et al. 2010; Srivastava e t al. 2013) to predict ETo . However, the availability of the complete public weather forecast messages for the PM equation could not always be guaranteed over a large area at fine resolution (Cai et al. 2007), and while mesoscale NWPs could produce sufficient meteorological data for the PM equation over a large region at a high resolution, it has been noted to be very time consuming and the output suffered from systematic errors (Ishak et al. 2010; Silva et al. 2010) . As an alternative to the above methods, a retrospective forecast (reforecast) archive of a global NWP could be used to forecast daily ETo and correct systematic errors. Archives of the first generation reforecast of the Global Forecast System (GFS ) (hereafter reforecast1) (Hamill et al. 2006) with a forecast analog technique (Hamill and Whitaker 2006) have been applied to produce skillful ensemble forecasts of daily ETo at high resolution (Tian and Martinez 2012a, 2012b) . However, since the reforec ast1 did not include sufficient meteorological variables for the PM equation, the forecasting skill was largely impaired due to the need to approximate several variables. The second generation reforecast dataset (hereafter reforecast2) was recently develop ed through archiving the forecast outputs of the operational 2012 Global Ensemble Forecast System (GEFS) (Hamill et al. 2013) . Compared to the reforecast1, the reforecast2 used a newer version of the weather forecast model and archived a larger set of outp ut data at a higher spatial and temporal resolution (Hamill et al. 2013) . Thus, using the forecast analog technique with the reforecast2 could potentially improve the daily ETo forecast skill. PAGE 92 92 In this study, we downscaled and bias corrected medium range p robabilistic forecasts of daily ETo at high spatial resolution using the forecast analog approach with the reforecast2 dataset in the southeastern United States (SEUS) (Alabama, Georgia, Florida, North Carolina, and South Carolina). The ETo was c alculated using the PM equation and t he probabilistic forecasts of daily ETo were evaluated compared to the climatology of the forcing dataset of the National Land Data Assimilation System phase 2 (NLDAS 2) for all months and leads over the SEUS. A water deficit fo recast driven by the ETo forecasts was evaluated to explore its usefulness for water management. Data More than 28 years (Dec 1984 to Jul 2013) of reforecast2 data are currently archived (Hamill et al. 2013) and five variables are particularly useful for ETo estimation. This work used the ensemble mean of the 11 member forecasts of 2 m mean temperature (Tmean), maximum temperature (Tmax), minimum temperature (Tmin), in coming solar radiation (Rs), and 10 m wind speed (u 10 or Wind) for all 16 lead days at 1 .0 o Ã—1.0 o (~100 km) over the SEUS (90Â°W 74Â°W, 24Â°N 28Â° N ) ( http://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html ) (Fig ure 4 1) . The forecasts are archived every three hours for the first 8 lead days and six hours from lead day 9 to lead day 16. From lead day 1 to 16, Tmean and Rs were averaged into daily values; daily Tmax and Tmin were obtained by comparing three or six hour Tma x and Tmin on each day; u 10 was estimated from the magnitude of the orthogonal U and V components of 10 m wind speed and then averaged into daily values. Wind speed at 2 m height (u 2 ) was estimated from u 10 (Allen et al. 1998) : , ( 4 1) PAGE 93 93 where z is the measurement height (10 m). For this work, we used reforecast2 data from 1 October 1985 to 31 September 2012. Since long term continuous daily observed data for ETo estimation were generally not available over the SEUS, we used the forcing dataset of the NLDAS 2 (Xia et al. 2012a; Xia et al. 2012b) as a surrogate for observations. The forcing data of NLDAS 2 has a resolution of 0.125 o Ã— 0.125 o (~12 km) (Fig ure 4 1) over North America. We obtained relevant variables at an hourly time step (temperature, Rs, and u 10 ) over the SEUS (89Â°W 75Â°W, 25Â°N 37Â°N). Daily Tmax and Tmin were obtained by comparing 1 hour values on each day; daily Tmean was the average of daily Tmax and Tmin; daily Rs a nd u 10 were converted by averaging hourly values into daily values. The u 10 was converted to u 2 using E quation 4 1. Observations from two weather stations in the SEUS (Fig ure 4 2) were selected to compare with the NLDAS 2 ETo, Tmax, Tmin, Rs, and Wind ex tracted at the same locations. These two stations included Lake Alfred, FL (28.10 o N, 81.71 o W), and Raleigh, NC (38.88 o N, 78.79 o W). The Lake Alfred station data was from the Florida Automated Weather Network (FAWN) , and the Clayton station (ID: CLAY) data was from the North Carolina Climate Retrieval and Observations Network of the Southeast Database ( NC CRONOS ). Both stations are daily observations covering the period from 1 January 1998 to 31 December 2012. Both of the Lake Alfred and Clayton stations i ncluded Tmax, Tmin, Tdew, Rs, Wind, and ETo, where ETo was calculated by the FAO 56 PM equation. PAGE 94 94 Methods ETo C alculation The FAO 56 PM equation (Allen et al. 1998) was used to estimate daily ETo (mm/day): ( 4 2) where R n is the net radiation at the crop surface [MJ m 2 day 1 ]; G is soil heat flux density [MJ m 2 day 1 ], and was considered to be negligible for the calculations; T is the mean air temperature at 2 m height [ o C]; u 2 is wind speed at 2 m height [m s 1 ] ; e s is saturation vapor pressure [kPa]; e a is actual vapor pressure [kPa]; is the slope of the saturation vapor pressure curve [kPa o C 1 ]; and is the psychrometric constant [kPa o C 1 ]. Details of the calcu lation of each of the terms in E quation 4 2 can be found at Allen et al. (1998). Allen et al. (1998) recommended using the following equation to approximated T dew using T min : ( 4 3) where K o = 0Â°C for humid and subhumid climates and K o = 2Â°C for arid and semiarid climates. Since the reforecast 2 and forcing data of NLDAS 2 did not include dew point temperature (T dew ) which are required to calculate e a , we approximate T dew using E quation 4 3 with K o = 0Â°C . It is worthwhile to note that by assuming the mixing ratio is approximately equal to the specific humidity (SH, kg/kg), e a can also be estimated with SH and surface pressure (P, kPa) fields available in the reforecast2 and forcing data of NLDAS 2 using Equation 4 4 : PAGE 95 95 ( 4 4) To check the approximation of e a using Tmin , we compared the daily e a ( SH ) ( e a calculated using NLDAS 2 daily SH and P ) with the daily e a (T min ) ( e a approximated using NLDAS 2 daily T min ) from 1985 to 2012 at two locations, Saint Leo, FL and Smithfield, NC in 12 months (Fig ure 4 2 ) . Figure 4 3 shows t h at e a was slightly over estimated by Tmin in June, July, and August , but e a (T min ) matched quite well with e a ( SH ) in other months, which indicated that the approximation using Tmin was generally valid. The Analog M ethod The fine resolution ETo forecasts were produced using a moving spatial window forecast analog approach (Hamill and Whitaker 2006; Hamill et al. 2006) . This method is briefl y described below. The first step was to find the closest analog dates for one lead day forecast within a domain of sixteen grid points of GEFS (Fig ure 4 1). That is, the root mean square error (RMSE) over the 16 grid points of GEFS was computed by compari window of the same day of the year at the same lead. For example, a 15 February 2012 forecast at lead day 5 over central Florida (the 16 GEFS grid points) was compared against the reforecasts from 1 January to 1 April 1985 2011 (i.e. 2457 forecasts) at the same lead day. The RMSE between the current forecast and each reforecast was computed and averaged over the 16 grid points in Figure 4 1. The analog dates for the orecast were selected as the 75 dates with the lowest RMSE. In the second step, the analogs for the forecast of the day were selected on the dates of the 75 analogs at each of the grid points of the NLDAS 2 dataset in Figure 4 1. Thus, 75 analogs were chos en for this forecast to generate an ensemble forecast for each grid PAGE 96 96 point of the NLDAS 2. This process was repeated for the other forecast lead days and the other locations across the SEUS, and the forecast over the entire SEUS were generated by tiling tog ether the local analog forecasts. Leave one out cross validation was conducted by excluding the current year from the list of potential analogs. Forecast V erification The overall skill for the probabilistic forecasts was evaluated using the modified Linea r Error in Probability Space (LEPS) skill score (Potts et al. 1996) . The modified LEPS measures the distance between the forecast and the observation in the cumulative distribution space and t he LEPS score ( S ) is defined as: ( 4 5) where P f and P o are the probabilities of the forecast and observation from their respective empirical cumulative distributions. The value of S can be expressed as a skill score with a range 100 to 100%, where a skill score of 100% indicates a perfect fo recast and a skill of 0% indicates a skill score equivalent to climatology (Piechota et al. 2001; Potts et al. 1996) , where the climatology was an ensemble of a running Â±30 day window. The LEPS skill score was evaluated at each grid point for each month an d lead day. The Brier Skill Score (BSS) was used to evaluate the skill of probabilistic forecasts for five categories, including terciles (upper, middle, and lower) and extremes (10 th and 90 th percentiles). The BSS is written as (Wilks 2011) : ( 4 6) PAGE 97 97 where BS f is the Brier score of the forecast and BS c is the Brier score of climatology, which was computed from daily values Â± 30 days of the forecast date. The BSS ranges between a perfect forecast and values of 0 indicate that the skill of the forecast is equivalent to climatology. The BSS was evaluated at each grid point for each month and lead day. R eliability diagrams (Sobash et al. 2011; Wilks 2011) and Relative Operating Charac teristic (ROC) diagrams (Wilks 2011) were used to evaluate the resolution and reliability across the study area and period. The reliability diagram indicates the degree that forecast probabilities match observed frequencies. An overall measure of the relia bility of the forecasts can be assessed by the deviation of the reliability curve from the diagonal. For a perfectly reliable forecast system, the reliability curve is aligned along the diagonal. Curves below (above) the diagonal indicate over (under ) f orecasting, the nearer the curve to horizontal, the less reliab le the forecast. The ROC diagram compares hit rates ( H ) to false alarm rates ( F ) at different forecast probability levels and is a measure of how well the probabilistic forecast discriminates b etween events and non events. The H and F are defined as follows: ( 4 7) ( 4 8) where CE is the correctly forecast events, FE is the falsely forecast events, FN is the falsely forecast non events, and CN is the correctly forecast non events. An ROC curve that lies along the 1:1 line indicates no resolution and a curve that is far towards the PAGE 98 98 upper left corner indicates high resolution. The ROC and reliability dia grams were evaluated by pooling all grid points and months for each lead day. Irrigation Scheduling M odel A simple model based on the Agricultural Reference Index for Drought (ARID) (Woli et al., 2012, 2013; Khare et al., 2013) was used for irrigation scheduling. Th e ARID is a generic model that uses a budget based simple soil water balance for a reference grass having a 400 mm soil layer with evenly distributed roots. The irrigation scheduling was made in terms of the water deficit ( WD , mm ) component o f the ARID model, which is defined as: ( 4 9) where ET is actual evapotranspiration (mm) and ETo is the reference evapotranspiration (mm). The water balance components include precipitation, surface runoff, deep drainage, and evapotranspiration. The soil water content on the i th day is calculated from outputs: ( 4 10) where Wi is soil water (mm) on the i th day, W i 1 is soil water (mm) on the i 1 th day, P i is precipitation (mm) on the i th day, D i is deep dr ainage (mm) on the i th day, and R i is surface runoff (mm) on the i th day. The ETo is estimated using the PM equation. The ET is estimated based on the water uptake model developed by Meinke et al. (1993): ( 4 11) where is the volumetric soil moisture content (m 3 m 3 ), wp is the volumetric soil moisture content at the wilting point (m 3 m 3 ) , MUF is the maximum uptake factor (d 1 ), PAGE 99 99 and Z is the root zone depth. The surface runoff was calculated using the United States Department of Agriculture Natural Resources Conservation Service (USDA NRCS) curve number method (NRCS, 1986): ( 4 12) where CN is the runoff curve number and is dependent on soil type and land cover, I is the initial abstraction (mm), and S is the potential maximum retention (mm). In ARID, deep drainage is calculated as: ( 4 13) where fc is the v olumetric soil content at field capacity (m 3 m 3 ) and DC is a drainage coefficient (m 3 m 3 d 1 ). Since ARID assumes a reference grass having a 400 mm soil layer with evenly distributed roots , it only requires site specifi c weather data as a site specifi c input. Following Woli (2012), default values for the other parameters were used in this study ( DC = 0.55 m 3 m 3 d 1 , Z = 400 mm , CN = 65, MUF = 0.096 d 1 , wp = 0.06 m 3 m 3 , fc = 0.12 m 3 m 3 ). The 27 years of daily WDs (from 1 October 1985 to 31 Septembe r 2012) at five locations in the SEUS (Fig ure 4 2), including Smithfield, NC (35.52 o N, 78.35 o W), Gainesville, GA (34.30 o N, 83.86 o W), Tifton, GA (31.45 o N, 83.48 o W), Saint Leo, FL (28.34 o N, 82.26 o W), and Perrine, FL (25.58 o N, 80.44 o W), were calculated using the ARID model with the input from the daily NLDAS 2 precipitation (aggregated from hourly precipitation, mm/day) and ETo (mm/day). PAGE 100 100 Evaluation of Analog F orecast of WD The ETo forecast analogs and NLDAS 2 precipitation were used as inputs into the ARID model to generate the WD analog forecast. As a comparison, the WD climatology was generated using the NLDAS 2 ETo climatology (ensemble of a running Â±30 day window) and NLDAS 2 precipitation. The calculated NLDAS 2 WDs were used to verify the WD forecasts. The ensemble forecast of both analog and climatology were evaluated using the p factor and r factor (Yang et al. 2008). The p factor is the percent of observations within t he 2.5% and 97.5% percentiles of the ensemble forecast . Th e r factor is the relative width of the 2.5% and 97.5% percentiles of the ensemble forecast . It is calculated as: ( 4 14) where n is the number of days, and represent the 2.5% and 97.5% percentiles of the ensemble forecast at day t i , and obs is the standard deviation of the observations . The goodness of the ensemble forecast is judged on the basis of the closeness of the p factor to 100% (i.e., all observations bracketed b y the ensemble forecast) and the r factor to 0 (i.e., achievement of rather small ensemble forecast range). The deterministic WD forecast s were defined as the mean of the analog forecast or climatology ensemble , respectively . The Nash Sutcliff e coefficient (E) and root mean square error (RMSE) were employed to evaluate both the deterministic climatology forecast and the deterministic analog forecast. The Nash Sutcliff e coefficient (E) is written as (Nash and Sutcliffe 1970): PAGE 101 101 ( 4 15) where n is the number of days, and represent the observation and forecast at day t i , respectively, and is the average value of the observations over the entire period. The E ranges from to 1, with the value of 1 corresponding to a perfect match of the forecast to the observations and a value of less than 0 represents the forecast is worse than the mean o f observations. Results Comparison between O bservations and NLDAS 2 Figure 4 4 shows a comparison of the observed and NLDAS 2 daily ETo , Tmax, Tmin, Rs, and Wind for 12 months at Lake Alfred, FL and Clayton, NC in the SEUS . While the NLDAS 2 ETo showed greater positive bias at the Lake Alfred station than the Clayton station throughout the year, the Tmax and Tmin matched quite well with the observations at both stations. The Rs showed higher bias at the Lake Alfred station t han the Clayton station throughout the year. The wind also showed slightly smaller positive bias at the Clayton station than the Lake Alfred station across all months. Since ETo was estimated by the FAO 56 PM equation with the inputs of Tmax, Tmin, Rs, and Wind, any bias of ETo was due to one or more of these variables. At the Lake Alfred station in Florida, the high bias of ETo was likely caused by the high bias of both Rs and Wind. The ETo bias was relatively lower in the Clayton station than the at the L ake Alfred station, which was likely due to only Wind showing positive bias at this station. PAGE 102 102 Skill of the Analog ETo F orecast Figure 4 5 shows the skill scores as a function of month and lead day. For lead day 1, the lower and upper tercile forecast skill (BSS) were above 0.4 in cool months and above 0.2 in warm months (Fig ure 4 5b and 4 5 d); the lower and upper extreme forecast skill (BSS) were between 0.2 and 0.4 throughout the year with the skill higher in cool m onths than in warm months (Figure 4 5a an d 4 5 e); the middle tercile forecast skill (BSS) were the lowest among the five categories with the skill scores above 0 and below 0.2 (Fig ure 4 5c); the overall forecast skill (LEPS skill score) were above 40% in cool months and above 20% in warm months. The forecast skill for tercile, extreme, and overall forecasts were generally positive up to lead day 7 with the skill scores higher in cool months than in warm months and the skill decreased with the increase in lead day. Figure 4 6 shows the reliability and ROC diagrams for tercile and extreme forecasts for lead days 1 and 5. While the reliability curves showed slight over forecasting bias, the curves were along the diagonal for all five categorical forec asts at lead days 1 and 5 (Figure 4 6a and 4 6 b), which indicated the forecast probability matched very well with the observed relative frequency. All ROC curves with the exception of the middle tercile forecasts were far toward the upper left corner f or both lead days 1 and 5 (Figure 4 6c and 4 6 d), ind icating the upper and lower tercile and extreme forecasts had very good resolution and could discriminate between events and nonevents very well. While the reliability and ROC curves all showed high skill at lead day 5, the BSS only showed modest skill. Ou r explanation is that the reliability and ROC curves were created by pooling the forecasts and observations of all grid points and months instead of evaluating at each grid point and month before averaging, so that the PAGE 103 103 variability of ETo across locations a nd seasons produced artificially high skill (Hamill and Juras 2006) . The skill scores for each grid point w ere calculated by averaging over all months for different lead days. Figure 4 7 shows the overall, tercile and extreme forecast skill for each grid point over the SEUS for four lead days. The skill decreased with the increase of lead days. All forecasts indicated the Florida Peninsula showed the lowest skill and the northern portion of the study area generally showed higher skill than the southern. E valuation of WD F orecast Table 4 1 shows the evaluation of the analog forecasts and climatology of WDs (mm) at lead day 1 and 5 at five locations in the SEUS. The analog forecast showed better E, RMSE, and r factor than climatology at all the five location s with the stations in Georgia and North Carolina showing better performance than the two stations in Florida for both lead days 1 and 5. In terms of the E and RMSE, which evaluates the deterministic forecast, the analog based WD forecast is more accurate than the climatology based WD forecast. There is a tradeoff between the p factor and r factor, i.e. the percent of observations falling within the upper and lower boundaries of the ensemble and the width of the boundaries of the ensemble. While the climato logy ensemble only shows a slightly higher p factor than the analog ensemble forecast, the analog ensemble forecast shows much lower r factor than the climatology ensemble. This indicates the analog ensemble forecast has a much narrower boundary than the c limatology ensemble but includes a similar percent of observations as the climatology ensemble, i.e. the analog ensemble forecast achieved a better tradeoff than the climatology ensemble. PAGE 104 104 Concluding R emarks In this work, daily medium range probabilistic E To forecasts at high resolution were produced using the analog approach with the reforecast2 dataset. The daily ETo probabilistic forecast showed skill up to lead day 7. The WD forecasts were generated using the daily ETo analog forecast at five locations in the SEUS. The analog based WD forecast showed high accuracy and small uncertainty at lead days 1 and 5. Since the WD is considered to be the amount of water needed for irrigation, this WD forecast based on the ETo analog forecast will be useful for irr igation scheduling. The daily ETo forecast showed higher skill in cool months than in warm months. The lower skill in warm months than in cool months was likely due to convective heating. Convective heating is more frequent in summer time than in winter time. Such convection could cause different weather conditions such as cloudiness and rainfall at small scales, which may not have been well captured by the GEFS model resolution. Over space , the skill was higher in the north than in the south of the SEUS with the Florida Peninsula showing the lowest skill. This is probably due to the influence of the sea breeze over the Florida peninsula (Marshall et al. 2004, Misra et al. 2011). Compared t o the reforecast1 ETo using either natural analogs of the ensemble mean forecast (reforecast1a, Tian and Martinez 2012a) or constructed analogs of all forecast members (reforecast1b, Tian and Martinez 2012b) , the skill of reforecast2 were greatly improved throughout the year (Fig ure 4 8). For example, the overall forecast skill (LEPS skill score) for lead day 1 in January increased from 28% (reforecast1a) and 37% (reforecast1b) to 43%. Moreover, the ETo forecasts using reforecast1a could only produce skillf ul forecasts up to lead day 5. In contrast to the reforecast1a and reforecast1b ETo, the reforecast2 ETo did not show the lowest skill in the mountainous PAGE 105 105 regions at the confluence of North Carolina, South Carolina, and Georgia and the skill of the reforeca st2 was higher than the reforecast1 at all locations over the SEUS (Fig ure 4 8). In moving from the reforecast1 data used in Tian and Martinez (2012a) to the reforecast2 data here, the improvement of ETo forecast skill is due to various reasons. First, the reforecast2 dataset were generated by the operational version GEFS (Hamill et al. 2011) , which has more advanced configurations with higher horizontal and vertical resolution and more advanced methods for data assimilation and ensemble initialization than the 1998 version GFS used by the reforecast1. Second, the reforecast2 provided more of the required variables for the PM equation including Tmax, Tmin, and Rs so that only one approximation (using Tmin to replace Tdew) was made in this study. By contrast, the reforecast1 had a number of variables (Tmax, Tmin, Rs, Tdew, and RH) that needed to be approximated in order to use the PM equation (Tian and Martinez, 2012a). Third, the reforecast2 (~100 km) has much higher spatial resolution than the reforecast1 (~ 250 km). Therefore, the analogs could be selected from a smaller spatial domain (reforecast2), which was more likely to find appropriate analogs compared to a larger spatial domain (reforecast1). For the forecast verification, the use of the forcing datas et of NLDAS 2 instead of the North American Regional Reanalysis (NARR) as used by Tian and Martinez (2012a, 2012b) could also affect the forecast. T he NLDAS 2 forcing data was based on the interpolation of the NARR fields. Rs was bias corrected from the in terpolated NARR using satellite derived Rs (Xia et al. 2012b) over each grid cell using the ratio of their monthly average diurnal cycle . The details of the method for bias correction can be found in Berg et al. (2003) and Pinker et al. (2003). The NLDAS 2 2 m temperature was PAGE 106 106 directly interpolated from NARR with adjustments to account for the vertical difference between the NARR and NLDAS 2 fields of terrain height (Xia et al. 2012a). Nevertheless, t he downscaled and bias corrected ETo forecasts produced by this work are still affected by the biases in the NLDAS 2 fields . By comparing the NLDAS 2 fields with station based observations, we can see biases for ETo, Rs, and Wind in Florida. Since the ETo was calculated by the PM equation, the biases of Rs and Wi nd would be the major contribution to the biases of ETo. The ETo has smaller bias in North Carolina than in Florida, which was due to the smaller bias of Rs and Wind. While the forcing dataset of NLDAS 2 showed bias, we should note that the skill of the fo recast is not related to the bias of the NLDAS 2 data, since the NLDAS 2 data was used for both validation and the selection of forecast analogs. It is worth noting that further research could be conducted to find the optimum number of analogs and size of the search window to improve the ETo forecast skill (e.g. Hamill et al. 2006). Also, ETo forecasts could be verified using ground based observations rather than the forcing data of NLDAS 2. However, such ground based observations would need to be of a si milar length of the reforecast2 dataset in order for forecast analog methods to be effective. In this work Tmin was used to approximate Tdew which was found to be generally valid for the humid southeastern US. This approximation should be validated befor e being applied to other, less humid locations (Allen et al. 1998). PAGE 107 107 Table 4 1 . Evaluation of ensemble and deterministic forecast of water deficits (WDs, mm) at five locations in the SEUS Locations Metrics Analog forecast Climatology Lead 1 Lead 5 Smithfield, NC E 0.87 0.78 0.70 RMSE 0.66 0.86 1.01 p factor 0.94 0.94 0.98 r factor 1.14 1.51 2.09 Gainesville, GA E 0.89 0.82 0.76 RMSE 0.63 0.80 0.92 p factor 0.94 0.95 0.99 r factor 1.05 1.36 1.86 Tifton, GA E 0.85 0.76 0.70 RMSE 0.73 0.91 1.02 p factor 0.93 0.94 0.98 r factor 1.32 1.67 2.15 Saint Leo, FL E 0.79 0.71 0.67 RMSE 0.77 0.90 0.97 p factor 0.94 0.94 0.98 r factor 1.67 1.94 2.36 Perrine, FL E 0.74 0.67 0.64 RMSE 0.74 0.84 0.88 p factor 0.94 0.94 0.97 r factor 1.86 2.05 2.40 PAGE 108 108 Figure 4 1. Study area and an example of a subset of sixteen GEFS grid points and corresponding NLDAS 2 grid points in the Tampa Bay area PAGE 109 109 Figure 4 2. Locations of the observed stations and selected locations for WD forecast PAGE 110 110 Figure 4 3. Box and whisker plots of the NLDAS 2 e a (T min ) (red) and the NLDAS 2 e a ( SH ) (blue) in 12 months at Saint Leo, FL (a) and Smithfield, NC (b). PAGE 111 111 Figure 4 4. Box and whisker plots of NLDAS 2 (red) and observations (blue) for ETo, Tmax, Tmin, Rs, and Wind in 12 months at Alfred Lake, FL (a) and Clayton, NC (b). PAGE 112 112 Figure 4 5. BSS and LEPS skill score (%) as a function of month and lead day. The BSS reflected the extreme and tercile forecasting ski ll (a e); the LEPS skill score reflected the overall forecasting skill (f). PAGE 113 113 Figure 4 6. Reliability (a b) and ROC (c d) diagrams for tercile and extreme forecasts at lead days 1 and 5. PAGE 114 114 Figure 4 7. LEPS skill score (first row, %) and BSS ( second to sixth rows) over the SEUS. The LEPS skill score reflected the overall forecasting skill (first row); the BSS reflected the tercile and extreme forecasting skill (second to sixth rows). PAGE 115 115 Figure 4 8. The first row compares LEPS skill scores (%) for reforecast2, reforecast1a, and reforecast2b at lead days 1 and 5 throughout the year. The second and third rows compare LEPS skill scores (%) for reforecast2, reforecast1a, and reforecast2b at lead days 1 and 5 over the SEUS . PAGE 116 116 CHAPTER 5 IMP ROVING SHORT TERM URBAN WATER DEMAND FORECASTS USING FORECAST ANALOGS OF THE GLOBAL ENSEMBLE FORECAST SYSTEM (GEFS) Chapter Introduction Short term urban water demand forecasts play an important role in decision making of water supply management. Accurate short term water demand forecasts are important for optimizing pumping well, reservoir, and mains operations, balancing water allocations, developing short term water supply strategies, and aiding water supply decision making for an urban area [ e.g. Adamow ski et al. , 2012; Donkor et al. , 2014; Herrera et al. , 2010; Tiwari and Adamowski , 2013] . While longer term urban water demands are determined by various human and natural factors [ House Peters and Chang , 2011] , short term urban water demands are mainly influenced by weather related variables [ Donkor et al. , 2014] . Consequently, precipitation and temperature derivatives are often used in short term urban water demand forecast modeling. Linear regression and time series models such as multiple linear regression (MLR) and autoregressive integrated moving average (ARIMA) type methods have been commonly used for short term urban water demand forecasting [e.g. Adamowski and Chan , 2011; Adamowski et al. , 2012; Tiwari and Adamowski , 2013] . In recent years, many nonlinear approaches, such as multiple nonlinear regression (NMLR), artificial neural networks (ANNs), support vector machines (SVMs), and nonstationary hybrid models have also been adopted to forecast short term water demand. Bougadis et al. [2005] applied ANN to address nonlinearity in short term urban water demand forecasting and found that ANN had higher skill than the linear approaches. Adamowski et al. [2012] compared ANNs, hybrid models of wavelet analysis and ANN (WA ANN), NMLR, and linear models for short term urban water PAGE 117 117 demand forecasts and found WA ANN had the highest forecast skill due to the way WA ANN deals with nonstationarity. The bootstrap technique was combined with forecasting models to assess t he uncertainty associated with water demand models [ Tiwari and Adamowski , 2013] . However, these methods were still based on persistence of weather observations, i.e. using lagged relationships between the weather variables and urban water demand. Weather f orecast information has not been used for forecasting short term urban water demand to date. Outputs of numerical weather prediction models (NWPs) provides weather forecast information over the globe and can be employed for short term urban water demand f orecasts for a given a location. However, a downscaling procedure is required due to the relatively coarse scale output of NWPs. There are two categories of downscaling methods, dynamical downscaling and statistical downscaling [ Fowler et al. , 2007] . Dynam ical downscaling uses outputs from a global scale model as initial and boundary conditions to run a regional scale model and produces forecast information at a finer resolution. However, dynamical downscaling is computationally expensive and suffers from b iases introduced by the driving model [ Abatzoglou and Brown , 2012; Hwang et al. , 2011; Plummer et al. , 2006] . Statistical downscaling develops empirical relationships between outputs from a global scale model and local observations to make local forecasts. Statistical downscaling is computationally efficient and straightforward to apply but requires long term records of both forecasts and observations to develop the empirical relationships. Of many statistical downscaling methods available, analog methods h ave been shown to perform, in general, as wel l as more complicated methods [ Zorita and von Storch , 1999] . Retrospective forecasts (reforecasts) of a NWP, the PAGE 118 118 Global Ensemble Forecast System (GEFS), have recently been developed through archiving nearly 30 y ears of sub daily forecast outputs of the GEFS [ Hamill et al. , 2013] . The GEFS reforecasts can be used to understand forecast model bias, evaluate forecast skill, quantify model uncertainty, and conduct statistical downscaling . Tian and Martinez [2012a, 20 12b, and 2014] have produced skillful short term local reference evapotranspiration forecasts by downscaling the GEFS and its precedent: the Global Forecast System (GFS) using analog approaches. This weather forecast information has potential to improve sh ort term urban water demand forecasts but has not yet been evaluated for water demand forecasts to date. The objectives of this study are: (1) to evaluated forecast analogs of water demand related weather variables forecasts from reforecasts of the GEFS using station based obser vations, and (2) to test whether short term water demand forecasts can be improved using these forecast analogs of water demand related weather variables. Study Area and Data Tampa Bay Water, the largest water supplier in Florida, provides wholesale water for over two million customers in Hillsborough, Pasco, Pinellas Counties, and the cities of St. Petersburg, New Port Richey and Tampa. Short term water demands are forecasted at specific locations known as point of connections (POC s, Figure 5 1) where Tampa Bay Water delivers finished product. Tampa Bay Water has archived weekly water demand data from 1991 to 2010 for each POC. There are over 20 rainfall gages and several temperature stations in this region. The rainfall data for ea ch POC was the average of the nearest rainfall gages. Since temperature is relatively homogenous in this region, temperature data from one station close the center of the region was used for all POCs. We selected temperature and rainfall data covering the PAGE 119 119 same period as the water demand data. Tampa Bay Water has developed one week ahead weekly water demand forecast models based on the auto regressive integrated moving average models with exogenous variables ( ARIMAX ) using preceding water demand data and ra infall and temperature data over the Tampa Bay region [ Asefa and Adams , 2007] . Seven ARIMAX models have been developed for POCs or combinations of POCs over the Tampa Bay region (Figure 5 1). The basic structure of the ARIMAX model is written as [ Box et al ., 2013 ]: ( 5 1) and, ( 5 2) ( 5 3) ( 5 4) where y(t) is output at time t ; u is the exogenous variable; t he parameters n a , n b , and n c ar e the orders of the ARIMAX model; n k is the delay; q is the delay operator. Time series of weekly average daily water demand (WD, million gallons per day, mgd) were differenced to ensure stationary. The identified exogenous input variables are weekly total rainfall (WeekRain, inch), number of rainy days in a week (RainDays), number of consecutive rainy days in a week (CosRainDays), and number of hot days in a week (HotDays). WeekRain was aggregated from daily rainfall. The RainDays, CosRainDays, and HotDays were calculated by counting days with rainfall or temperature exceeding a given threshold (rainfall>0.01 inch and temperature>85 o F). Different ARIMAX structures were tested in order to identify the best model for each POCs or combinations of POCs. The mode l parameters were estimated during the training period from December 5th 1991 to September 16th 2004 and validated during September 23rd 2004 to February 25th 2010. Table 5 1 shows the input variables of the seven optimized ARIMAX models. PAGE 120 120 The WeekRain, Rai nDays, and CosRainDays for those models with combination of POCs were the average of those POCs. Since the temperature data was from one station, the HotDays were the same for all models. The GEFS reforecasts [ Hamill et al. , 2013] have archived nearly 30 y ear data from December 1984 to present. The 2 m temperature (T) and precipitation fields are useful for short term urban water demand forecast. The GEFS reforecasts are ensemble forecasts with 11 forecast members at 1.0 o Ã—1.0 o (~100 km) spatial resolution. The forecasts are archived every three hours from lead day 1 to lead day 8 and six hours from lead day 9 to lead day 16. We calculated the ensemble mean of the 11 forecast members and extracted 16 grid points covering the Tampa Bay region (81 o W 84 o W, 26 o N 29 o N) (Figure 5 1). From lead day 1 to 7, T was averaged into daily values; daily precipitation was obtained by aggregating three hour precipitation on each day. The GEFS weekly forecasts for WeekRain, RainDays, CosRainDays, and HotDays were obtained fro m lead days 1 to 7 using the same method as converting the observed daily values. The GEFS one week ahead weekly forecasts for the four variables were produced daily from December 5th 1991 to February 25th 2010. Methods In order to apply the forecast anal og approach and evaluate the GEFS analog forecasts, moving weekly values of the observations of the weather variables from December 5th 1991 to February 25th 2010 were calculated with an increment of a daily time step. The 1 week ahead forecasts for the fo ur weather variables used in the water demand forecasts as well as the daily T from lead days 1 to 7 were produced using a moving spatial window forecast analog approach [ Hamill and Whitaker , 2006; Hamill et al. , 2006] . This method is briefly described below. The first step was to find the closest PAGE 121 121 analog dates for a given forecast within a domain of 16 grid points of GEFS (Figure 5 1). That is, the root mean square error (RMSE) over the 16 grid points forecast with the historical forecasts within a Â±45 day window of the same day of the year at the same lead. For example, a one week ahead forecast for 15 February 2009 over central Florida (the 1 6 GEFS grid points) was compared against available one week ahead reforecasts within a window from 1 January to 1 April 1991 2010. The RMSE between the current forecast and each reforecast was computed and averaged over the 16 grid points in Figure 5 1. Th e analog dates for that dates with the lowest RMSE. In the second step, the observed analogs for the forecast of th at day (or week) were selected on the dates of the 75 analogs at each of the seven POCs or combination s of POCs in Figure 5 1. Thus, 75 analogs were chosen for this forecast to generate an ensemble forecast for each of the seven POCs or combinations of POCs . Leave one out cross validation was conducted by excluding the current year from the list of potenti al analogs. The deterministic forecasts for each of the weather variables were determined by the ensemble mean of the forecast analogs. The mean squared error skill score (MSESS) was used to evaluate the deterministic forecasts for each POC or combination of POCs and each month. The MSESS is a relative skill that compares the forecasts with the climatology, which was the ensemble mean of all observed values within a window of Â± 30 days of the forecast ing date . The mean squared error of the forecasted value s ( MSE f ) and the mean squared error of the climatology ( MSE c ) were calculated for each POC or combination of POCs and each month. MSESS is calculated by: PAGE 122 122 ( 5 5) MSESS ranges from skill as climatology, negative values indicating the forecast has less skill than climatology, and 1.0 indicating a perfect forecast. The probabilistic forecasts for each of the weather variables were determined by the ensemble of the forecast analogs. The rank probability skill score (RPSS) was employed to evaluate probabilistic forecasts for each POC or combinations of POC and each month. The RPSS measures the cumulative squared error between the categorical forecast probabilities and the observed category relative to climatology [ Wilks , 2011] . The RPSS is defined as: ( 5 6) where RPS f is the ranked probability score of the forecast and RP S c is the ranked probability score of climatology, which was the ensemble of all observed values within the window of Â± 30 days of the forecast ing date . The RPS f and RPS c were calculated as: ( 5 7) and ( 5 8) where J is the number of categories, n is the number of forecasts and observations of a dichotomous event, p i f and p i c are the forecasted probability of the event using the PAGE 123 123 forecasts and climatology, respectively, and I i o = 1 if the event occurred and I i o = 0 if the event did not occur. In this study, we classified the variables into three categories using the terciles of the climatology, which are below normal (< 33 .3 th percentile), near normal (between 33.3th and 66.7th p ercentile), and above normal (> 6 6.7 th percentile) categories . The climatology forecast is 33.3% for each of the categories, by definition. The probability of the forecast for each category was calculated based on the percentage of forecast analogs in each of the categories. The RPSS ranges between to 1.0, and values of the RPSS of 1 indicate perfect skill and values of 0 indicate that the skill of the forecast is equivalent to climatology. The current Tampa Bay short term water demand model is based on p ersistence, which uses weekly water demand and weather variables in the current and previous weeks to predict water demand for the coming week. We need to modify the model in order to apply forecast analogs to the water demand forecast. As Table 5 1 shows, we advanced each of the weather variables except HotDays by one week, since HotDays did not show any skill (we showed this in Section 3). By doing this, the weather variables of the current week (t) in the original model become the next week (t+1) in the modified model. With this modification the forecast analogs can be used in the modified model, since the forecast analogs are the forecast for the next week (t+1). After applying 75 forecast analogs to the modified model, we can get an ensemble of water de mand forecasts with 75 forecast members. The ensemble water demand forecasts were evaluated using the p factor and r factor [ Yang et al. , 2008] . The p factor is the percent of observations within t he 2.5% and 97.5% percentiles of the ensemble PAGE 124 124 forecast . The r factor is the relative width of the 2.5% and 97.5% percentiles of the ensemble forecast . It is calculated as: ( 5 9) where n is the number of days, and represent the 2.5% and 97.5% percentiles of the ensemble forecast at day t i , and obs is the standard deviation of the observations . The goodness of the ensemble forecast is judged on the basis of the p factor and the r factor. The p factor close to 1 (i.e., all obs ervations bracketed by the ensemble forecast) represents a perfect forecast; and r factor close to 1 represents the same uncertainty as the standard deviation . The deterministic water demand forecast s were defined as the median of the 75 member ensemble forecast. The coefficient of determination (R 2 ), Nash Sutcliff e coefficient (E), root mean square error (RMSE), and mean absolute error (MAE) were employed to evaluate the deterministic forecasts. The Nash Sutcliff e coefficient (E) is written as [ Nash and Sutcliffe , 1970] : ( 5 10) where n is the number of days, and represent the observation and forecast at week t i , respectively, and is the average value of the observations over the entire period. The E ranges from to 1, with the value of 1 corresponding to a perfect match of the forecast to the observations and a value of less than 0 represents the forecast is worse than the mean o f observations . PAGE 125 125 Results and Discussion Performances of one week ahead deterministic forecasts and probabilistic forecasts for weather variables are shown in Figures 5 2 and 5 3, respectively. The WeekRain, RainDays, and CosRainDays for each of the seven m odels all showed positive skill for both deterministic and probabilistic forecasts throughout the year with the skill in cool months higher than in warm months (Figures 5 2 and 5 3). While the HotDays showed no skill for deterministic and probabilistic for ecasts in all months, T forecasts from lead days 1 to 7 showed positive skills for all months with the skills higher in winter than in summer. Because HotDays were mostly 0 during most months of a year except summer months, the calculated RMSEs between the forecasts and all potential analogs would be 0. Since the closest analogs w ere selected based on the RMSE , all potential analogs will have equal chance to be selected. As a result, the 75 closest analogs were actually randomly selected from all potential analogs, which may not be as good as using climatology. As noted by Tian and Martinez (2014), the lower skill in cool months than in warm months is likely due to the stronger convective heating in summer in the Tampa Bay region. Such convection could cause different weather conditions such as cloudiness and rainfall at different locations, which may not be well addressed by the GEFS model running at relatively coarse resolution. Figure 5 4 shows performances of predicted water demands for seven modified water demand forecast models driven by forecast analogs during the validation period. The ensemble forecast for each model was represented by the 95PPU (95% prediction uncertainty) band. T he p factor and r factor reflected the percent of observations covered by the 95PPU and relative width of the 95PPU to the standard deviation, respectively. It can be seen that the performances of ensemble forecasts are different PAGE 126 126 among different models, b ut in general, the ensemble forecasts well captured the observations. Since the wider 95PPU band will likely cover more observations, the p factor is high when the r factor is high and vice versa. The p factor and r factor basically achieved very good bala nce for different models. In Figure 5 4, we can see the deterministic forecast visually matched quite well with the observation for each of the models. The quantitative evaluation results for deterministic forecasts are given in Table 5 2. The modified mod els driven by forecast analogs showed higher skill than the model is the modified driven by the observations, i.e. using the observations for week t+1 instead of the forec asts for week t+1 to drive modified model. Since the observation under the same model structure. For model 3, the skill of the original model or the analog driven mod explain why the model 3 could not be further improved by using forecast analogs. It has been shown that the use of forecast analogs of the GEFS has added value to the short term urban water demand forecasts. The modified analog driven water demand forecast models likely performed better than the original water demand models because of the use of the skillful weather forecast information instead of the persistence of weather observations. The ensemble water demand forecast was made by using the forecast analogs accounting for uncertainty of weather forecasts, which allowed for uncertaint y testing of water demand forecasts. While in this study the forecast analogs of the GEFS was used to improve the ARIMAX based water demand model, it can also be employed to improve other types of demand models such as ANN, SVM, or hybrid PAGE 127 127 models (e.g. WA A NN). We expect the use of forecast analogs would contribute additional skills to any forecast demand models. The GEFS informed water demand forecast method employed in this study would not only be beneficial to urban water management and planning in the Ta mpa Bay region but also to other regions of the world. Water demand forecasts can be made at different lead times, namely daily, weekly, and monthly forecasts for different planning purposes. In this study, it has been shown that weekly forecasts could be improved using the GEFS forecast analogs. Since the GEFS forecast produced sub daily forecasts, daily water demand forecasts could also be improved by using this information. The GEFS did not make forecasts for more than 16 days. For monthly forecasts, t he Global Climate Models (GCMs) for long lead forecasts could be considered to improve urban water demand forecasts. For example, the Climate Forecast System version 2 (CFSv2) [ Saha et al. , 2013] have showed skills for forecasting different climate fields such as reference evapotranspiration and its relevant variables [ Tian et al. , 2014] . The forecasts from the CFSv2 could be employed to improve the monthly water demand forecasts, which could be considered as a future work. Chapter Conclusions In this study , we investigated the potential of using the GEFS forecast analogs to improve the short term urban water demand forecasts in the Tampa Bay region. Firstly, we evaluated the GEFS forecast analogs of weather variables related to urban water demand forecasts. The GEFS analogs generally showed high skill for forecasting WeekRain, RainDays, CosRainDays, and T but no skill for forecasting HotDays. The forecast analogs of WeekRain, RainDays, and CosRainDays were then used to drive PAGE 128 128 the modified water demand model b ased on the ARIMAX to produce an ensemble water demand forecasts. The analog driven water demand forecasts mostly showed higher skill than the original forecasts. The ensemble water demand forecasts accounted for uncertainty of weather forecasts, which all owed for assessing demand forecast uncertainty. The weather informed water demand forecast approach employed in this study is not restricted to the ARIMAX based demand forecast model used. It could also work with other well performed demand models such as ANN, SVM, or hybrid models (e.g. WA ANN) to further improve the water demand forecasts. The water demand forecast informed by the GEFS forecast would not only be beneficial to urban water management and planning in the Tampa Bay region but also can be appl ied to other regions of the world. Since the longer lead water demand forecasts are also important for water resources planning, a future work could be using the GCMs to improve long lead urban water demand forecasts. PAGE 129 129 Table 5 1. Variables of original and modified models Model POCs a Variables of Original Model b Variables of Modified Model 1 Little Road, US41, Odessa WD(t), WD(t 1), CosRainDays(t), CosRainDays(t 1), WeekRain(t), HotDays(t) WD(t), WD(t 1), CosRainDays(t+1), CosRainDays(t), WeekRain(t+1), HotDays(t) 2 Cosme WD(t), CosRainDays(t), CosRainDays(t 1) WD(t), CosRainDays(t+1), CosRainDays(t) 3 Lake Bridge WD(t), CosRainDays(t), CosRainDays(t 1), WeekRain(t) WD(t), CosRainDays(t+1), CosRainDays(t ), WeekRain(t+1) 4 Lithia WD(t), CosRainDays(t), RainDays(t), HotDays(t) WD(t), CosRainDays(t+1), RainDays(t+1), HotDays(t) 5 Maytum WD(t), WD(t 1), RainDays(t) WD(t), WD(t 1), RainDays(t+1) 6 Pinellas, Keller WD(t), CosRainDays(t), CosRainDays(t 1), WeekRain(t), HotDays(t) WD(t), CosRainDays(t+1), CosRainDays(t), WeekRain(t+1), HotDays(t) 7 NWH, Lake Park WD(t), CosRainDays(t), CosRainDays(t 1), WeekRain(t), HotDays(t) WD(t), CosRainDays(t+1), CosRainDays(t), WeekRain(t+1), HotDays(t) a, b WD(t), water demand (mgd) in week t; WeekRain(t), weekly total rainfall (inch) in week t; RainDays(t): number of rainy days (>0.01 inch) in week t; CosRainDays(t): consecutive rainy days (>0.01 inch) in week t; HotDays(t): number of hot days (>85F) in week t. PAGE 130 130 Ta ble 5 2. Summary of forecasting performances for the original model, the analog driven (median) model, and the Model Original model Analog driven model (median) "Perfect" model R 2 E RMSE MAE R 2 E RMSE MAE R 2 E RMSE MAE 1 0.66 0.65 1.58 1.24 0.70 0.68 1.48 1.14 0.77 0.76 1.28 0.98 2 0.74 0.73 1.13 0.86 0.75 0.74 1.12 0.85 0.77 0.76 1.07 0.83 3 0.88 0.88 0.42 0.33 0.88 0.87 0.42 0.32 0.89 0.89 0.39 0.29 4 0.67 0.64 2.74 2.07 0.71 0.68 2.58 1.97 0.78 0.75 2.29 1.72 5 0.67 0.65 0.20 0.15 0.70 0.67 0.20 0.15 0.74 0.70 0.19 0.15 6 0.74 0.74 2.84 2.28 0.79 0.77 2.70 2.08 0.82 0.80 2.52 1.92 7 0.65 0.63 1.41 1.10 0.68 0.65 1.36 1.03 0.74 0.71 1.24 0.95 modified model driven by observations. R 2 : coefficient of determination; E: coefficient of efficiency; RMSE: root mean square error; MAE: mean absolute error. PAGE 131 131 Figure 5 1. GEFS grid points and locations of the water demand forecast models. Each of the POCs was labeled. PAGE 132 132 Figure 5 2. Performance of deterministic forecasts (MSESS) for weather variables in different m onths. (a) one week ahead WeekRain forecasts for seven models, (b) one week ahead RainDays forecasts for seven models, (c) one week ahead CosRainDays forecasts for seven models, and (d) one week ahead HotDays forecast and 7 day T forecasts for all models PAGE 133 133 Figure 5 3. Same in Figure 5 2, but for probabilistic forecasts (RPSS). PAGE 134 134 Figure 5 4. Comparison of observed and predicted (median of the ensemble forecast) weekly water demands for seven modified water demand models driven by forecast analogs d uring the validation period from 9/23/2004 to 2/25/2010 . The 95PPU ( 95% prediction uncertainty ) is the area between the 2.5% and 97.5% percentiles of the ensemble forecast . PAGE 135 135 CHAPTER 6 SEASONAL PREDICTION OF REGIONAL REFERENCE EVAPOTRANSPIRATION (ET o ) BASE D ON CLIMATE FORECAST SYSTEM VERSION 2 (CFS v 2) Chapter Introduction Reference evapotranspiration (ETo) is defined as the evapotranspiration from a hypothetical reference crop in an adequately watered condition (Allen et al. 1998). ETo is one of the most im portant hydroclimatic variables for scheduling irrigation, driving hydrologic and crop models, and estimating actual evapotranspiration for a region (Gong et al. 2006) . If ETo can be predicted a few months in advance, it would be beneficial for the water m anagement and irrigation communities for making long term planning decisions. There are many methods to estimate ETo. The FAO 56 Penman Monteith (FAO PM) equation (Allen et al. 1998) is considered a globally valid standardized method to estimate ETo and wa s adopted by Food and Agricultural Organization (FAO) of the United Nations. A major limitation of this method is that it requires a large amount of climatic data input at the land surface level including air temperature, wind speed, solar radiation, and d ew point temperature or relative humidity, which are often not available in many regions. Coupled ocean land atmosphere general circulation models (CGCMs) combine models for the ocean, atmosphere, land surface, and sea ice, and run from several months to one year ahead to produce seasonal forecasts (Troccoli 2010) . CGCMs have been operationally implemented at major weather and climate forecast centers around the world (Palmer et al. 2004; Saha et al. 2006; Yuan et al. 2011) . Recently, the National Centers for Environmental Prediction (NCEP) of the National Oceanic and Atmospheric Administration (NOAA) has improved the physics and resolution of its operational CGCM and updated the forecast system to the second generation, Climate PAGE 136 136 Forecast System version 2 ( CFSv2) (Saha et al. 2010; Saha et al. accepted; Yuan et al. 2011) . There are 29 years (1982 to 2010) of retrospective forecasts (reforecasts or hindcasts) of CFSv2 that are archived by the National Climatic Data Center (NCDC) . Recent studies have used the archived CFSv2 reforecasts for different applications such as evaluating the seasonal forecast skill of soil moisture (Mo et al. 2012) , the South American monsoon (Jones et al. 2012) , meteorological drought (Yoon et al. 2012) , streamflow (Yuan and Wood 2012a; Yuan et al. in press) , east Asian winter monsoon (Jiang et al. 2013) , summer heat waves (Luo and Zhang 2012) , and tornado occurrence (Tippett et al. 2012) . While the CFSv2 has shown potential to improve seasonal forecast sk ill for many applications, studies using CFSv2 to predict seasonal ETo have not been conducted to date. Seasonal predictions of CFSv2 land surface variables can provide valuable information for ETo forecasts. The archived reforecasts of CFSv2 land surface variables (e.g. temperature, wind speed, and solar radiation) provide all of the variables necessary for the FAO PM equation to assess the predictability for ETo seasonal forecasts. Due to the upgraded physics and resolution, the CFSv2 has been shown to h ave the ability to predict near surface air temperature and precipitation for hydrological forecasting at long leads (Yuan and Wood 2012b; Yuan et al. 2011) . Besides near surface air temperature, other land surface variables such as solar radiation and win d speed are important to forecast ETo. However, the predictability of these land surface variables for ETo forecasts has not yet been assessed. ETo seasonal predictions are often needed at the local scale. Because CFSv2 has the horizontal resolution of T1 26 (equivalent to nearly 100 km), it is too coarse to PAGE 137 137 meet local forecasting needs. To provide local predictions of seasonal ETo, CFSv2 forecasts need to be spatially downscaled. In general, there are two categories of downscaling methods: statistical and dynamical (Fowler et al. 2007) . Statistical downscaling employs statistical relationships between the output of a CGCM and local observations and is computationally efficient and straightforward to apply. The statistical downscaling step can also be used to correct systematic bias between CGCM output and observations. One limitation to statistical downscaling methods is that they need long term continuous forecast archives and observations to establish the statistical relationships. For dynamical downscali ng, a CGCM provides the boundary and initial conditions to a regional climate model (RCM) and the RCM runs at a finer resolution to produce local scale forecasts. The errors in the CGCM are typically propagated to the RCM and influence predictions (Hwang e t al. 2011; Yoon et al. 2012). RCMs also have their own errors. Thus, dynamical downscaling require s additional statistical bias correction and is computationally intensive. There are multiple methods for statistical downscaling and bias correction. The b ias correction and spatial disaggregation (BCSD) method is a n interpolation based downscaling technique which has been extensively applied in hydrologic prediction studies (e.g. Christensen et al., 2004; Salathe e t al., 2007; Maurer and Hidalgo 2008 ; Wood et al. 2002; Wood et al. 2004; Yoon et al. 2012 ). The BCSD method consists of bias correction using quantile mapping and then spatial disaggregation, which works effectively to correct both mean and variance of the forecasts using observations. Abatzoglou and Brown (2012) developed the spatial disaggregation with bias correction (SDBC) method by reversing the order of the BCSD procedure s . This modification PAGE 138 138 improved downscaling skill for reproducing local scale temporal statistics of precipitation (Hwang and Graham 2013). The spatial disaggregation (SD) from BCSD or SDBC method could be adapted to an independent method by spatially interpolat ing the anomalies of forecasts to a fine r resolution, and then produce the downscaled forecasts by adding the observed climatology to the interpolated forecast anomalies. Besides the interpolation based downscaling methods, there are parametric approaches (Schaake et al. 2007; Wood and Schaake 2008) and Bayesian merging techniques (Coelho et al. 2004; Luo and Wood 2008; Luo et al. 2007) that have also been used in downscaling seasonal climate forecasts. While the natural analog and constructed analog downscaling methods have shown good performance (Abatzoglou and Brown 2012; Hidalgo et al. 2008; Maurer and Hidalgo 2008; Maurer et al. 2010; Tian and Martinez 2012a, 2012b) , the seasonal reforecast datasets are generally not long enough to perform those analog based downscaling methods since there is a limited number of potential historical ana logs. The objectives of this study were 1) to evaluate the ability of the CFSv2 to produce downscaled ETo seasonal predictions from 0 to 9 month lead, 2) to assess the predictability of the relevant CFSv2 land surface and reference height variables for ET o forecasts, and 3) to compare the skill from two interpolation based statistical downscaling methods to downscale CFSv2 forecasts. As members of the Southeast Climate Consortium (SECC), our goal was t o develop an improved understanding of seasonal climate variability and climate predictability at local to regional scales across the s outheastern United States (SEUS) . The refore, the study region includes Alabama, Georgia, and Florida in the SEUS. Sections 2 and 3 describe the data and methods PAGE 139 139 used in this wo rk. The results are presented in Section 4. Conclusion and discussion are given in Section 5. Data The availability of the long term archived CFSv2 reforecasts makes it feasible to conduct statistical downscaling. The forcing dataset of the North American Land Data Assimilation System Project Phase 2 (NLDAS 2) were used as observations to verify and correct errors of the downscaled forecasts. NCEP CFSv2 Forecasts The CFSv2 is a fully coupled land ocean atmosphere dynamical seasonal prediction system (Saha et al. accepted) . The CFSv2 reforecast archive consists of the NCEP Global Forecast System, the Geophysical Fluid Dynamics Laboratory Modular Ocean Model version 5.0 coupled with a two layer sea ice model, and the four layer Noah land surface model ( Yuan et al. 2011) . The reforecast includes model forecasts from January 1982 to December 2010 with 9 month leads at a T126 resolution (roughly 100 km). The reforecast of monthly means have 28 members in November and 24 members in other months with initial conditions of the 0, 6, 12 and 18Z ( Coordinated Universal Time or UTC) cycles for every 5 day s , starting from the last month after the 7th . For example, the 24 ensemble members for January are the four cycles for each of December 12th, December 17th, December 22th, December 27th, January 1st, and January 6th. The near surface variables of CFSv2 reforecast including 2 m maximum temperature ( Tm ax , K ), minimum temperature ( Tmin , K), mean temperature ( Tmean , K), surface solar radiation ( Rs , W/m 2 ), and 10 m wind speed ( u 10 , m/s) are potentially useful to forecast ETo. Wind speed at 2 m height ( u 2 or referred hereafter as Wind) was estimated from u 1 0 (Allen et al. 1998) : PAGE 140 140 ( 6 1) where z is the measurement height (10 m). The monthly means of those variables were converted to seasonal means by taking the moving average of the three months from January 1982 to December 2010. For the convenience of manipulation, all of those variables were regrided to 1.0 o Ã—1.0 o resolution (Fig ure 6 1). Forcing D ataset of NLDAS 2 The 0.125 o Ã—0.125 o resolution (approximately 12 km) NLDAS 2 forcing data set (Xia et al. 2012b; Xia et al. 2012c) ( Fig ure 6 1) were taken as a surrogate for long term observations to use both for forecast verification and bias correction (as described in section 3) . NLDAS 2 integrates a large quantity of observation based and model reanalysis data to drive land surface models, and executes at 0.125 o Ã— 0.125 o grid spacing over North America with an hourly time step. The NLDAS 2 data provides the same land surface variables as CFSv2 reforecasts required for ETo estimation, including 2 m Tmax, Tmin, Tmean, u 10 , and Rs . The u 10 was converted to u 2 using Equation 6 1. For this work, we used 30 years of data from January 1982 to December 2011 because CFSv2 has a 9 month lead. The NLDAS 2 hourly data were aggregated into daily data and then averaged to monthly means. The mon thly means were converted to seasonal means as was done for CFSv2 reforecasts. The NLDAS 2 fields used in this study were based on the interpolation of the 3 hour time step North American Regional Reanalysis (NARR) (0.3 o Ã—0.3 o ) (Mesinger et al., 2006; Xia et al. 2012b). The NLDAS 2 Rs was bias corrected using satellite derived Rs by Pinker et al. ( 2003) over each grid cell using the ratio of their monthly average diurnal cycle (Xia et al. 2012b); all other NLDAS 2 fields used in this study were directly PAGE 141 141 int erpolated from NARR with or without adjustments to account for the vertical difference between the NARR and NLDAS 2 fields of terrain height (Xia et al. 2012a). The methods of the spatial and temporal interpolation and vertical adjustment were adopted from Cosgrove et al. (2003). Since the NARR data is a hybrid p roduct of model simulations and observ ations, it contains know n biases for either output fields (e.g. Markovic et al., 2009; Vivoni et al., 2008; Zhu and Lette nmaier, 2007) or estimated ETo (Tian an d Martinez 2012b). Thus, biases from the NARR would be propagated to the NLDAS 2 fields and consequently affect ETo estimation in this study. High biases of Rs and precipitation were found in the prior generation of forcing of the NLDAS 2 (Luo et al. 2003) . The validation for the NLDAS 2 fields is still ongoing by the research community. Methods ETo E stimation M ethod s The original PM approach (Jensen et al., 1990; Monteith, 1964; Shuttleworth, 1993) was ation whereby a single surface resistance and a single aerodynamic resistance term represent the transport properties of the vegetated surface , where t he vegetated surface assumed to be complete and uniform . Since the PM equation approximates the stomatal resistance of the entire plant canopy as a single surface resistance, it takes an inherently macro scale view of the evapotranspiration process which is appropriate for application at large scales such as that of CGCMs (e.g. Kingston et al. 2009; Sperna Weiland 2012). The original PM equation is written as: ( 6 2) PAGE 142 142 where evapotranspiration ( ET 2 , which can be converted into ET ( mm ) by dividing by the latent heat of vaporization of water, 1 ], 1 ) is the slope of the plot of saturated vapor pressu re against air temperature, R n is the net radiation 2 ), a 3 ), c p is the specific heat of air at constant 1 K 1 ), D is the vapour pressure deficit (Pa) and is the psychometric 1 ). r a and r s are the aerodynamic and surface 1 ), respectively. The FAO developed the concept of reference evapotranspiration and the standardized FAO PM equatio n by defining the grass reference crop as a hypothetical crop with an assumed height of 0.12 m and standardized height for wind speed, temperature and humidity measurements at 2 m under well watered conditions. ET rates of various vegetation covers are rel ated to the ETo by means of crop coefficients. By introducing this grass reference crop, the FAO PM equation was derived from Equation 6 2 (derivation procedure is shown in Appendix B ) and has the form: ( 6 3) where E To is the reference evapotranspiration (mm/day), R n is the net radiation at the crop surface [MJ m 2 day 1 ] and is determined from the difference between net shortwave radiation (assuming a constant albedo of 0.23) and net outgoing longwave radiation calculated following Allen et al. (1998) ; G is soil heat flux density [MJ m 2 day 1 ], and was considered to be n egligible for the calculations; T is the mean air temperature at 2 m height [ o C]; u 2 is wind speed at 2 m height [m s 1 ]; e s is saturation vapor pressure [kPa]; e a is actual vapor pressure [kPa]; is the slope of the saturation PAGE 143 143 vapor pressure curve [kPa o C 1 ]; and is the psychrometric constant [kPa o C 1 ]. The only required inputs for the FAO PM equation are climatic variables. For details on the calculation of e ach of the terms in Equation 6 3 , the reader is referred to Allen et al . (1998). Since the CFS v2 reforecast and forcing data of NLDAS 2 did not include dew point temperature ( T dew ) or relative humidity ( RH ) that are required to calculate e a , we approximated T dew using Tmin which has been found to be suitable for humid regions (Allen et al. 1998). Sensitivity coefficients (e.g. Gong et al. 2006) were used to evaluate variables that are most important for influencing ETo at different seasons over the SEUS. The forcing data of the NLDAS 2 were used in the estimation the sensitivity coefficient. The s ensitivity coefficient has a non dimensional form written as: ( 6 4) where S V ijk is sensitivity coefficient for V ij k , V i jk is the climatology of i th variable in the j th month at grid point k , V i jk is the difference between the maximum and minimum values of the i th variable in the j th month at grid point k throughout all years, ET o jk is the reference evapotranspiration estimated by Equation 6 3 with inputs of the climatology for each of the variabl e s for the j th month at grid point k , ET o jk is the difference between the estimated ET o by Equation 6 3 with inputs of the maximum and minimum values of each of the variable s in the j th month at grid point k throughout all years. A greater sensitivity co efficient reflects a greater influence of a given variable on ETo in a given month, either negatively or positively. PAGE 144 144 Downscaling M ethods The downscaled ETo seasonal forecasts were produced in two ways. The first was to calculate the coarse scale ETo forec asts using the FAO PM equation with the input of the CFSv2 forecasts of Tmean , Tmax , Tmin, Rs, and W ind and then downscale and bias correct the coarse scale ETo (hereafter ETo1); the second was to downscale and bias correct the CFSv2 forecasts of Tmean , Tm ax , Tmin, Rs, and W ind and then input those variables into the FAO PM equation to calculate the downscaled ETo forecast (hereafter ETo2). Similarly, Yuan and Wood (2012a) conducted downscaling of the CFSv2 streamflow forecasts in two ways: bias correcting streamflow predicted by the integrated land surface model in CFSv2; and downscal ing the meteorological seasonal forecast and using it as input to a well calibrated hydrologic model . Two methods, SD and SDBC were used to downscale each of th ose variables and ETo. SD m ethod The SD method is an interpolation based downscaling method that includes two steps. The first step is to spatially interpolate the anomalies of CFSv2 forecasts to a finer resolution (0.125 o Ã— 0.125 o ) using the inverse distance weighted (I D W) technique. The anomalies were defined as the departure of the reforecasts from the model climatology. The IDW technique estimates values at a point by weighting the influence of nearby data with the most distant data weighted the le ast. The IDW technique consists of two steps : ( 1) compute distances to all the points in the dataset, and ( 2) compute the weight of each point. The w eighting function is th e inverse power of the distance: ( 6 5) PAGE 145 145 and ( 6 6) where is the estimated value, Z is the value of the control points, d i is the distance between the estimated value and the control points, p is the power to which the distance is raised (a powe r of 2 was used in this work) , and ( X,Y ) represents the coordinate. The second step of the SD method is to produce the forecasts by adding the downscaled anomalies to the climatology of NLDAS 2 forcing data. The leave one out cross validation procedure was conducted by leaving the target season (the season being forecasted) out when calculating the NLDAS 2 climatology. SDBC m ethod Compared to the SD method, the SDBC method (Abatzoglou and Brown 2012) has three steps with an additional quantile mapping bias correction procedure. In the first step, the anomalies of the CFSv2 forecasts were interpolated to moderate resolution (0.125 o Ã— 0.125 o ) using the IDW technique. In the second step, the interpolated anomalies of the CFSv2 were bias corrected using the anomalies of the NLDAS 2 forcing data using the quantile mapping technique. In comparison to the SD method, the SDBC method corrects both mean and variance of the forecasts in probability space. This technique has three procedures : ( 1) creating cumulative density functions ( CDFs ) of the anomalies of NLDAS 2 forcing data for each grid point , ( 2) creating CDFs of the CFSv2 anomalies for each grid point and lead time , ( 3) using the quantile mapping that preserves the probability of exceedence of the CFSv2 dat a, but corrects the CFSv2 anomalies to the value that corresponds to the same probability of PAGE 146 146 exceedence from the anomalies of NLDAS 2 . Thus bias corrected data on time i at grid j was calculated as, ( 6 7) w here F(x) and F 1 (x) denote a CDF of the data and its inverse, and subscripts sim and obs indicate downscaled CFSv2 anomalies and NLDAS 2 anomalies , respectively. The final step is to produce the forecasts by adding the bias corrected downscaled anomalies to the climatolog y of NLDAS 2 forcing data . In steps 2 and 3, t he cross validation procedure w as conducted by leaving the target season out when creating the CDFs of the NLDAS 2 anomalies and calculating the NLDAS 2 climatology. Evaluation S tatistics Seasonal skill for bot h deterministic and probabilistic forecasts were evaluated for each NLDAS 2 grid point from lead 0 to 9 and summarized by season. The deterministic forecasts were determined by the ensemble mean of the 24 members. The mean squared error skill score (MSESS) was computed for each grid point, each season, and each lead. MSESS is a relative skill measure that compares the downscaled forecasts with the climatology forecast, which was calculated from NLDAS 2 seasonal values at the same time as the forecast season . The mean squared error of the forecasted values ( MSE f ) and the mean squared error of the climatology ( MSE c ) were calculated for each grid point for each season and lead. MSESS is calculated by: ( 6 8) PAGE 147 147 MSESS ranges from skill as climatology, negative values indicating the forecast has less skill than climatology, and 1.0 indicating a perfect forecast. The ensemble forecast of CFSv2 can b e used as a probabilistic forecast by counting the number of ensemble members falling into three equal categories or terciles. The Brier skill score (BSS) was employed to evaluate the skill of probabilistic forecasts in terciles (above normal, near normal, and below normal) for each grid point, each season, and each lead. The terciles of the NLDAS 2 data were used to divide the forecast into three categories. The climatological forecast was, by definition, 33.3% for all three categories. The BSS is written as (Wilks 2011) : ( 6 9) where BS f is the Brier score of the forecast and BS c is the Brier score of climatology, which was computed from NLDAS 2 seasonal values at the same time as the forecast season. The BS f and BS c were calculated as: ( 6 10) and ( 6 11) where n is the number of forecasts and observations of a dichotomous event, p i f and p i c are the forecasted probability of the event using the forecasts and climatology, respectively, and I i o = 1 if the event occurred and I i o = 0 if the event did not occur. The PAGE 148 148 BSS ranges between to 1.0, and values of the BSS of 1 indicate perfect skill and values of 0 indicate that the skill of the forecast is equivalent to climatology. Results Tables 6 1 and 6 2 show the overall mean MSESS and BSS for downscaled CFSv2 variables and ETo by the SD and SDBC methods in 0 month lead for all seasons. In Table 6 1, in terms of the overall mean skill scores for the CFSv2 variables, Tmax had the highest skill for both deterministic and probabilistic forecasts and was followed by Tmean, Tmin, Rs and Wind. The skill scores for the SDBC method were slightly higher than the SD method. The probabilistic forecasts in the above normal and below normal categories had positive skill, while the forecast skill in the near normal category was all negative. F ailure of the near normal forecast has been found in many previous studies (e.g. van den Dool and Toth 1991 , Barnston et al. 2003). This failure is related to two facts: (1) n arrowing of the forecast probability distribution increases the probability in the near nor mal category and decreases the probability in the outer two categories, but the change is usually not sizeable; and (2) overall shifts in the forecast probability distribution can reduce the probability in the near normal category, but changes the probabil ities in below and above normal categories far more substantially (van den Dool and Toth 1991). Table 6 2 shows, in general, for the overall mean predictive skill of the two ETo methods, ETo2 (downscaling individual variables before calculating ETo) showed slightly higher skill than ETo1 (calculating coarse scale ETo before downscaling), and the SDBC showed slightly higher skill than the SD method. Evaluation of Forecast Skill in Different S easons This section compared the mean forecast skill in differen t seasons averaged over the entire region in 0 month lead. Figure 6 2 shows the forecast skill of the deterministic PAGE 149 149 and tercile probabilistic forecasts in different seasons for the five downscaled CFSv2 variables for the SD and SDBC methods in 0 month lead . In terms of the MSESS and BSS in below and above normal categories, Tmean and Tmax had the highest skill over all seasons; Tmin and Rs were skillful only during the cold seasons; Wind showed minor skill in warm seasons. Figure 6 3 shows a comparison of s kill scores of downscaled ETo for the SD and SDBC methods in different seasons at 0 month lead. Both ETo1 and ETo2 showed skill in cold seasons while the skill dropped below 0 during warm seasons; The two methods for downscaling ETo did not show much diffe rence in terms of the forecast skill, particularly in cool seasons when the skill was positive. Both Figures 6 2 and 6 3 show there is no sizeable difference between the SD and SDBC methods. Since ETo estimation is governed by Tmean, Tmax, Tmin, Rs, and W ind, it would be useful to look at the influence of each variable on ETo. In terms of the sensitivity coefficients in Table 6 3, Tmax and Rs had the greatest influence on ETo, followed by Tmin and Tmean, and Wind showed only slight influence, particularly in warmer seasons. All variables had positive influence on ETo except Tmin that had negative influence. While the influence of Tmax is relatively constant for all seasons, Tmean, Tmin, and Rs have greater influence in warmer seasons than in cooler seasons. During warm seasons, due to the influence of Tmin and Rs on ETo, the poor forecasts of these two variables by CFSv2 would cause the negative skill of the ETo forecast in these seasons. While during cold seasons when Tmax, Tmean, Tmin, and Rs forecasts had relatively good performance, ETo showed positive forecast skill. PAGE 150 150 Evaluation of Forecast S kill over S pace The forecast skill in this section was the skill scores averaged over all seasons in different grid points in 0 month lead. Figures 6 4 and 6 5 show the spatial distribution of the deterministic and probabilistic forecast skill of the downscaled CFSv2 variables (Tmean, Tmax, Tmin, Rs, and Wind) for the SD and SDBC methods for all seasons in 0 month lead, respectively. For both the SD and SDBC methods, Tmean and Tmax had the highest skill over the region followed by Tmin, Rs, and Wind (Fig ure s 6 4 and 6 5). Figure 6 4 shows, for the SD method, both deterministic and probabilistic forecasts for Tmean and Tmax were skillful in most of the region. The MSES S and BSS for Tmin were highest in south Florida, north Georgia, north Alabama, and coastal areas for the below and above normal categories. For deterministic forecasts and above normal forecasts of Rs, there was skill in north Florida, while for the below and near normal forecasts there were no skill in most of the area. The forecasts of Wind did not show skill anywhere in the region with west Florida and Alabama showing the most negative skill. Compared to the other variables, Wind is the most influenced variable by land surface conditions. The failure of the Wind forecasts was likely due to several reasons. First, the boundary layer changes diurnally and seasonally, which makes it difficult to model; second, the turbulent fluxes in the boundary layer (dra g force due to land surface friction) in CFSv2 may not be resolved likely due to the coarse spatial resolution of the model configuration. Figure 6 5 shows that for the SDBC method, although the skill showed a similar spatial pattern with the SD method, th ere were improvements of skill over the whole area. Figure 6 6 show s the average skill scores of the deterministic and probabilistic forecasts of the downscaled ETo1 and ETo2 for the SD and SDBC methods for 0 month PAGE 151 151 lead . For SD method, both ETo1 and ETo2 showed skillful deterministic forecasts for most of the area except south Florida. The forecast skill for ETo1 and ETo2 was very similar. For the SDBC method, in terms of the MSESS and above normal BSS, there were larger areas showing higher skill for ETo2 than for ETo1. There were greater areas showing high skill for the SDBC method than the SD method. The greatest improvement of skill occurred in the near normal forecasts, though the skill was still negative in most of the area. The skill improvement for the SDBC was due to the additional procedure to bias correct the overall shape of the forecast distribution. In terms of t he monthly average of the absolute sensitivity coefficients for each variable in Figure 6 7, Tmax and Rs had the greatest influence on ETo, followed by Tmin, Tmean, and Wind. The a bsolute sensitivity coefficient for each variable was relatively homogeneous over space . Given the influence of Rs on ETo, the low skill of ETo in south Florida is likely caused by the low skill of Rs in the ar ea. Evaluation of Forecast Skill for Different L eads The skill scores in this section were the spatial average over all the grid points for different months and leads. All the contours in Figures 6 7 to 6 10 were smoothed for display purposes. Figures 6 7 and 6 8 demonstrate the predictive skill of the deterministic and probabilistic forecasts of the CFSv2 variables (Tmean, Tmax, Tmin, Rs, and Wind) for all seasons and leads over the entire area for the SD and SDBC method, respectively. In general, both Fig ures indicate Tmean and Tmax had longer skillful leads than Tmin, Rs, and Wind, with Tmin and Rs having skillful leads out to approximately 3 months during the cold seasons. Figure 6 8 shows the MSESS of Tmean and Tmax were skillful at long leads for most seasons of the year. The BSS for Tmean and Tmax indicates that the below and above normal forecasts were skillful at PAGE 152 152 near leads. Tmin only showed skill at near leads for cold seasons and no skill for warm seasons in terms of MSESS and BSS in the outer two categories. The deterministic forecasts for Rs were skillful at long leads for cool seasons, while the probablistic forecasts were skillful at near leads during the cold seasons. The deterministic forecasts for Wind showed some skill out to month 3 lead fo r warm seasons, while there was no skill for other seasons and tercile forecasts. Figure 6 9 shows the SDBC method improved skill for longer leads for below normal and above normal forecasts of Tmax and Rs, and deterministic forecasts of Wind. The greatest improvement occurred for near normal forecasts of Tmax even though the skill was still negative. There were no obvious improvements for the other forecasts. Figures 6 10 and 6 11 show the predictive skill of ETo1 and ETo2 as a function of seasons and lead s for the SD and SDBC methods, respectively. In general, ETo1 showed similar skill to ETo2; and skillful forecasts at long leads occurred in cold seasons (Fig ures 6 10 and 6 11). In Figure 6 10, for the SD method, the predictive skill of the deterministic forecasts of ETo1 and ETo2 had skill at all leads for cold seasons, while there was no skill for warm seasons. Both ETo1 and ETo2 had skill for below normal forecasts at long leads for cold seasons. For above normal forecasts, both ETo1 and ETo2 show skill at long leads out to month 5 in cold seasons. Figure 6 11 shows the SDBC method indicates a slight improvement of skill at longer forecast lead in terms of the above and below normal BSS, and a great improvement of skill at longer leads for the near norma l BSS even though the skill was still negative. The student t test was conducted to compare the skill of SDBC and SD. For ETo1, the improvements were significant (p<0.05) for the below normal BSS from month 6 to month 9 leads, for the PAGE 153 153 near normal BSS at al l leads, and for the above normal BSS from month 6 to month 9 leads; for ETo2, only the near normal BSS from month 6 to month 9 leads and the above normal BSS from month 6 to month 8 leads showed significant improvement (p<0.05). There were no improvements for any other forecasts. Evaluation of Forecast Skill during ENSO E vents Since CFSv2 has been shown to accurately predict the phase of ENSO (Kim et al. 2012), ETo might be better predicted during ENSO events. To evaluate this, we summarized the skill sco res of ETo forecasts where the forecast initial seasons (Figures 6 12 and 6 13) and target seasons (Figure 6 14) were classified as either El NiÃ±o ( Oceanic NiÃ±o Index , ONI exceeds +0.5 o C for at least 5 consecutive over lapping seasons ) or La NiÃ±a events (O NI is below 0.5 o C for at least 5 consecutive over lapping seasons Prediction Center. Figure 6 12 shows the predictive skill of downscaled ETo by the SDBC method in different seasons at 0 month lead during ENSO events. While the ETo forecast skill was still negative in warm seasons, it was positive in cold seasons and higher than the forecast skill for the entire period (Figure 6 3). Figure 6 13 shows the predictive skill of the downscaled ETo by the SDBC method as a function of seasons and leads where forecast initial seasons were during ENSO events. It shows the forecast skill was higher than durin g the entire period at long leads (Figure 6 11) . Figure 6 14 shows the predictive skill of the downscaled ETo by the SDBC method as a function of seasons and leads where the forecast target seasons were during ENSO events. It shows the skill was negative a t long leads even though it showed high skill at 0 month lead . These results indicate that the CFSv2 model can predict ETo with good skill only when the forecast initial seasons were in either the El NiÃ±o or La NiÃ±a phase o f PAGE 154 154 ENSO . The ETo forecasts were no t skillful when the forecast target months were during an ENSO event but the forecast initial months were not. Concluding R emarks ETo is an important hydroclimatic factor for regional water resources planning and management. Based on the seasonal forecast s of Tmean, Tmax, Tmin, Rs, and Wind from the NCEP CFSv2, this study evaluated the deterministic and probabilistic forecasts of seasonal ETo and the predictability of the five relevant variables over the southeastern United States. The ETo was estimated by two methods. The first method (ETo1) calculated the coarse scale ETo using the FAO PM equation with the input from the seasonal forecasts of Tmean, Tmax, Tmin, Rs, and Wind from the CFSv2, and then downscaled the coarse scale ETo to a regional 12 km grid. The second method (ETo2) downscaled each of the five CFSv2 variables to the 12 km grid, and then calculated ETo using the FAO PM equation with those downscaled variables. Two methods of statistical downscaling were tested for all seasons and all leads, sp atial disaggregation (SD) and spatial disaggregation with bias correction (SDBC). The CFSv2 showed potential to make deterministic and probabilistic forecasts of seasonal ETo and the five relevant variables in the southeastern United States. The skill for forecasting seasonal ETo varied with different seasons and across three states of the southeastern U.S. . Overall, the deterministic and probabilistic forecasts for both ETo methods were skillful at longer leads during the cold seasons, but showed no skill at any leads during the warm seasons. The ETo2 had slightly higher skill than ETo1 over space; however, there was little difference in terms of the forecast skill in different seasons or at different leads. In terms of the computational time, the SD metho d is more efficient than the SDBC method since it does not include the quantile mapping bias PAGE 155 155 correction procedure. However the SDBC method slightly improved the probabilistic forecast skill for most of the area except part of south Florida; the skill for p robabilistic forecasts were significantly enhanced over all seasons with longer skillful leads during the cold seasons but the skill for all leads were still negative during the warm seasons. For downscaling of the forecasts of the five variables, the SDB C method improved the skill mostly in warm seasons when the SD method showed minor or negative skill; the improvement of the skill was over most of the study area especially over the areas showing minor or negative skill; the enhancements of the skillful f orecast leads were mostly in warm seasons. Considering the similarity of forecast skill and the efficiency of computational time, ETo1 is the preferred method to ETo2 because it downscaled only one variable (ETo) instead of five. However, ETo2 is helpful to identify the skillful or unskillful variables and accordingly to determine the ETo calculation method. Based on this information, other approximate ETo calculation methods such as the Hargreaves method (Hargreaves and Samani 1985), Turc method (Turc 196 1), or Priestley Taylor method (Priestley and Taylor 1972) that require fewer variables might be as skillful as the PM method to produce ETo forecasts in the region ( Droogers and Allen 2002 ; Martinez and Thepadia 2010; Sperna Weiland et al. 2012; Thepadia and Martinez 2012; Todorovic and Karic 2013). In addition, the replacement of unskillful variables with the climatology from reanalysis datasets has been shown to improve forecast skill (Tian and Martinez 2012a, 2012b). The improvement of skill for the SDB C method over the SD method implies the additional quantile mapping procedure is effective to correct the PAGE 156 156 systematic errors of the CFSv2 forecast since the SDBC method corrects for bias in the entire shape of the distribution . ETo was found to be better predicted by the CFSv2 model when the forecast initial seasons were in either the El NiÃ±o or La NiÃ±a phase o f ENSO . However, the ETo forecasts were not skillful when the forecast target months were during an ENSO event but the forecast initial months were not. This result is consistent with the findings from the CFSv1 model, which was found to be able to capture the i mpact of ENSO on precipitation when the initial conditions already contained the ENSO signal (Yoon et al. 2012). The variability of the ensemble of the forecasted variables was not evaluated in this work. Such variability of the ensemble is most likely d ifferent among different variables. However, downscaling and bias correction procedures could reduce the forecast uncertainty (e.g. Wilks and Hamill 2007). There are different methods to evaluate the uncertainty of ensemble forecasts (e.g. Wang in press). Such a study was beyond the scope of this paper. Future work could be conducted to evaluate the uncertainties associated with each of the forecasted variables and how much the downscaling and bias correction could reduce such uncertainty. The CFSv2 based seasonal prediction of ETo showed moderate skill in cold seasons but no skill in warm seasons. The low performance of ETo prediction in summer was caused by the skill drops of Rs and Tmin. Our explanation is that more convective heating occurs in summer th an in winter. Such convection could generate different weather conditions (e.g. clouds) at a small scale which are not captured by the CFSv2 due to its coarse resolution. When the forecast target is in summer, most of the PAGE 157 157 forecasts except Tmean and Tmax s howed no skill. These variables, including ETo, Tmin, Rs, and Wind, may not be ready for applications in planning and decision making in summer in the SEUS. Since Rs had no skill during summer and was found to be one of the most important influential varia bles for ETo estimation in the SEUS, efforts such as running the CFSv2 at a high grid spacing could be done in order to improve the Rs forecasts in the summer. When the forecast target is in winter, using the forecast product could potentially bring benefi ts for planning or decision making for different sectors even though the skill is moderate. For the application of ETo forecasts in agricultural water management, summer is not the only growing season for many crops in the SEUS due to the warmer climate, t he forecast information in other seasons could be useful for farmers. The evaluation of the economic value for using the seasonal forecasts could be conducted in future work. The ETo forecasts produced in this work were bias corrected using the NLDAS 2 fi elds which were based on interpolation of the NARR. While the NLDAS 2 data was used as a surrogate for observations, it may contain biases compared to station based observations. Therefore, the downscaled and bias corrected CFSv2 predictions produced by th is work are still affected by the biases in the NLDAS 2 data. Nevertheless, the methods for ETo estimation and forecasts used in this work are ready to be extended to other regions. The evaluation of the forecast skill can provide valuable information for users who want to use the seasonal forecast product. The downscaled seasonal ETo forecast product could potentially be used to drive hydrological models, urban water supply and demand models, and crop models, inform irrigation schedules, guide water resour ces planning, assess the risk of climate variability, etc., and thus PAGE 158 158 improve the reliability of decision making and reduce risk in different societal sectors such as water management, resource management, and agricultural production management . PAGE 159 159 Tabl e 6 1. The overall average MSESS and BSS for different downscaled CFSv2 variables in 0 month lead by SD and SDBC methods . The MSESS and BSS ar e evaluate the the overall skill and tercile skill , respectively . Variables SD SDBC MSESS BSS MSESS BSS Below Near Above Below Near Above Tmean 0.367 0.196 0.083 0.158 0.379 0.216 0.048 0.173 Tmax 0.410 0.177 0.165 0.218 0.443 0.254 0.053 0.261 Tmin 0.148 0.012 0.111 0.038 0.145 0.020 0.086 0.025 Rs 0.027 0.092 0.120 0.071 0.020 0.083 0.094 0.053 Wind 0.172 0.100 0.081 0.115 0.055 0.095 0.081 0.111 PAGE 160 160 Table 6 2. As in Table 6 1 but for downscaled ETo Variables SD SDBC MSESS BSS MSESS BSS Below Near Above Below Near Above ETo1 * 0.156 0.070 0.353 0.033 0.052 0.002 0.089 0.034 ETo2 ** 0.188 0.089 0.334 0.043 0.216 0.006 0.212 0.099 * ETo1 was calculated using the CFSv2 variables before downscaling ** ETo2 was calculated using the downscaled CFSv2 variables PAGE 161 161 Table 6 3. Spatial average of monthly sensitivity coefficients for Tmean , Tmax , Tmin, Rs, and Wind over the SEUS. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec S Tmean 0.07 0.10 0.13 0.17 0.21 0.22 0.23 0.22 0.17 0.12 0.08 0.06 S Tmax 0.76 0.78 0.76 0.78 0.77 0.82 0.84 0.81 0.97 0.94 1.04 0.81 S Tmin 0.16 0.18 0.22 0.25 0.30 0.35 0.40 0.40 0.42 0.35 0.33 0.21 S Rs 0.57 0.61 0.66 0.71 0.78 0.82 0.84 0.84 0.77 0.69 0.60 0.56 S Wind 0.14 0.11 0.08 0.06 0.04 0.04 0.03 0.03 0.06 0.10 0.15 0.16 PAGE 162 162 Figure 6 1 . Example of a subset of sixteen (1.0 o Ã— 1.0 o ) grid points covering the Tampa Bay area. The small points denote where NLDAS 2 data are available. Large black squares denote where the CFSv2 reforecast data are available. PAGE 163 163 Figure 6 2. Comparison of skill scores of five CFSv2 downscaled variables by the SD and SDBC methods as a function of season of the year in 0 month lead: (a) MSESS for SD, (b) below normal BSS for SD, (c) near normal BSS for SD, (d) above normal BSS for SD, (e) MSESS for SDBC, (f) below normal BSS for SDBC, (g) near normal BSS for SDBC, and (h) above normal BSS for SDBC. PAGE 164 164 Figure 6 3. Comparison of skill scores of downscaled ETo by the SD and SDBC methods as a function of season of the year in 0 month lead: (a) MSESS, (b) below normal BSS, (c) near normal BSS, (d) above normal BSS. PAGE 165 165 Figure 6 4. The average skill scores of the downscaled CFSv2 variables by the SD method for the deterministic and tercile forecasts across the southeastern U.S. PAGE 166 166 Figure 6 5. As in F igure 6 4 but for the SDBC method PAGE 167 167 Figure 6 6. The average skill scores of the downscaled ETo by the SD and SDBC methods for the deterministic and tercile forecasts across the southeastern U.S. PAGE 168 168 Figure 6 7. Monthly average of absolute sensitivity coefficients for Tmean , Tmax , Tmin, Rs, and Wind over all seasons in the SEUS. PAGE 169 169 Figure 6 8. The average skill scores of downscaled CFSv2 Tmean, Tmax, Tmin, Rs, and Wind (top to bottom) by the SD method as a function of seasons (JFM to DJF) an d lead time (0 to 9 month) . The thick contour denotes 0. PAGE 170 170 Figure 6 9. As in Figure 6 8 but for the SDBC method PAGE 171 171 Figure 6 10. The average skill scores of ETo1 and ETo2 by the SD method as a function of seasons and lead time. The thick contour denotes 0. PAGE 172 172 Figure 6 11. As in Figure 6 10 but for the SDBC method PAGE 173 173 Figure 6 12 . The skill scores of downscaled ETo by the SDBC method as a function of season of the year in 0 month lead during ENSO events: (a) MSESS, (b) below normal BSS, (c) near normal BSS, (d) a bove normal BSS PAGE 174 174 Figure 6 1 3 . The average skill scores of ETo1 and ETo2 by the SDBC method as a function of seasons and lead time . The forecast initial season is during ENSO events . The thick contour denotes 0. PAGE 175 175 Figure 6 1 4 . The average skill scores of ETo1 and ETo2 by the SDBC method as a function of seasons and lead time . The forecast target season is during ENSO events . The thick contour denotes 0. PAGE 176 176 CHAPTER 7 STATISTICAL DOWNSCALING MULTI MODEL FORECASTS FOR SEASONAL PRECIPITATION AND SURFACE TE MPERATURE OVER SOUTHEASTERN USA Chapter Introduction Seasonal climate variability has great impact on many sectors such as agriculture, water resources, human health, ecosystems, transportation and infrastructure. If seasonal climate conditions can be pred icted a few months in advance, there is potential to reduce the damage caused by climate variability or take advantage of the benefits brought by it. There are two aspects that make the seasonal climate predictable: parts of geophysical system such as the oceans and the land are evolving much slower than the atmosphere, and the useful information of long lead seasonal forecasts can be extracted from large averaging in time and space and does not have to be the same quality as a weather forecast (Troccoli 20 10) . Due to the predictability of seasonal climate variability, dynamical models and statistical models have been developed for seasonal forecasting. Weather and climate forecast centers such as the National Centers for Environmental Prediction (NCEP) and the European Center for Medium Range Weather Forecasts (ECMWF) make dynamical predictions using fully coupled ocean atmosphere general circulation models (GCMs). These models have demonstrated skill in forecasting SSTs at global or continental scales (Tro ccoli 2008) , but they cannot accurately predict land surface variables such as precipitation rate (P) and surface air temperature (T2M) beyond 1 month lead consistently in all regions even though these forecasts provide direct information for agricultural management, water resources and irrigation planning, flood and drought forecasting, etc. (e.g. Lavers et al. 2009; Saha et al. 2012; Weisheimer et al. 2009; Yuan et al. 2011) . In addition to this, GCMs are typically configured at effective PAGE 177 177 resolutions of 1 00 to 300 km which are too coarse to represent local features. As an alternative approach to dynamical models, statistical models have often used the observed antecedent SSTs in tropical oceans as predictors of local P and T2M (Troccoli 2008) . While such s tatistical models are black box models and their applications are restricted by locations, they can give comparable and even higher skill for some regions and are much less expensive and more straightforward than dynamical models. While GCM P and T2M for ecasts have coarse resolution and suffer from systematic errors, there are several techniques that can be used to compensate for these limitations. Statistical or dynamical downscaling procedures can to be employed to produce local scale P and T2M forecast s from a GCM (Fowler et al. 2007) . Statistical downscaling employs the statistical relationship between the GCM output and the local observed weather, while dynamical downscaling uses GCM output to provide initial and boundary conditions for a regional cli mate model (RCM) to produce local scale forecasts. Dynamical downscaling is computationally intensive and the errors from both the GCM and the RCM are propagated to the downscaled predictions and typically need an additional statistical post processing ste p to correct for systematic error (Hwang et al. 2011; Yoon et al. 2011) . Compared to dynamical downscaling, statistical downscaling is computationally efficient and straightforward to apply; and the statistical downscaling methods inherently correct the mo del bias. Statistical downscaling methods can be classified into three categories: perfect prognosis (PP), model output statistics (MOS), and weather generators (WGs) (Maraun et al. 2010) . PP approaches calibrate the statistical model (often regression re lated methods) using the observed large scale field (predictor) and observed local scale PAGE 178 178 weather (predictand) and use the GCM forecast field as the predictor in the calibrated statistical model to produce the local scale forecast (Maraun et al. 2010) . The commonly used statistical models for PP are: linear models, generalized linear and additive models, vector generalized linear models, weather type based downscaling, nonlinear regression, and analog methods (e.g. Haylock et al. 2006; Juneng et al. 2010; Ka ng et al. 2007; Tian and Martinez 2012a, 2012b; Yates et al. 2003) . MOS approaches produce the downscaled field by directly calibrating the spatially disaggregated or dynamically downscaled GCM output using observations. Many MOS approaches have been appli ed to downscale GCMs ( e.g. Abatzoglou and Brown 2012; Wood et al. 2004; Yoon et al. 2011), or to compare atmospheric models with fully coupled models (Landman et al. 2012; Ndiaye et al. 2011) . WGs are statistical models that generate random sequences of lo cal scale weather with the temporal and spatial characteristics that resemble the observed weather. For downscaling purposes, WGs need to be used in conjunction with MOS or PP methods ( e.g. Feddersen and Andersen 2005) . Some other statistical downscaling t echniques such as the Bayesian merging technique have also been applied to seasonal forecasts ( e.g. Luo and Wood 2008; Luo et al. 2007; Yuan et al. 2013) . The major limitation of statistical downscaling is it requires long series of retrospective forecasts (reforecasts or hindcasts) and observations to derive the statistical relationship. Another technique to improve GCM seasonal forecast skill is throug h multi model ensembles (MMEs). The MME forecasts combine multiple GCMs and often have higher skill than any individual model ( e.g. Juneng et al. 2010; Kar et al. 2012a; Krishnamurti et al. 2000; Lavers et al. 2009; Yun et al. 2003) . The benefits of MMEs a re due to the PAGE 179 179 increased ensemble size that generates the full spectrum of possible forecasts by including models with different physics, numerics, and initial conditions (Hagedorn et al. 2005) . Several algorithms have been developed to combine MME forecast s to produce probabilistic or deterministic forecasts. These methods vary from the simplest combination of unweighted ensembles or the ensemble mean of different models ( e.g. Hagedorn et al. 2005; Landman and Beraki 2012; Mason and Mimmack 2002; Peng et al . 2002; Tippett and Barnston 2008) , to more complex weighted ensembles using multiple linear regression, maximum likelihood, or Bayesian techniques ( e.g. Bohn et al. 2010; Doblas Reyes et al. 2005; Duan et al. 2007; Kar et al. 2012b; Kharin and Zwiers 2002 ; Raftery et al. 2005) . A number of studies have shown that the latter approaches do not always have higher skill than the simple unweighted ensemble approach especially when the sample size is relatively small ( e.g. Doblas Reyes et al. 2005; Landman and B eraki 2012) . There are strong connections between large scale oceanic atmospheric oscillations and local land surface variables in many regions of the globe. A number of studies have employed such semi empirical relationships to predict long lead land sur face variables such as precipitation and streamflow at several locations using different data driven statistical models ( e.g. Grantz et al. 2005; Juneng et al. 2010; Kalra and Ahmad 2012; Regonda et al. 2006) . The most commonly used oceanic atmospheric osc illations are based on SST anomalies in the tropical Pacific Ocean associated with the El NiÃ±o South Oscillation (ENSO). ENSO has been shown to have significant influence on precipitation and streamflow in the SEUS (Johnson et al. 2013; Schmidt et al. 2001 ; Sun and Furbish 1997) . Many GCMs have shown high skill for PAGE 180 180 predicting tropical SST at long leads ( e.g. Lavers et al. 2009; Saha et al. 2012; Weisheimer et al. 2009) , and such information is attractive for use with the PP downscaling approaches to predict local land surface variables. This raises a question whether seasonal P and T2M at local scale are predictable ction against directly downscaling these land surface fields from GCMs for different locations, seasons, leads, and MME schemes? Thus, it is necessary to conduct a comprehensive assessment of the local scale P and T2M prediction from multiple climate model s using prognosis from SST prediction) for all lead times and seasons over a region. The investigation of these local forecasts can provide valuable information for decision m aking in water and natural resources, agriculture, infrastructure planning, and other sectors at the regional scale. This research was conducted over three states (Alabama, Georgia, and Florida) in the SEUS. The paper is organized as follows: Section 2 de scribes the dataset and methodology used in this study. Section 3 presents results. Discussion and conclusions are given at Section 4. Data and M ethods The D atasets The newly developed hindcast dataset from the NMME system was used in this work. The NMME i s a forecasting system that currently includes GCMs from the Fluid Dynamics Laboratory (GFDL), the International Research Institute for Climate and PAGE 181 181 Society (IRI), the National Ce nter for Atmospheric Research (NCAR), and the National Aeronautics and Space Administration (NASA). The hindcast dataset of the NMME is archived at the IRI. The P and T2M from nine NMME models and SST from seven NMME models have been used in this study. Ta ble 7 1 shows the basic information of the hindcast archives from nine models of the NMME. All model outputs are monthly forecasts at 1.0 o Ã—1.0 o resolution (approximately 100 km) (Fig ure 7 1) with at least 7 month lead time and covering a 29 years period (1 982 to 2010 or 1981 to 2009). All the monthly forecasts of P, T2M, and SST were converted to seasonal means by calculating three month moving averages. Since the hindcasts from GFDL aer04, GMAO, and GMAO 062012 had missing data, this paper mainly focused o n comparing the remaining six models for all seasons and leads over the SEUS. Observed monthly SST in the NiÃ±o3.4 region (Jan 1982 to Dec 2010) (170Â°W 120Â°W, 5Â°S ( http://www.cpc.ncep.noaa.gov/data/indices/ ). Monthly values were converted to three month seasonal means. P and T2M from the National Land Data Assimilation System phase 2 (NLDAS 2) forcing dataset (Xia et al. 2012a; Xia et al. 2012 b) were used as a surrogate for long term observations for forecast verification and downscaling (as described in section 2.b and 2.c). The NLDAS 2 is available at a 0.125 o Ã— 0.125 o (approximately 12 km) (Fig ure 7 1) resolution over North America with an h ourly time step from 1979 to present. The NLDAS 2 forcing dataset was developed for driving land surface models and was mostly derived from the North American Regional Reanalysis (Mesinger et al. 2006) and observations. Seasonal values of P and T2M from Ja nuary 1982 to December 2010 were used in this study. To calculate seasonal P, PAGE 182 182 hourly values were aggregated into daily values, and then averaged into monthly values and converted to three month moving averages. T2M was calculated from daily maximum and min imum temperature (Tmax and Tmin) which were extracted from hourly values, averaged into monthly values, and converted to three month moving averages. The MME S chemes There are a number of algorithms to combine multiple GCMs. This work used two simple MME schemes to combine models of the NMME system to generate both deterministic and probabilistic forecasts. One approach combined all forecast members all member forecasts. Fo r the SuperEns method, the probabilistic forecast was generated using all the forecast members and the deterministic forecast were calculated by averaging all members. The other approach assigned equal weights to each model nEns); the MeanEns assigns equal weights to each models' ensemble means. The probabilistic forecast and deterministic forecast were generated using the ensemble of model ensemble means and by averaging the ensemble means of all models, respectively. The PP Downscaling M ethods The PP downscaling method establishes statistical models using large scale and local scale observed data and applies these models to the GCMs output. Two regression based PP downscaling methods were conducted for each member of the NMM E model. The first PP method was a traditional linear regression (LR) model and the other was a nonparametric locally weighted polynomials regression (LWPR) model. For both regression models, the predictor was the spatial average of NiÃ±o3.4 SSTs and PAGE 183 183 the pr edictand was P or T2M. The regression models were first trained using the observed NiÃ±o3.4 SST and the P or T2M of the NLDAS 2 in the SEUS for each season and each grid point over the SEUS, and then these trained models were applied to the quantile mapping (see Section 2d) bias corrected SST from each of the NMME forecast members to predict P or T2M for all seasons and leads over all the grid points over the SEUS. Before fitting the model, the P data was transformed by taking a square root to remove some of the skewness ( e.g. Dettinger et al. 2004; Hidalgo et al. 2008) . The LR and LWPR models and downscaling procedures are briefly described below. The LR model is widely used in hydroclimate forecasting around the world ( e.g. McCabe and Dettinger 2002; Piecho ta et al. 2001; Singhrattna et al. 2005) . The LR fits a linear function between the predictor (SST) and response variable (P or T2M): t N ( 7 1) where the coefficients a o and a 1 are estimated from the data by minimizing the sum of squares of the errors; t is the residual, which is assumed to follow a normal distribution with a mean of zero and a standard deviation estimated from the data; N is the number of observations. Once th e model is fit, model residuals are then determined and the fitted model can be used to predict the mean value of the response variable ( Y new ) with new values of the predictor (SST from the NMME). To include the uncertainties of the regression model and ge nerate an ensemble forecast for each member, ten deviates were stochastically generated from the normal distribution of t . The ensemble for each forecast member was then generated by adding these ten random deviates to Y new . The main drawbacks of the LR m odel are the assumption of the Gaussian distribution of the data and the fitting of a global linear relationship between variables. PAGE 184 184 If the relationship between variables is nonlinear, higher order functions (quadratic, cubic, etc.) should be considered. However, a robust model is difficult to achieve without datasets of adequate length in order for the local nonlinear features to be well captured. The LWPR model (Loader 1999) is a nonparametric local regression method that can estimate the function locall y and capture any linear or nonlinear feature of the data. While the LWPR model has been successfully used for several hydroclimate applications ( e.g. Grantz et al. 2005; Lall et al. 2006; Singhrattna et al. 2005) knowledge, to use it for downscaling purposes. The basic form of the model is: t N ( 7 2) where the function f can be a linear or nonlinear, t is the residual, which is assumed to follow a normal distribution with zero mean and standard deviation estimated from the data; N is the number of observations. The LWPR method fits the function f locally to estimate Y . In this procedure, the value of th e function f at any point x t is obtained by fitting a polynomial of order p to a small number of neighbors ( K ) to x t using a weighted least squares approach, and then the model residuals are determined. The mean value of a response variable ( Y new ) is estim ated by the fitted polynomial function with a new value of the predictor. This study adopted a similar approach to the LR model to generate an ensemble forecast for each member. This approach has shown good results with a small sample size (Singhrattna et al. 2005) . Similar to the LR, ten deviates were stochastically generated from the normal distribution of t and these random deviates were added to Y new to generate the ensemble for each member of the NMME. In addition to the stochastic generator, the nonp arametric K nearest neighbor (KNN) PAGE 185 185 approach could be used to resample the residuals and generate an ensemble forecast. The benefit of the KNN approach is that there is no normality assumption for the residuals, but this approach only works well with a larg e sample size (Grantz et al. 2005; Prairie et al. 2005; Singhrattna et al. 2005) and thus was not tested in this study . The package LOCFIT (Loader 1999) was used in the fitting process; the optimal values of the neighborhood size ( K ) and the polynomial ord er ( p ) (usually 1 or 2) were obtained by minimizing the generalized cross validation (GCV) score function: ( 7 3) where e i is the error (the difference between the model estimate and observed), N is the number of data points, m is the number of parameters. The MOS Downscaling M ethods The MOS method calibrates the model output against the observations. In this study, two interpolation based MOS downscaling methods, spatial disaggregation (SD) and spatial disaggregation with bias correction (SDBC), were conducted for each member and each lead forecast of the NMME models. The SD and SDBC methods were modified from the widely used bias correction and spatial disaggregation (BCSD) method ( e.g. Christen sen et al. 2004; Maurer et al. 2010; Salathe et al. 2007; Wood et al. 2002; Wood et al. 2004; Yoon et al. 2011) . These two methods are briefly described below. The SD method consists of two steps. The first step is to spatially interpolate the anomalies o f the NMME P and T2M forecasts to the resolution of the NLDAS 2 (0.125 o Ã— 0.125 o ) using the inverse distance weighting (IDW) technique. The anomalies were PAGE 186 186 the difference between the model forecasts and the climatology of the model forecast for each season. The second step is to produce the forecasts by adding the climatology of the NLDAS 2 to the interpolated anomalies of the model forecast. The systematic errors were corrected in this step since the climatology of the NLDAS 2 replaced the climatology of th e model forecasts. The IDW technique first computes distances from the interpolating target grid points to all other grid points in the dataset and then calculates weights based on the distances. The P or T2M anomalies of the interpolating target grid poi nts are the weighted average of the P or T2M anomalies of the NMME grid points. The weighting function is: ( 7 4) and ( 7 5) where is the estimated anomalies, Z is the anomalies at the NMME grid point, d i is the distance between the interpolating target grid point and the NMME grid point, p is the power to which the distance is raised (a power of 2 was used in this work), and ( X,Y ) represents the coordinate. The SDBC method was developed by Abatzoglou and Brown (2012) by reversing the order of the BCSD method, which has been found to improve skill for reproducing local scale temporal statistics (Hwang and Graham 2013) . Compared to the SD method, the SDBC method h as one additional step of bias correction using the quantile mapping PAGE 187 187 technique (Panofsky and Brier, 1958; Wood et al., 2002). The first step is to spatially interpolate the anomalies of the NMME P and T2M forecasts to the NLDAS 2 resolution (0.125 o Ã— 0.125 o ) using the IDW technique. The next step is to use the quantile mapping technique with the anomalies of the NLDAS 2 to bias correct the anomalies of the interpolated model output. The quantile mapping technique includes three steps: (1) creating cumulativ e density functions (CDFs) of the anomalies of NLDAS 2 forcing data for each grid point, (2) creating CDFs of the NMME anomalies for each grid point and lead time, (3) using the quantile mapping that preserves the probability of exceedence of the NMME anom alies, but corrects the NMME anomalies to the value that corresponds to the same probability of exceedence from the anomalies of NLDAS 2. Thus, bias corrected data on time i at grid j was calculated as, ( 7 6) where F(x) and F 1 (x) denote a CDF of the data and its inverse, and subscripts sim and obs indicate downscaled NMME anomalies and NLDAS 2 anomalies, respectively. Thus both the mean and variance of the forecast anomalies were corrected in probability space. The fina l step is to produce the forecast by adding the climatology of the NLDAS 2 to the bias corrected interpolated anomalies of the NMME. Forecast V erification Seasonal skill for both deterministic and probabilistic forecasts for each NMME model and the two MM E schemes were evaluated for each grid point from 0 to 7 month lead and summarized by season. The deterministic forecasts were calculated by the ensemble mean of the forecast members. The mean squared error skill score (MSESS) was computed for each grid p oint, each season, and each lead. The MSESS is a PAGE 188 188 relative skill measure that compares the downscaled forecasts with the climatology forecast, which was the mean of the NLDAS 2 seasonal values in the forecast target season. The mean squared error of the for ecasted values ( MSE f ) and the mean squared error of the climatology ( MSE c ) were calculated for each grid point for each season and lead. The MSESS is then calculated by: ( 7 7) MSESS ranges from as climatology, negative values indicating the forecast has worse skill than climatology, and 1.0 indicating a perfect forecast. The Brier skill score (BSS) was emplo yed to evaluate the skill of probabilistic forecasts in terciles (above normal, near normal, and below normal) for each grid point, each season, and each lead. The terciles were defined from the data of NLDAS 2 in the forecast target season. The equation f or the BSS is (Wilks 2011) : ( 7 8) where BS f is the Brier score of the forecast and BS c is the Brier score of climatology (the NLDAS 2 seasonal values in the forecast target season). The BS f and BS c were calculated as: ( 7 9) and ( 7 10) PAGE 189 189 where n is the number of forecasts and observations of a dichotomous event, p i f is the forecasted probability of the event using the forecasts, and p i c is the forecasted probability of the event using climatology (which is always 33.3%); I i o = 1 if the event occurred and I i o = 0 if the event did not occur. The BSS ranges between values of the BSS of 1 indicate perfect skill and val ues of 0 indicate that the skill of the forecast is equivalent to climatology. Since the forecasts need to be tested on a sample of independent data, a cross validation procedure was conducted for the SD and SDBC methods. We tested both leave one out and l eave three out cross validation procedure for bias correction of SST. For leave one out (leave three out) cross validation, we left the target season (target season plus two neighbor seasons) out when creating the CDFs of the anomalies and calculating the NLDAS squared error of the NMME ensemble forecasts from these two procedures, and found there was no significant difference between them (p value>0.4 for all seasons and leads). In addition, th e inter annual autocorrelation in the data was small and insignificant. Therefore, the leave one out cross validation was justified and was conducted for all bias corrections in SD and SDBC methods. For the LR and LWPR methods, the cross validation procedu re was not required, since the regression models were trained using the NLDAS 2 and tested using the GCM forecasts. Results PAGE 190 190 all near normal probabilistic forecasts showed no skill as many previous studies found (e.g. Van Den Dool and Toth 1991). Mode l P er formance on F orecasting NiÃ±o3.4 SST Since ENSO has significant impact on the seasonal climate variability in the SEUS, we first evaluated the skill of the quantile mapping bias corrected spatial average SST in the NiÃ±o3.4 region for each of the NMME models. Figure 7 2 shows the NiÃ±o3.4 SST f orecast skill for six NMME models as a function of season and lead time. All the models showed high skill in predicting the SST for most leads. While the CFSv1 is the predecessor of the CFSv2, the CFSv1 was more skillful than CFSv2 in fall and winter seaso ns but showed lower skill at far leads when the target seasons were in spring. This result is consistent with Barnston and Tippett (2012) who explained the low skill was mainly caused by a discontinuity in the initial condition climatology of the CFSv2 and this discontinuity turns out to be most prominent in the tropical Pacific region (Kumar et al. 2012; Xue et al. 2011). Overall Mean Forecasting S kill The overall mean forecasting skill was calculated by averaging the skill scores over space and seasons. Tables 7 2 and 7 3 summarize the P and T2M overall mean skill scores for each downscaling method and for each model and MME scheme for the first forecast season (0 month lead), respectively. In general, both Tables 7 2 and 7 3 showed the SDBC and the LWPR methods had higher skill than the SD and the LR methods. This is because the additional quantile mapping bias correction procedure reduced the bias, and the local regression captured the nonlinear feature between variables. For the P forecast, all the mode ls (except the CFSv2) with the PP downscaling methods (LR and LWPR) showed higher skill than the MOS downscaling PAGE 191 191 methods (SD and SDBC); for the T2M forecast, the CFSv2, MeanEns, and SuperEns with the MOS downscaling methods showed higher skill than the PP downscaling methods. Performance of Single M odels In this section, we showed results for deterministic forecasts and above and below normal probabilistic forecasts for single NMME model forecasts. Figures 7 3 to 7 6 show the deterministic and probabilist ic forecast skill for the P and T2M forecasts for six NMME models downscaled by SDBC and LWPR at 0 month lead for winter (DJF) and summer (JJA) over space. For the P forecasts downscaled by the SDBC method in winter, all six models showed noticeable skill or high skill over a great area of the SEUS. But in summer, only CFSv1 and CFSv2 showed limited skill and the other four models showed no skill (Fig ure 7 3). The LWPR method showed noticeable skill or high skill in DJF and limited skill in JJA (Fig ure 7 4 ). For the T2M forecasts downscaled by the SDBC method in winter, only the CFSv2 and the GFDL models showed limited to high skill and the other four models showed no skill or limited skill over the region; in summer, only CFSv1 and CFSv2 showed noticeable or high skill for most of the region and other models mostly showed no skill (Fig ure 7 5). In contrast with the SDBC method, while the LWPR method for the T2M forecasts only showed limited skill (Fig ure 7 6). The P and T2M forecast skill as a function of season and lead were calculated by averaging the skill scores over space. Figures 7 7 to 7 10 show the deterministic and probabilistic forecast skill for P and T2M for six NMME models downscaled by SDBC and LWPR as a function of target season and lead tim e. In Figure 7 7, we compared the PAGE 192 192 forecast target was in summer seasons; while for deterministic forecasts five models showed limited or noticeable skill for all leads for w inter seasons except the CCSM3 showing no skill, for probabilistic forecasts only CFSv1 and CFSv2 show limited skill for winter seasons. Figure 7 8 shows the skill of the P forecasts downscaled by the LWPR method as a function of target season and lead tim e. Figures 7 9 and 7 10 provide the T2M forecast skill as a function of target season and lead time for the SDBC method and the LWPR method, respectively. For the SDBC method, only the CFSv1 and the CFSv2 models showed more than limited skill at near lead over all seasons; the other four models mostly showed no skill at all leads throughout the year (Fig ure 7 9). The LWPR method showed no skill or limited skill for warm seasons even at near lead (Fig ure 7 10). To compare the skill between the LWPR and SDBC methods for each season and lead time, we calculated the mean difference between the skill scores of the LWPR and SDBC over all grid points (the LWPR skill scores minus the SDBC skill scores). Figures 7 11 and 7 12 show, for all leads and seasons, the LWPR method improved Since these comparisons were based on a very large sample size (3385 grid points), the 95% confidence interval were extremely small (around Â±0.005). The se improvements were considered as statistically significant. Performance of Ensemble F orecasts In this section, we report the deterministic and probabilistic forecast skill for the two MME forecasts (MeanEns and SuperEns) and compare these results with th e single models. PAGE 193 193 The P and T2M forecast skill over space were shown for winter (DJF) and summer (JJA) at 0 month lead. For P forecasts, the SDBC SuperEns, and the LWPR SuperEns and MeanEns showed noticeable skill or high skill in winter but limited skill in summer; the SDBC MeanEns showed less skill than the other three methods in both seasons (Fig ure 7 13). For the T2M forecast, the SDBC SuperEns showed high skill or noticeable skill in DJF and JJA over the region; the SDBC MeanEns showed less skill than the SDBC SuperEns but higher skill than the LWPR SuperEns and MeanEns (Fig ure 7 14). The P and T2M forecasting skill at all leads were calculated by averaging the skill scores over space. For the P forecasts, the SDBC SuperEns and the LWPR SuperEns and Mea nEns limited skill or noticeable skill in cool seasons even at far leads but less than limited skill in warm seasons, but the SDBC MeanEns mainly showed no skill for all leads at all seasons (Fig ure 7 15). For the T2M forecast, the SDBC MeanEns showed no s kill after lead 1; the LWPR SuperEns and MeanEns showed noticeable skill in cool seasons and had farther skillful leads than the SDBC SuperEns in cool seasons but no skill in warm seasons; the skill of the SDBC and LWPR methods was complementary for T2M fo recasts (Fig ure 7 16). For both P and T2M forecasts, the SDBC SuperEns always had better performance than the SDBC MeanEns. This was because the models with more forecast members such as CFSv2 showed higher skill and MeanEns assigned less weight than Super Ens to these skillful models. To compare SuperEns with single model forecast for each season and lead time, we calculated the mean difference between the skill scores of SuperEns and every single model over all grid points (SuperEns skill scores minus sing le model skill scores). Due to the limited space, we only compared the forecasts downscaled by the SDBC PAGE 194 194 method. Figures 7 17 and 7 18 show the SuperEns P and T had similar skill to the CFSv2 P and T but higher skill than the other models. As explained in S ection 3.c, these improvements were considered as statistically significant. Discussion and C onclusions We used two MOS approaches (SD and SDBC) and two PP approaches (LR and LWPR) to statistically downscale the NMME models and two MME schemes (SuperEns an d MeanEns) for P and T2M forecasts. The downscaled probabilistic and deterministic forecast skill were assessed over the SEUS for each of the NMME models and model ensembles for all leads and seasons. The SDBC and LWPR showed higher skill than the SD and t he LR method due to the additional quantile mapping bias correction and the local regression that captures the nonlinear features between variables. For the SDBC method, the P forecast of all NMME models and ensembles showed skill for winter seasons but no skill for summer seasons, whereas only the T2M forecast of the CFSv2 and SuperEns showed skill throughout the year over the SEUS. For the LWPR method, both P and T2M forecast skill were similar for all single models and the model ensembles with the winter seasons showing positive skill and summer seasons showing no skill or limited skill. The LWPR is a nonlinear nonparametric local regression method this work was the fir st to apply LWPR for statistical downscaling. The skill for the LWPR method was generally higher than the SDBC method. This result implies the local P is likely to be better predicted indirectly using SSTs than directly downscaling and bias correct the mod el P field, whereas the T2M field could be better predicted by directly downscaling and bias correcting the model output. The PP downscaling methods showed significantly higher skill than the MOS downscaling methods in most cases. PAGE 195 195 While the SuperEns showed higher skill than the MeanEns and showed similar skill to the best single models for the SDBC method, the SuperEns and MeanEns showed similar skill to each other and to the best single models for the LWPR method. In summary, the forecast skill varies amon g different models, ensemble schemes, variables, seasons, and locations. The skillful downscaled forecasts could be useful in agriculture, water resources, human health, ecosystems, transportation and infrastructure; and the downscaling framework could be applied in the other parts of the world. The PP methods used in this study are based on the relationship between the NiÃ±o3.4 SST and the local P and T2M in the SEUS, which generally showed higher skill than the MOS methods that are based on the bias correc tion of the spatially disaggregated raw model output. This indicates the GCMs were able to better simulate large scale ocean variables than the local scale P and T2M, as explained by Juneng et al. (2010). The PP methods successfully enhanced the forecast s kill by statistically relating the large scale climate signal of ENSO with the local P and T2M in the SEUS. For the PP method, the similarity of the skill scores between P and T2M and among different locations suggests the NiÃ±o3.4 SST equally affect the P and T2M at almost every location over the region of interest. To improve the P forecast using the MOS procedure, some predictable low level circulation fields such as geopotential height fields, moisture field at standard pressure levels, thickness fields, etc. can be considered as predictors of the P field (e.g. Landman et al. 2012). Since the NMME has not archived low level circulation fields, we did not consider these fields in this study but it can be explored in the future studies. PAGE 196 196 The use of the PP me thods in this study was justified by two reasons. First, the P and T2M in the SEUS have strong connections with the ENSO signal between October and March, with El NiÃ±o (La NiÃ±a) years tending to be cooler (warmer) and wetter (dryer) during these months (Ro pelewski and Halpert 1986). Second, the NiÃ±o3.4 SST as a primary indicator of ENSO was highly predictable by the NMME models particularly in the winter seasons. However, none of the four statistical downscaled forecasts showed even modest skill in summer s easons. The poor performance of the MOS was likely due to the inability of the GCMs to simulate the strong local convective climate in summer seasons in the SEUS. Since ENSO is only significantly correlated to the local P and T2M in the SEUS in cool season s, it is expected that the PP method would not do well in summer seasons. Other than ENSO, there are other large scale SST based variables associated with the climate in the SEUS including the Pacific Decadal Oscillation and Atlantic Multi Decadal Oscillat ion (e.g. Johnson et al. 2013; Martinez and Jones 2011; Martinez et al. 2009). The forecast skill can be further improved with the PP method in summer seasons if the historical forecasts for the NMME system were long enough to represent these longer term p henomena. The MME forecasts in this study were based on combining all forecast members (SuperEns) and assigning equal weights to each model (MeanEns). For the SDBC method the SuperEns outperformed the MeanEns, whereas the differences in the ensemble metho ds were very small when the downscaling method used is LWPR. This is because for the SDBC method the models with higher skill typically contained a higher number of forecast members, but for the LWPR method the models showed similar skill. Although the ski ll of the SuperEns is generally higher than most of the PAGE 197 197 single models, it did not outperform the best single model. This is likely due to the relatively poor performance of some of the NMME models. If the historical forecasts were long enough to provide a sufficient sample to robustly assign the truly bad forecast members with less weight and truly good forecast members with more weight, some weighted MME schemes (e.g. multiple linear regression, maximum likelihood, and Bayesian techniques) may be considere d to further improve the forecast skill. In addition, it is notable that the skill of the SDBC and LWPR methods was complementary in some cases. The forecasting skill can be improved using a combination of the forecasts downscaled by the SDBC and LWPR met hods. Due to space limitations, this research only evaluated the probabilistic forecast using the BSS measuring the overall probabilistic forecast accuracy without conducting a complete evaluation for forecast systems' ability for discrimination, re liability, sharpness, etc. Such evaluation could provide useful forecast information for end users and could be considered as a future work. PAGE 198 198 Table 7 1. Basic information of the NMME hindcast archives used in this study Model Name Abbreviation Members Perio d Lead Month Resolution Missing Data Reference NCEP CFSv1 CFSv1 15 1981 2009 0 8 1.0 o Ã—1.0 o No Saha et al. 2006 NCEP CFSv2 CFSv2 24 1982 2010 0 9 1.0 o Ã—1.0 o No Saha et al. 2012 NCAR CCSM3.0 CCSM3 6 1982 2010 0 11 1.0 o Ã—1.0 o No Kirtman and Min 2009 IRI ECHAM4p5 AnomalyCoupled ECHAM Anom 12 1982 2010 0 7 1.0 o Ã—1.0 o No DeWitt 2005 IRI ECHAM4p5 DirectCoupled ECHAM Dir 12 1982 2010 0 7 1.0 o Ã—1.0 o No DeWitt 2005 GFDL CM2p1 GFDL 10 1982 2010 0 11 1.0 o Ã—1.0 o No Delworth et al. 2006 GFDL CM2p1 aer04 a GFDL aer04 10 1982 2010 0 11 1.0 o Ã—1.0 o Sep to Dec Delworth et al. 2006 NASA GMAO GMAO 10 1982 2010 0 8 1.0 o Ã—1.0 o Jun and Jul Rienecker et al. 2008 NASA GMAO 062012 a GMAO 062012 12 1982 2010 0 8 1.0 o Ã—1.0 o Jan to May Rienecker et al. 2008 a SST data is not available PAGE 199 199 Table 7 2. Overall mean of precipitation forecast skill for the NMME models in lead 0. The highest skill scores are highlighted in bold for each model and each category. SS Methods Models CFSv1 CFSv2 CCSM3 ECHAM Anom ECHAM Dir GFDL GFDL aer04 a,b GMAO a GMAO 062012 a,b MeanEns SuperEns MSESS SD 0.110 0.164 0.003 0.020 0.018 0.051 0.139 0.163 0.086 0.147 0.161 SDBC 0.107 0.167 0.194 0.069 0.076 0.046 0.260 0.120 0.035 0.042 0.166 LR 0.148 0.131 0.149 0.151 0.152 0.143 0.143 0.153 LWPR 0.168 0.140 0.171 0.176 0.176 0.166 0.166 0.176 BSS Below SD 0.049 0.021 0.349 0.198 0.205 0.148 0.259 0.086 0.106 0.293 0.013 SDBC 0.031 0.042 0.243 0.156 0.185 0.106 0.263 0.034 0.066 0.105 0.036 LR 0.035 0.019 0.031 0.034 0.035 0.026 0.025 0.037 LWPR 0.045 0.021 0.044 0.048 0.048 0.037 0.037 0.048 BSS Above SD 0.011 0.027 0.291 0.152 0.159 0.132 0.282 0.026 0.070 0.168 0.025 SDBC 0.009 0.044 0.252 0.124 0.138 0.106 0.269 0.007 0.041 0.066 0.054 LR 0.040 0.025 0.034 0.039 0.039 0.031 0.030 0.043 LWPR 0.055 0.031 0.049 0.055 0.055 0.048 0.047 0.058 a Incomplete data (see Table 7 1) b SST data was not available PAGE 200 200 Table 7 3. Overall mean of temperature forecast skill for the NMME models in lead 0. The highest skill scores are highlighted in bold for each model and each category. SS Methods Models CFSv1 CFSv2 CCSM3 ECHAM Anom ECHAM Dir GFDL GFDL aer04 a,b GMAO a GMAO 062012 a,b MeanEns SuperEns MSESS SD 0.143 0.367 0.198 0.384 0.421 0.663 1.441 0.313 0.253 0.167 0.195 SDBC 0.150 0.379 0.436 0.133 0.154 0.161 0.683 0.350 0.190 0.190 0.258 LR 0.174 0.088 0.100 0.100 0.099 0.100 0.094 0.106 LWPR 0.235 0.124 0.175 0.187 0.186 0.189 0.184 0.189 BSS Below SD 0.012 0.196 0.407 0.186 0.187 0.188 0.457 0.159 0.125 0.042 0.103 SDBC 0.043 0.216 0.360 0.133 0.150 0.158 0.440 0.143 0.144 0.019 0.135 LR 0.010 0.002 0.003 0.007 0.006 0.003 0.001 0.012 LWPR 0.049 0.036 0.055 0.065 0.065 0.064 0.061 0.071 BSS Above SD 0.007 0.158 0.285 0.214 0.230 0.307 0.485 0.164 0.083 0.098 0.076 SDBC 0.018 0.173 0.269 0.216 0.241 0.295 0.439 0.152 0.099 0.033 0.096 LR 0.002 0.013 0.009 0.008 0.008 0.007 0.013 0.000 LWPR 0.047 0.006 0.039 0.049 0.049 0.065 0.047 0.055 a Incomplete data (see Table 7 1) PAGE 201 201 Figure 7 1. Study area and example of the NMME and NLDAS 2 grid points. PAGE 202 202 Figure 7 2. Bias corrected NiÃ±o3.4 SST forecast skill of six NMME models as a function of season and lead time. PAGE 203 203 Figure 7 3. Precipitation for DJF (left panel) and JJA (right panel) forec ast skill of six NMME models downscaled by the SDBC method at 0 month lead over the SEUS. PAGE 204 204 Figure 7 4. Precipitation for DJF (left panel) and JJA (right panel) forecast skill of six NMME models downscaled by the LWPR method at 0 month lead over the SEUS . PAGE 205 205 Figure 7 5. As in Figure 7 3, but for temperature. PAGE 206 206 Figure 7 6. As in Figure 7 4, but for temperature. PAGE 207 207 Figure 7 7. Precipitation forecast skill for six NMME models downscaled by the SDBC method as a function of season and lead time. PAGE 208 208 Figure 7 8. Pr ecipitation forecast skill for six NMME models downscaled by the LWPR method as a function of season and lead time. PAGE 209 209 Figure 7 9. As in Figure 7 7, but for temperature. PAGE 210 210 Figure 7 10. As in Figure 7 8, but for temperature. PAGE 211 211 Figure 7 11. Mean difference between the forecast skill of the LWPR and the SDBC precipitation over space for each model and lead in different seasons PAGE 212 212 Figure 7 12. Mean difference between the forecast skill of the LWPR and the SDBC temperature over space for eac h model and lead in different seasons PAGE 213 213 Figure 7 13. Precipitation for DJF (left panel) and JJA (right panel) forecast skill of SuperEns and MeanEns downscaled by the SDBC and LWPR methods at 0 month lead over the SEUS. PAGE 214 214 Figure 7 14. As in Figure 7 13, b ut for temperature. PAGE 215 215 Figure 7 15. Precipitation forecast skill for SuperEns and MeanEns downscaled by the SDBC and LWPR methods as a function of season and lead time. PAGE 216 216 Figure 7 16. As in Figure 7 15, but for temperature. PAGE 217 217 Figure 7 17. Mean difference b etween the forecast skill of the SDBC SuperEns and the SDBC individual model precipitation over space for each model and lead in different seasons PAGE 218 218 Figure 7 18. Mean difference between the forecast skill of the SDBC SuperEns and the SDBC individual model temperature over space for each model and lead in different seasons PAGE 219 219 CHAPTER 8 CONCLUSIONS This dissertation was devoted to bridg ing the gap between the weather/ climate forecasts and the regional water resources management. The NWPs and GCMs were evaluated for improving hydroclimate forecasts at different spatial and temporal scales . The weather and climate forecast information were successfully incorporated into decision making in water resources management in the southeast USA. Different hydrocli mate forecasts with their applications ( ETo and its relevant climate fields , P, T2M, irrigation scheduling, and urban water demand ) in medium range and seasonal time scales were produced at the regional level by downscaling NWPs or GCMs outputs. T he specific objectives of this study were assessed in full details and organized in six sections entitled: 1) Forecasting reference evapotranspiration using retrospective forecast analogs in the southeastern United States ; 2) Comparison of two analog based do wnscaling methods for regional reference evapotranspiration forecasts ; 3) The GEFS based daily ETo forecast and its implication for water management in the southeastern United States ; 4) Improving short term urban water demand forecasts using forecast anal ogs of the GEFS; 5) Seasonal prediction of regional ETo based on CFSv2; 6) Statistical downscaling multi model forecasts for seasonal precipitation and surface temperature over southeastern USA . Major conclusions of this dissertation are highlighted by sec tion as follows: Evaluation of the GFS ETo forecasts. T he ETo forecast approaches that combined R2 solar radiation with T , RH and Wind from GFS the reforecasts produced higher skill than methods that estimated parameters using GFS the reforecasts data PAGE 220 220 only . The primary increase in skill was due to the use of relative humidity from the GFS reforecasts and long term climatological mean values of solar radiation from the R2 dataset, indicating its importance in forecasting ETo in the region. While the five c ategorical forecasts were skillful, the skill of upper and lower tercile forecasts was greater than that of lower and upper extreme forecasts and middle tercile forecasts. Most of the forecasts were skillful in the first 5 lead days. Comparison of two ana log based methods. CA showed slightly higher skill than NA in terms of the metrics for deterministic forecasts, while for probabilistic forecasts NA showed higher skill than CA regarding the BSS in five categories (terciles, and 10th and 90th percentiles) and lower skill than CA regarding the LEPS skill score. Both CA and NA produced skillful deterministic results in the first 3 lead days, while the skill was higher for CA than for NA. Probabilistic NA forecasts exhibited higher resolution and reliability t han CA, likely due to a larger ensemble size. Forecasts by both methods showed the lowest skill in the Florida peninsula and in mountainous areas, likely due to the fact that these features were not well resolved in the model forecast. GEFS based ETo and agricultural water deficit forecasts. All forecast skill was generally positive up to lead day 7 throughout the year with higher skill in cooler months compared to warmer months. The GEFS reforecast improved ETo forecast skill for all lead days over the so utheast USA compared to the first generation reforecast. The water deficit forecasts driven by the ETo forecasts showed higher accuracy and less uncertainty than the forecasts driven by climatology, indicating their usefulness for irrigation scheduling, hy drological forecasting, and water demand forecasting in the southeast USA . PAGE 221 221 Improving urban water demand forecasts using the GEFS. The analog approach generally showed high skill for forecasting WeekRain, RainDays, CosRainDays, and T but no skill for fore casting HotDays. The analog driven ARIMAX water demand forecasts mostly showed higher skill than the original ARIMAX forecasts implemented in the region. These results indicated that the GEFS showed promising features for advancing short term urban water d emand forecasts. Seasonal prediction of regional ETo. The downscaled ETo from the coarse scale ETo showed similar skill to those by first downscaling individual variables and then calculating ETo. The sensitivity coefficients showed Tmax and Rs had the greatest influence on ETo, followed by Tmin and Tmean, an d Wind. The downscaled Tmax showed highest predictability, followed by Tmean, Tmin, Rs, and Wind. SDBC had slightly better performance than SD for both probabilistic and deterministic forecasts. The skill was locally and seasonally dependent. The CFSv2 bas ed ETo forecasts showed higher predictability in cold seasons than in warm seasons. The CFSv2 model could better predict ETo in cold seasons during ENSO events only when the forecast initial condition was in either the El NiÃ±o or La NiÃ±a phase of ENSO. Sta tistical downscaling multi model seasonal P and T2M forecasts . The SDBC and the LWPR methods showed higher skill than the SD and the LR methods. Both SDBC and LWPR downscaled P showed skill in winter but no skill or limited skill in summer at all leads for all NMME models. The SDBC downscaled T2M were skillful only for the CFSv2 model even at far leads, whereas the LWPR downscaled T2M showed skill for all NMME models. The LWPR method generally showed significantly higher skill than the SDBC. After bias corr ection, the SuperEns mostly showed higher skill than the PAGE 222 222 MeanEns and most of the single models, but its skill did not outperform the best single model. In summary , the NWPs and GCMs showed promising features for improving regional hydroclimate forecasts. While this study has showed that the NWPs and GCMs can provide useful information for regional water resources management, more efforts could be made to evaluate the benefits of the forecasts in water management, to assess the forecasts at other time scales (e.g. subseasonal time scale), and to develop risk management frameworks for incorporating climate forecast information into practical water management. PAGE 223 223 APPENDIX A MONTEITH EQUATION The terms of the Penma n Monteith equation are calculated or estimated following the recommendations of Allen et al. (1998). is calculated using equation ( A 1 ): (A 1) where: T air temperature [ o C ]. is calculated using equation (A 2) and (A 3) : (A 2) (A 3) where: P is atmospheric pressure [ kPa ]; z is elevation above sea level [m]. e s is calculated using equation (A 4) : ( A 4 ) where: (A 5) T air temperature ( o C). e a can be calculated using equation (A5) with T dew , and thus: PAGE 224 224 where T dew is dew point temperature ( o C). e a can also be calculated using equation (A 6) with RH mean (A 6) where RH mean is the mean relative humidity. As the magnitude of the daily soil heat flux beneath the reference surface is relatively small, it may be ignored and thus: G 0 R ns can be calculated from equation (A 7) and (A 8): R ns =(1 a ) R s (A 7) where a is the albedo which is 0.23 for the hypothetical grass reference crop. Rs [ MJ m 2 day 1 ] is incoming shortwave solar radiation, which can be derived from air temperature differences: (A 8) Where: Tmax maximum air temperature [ o C ], Tmin minimum air temperature [ o C ], k Rs adjustment coefficient (0.175 [ o C 0.5 ]) R a is daily extraterrestrial radiation (Equation A 9). (A 9) where: PAGE 225 225 G sc solar constant = 0.0820 MJ m 2 min 1 , d r inverse relative distance Earth Sun (A 10) w s sunset hour angle [ rad ] (A 11), j latitude [ rad ], d solar decimation [ rad ] (A 12) (A 13) (A 14) (A 1 5) J is Julian date R nl can be calculated from equation (A 16) (A 17) where: s Stefan Boltzmann constant [4.903x10 9 M J K 4 m 2 day 1 ], T max,k maximum absolute temperature during the 24 hour period [ K= o C+273.16 ], T min,k minimum absolute temperature during the 24 hour period [ K= o C+273.16 ], e a actual vapor pressure [ kPa ], R so calculated clear sky radiation [ MJm 2 day 1 ] (A18). R s solar radiation see (A8) (A 18) PAGE 226 226 where: z station elevation above sea level [m]. R a see (A 9). R n can be calculated from equation (A 19): (A 19) u 2 can be calculated from equation (A 20): (A 20) where: z height of measurement above ground surface [ m ]. When z =10m, u 2 =0.75 u z PAGE 227 227 APPENDIX B DERIVATION OF THE FAO PENMAN MONTEITH EQUATION FOR THE HYPOTHETICAL GRASS REFERENCE CROP For the original PM equation, the aerodynamic res istance is estimated using Equation B 1: ( B 1) where: r a is aerodynamic resistance [s m 1 ], z m is height of wind measurements [m], z h is height of humidity measurements [m], d i s zero plane displacement height [m], z om is roughness length governing momentum transfer [m], z oh is roughness length governing transfer of heat and vapor [m], k ], u z is wind speed at height z [m s 1 ]. For a wide range of crops the zero plane displacement height, d [m], and the roughness length governing momentum transfer, z om [m], can be estimated from the crop height h [m] by the following equations: The roughness length governing transfer of heat and vapor, z oh [m], can be approximated by: PAGE 228 228 Assuming a grass reference crop surface with constant crop height of 0.12 m and standardized height for wind speed, temperature and humidity measurements at 2 m ( z m =z h = 2 m), Equation A1 becomes: ( B 2) For the original PM equation, the surface resistan ce is estimated using Equation A3: ( B 3) where: r s is surface resistance [s m 1 ], r 1 is bulk stomatal resistance of the well illuminated leaf [s m 1], LAI active is active (sunlit) leaf area index [m 2 (leaf area) m 2 (soil surface)] Assuming the same grass reference crop surface, a general equation for LAI active is: which takes into consideration the fact that generally only the upper half of dense clipped grass is actively contributing the surface heat and vapor transfer. A general equation for LAI is: where h is the crop height [m]. The stomatal resistance, n , of a single leaf has a value of about 100 s m 1 under well watered conditions. Thus, Equation B 3 becomes: PAGE 229 229 [s m 1 ] ( B 4) Then, Equation B 5 is derived based on Equations B 2 and B 4: ( B 5) In Equation B 2, R n and G is energy available per unit area and expressed in MJ m 2 day 1 . The conversion from energy values to equivalent depths of water or vice versa is given by: Radiation [mm d ay 1 ] = 0.408 Radiation [MJ m 2 day 1 ] ( B 6) The specific heat at constant pressure is given by: And considering the ideal gas law for a : where T kv , the virtual temperature, can be substituted by Results in: [MJ m 2 o C 1 s 1 ] where: c p is specific heat at constant pressure [MJ kg 1 o C 1 ], a is mean air density at constant pressure [kg m 3 ], r a is aerodynamic resistance [s m 1 ], is psychrometric constant [kPa o C 1 ], PAGE 230 230 is ratio molecular weight of water vapor/dry air = 0.622, is latent heat of vaporization [MJ kg 1 ], R is specific gas constant = 0.287 kJ kg 1 K 1 , T is air temperature [ o C], P is atmospheric pressure [kPa], [MJ m 2 o C 1 day 1 ] ( B 7) when divided by ( =2.45), Equation A7 becomes: [mm o C 1 day 1 ] ( B 8) The FAO PM equation (Equation B 3) is then derived by plugging Equations B 5, B 6, and B 8 into Equation B 2. PAGE 231 231 LIST OF REFERENCES Abatzoglou, J. T., and T. J. Brown, 2012: A comparison of statistical downscaling methods suited for wildfire applications. Int. J. Climatol., 32, 772 780. Adamowski, J., and H. F. Chan (2011), A wavelet neural network conjunction model for groundwater level forecasting, Journal of Hydrology, 407(1 4), 28 40. Adamowski, J., H. Fung Chan, S. O. Prasher, B. Ozga Zielinski, and A. Sliusarieva (2012), Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Mon treal, Canada, Water Resources Research, 48(1), W01528. Allen, R. G., L. S. Pereira, D. Raes, and M. Smith, 1998: Crop evapotranspiration Guidelines for computing crop water requirements FAO Irrigation and drainage paper 56. FAO, Rome, Italy. [Available on line at http://www.fao.org/docrep/X0490E/X0490E00.htm .] Asefa, T., and A. Adams (2007), Short Term Urban Water Demand Forecast Models in Action: Challenges from Model Development to Implementati on to Real Time Operations, in World Environmental and Water Resources Congress 2007, edited, pp. 1 7. Barnston, S. J. Mason, L. Goddard, D. G. DeWitt, and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor . Soc., 84, 1783 1796. Barnston, A. G., and M. K. Tippett, 2012: A Comparison of Skill of CFSv1 and CFSv2 Hindcasts of NiÃ±o3. 4 SST. Science and Technology Infusion Climate Bulletin, pp. 18 28, NOAA, Silver Spring, Md. Barsugli, J., C. Anderson, J. Smith, and J. Vogel, 2009: Options for improving climate modeling to assist water utility planning for climate change. Water Utility Climate Alliance White Paper, 146 pp. Berg, A.A., J.S. Famiglietti, J.P. Walker, and P.R. Houser, 2003: Impact of bias correction to reanalysis products on simulations of North American soil moisture and hydrological fluxes, J. Geophys. Res., 108, 4490. Bohn, T. J., M. Y. Sonessa, and D. P. Lettenmaier, 2010: Seasonal Hydrologic Forecasting: Do Multimodel Ensemble Averages Always Yie ld Improvements in Forecast Skill? J. Hydrometeor., 11, 1358 1372. Bolson, J., Martinez, C., Breuer, N., Srivastava, P., and Knox, P. (2013). Climate information use among southeast US water managers: beyond barriers and toward opportunities. Regional Envi ronmental Change, 1 11. PAGE 232 232 Bougadis, J., K. Adamowski, and R. Diduch (2005), Short term municipal water demand forecasting, Hydrological Processes, 19(1), 137 148. Box, G. E., G. M. Jenkins, and G. C. Reinsel (2013), Time series analysis: forecasting and cont rol, 4th edition. John Wiley & Sons, New York, United States, 746 pp. Bracken, C., Rajagopalan, B., and Prairie, J. (2010). A multisite seasonal ensemble streamflow forecasting technique. Water Resources Research, 46(3). Cai, J., Y. Liu, T. Lei, and L. S. Pereira, 2007: Estimating reference evapotranspiration with the FAO Penman Monteith equation using daily weather forecast messages. Agr. Forest Meteor., 145, 22 35. Cai, J., Y. Liu, D. Xu, P. Paredes, and L. Pereira, 2009: Simulation of the soil water bala nce of wheat using daily weather forecast messages to estimate the reference evapotranspiration. Hydrol. Earth Syst. Sc., 13, 1045 1059. Chattopadhyay, S., R. Jain, and G. Chattopadhyay, 2009: Estimating potential evapotranspiration from limited weather da ta over Gangetic West Bengal, India: a neurocomputing approach. Meteor. Appl., 16, 403 411. Chiew, F. H. S., N. N. Kamaladasa, H. M. Malano, and T. A. McMahon, 1995: Penman Monteith, FAO 24 reference crop evapotranspiration and class A pan data in Australi a. Agr. Water Manage., 28, 9 21. Christensen, N. S., Wood, A. W., Voisin, N., Lettenmaier, D. P., and Palmer, R. N.: 2004: The effects of climate change on the hydrology and water resources of the Colorado River basin, Climatic Change, 62, 33 363. Coelho, C., S. Pezzulli, M. Balmaseda, F. Doblas Reyes, and D. Stephenson, 2004: Forecast calibration and combination: A simple Bayesian approach for ENSO. J. Climate, 17, 1504 1516. Cosgrove, B. A., and Coauthors, 2003: Real time and retrospective forcing in the North American Land Data Assimilation System (NLDAS) project, J. Geophys. Res., 108, 8842, doi:10.1029/2002JD003118. Clark, M. P., and L. E. Hay, 2004: Use of medium range numerical weather prediction model output to produce forecasts of streamflow. J. Hyd rometeor., 5, 15 32. Dai, X., H. Shi, Y. Li, Z. Ouyang, and Z. Huo, 2009: Artificial neural network models for estimating regional reference evapotranspiration based on climate factors. Hydrol. Process., 23, 442 450. Dettinger, M., D. Cayan, M. Meyer, and A. Jeton, 2004: Simulated Hydrologic Responses to Climate Variations and Change in the Merced, Carson, and American River Basins, Sierra Nevada, California, 1900 2099. Climatic Change, 62, 283 317. PAGE 233 2 33 Devineni, N., Sankarasubramanian, A., and Ghosh, S. (2008) . Multimodel ensembles of streamflow forecasts: Role of predictor state in developing optimal combinations. Water resources research, 44(9). DeWitt, D. G., 2005: Retrospective Forecasts of Interannual Sea Surface Temperature Anomalies from 1982 to Present Using a Directly Coupled Atmosphere Ocean General Circulation Model. Mon. Weather Rev., 133, 2972 2995. Doblas Reyes, F. J., R. Hagedorn, and T. N. Palmer, 2005: The rationale behind the success of multi model ensembles in seasonal forecasting II. Calibr ation and combination. Tellus A, 57, 234 252. Donkor, E., T. Mazzuchi, R. Soyer, and J. Alan Roberson (2014), Urban Water Demand Forecasting: Review of Methods and Models, Journal of Water Resources Planning and Management, 140(2), 146 159. Droogers, P. an d R. Allen (2002). "Estimating Reference Evapotranspiration Under Inaccurate Data Conditions." Irrigation and Drainage Systems 16(1): 33 45.Fowler, H. J., S. Blenkinsop, and C. Tebaldi, 2007: Linking climate change modelling to impacts studies: recent adva nces in downscaling techniques for hydrological modelling. Int. J. Climatol., 27, 1547 1578. Duan, Q., N. K. Ajami, X. Gao, and S. Sorooshian, 2007: Multi model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour., 30, 1371 138 6. Dutta, D., Welsh, W. D., Vaze, J., Kim, S. S., and Nicholls, D. (2012). A comparative evaluation of short term streamflow forecasting using time series analysis and rainfall runoff models in eWater source. Water Resources Management, 26 (15), 4397 4415. Earth System Research Library Physical Sciences Division (ESRL). 2010. ESRL/PSD Reforecast Products. Retrieved January 5, 2010: http://www.esrl.noaa.gov/psd/forecasts/reforecast/ Feddersen, H., and U. Andersen, 2005: A method for statistical downscaling o f seasonal ensemble predictions. Tellus A, 57, 398 408. FernÃ¡ndez Ferrero, A., SÃ¡enz, J., Ibarra Berastegi, G., 2010. Comparison of the Performance of Different Analog Based Bayesian Probabilistic Precipitation Forecasts over Bilbao, Spain. Mon. Weather Re v. 138(8), 3107 3119. Food and Agricultural Organization of the United Nations (FAO). 2004. Global map of monthly reference evapotranspiration 10 arc minutes, downloadable from: http://www.fao.org/geonetwork/srv/en/metadata.show?id=7416 PAGE 234 234 Fowler, H. J., S. Blenkinsop, and C. Tebaldi. (2007). Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling. International Journal of Climatology, 27(12): 1547 1578. Garcia, M., D. Raes, R. Allen, and C. H erbas, 2004: Dynamics of reference evapotranspiration in the Bolivian highlands (Altiplano). Agr. Forest Meteor., 125, 67 82. Getnet Y, M., 2011: Implications of medium range numerical weather model output in hydrologic applications: Assessment of skill an d economic value. J. Hydrol., 400, 448 464. Global climate projections, in climate change 2007: The Physical Science Basis -Contribution of Working Group I to the Fourth Assessment Report of the Inter governmental Panel on Climate Change. Cambridge Univ. P ress, Cambridge, U.K., pp. 747 845. Golembesky, K., Sankarasubramanian, A., and Devineni, N. (2009). Improved drought management of Falls Lake Reservoir: Role of multimodel streamflow forecasts in setting up restrictions. Journal of Water Resources Plannin g and Management, 135(3), 188 197. Gong, L., C. y. Xu, D. Chen, S. Halldin, and Y. D. Chen, 2006: Sensitivity of the Penman Monteith reference evapotranspiration to key climatic variables in the Changjiang (Yangtze River) basin. J. Hydrol., 329, 620 629. G ong, G., Wang, L., Condon, L., Shearman, A., and Lall, U. (2010). A simple framework for incorporating seasonal streamflow forecasts into existing water resource management practices. Journal of the American Water Resources Association, 46(3), 574 585. Gra ntz, K., Rajagopalan, B., Clark, M., and Zagona, E. (2005). A technique for incorporating large scale climate information in basin scale ensemble streamflow forecasts. Water Resources Research, 41(10), W10410. Grantz, K., Rajagopalan, B., Zagona, E., and Clark, M. (2007). Water management applications of climate based hydrologic forecasts: Case study of the Truckee Carson River Basin. Journal of Water Resources Planning and Management, 133(4), 339 350. Hagedorn, R., F. J. Doblas Reyes, and T. N. Palmer, 20 05: The rationale behind the success of multi model ensembles in seasonal forecasting I. Basic concept. Tellus A, 57, 219 233. Hagedorn, R., T. M. Hamill, and J. S. Whitaker, 2008: Probabilistic Forecast Calibration Using ECMWF and GFS Ensemble Reforecasts. Part I: Two Meter Temperatures. Mon. Wea. Rev., 136, 2608 2619. PAGE 235 235 Hagedorn, R., Buizza, R., Hamill, T. M., Leutbecher, M., and Palmer, T. N. (2012). Comparing TIGGE multimodel forecasts with reforecast calibrated ECMWF ensemble forecasts. Quarte rly Journal of the Royal Meteorological Society, 138(668), 1814 1827. Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434 1447. Hamill, T.M. , Whitaker, J.S., and S.L. Mullen. (2006). Reforecasts: An important dataset for improving weather predictions. Bulletin of the American Meteorological Society 87(1): 33 46. Hamill, T. M., and J. Juras, 2006a: Measuring forecast skill: is it real skill or is it the varying climatology? Q. J. Roy. Meteor. Soc., 132, 2905 2923. Hamill, T. M., and J. S. Whitaker, 2006b: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209 3229. Hamill , T. M., and J. S. Whitaker, 2007: Ensemble calibration of 500 hPa geopotential height and 850 hPa and 2 m temperatures using reforecasts. Mon. Wea. Rev., 136, 2608 2619. Hamill, T. M., R. Hagedorn, and J. S. Whitaker, 2008: Probabilistic forecast calibrat ion using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620 2632. Hamill, T. M., J. S. Whitaker, M. Fiorino, and S. G. Benjamin, 2011: Global Ensemble Kalman Filter. Mon. Weath. Rev., 139, 668 688. Hamill, T. M. (2012). Verification of TIGGE Multimodel and ECMWF Reforecast Calibrated Probabilistic Precipitation Forecasts over the Contiguous United States. Monthly Weather Review, 140(7), 2232 2252. Hamill , T. M., Gary T. B., J. S. Whitaker, D. R. Murray, M. Fiorino, T. J. Galarneau, Y. Zhu, and W. Lapenta. (2013). NOAA's Second Generation Global Medium Range Ensemble Reforecast Dataset. Bulletin of the American Meteorological Hidalgo, H.G., Dettinger, M.D. , Cayan, D.R., 2008. Downscaling with constructed analogues: Daily precipitation and temperature fields over the United States. California Energy Commission PIER Final Project Report CEC 500 2007 123. Hargreaves, G. H., and Samani, Z. A. _1985. Reference c rop evapotranspiration from temperature. Trans. ASABE, 1, 96 99. PAGE 236 236 Haylock, M. R., G. C. Cawley, C. Harpham, R. L. Wilby, and C. M. Goodess, 2006: Downscaling heavy precipitation over the United Kingdom: a comparison of dynamical and statistical methods and their future scenarios. Int. J. Climatol., 26, 1397 1415. Herrera, M., L. Torgo, J. Izquierdo, and R. PÃ©rez GarcÃa (2010), Predictive models for forecasting hourly urban water demand, Journal of Hydrology, 387(1 2), 141 150. House Peters, L. A., and H. Ch ang (2011), Urban water demand modeling: Review of concepts, methods, and organizing principles, Water Resources Research, 47(5), W05401. Hwang, S., W. Graham, J. L. HernÃ¡ndez, C. Martinez, J. W. Jones, and A. Adams, 2011: Quantitative Spatiotemporal Evalu ation of Dynamically Downscaled MM5 Precipitation Predictions over the Tampa Bay Region, Florida. J. Hydrometeor., 12, 1447 1464. Hwang, S., and W. D. Graham, 2013: Development and comparative evaluation of a stochastic analog method to downscale daily GCM precipitation. Hydrol. Earth Syst. Sci., 17, 4481 4502, 2013. Ibarra Berastegi, G., SÃ¡enz, J., Ezcurra, A., ElÃas, A., DÃaz de ArgandoÃ±a, J., and Errasti, I., 2011. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain) using analogues and analogues followed by random forests and multiple linear regression. Hydrol. Earth. Syst. Sc. 15(6), 1895 1907. Imbert, A., Benestad, R., 2005. An improvement of analog model strategy for more reliable local climate change scenarios. Theor. A ppl. Climatol. 82(3), 245 255. Ines, A. V. M., and J. W. Hansen, 2006: Bias correction of daily GCM rainfall for crop simulation studies. Agr. Forest Meteor., 138, 44 53. Ishak, A. M., M. Bray, R. Remesan, and D. Han, 2010: Estimating reference evapotrans piration using numerical weather modelling. Hydrol. Process. 24, 3490 3509. Jensen, M. E., Burman, R. D., and Allen, R. G. 1990: Evapotranspiration and irrigation 70, American Societ y of Civil Engineers, New York. Jiang, X., S. Yang, Y. Li, A. Kumar, W. Wang, and Z. Gao, 2013: Dynamical Prediction of the East Asian Winter Monsoon by the NCEP Climate Forecast System. J. Geophys. Res.: Atmos., 118, doi:10.1002/jgrd.50193. Johnson, N. T ., C. J. Martinez, G. A. Kiker, and S. Leitman, 2013: Pacific and Atlantic sea surface temperature influences on streamflow in the Apalachicola Chattahoochee Flint river basin. J. Hydrol., 489, 160 179. PAGE 237 237 Jones, C., L. M. V. Carvalho, and B. Liebmann, 2012: Forecast Skill of the South American Monsoon System. J. Climate, 25, 1883 1889. Juneng, L., F. T. Tangang, H. Kang, W. J. Lee, and Y. K. Seng, 2010: Statistical Downscaling Forecasts for Winter Monsoon Precipitation in Malaysia Using Multimodel Output Vari ables. J. Climate, 23, 17 27. Kalra, A., and S. Ahmad, 2012: Estimating annual precipitation for the Colorado River Basin using oceanic atmospheric oscillations. Water Resour. Res., 48, W06527, doi:10.1029/2011WR010667. Kang, H., K. H. An, C. K. Park, A. L . S. Solis, and K. Stitthichivapak, 2007: Multimodel output statistical downscaling prediction of precipitation in the Philippines and Thailand. J. Geophys. Res., 34, L15710, DOI: 10.1029/2007GL030730. Kar, S. C., N. Acharya, U. Mohanty, and M. A. Kulkarni , 2012a: Skill of monthly rainfall forecasts over India using multi model ensemble schemes. Int. J. Climatol., 32, 1271 1286. Kar, S. C., N. Acharya, U. C. Mohanty, and M. A. Kulkarni, 2012b: Skill of monthly rainfall forecasts over India using multi model ensemble schemes. Int. J. Climatol., 32, 1271 1286. Kanamitsu, M., W. Ebisuzaki, J. Woollen, S. K. Yang, J. Hnilo, M. Fiorino, and G. Potter, 2002: NCEP DOE AMIP II Reanalysis (R 2). Bull. Amer. Meteor. Soc., 83, 1631 1644. Kharin, V. V., and F. W. Zwiers , 2002: Climate Predictions with Multimodel Ensembles. J. Climate, 15, 793 799. Kirtman, B. P., and D. Min, 2009: Multimodel Ensemble ENSO Prediction with CCSM and CFS. Mon. Weather Rev., 137, 2908 2930. Krishnamurti, T. N., and Coauthors, 2000: Multimodel Ensemble Forecasts for Weather and Seasonal Climate. J. Climate, 13, 4196 4216. Khare, Y. P., C. J. Martinez, and MuÃ±oz Carpena, 2013: Parameter variability and drought models: a study using the Agricultural Reference Index for Drought (ARID). Agron. J., 105: 1417 1432. Kirtman, B.P., and Coauthors. 2013. The North American Multi Model Ensemble (NMME): Phase 1 seasonal to interannual prediction, Phase 2 toward developing intra seasonal prediction. Bulletin of the American Meteorological Society, In Press. Kim, H. M., P. Webster, and J. A. Curry, 2012. Seasonal prediction skill of ECMWF System 4 and NCEP CFSv2 retrospective forecast for the Northern Hemisphere Winter. Clim. Dynam., 39: 2957 2973. PAGE 238 238 Kingston, D.G., M.C. Todd, R.G. Taylor, J.R. Thompson, and N.W . Arnell, 2009: Uncertainty in the estimation of potential evapotranspiration under climate change. Geophysical Research Letters 36, L20403, doi:10.1029/2009GL040267, 2009. Kumar, M., N. Raghuwanshi, and R. Singh, 2011: Artificial neural networks approach in evapotranspiration modeling: a review. Irrigation Sci., 29, 11 25. Kumar, A., and Coauthors, 2012: An Analysis of the Nonstationarity in the Bias of Sea Surface Temperature Forecasts for the NCEP Climate Forecast System (CFS) Version 2. Mon. Weather Rev., 140, 3003 3016. Landeras, G., A. Ortiz Barredo, and J. J. LÃ³pez, 2009: Forecasting weekly evapotranspiration with ARIMA and artificial neural network models. J. Irrig. Drain. E. ASCE, 135, 323 334. Li, Z., Zheng, F. L., Liu, W. Z., 2012. Spatiotempor al characteristics of reference evapotranspiration during 1961 2009 and its projected changes during 2011 2099 on the Loess Plateau of China. Agr. Forest. Meteorol. 154 155(2012), 147 155. Liang, Y., Durrans, S., Lightsey, T., 2002. A revised version of Pn ET II to simulate the hydrologic cycle in southeastern forested areas. J. Am. Water Resour. As. 38(1), 79 89. LÃ³pez Urrea, R., F. MartÃn de Santa Olalla, C. Fabeiro, and A. Moratalla, 2006: Testing evapotranspiration equations using lysimeter observations in a semiarid climate. Agr. Water Manage., 85, 15 26. Lorenz, E.N., 1969. Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci. 26(4), 636 646. Lall, U., Y. I. Moon, H. H. Kwon, and K. Bosworth, 2006: Locally weighted poly nomial regression: Parameter choice and application to forecasts of the Great Salt Lake. Water Resour. Res., 42, W05422, doi:10.1029/2004WR003782 Landman, W. A., and A. Beraki, 2012: Multi model forecast skill for mid summer rainfall over southern Africa. Int. J. Climatol., 32, 303 314. Landman, W. A., D. DeWitt, D. Lee, A. Beraki, and D. LÃ¶tter, 2012: Seasonal Rainfall Prediction Skill over South Africa: One versus Two Tiered Forecasting Systems. Wea. Forecasting, 27, 489 501. Lavers, D., L. Luo, and E. F . Wood, 2009: A multiple model assessment of seasonal climate forecast skill for applications. J. Geophys. Res., 36, L23711, DOI: 10.1029/2009GL041365. PAGE 239 239 Loader, C., 1999: Statistics and Computing: Local regression and likelihood. Springer, New York, 308 pp. Lu, J., Sun, G., McNulty, S.G., Amatya, D.M., 2003. Modeling actural evapotranspiration from forested watersheds across the southeastern United States. JAWRA J. Am. Water Resour. As. 39(4), 886 896. Lu, J., Sun, G., McNulty, S.G., Amatya, D.M., 2005. A co mparison of six potential evapotranspiration methods for regional use in the southeastern United States. JAWRA J. Am. Water Resour. As. 41(3), 621 633. Luo, L., and Y. Zhang, 2012: Did we see the 2011 summer heat wave coming? Geophys. Res. Lett., 39, L0970 8, doi:10.1029/2012GL051383 Luo, L., E. F. Wood, and M. Pan, 2007: Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions. J. Geophys. Res., 112, D10102, doi:10.1029/2006JD007655 Luo, L., and E. F. Wood, 2008: Use of B ayesian merging techniques in a multimodel seasonal hydrologic ensemble prediction system for the eastern United States. J. Hydrometeor., 9, 866 884. Luo, L., and Coauthors, 2003: Validation of the North American Land Data Assimilation System (NLDAS) retro spective forcing over the southern Great Plains, J. Geophys. Res., 108, 8843, doi:10.1029/2002JD003246. Mason, S. J., and G. M. Mimmack, 2002: Comparison of Some Statistical Methods of Probabilistic Forecasting of ENSO. J. Climate, 15, 8 29. Markovic, M., C. G. Jones , K. Winger , and D. Paquin, 2009. The surface radiation budget over North America: Gridded data assessment and evaluation of regional climate models. Int. J. Climatol., 29, 2226 2240. Maraun, D., and Coauthors. (2010). Precipitation downscalin g under climate change: Recent developments to bridge the gap between dynamical models and the end user. Reviews of Geophysics, 48(3). Marshall, C. H., R. A. Pielke, L. T. Steyaert, and D. A. Willard, 2004: The Impact of Anthropogenic Land Cover Change on the Florida Peninsula Sea Breezes and Warm Season Sensible Weather. Mon. Wea. Rev., 132, 28 52. Martinez, C. J., and J. W. Jones, 2011: Atlantic and Pacific sea surface temperatures and corn yields in the southeastern USA: lagged relationships and forecast model development. Int. J. Climatol., 31, 592 604. Martinez, C. J., G. A. Baigorria, and J. W. Jones, 2009: Use of climate indices to predict corn yields in southeast USA. Int. J. Climatol., 29, 1680 1691. PAGE 240 240 Martinez, C. and M. Thepadia (2010). "Estimating Reference Evapotranspiration with Minimum Data in Florida." J. Irrig. Drain. Eng., 136: 494 501. Maurer, E., and H. Hidalgo, 2008: Utility of daily vs. monthly large scale climate data: An intercomparison of two statistical downscaling methods. Hydrol. Ear th Syst. Sc., 12, 551 563. Maurer, E., H. Hidalgo, T. Das, M. Dettinger, and D. Cayan, 2010: The utility of daily large scale climate data in the assessment of climate change impacts on daily streamflow in California. Hydrol. Earth Syst. Sc., 14, 1125 1138 . Matulla, C., Zhang, X., Wang, X.L., Wang, J., Zorita, E., Wagner, S., and von Storch, H., 2008. Influence of similarity measures on the performance of the analog method for downscaling daily precipitation. Clim. Dynam. 30(2), 133 144. McCabe, G. J., and M. D. Dettinger, 2002: Primary Modes and Predictability of Year to Year Snowpack Variations in the Western United States from Teleconnections with Pacific Ocean Climate. J. Hydrometeor., 3, 13 25. Meinke, H., G.L. Hammer, and P. Want, 1993. Potential soil water extraction by sunflower on a range of soils. Field Crops Res. 32, 59 81. Mesinger, F., and Coauthors, 2006: North American regional reanalysis. Bull. Amer. Meteor. Soc., 87, 343 360. Misra, V., L. Moeller, L. Stefanova, S. Chan, J. J. O'Brien, T. J. Smith, III, and N. Plant, 2011: The influence of the Atlantic Warm Pool on the Florida panhandle sea breeze. J. Geophys. Res., 116, doi:10.1029/2010JD015367. Mo, K. C., S. Shukla, D. P. Lettenmaier, and L. C. Chen, 2012: Do Climate Forecast System (CFSv2) forecasts improve seasonal soil moisture prediction? Geophys. Res. Lett., 39, L23703, doi:10.1029/2012GL053598. Monteith, J. L. Evaporation and environment. Symp. Soc. Exp. Biol., 19, 205 234 (1964) Nash, J. E. and J. V. Sutcliffe, 1970: River flow foreca sting through conceptual models part I A discussion of principles, J. Hydrol., 10, 282 290. Ndiaye, O., M. N. Ward, and W. M. Thiaw, 2011: Predictability of Seasonal Sahel Rainfall Using GCMs and Lead Time Improvements Through the Use of a Coupled Model. J. Climate, 24, 1931 1949. New, M., Lister, D., Hulme, M., Makin, I., 2002. A high resolution data set of surface climate over global land areas. Clim. Res. 21, 1 25. NRCS. 1986. Urban hydrology for small watersheds. 210 VI TR 55, 2nd ed. U.S. Gov. Print. Office, Washington, DC. PAGE 241 241 O'Connor, R. E., Yarnal, B., Dow, K., Jocoy, C. L., and Carbone, G. J. (2005). Feeling at risk matters: water managers and the decision to use forecasts. Risk Analysis, 25(5), 1265 1275. Ozkan, C., O. Kisi, and B. Akay, 2011: Neura l networks with artificial bee colony algorithm for modeling daily reference evapotranspiration. Irrigation Sci., 29, 431 441. Pal, M., and S. Deswal, 2009: M5 model tree based modelling of reference evapotranspiration. Hydrol. Process., 23, 1437 1443. Pal mer, T., and Coauthors, 2004: Development of a European multi model ensemble system for seasonal to inter annual prediction (DEMETER). Bull. Am. Meteor. Soc., 85, 853 872. Panofsky, H. A., and G. W. Brier, 1958: Some applications of statistics to meteorolo gy. The Pennsylvania State University, 224pp. Peng, P., A. Kumar, H. van den Dool, and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res.: Atmos., 107, 4710, doi:10.1029/2002JD002712. Plum mer, D. A., D. Caya, A. Frigon, H. CÃ´tÃ©, M. GiguÃ¨re, D. Paquin, S. Biner, R. Harvey, and R. de Elia (2006), Climate and Climate Change over North America as Simulated by the Canadian RCM, Journal of Climate, 19(13), 3112 3132. Piechota, T., F. Chiew, J. Dracup, and T. McMahon, 2001: Development of Exceedance Probability Streamflow Forecast. J. Hydrol. Eng., 6, 20 28. Pinker, R.T., and Coauthors. (2003), Surface radiation budgets in support of the GEWEX Continental Scale International Project (GCIP) and th e GEWEX Americas Prediction Project (GAPP), including the North American Land Data Assimilation System (NLDAS) project, J. Geophys. Res., 108, 8844. Prairie, J., B. Rajagopalan, T. Fulp, and E. Zagona, 2005: Statistical Nonparametric Model for Natural Salt Estimation. J. Environ. Eng., 131, 130 138. Priestley, C.H.B. and R.J. Taylor. 1972. On the assessment of surface heat flux and evaporation using large scale parameters. Mon. Weath. Rev., 100, 81 92. Potts, J., C. Folland, I. Jolliffe, and D. Sexton, 1996 : Revised" LEPS" scores for assessing climate model simulations and long range forecasts. J. Climate, 9, 34 53. Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian Model Averaging to Calibrate Forecast Ensembles. Mon. Weathe r Rev., 133, 1155 1174. PAGE 242 242 Rayner, S., Lach, D., and Ingram, H. (2005). Weather forecasts are for wimps: why water resource managers do not use climate forecasts. Climatic Change, 69(2 3), 197 227. Regonda, S. K., B. Rajagopalan, M. Clark, and E. Zagona, 2006 : A multimodel ensemble forecast framework: Application to spring seasonal flows in the Gunnison River Basin. Water Resour. Res., 42, W09404, DOI: 10.1029/2005WR004653. Rienecker, M. M., and Coauthors, 2008: The GEOS 5 Data Assimilation System Documentatio n of versions 5.0. 1, 5.1. 0, and 5.2. 0. NASA Tech. Rep. TM 2008 104606, 27, 118. Ropelewski, C. F., and M. S. Halpert, 1986: North American Precipitation and Temperature Patterns Associated with the El NiÃ±o/Southern Oscillation (ENSO). Mon. Weather Rev., 114, 2352 2362. Saha, S., and Coauthors, 2006: The NCEP climate forecast system. J. Climate, 19, 3483 3517. Saha, S., and Coauthors, 2010: The NCEP climate forecast system reanalysis. Bull. Am. Meteor. Soc., 91, 1015 1057. Saha, S., and Coauthors. (2013). The NCEP Climate Forecast System Version 2. Journal of Climate, In Press. Salathe Jr, E.P., Mote, P.W., Wiley, M.W., 2007. Review of scenario selection and downscaling methods for the assessment of climate change impacts on hydrology in the United States pacific northwest. International Journal of Climatology, 27(12): 1611 1621. Sankarasubramanian, A., Lall, U., Souza Filho, F. A., and Sharma, A. (2009a). Improved water allocation utilizing probabilistic climate forecasts: Short term water contracts in a r isk management framework. Water Resources Research, 45(11), W11409. Schaake, J., and Coauthors, 2007: Precipitation and temperature ensemble forecasts from single value forecasts. Hydrol. Earth Syst. Sci. Discuss., 4, 655 717. Schmidt, N., E. K. Lipp, J. B . Rose, and M. E. Luther, 2001: ENSO Influences on Seasonal Rainfall and River Discharge in Florida. J. Climate, 14, 615 628. Shrestha, D. L., Robertson, D. E., Wang, Q. J., Pagano, T. C., and Hapuarachchi, H. A. P. (2013). Evaluation of numerical weather prediction model precipitation forecasts for short term streamflow forecasting purpose. Hydrology and Earth System Sciences, 17(5), 1913 1931. PAGE 243 243 Shuttleworth, W.J. 1993: Chapter 4 Evaporation. In: Handbook of Hydrology. Maidment, D.R. (ed). McGraw Hill. Sin ghrattna, N., B. Rajagopalan, M. Clark, and K. Krishna Kumar, 2005: Seasonal forecasting of Thailand summer monsoon rainfall. Int. J. Climatol., 25, 649 664. Srivastava, P. K., D. Han, M. A. R. Ramirez, and T. Islam, 2013: Comparative assessment of evapotranspiration derived from NCEP and ECMWF global datasets through Weather Research and Forecasting model. Atmos. Sci. Lett. , 14, 118 125. Srivastava, P. K., D. Han, M. A. R. Ramirez, and T. Islam, in press: Sensitivity and uncertainty analysis of Me soscale model downscaled hydro meteorological variables for discharge prediction. Hydrol. Process. , doi: 10.1002/hyp.9946. Silva, D., F. J. Meza, and E. Varas, 2010: Estimating reference evapotranspiration (ETo) using numerical weather forecast data in c entral Chile. J. Hydrol., 382, 64 71. Sobash, R. A., J. S. Kain, D. R. Bright, A. R. Dean, M. C. Coniglio, and S. J. Weiss, 2011: Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection allowi ng model forecasts. Wea. Forecasting, 26, 714 728. Sperna Weiland, F. C., Tisseuil, C., DÃ¼rr, H. H., Vrac, M., and van Beek, L. P. H., 2012: Selecting the optimal method to calculate daily global reference potential evaporation from CFSR reanalysis data fo r application in a hydrological model study, Hydrol. Earth Syst. Sci., 16, 983 1000. Sun, H., and D. J. Furbish, 1997: Annual precipitation and river discharges in Florida in response to El NiÃ±o and La NiÃ±a sea surface temperature anomalies. J. Hydrol., 1 99, 74 87. Sun, G., McNulty, S.G., Amatya, D.M., Skaggs, R.W., Swift, L.W., Jr., Shepard, J.P., Riekerk, H., 2002. A comparison of the watershed hydrology of coastal forested wetlands and the mountainous uplands in the Southern US. J. Hydrol. 263(1), 92 1 04. Thepadia, M., and C.J. Martinez, 2012. Regional calibration of solar radiation and reference evapotranspiration estimates with minimal data in Florida. J. Irrig. Drain. Eng., 138: 111 119. Tian, D., and C. J. Martinez (2012a), Comparison of two analog based downscaling methods for regional reference evapotranspiration forecasts, Journal of Hydrology, 475(0), 350 364. PAGE 244 244 Tian, D., and C. J. Martinez (2012b), Forecasting Reference Evapotranspiration Using Retrospective Forecast Analogs in the Southeastern Un ited States, Journal of Hydrometeorology, 13(6), 1874 1892. Tian, D., and C. J. Martinez (2014), The GEFS based daily reference evapotranspiration (ETo) forecast and its implication for water management in the southeawstern United States. Journal of Hydrom eteorology, doi: http://dx.doi.org/10.1175/JHM D 13 0119.1, in press. Tian, D., C. J. Martinez, and W. D. Graham (2014), Seasonal prediction of regional reference evapotranspiration (ETo) based on Climate Forecast System version 2 (CFSv2), Journal of Hydro meteorology, doi: http://dx.doi.org/10.1175/JHM D 13 087.1, in press. Timbal, B., and B. McAvaney, 2001: An analogue based method to downscale surface air temperature: application for Australia. Clim. Dynam., 17, 947 963. Timbal, B., Dufour, A., McAvaney, B., 2003. An estimate of future climate change for western France using a statistical downscaling technique. Clim. Dynam. 20(7), 807 823. Tippett, M. K., and A. G. Barnston, 2008: Skill of Multimodel ENSO Probability Forecasts. Mon. Weather Rev., 136, 3933 3946. Tippett, M. K., A. H. Sobel, and S. J. Camargo, 2012: Association of US tornado occurrence with monthly environmental parameters. Geophys. Res. Lett., 39, L02801. Tiwari, M. K., and J. Adamowski (2013), Urban water demand forecasting and uncertainty assessment using ensemble wavelet bootstrap neural network models, Water Resources Research, 49(10), 6486 6507. Todorovic, M., B. Karic, et al. (2013). "Reference evapotranspiration estimate with limited weather data across a range of Mediterranean climat es." J. Hydrol., 481: 166 176. Troccoli, A., 2008: Seasonal Climate: Forecasting and Managing Risk. Vol. 82, Springer Verlag. Troccoli, A., 2010: Seasonal climate forecasting. Meteorol. App., 17, 251 268. rigation, evapotranspiration potentielle, formule climatique simplifice et mise a jour. Ann. Agron., 12, 13 49. Towler, E., Roberts, M., Rajagopalan, B., and Sojda, R. S. (2013). Incorporating probabilistic seasonal climate forecasts into river management using a risk Thornthwaite, C. W., 1948: An approach toward a rational classification of climate. Geogr. Rev., 38, 55 94. PAGE 245 245 Trajkovic, S., B. Todorovic, and M. Stankovic, 2003: Forecasting of Reference Evapotranspiration by Artificial Neural Networks. J. Irr ig. Drain. Eng., 129, 454 457. van den Dool, H. M., 1994: Searching for analogues, how long must we wait? Tellus A, 46, 314 324. Vicente Serrano, S. M., S. Begueria, and J. I. Lopez Moreno, 2010: A multiscalar drought index sensitive to global warming: The Standardized Precipitation Evapotranspiration Index. J. Climate, 23, 1696 1718. Vivoni, E.R., H. A. Moreno, G. Mascaro, J. C. Ridriguez, C. J. Watts, J. Garatuza Payan, R.L. Scott, 2008: Observed relation between evapotranspiration and soil moisture in t he North American monsoon season. Gephys. Res. Lett., 35, L22403, doi:10.1029/2008GL036001. Voisin, N., Pappenberger, F., Lettenmaier, D. P., Buizza, R., and Schaake, J. C. (2011). Application of a medium range global hydrologic probabilistic forecast sche me to the Ohio River basin. Weather and Forecasting, 26(4), 425 446. Wang, Y. M., S. Traore, T. Kerh, and J. M. Leu, 2011: Modelling reference evapotranspiration using feed forward backpropagation algorithm in arid regions of Africa. Irrig. Drain., 60, 404 417. Wang, H., in press. Evaluation of monthly precipitation forecasting skill of the National Multi model Ensemble in the summer season. Hydrol. Process., doi: 10.1002/hyp.9957 Werner, K., D. Brandon, M. Clark, and S. Gangopadhyay, 2005: Incorporating me dium range numerical weather model output into the ensemble streamflow prediction system of the National Weather Service. J. Hydrometeor., 6, 101 114. Weisheimer, A., and Coauthors, 2009: ENSEMBLES: A new multi model ensemble for seasonal to annual predict ions Skill and progress beyond DEMETER in forecasting tropical Pacific SSTs. J. Geophys. Res., 36, L21711, DOI: 10.1029/2009GL040896. Wetterhall, F., Halldin, S., Xu, C., 2005. Statistical precipitation downscaling in central Sweden with the analogue metho d. J. Hydrol. 306(1), 174 190. Whitaker, J. S., X. Wei, and F. Vitart, 2006: Improving week 2 forecasts with multimodel reforecast ensembles. Mon. Wea. Rev., 134, 2279 2284. Wilby, R., S. Charles, E. Zorita, B. Timbal, P. Whetton, and L. Mearns, 2004: Gui delines for use of climate scenarios developed from statistical downscaling methods. Intergovernmental Panel on Climate Change, 27 pp. [Available online at http://www.ipccdata.org/guidelines/.] PAGE 246 246 Wilks, D. S., 2006: Statistical methods in the atmospheric sc iences. 2nd ed., Academic Press, 467 pp. Wilks, D.S., 2011. Statistical methods in the atmospheric sciences, 3rd edition. Academic press, Amsterdam, Netherland, pp. 476. Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble MOS methods using GFS ref orecasts. Mon. Wea. Rev., 135, 2379 2390. Woli, P., J. W. Jones, and K. T. Ingram. 2013: Assessing the Agricultural Reference Index (ARID) using uncertainty and sensitivity analyses. Agron. J., 105, 150 160. Woli, P., J. W. Jones, K. T. Ingram, and C. W. F raisse. 2012: An agricultural reference index for drought. Agron. J., 104, 287 300. Wood, A.W., Maurer, E.P., Kumar, A., Lettenmaier, D.P., 2002. Long range experimental hydrologic forecasting for the eastern United States. J. Geophys. Res, 107(D20): 4429. Wood, A. W., L. R. Leung, V. Sridhar, and D. P. Lettenmaier, 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. Climatic Change, 62, 189 216. Wood, A. W., and Lettenmaier, D. P. (2006). A test bed f or new seasonal hydrologic forecasting approaches in the western United States. Bulletin of the American Meteorological Society, 87(12), 1699 1712. Wood, A. W., and J. C. Schaake, 2008: Correcting errors in streamflow forecast ensemble mean and spread. J. Hydrometeor., 9, 132 148. Xia, Y., M. Ek, H. Wei, and J. Ming, 2012a: Comparative analysis of relationships between NLDAS 2 forcings and model outputs. Hydrol. Processes, 26, 467 474. Xia, Y., and Coauthors, 2012b: Continental scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS 2): 1. Intercomparison and application of model products. J. Geophys. Res.: Atmos., 117, D03109, doi: 10.1029/2011JD016048 Xia, Y., and Coauthors, 2012c: Co ntinental scale water and energy flux analysis and validation for North American Land Data Assimilation System project phase 2 (NLDAS 2): 2. Validation of model simulated streamflow. J. Geophys. Res.: Atmos., 117, D03110, doi:10.1029/2011JD016051 Xue, Y., B. Huang, Z. Z. Hu, A. Kumar, C. Wen, D. Behringer, and S. Nadiga, 2011: An assessment of oceanic variability in the NCEP climate forecast system reanalysis. Clim Dyn, 37, 2511 2539. PAGE 247 247 Yates, D., S. Gangopadhyay, B. Rajagopalan, and K. Strzepek, 2003: A tec hnique for generating regional climate scenarios using a nearest neighbor algorithm. Water Resour. Res., 39, 1199, doi:10.1029/2002WR001769. Yun, W. T., L. Stefanova, and T. N. Krishnamurti, 2003: Improvement of the Multimodel Superensemble Technique for Seasonal Forecasts. J. Climate, 16, 3834 3840. Yang, J., P. Reichert, K.C. Abbaspour, J. Xia, H. Yang, 2008: Comparing uncertainty analysis techniques for a SWAT application to the Chaohe Basin in China, J. Hydrol., 358, 1 23. Yoder, R. E., L. O. Odhiambo, and W. C. Wright, 2005: Evaluation of methods for estimating daily reference crop evapotranspiration at a site in the humid Southeast United States. Appl. Eng. Agric., 21, 197 202. Yoon, J. H., K. Mo, and E. F. Wood, 2011: Dynamic Model Based Seasonal Pre diction of Meteorological Drought over the Contiguous United States. J. Hydrometeor., 13, 463 482. Yoon, J. H., K. Mo, and E. F. Wood, 2012: Dynamic Model Based Seasonal Prediction of Meteorological Drought over the Contiguous United States. J. Hydrometeor ., 13, 463 482. Yuan, X., E. F. Wood, L. Luo, and M. Pan, 2011: A first look at Climate Forecast System version 2 (CFSv2) for hydrological seasonal prediction. J. Geophys. Res., 38, L13402, DOI: 10.1029/2011GL047792. Yuan, X., and E. F. Wood, 2012a: Downsc aling precipitation or bias correcting streamflow? Some implications for coupled general circulation model (CGCM) based ensemble seasonal hydrologic forecast. Water Resour. Res., 48, W12519, doi:10.1029/2012WR012256 Yuan, X., and E. F. Wood, 2012b: On the clustering of climate models in ensemble seasonal forecasting. Geophys. Res. Lett., 39, L18701, doi:10.1029/2012GL052735. Yuan, X., E. F. Wood, L. Luo, and M. Pan, 2011: A first look at Climate Forecast System version 2 (CFSv2) for hydrological seasonal prediction. Geophys. Res. Lett., 38, L13402, doi:10.1029/2011GL047792. Yuan, X., E. F. Wood, J. K. Roundy, and M. Pan, in press: CFSv2 based seasonal hydroclimatic forecasts over conterminous United States. J. Climate. doi:10.1175/JCLI D 12 00683.1. Zhang, H., Casey, T., 2000. Verification of categorical probability forecasts. Wea. Forecasting , 15, 80 89. PAGE 248 248 Zhu, C., and D.P. Lettenmaier, 2007. Long term climate and derived hydrology and energy flux data for Mexico: 1925 2004. J. Climate, 20, 1936 1946. Zorit a, E., Hughes, J.P., Lettemaier, D.P., Von Storch, H., 1995. Stochastic characterization of regional circulation patterns for climate model diagnosis and estimation of local precipitation. J. Climate 8(5), 1023 1042. Zorita, E., and H. von Storch, 1999: Th e analog method as a simple statistical downscaling technique: Comparison with more complicated methods. J. Climate, 12, 2474 2489. PAGE 249 249 BIOGRAPHICAL SKETCH Di Tian ( ) was originally from Xi an , the ancient capital of China . H e received his Bachelor of Engineering degree in Land Resources Management from China University of Geosciences, Beijing in 2005 . After working in industry for a short time, in 2007, he entered graduate school also at China University of Geoscience s, Beijing and received his Master of Management degree in Land Resources Management in 2010. In August 20 10 , he came to the United States and started working towards a Doctor of Philosophy degree in the Department of Agricultural and Biological Engineering at the University of Florida, with a concentration in Hydrologic Sciences. In the summer of 2014, he received his Doctor of Philosophy degree at the University of Florida . |