Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00003661/00001
## Material Information- Title:
- Optimal mating designs and optimal techniques for analysis of quantitative traits in forest genetics
- Creator:
- Huber, Dudley Arvle, 1948-
- Publication Date:
- 1993
- Language:
- English
- Physical Description:
- ix, 151 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Architectural design ( jstor )
Covariance ( jstor ) Design efficiency ( jstor ) Estimation bias ( jstor ) Estimation methods ( jstor ) Linear models ( jstor ) Matrices ( jstor ) Maximum likelihood estimations ( jstor ) Random variables ( jstor ) Statistical discrepancies ( jstor ) Dissertations, Academic -- Forest Resources and Conservation -- UF Forest Resources and Conservation thesis Ph. D - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1993.
- Bibliography:
- Includes bibliographical references (leaves 145-150).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Dudley Arvle Huber.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 030180493 ( ALEPH )
30335713 ( OCLC ) AJZ0671 ( NOTIS )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

OPTIMAL MATING DESIGNS AND OPTIMAL TECHNIQUES FOR ANALYSIS OF QUANTITATIVE TRAITS IN FOREST GENETICS By DUDLEY ARVLE HUBER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1993 ACKNOWLEDGEMENTS I express my gratitude to Drs. T. L. White, G. R. Hodge, R. C. Littell, M. A. DeLorenzo and D. L. Rockwood for their time and effort in the pursuit of this work. Their guidance and wisdom proved invaluable to the completion of this project. I further acknowledge Dr. Bruce Bongarten for his encouragement to continue my academic career. I am grateful to Dr. T. L. White and the School of Forest Resources and Conservation at the University of Florida for funding this work. I extend special thanks to George Bryan and Dr. M. A. DeLorenzo of the Dairy Science Department and Greg Powell of the School of Forest Resources and Conservation for the use of computing facilities, programming help and aid in running the simulations required. Most importantly, I thank my family, Nancy, John and Heather, for their understanding and encouragement in this endeavor. TABLE OF CONTENTS ACKNOWLEDGEMENTS ........................................ ii LIST OF TABLES ................................. ........... vi LIST OF FIGURES ............................................. vii ABSTRACT ................................................ viii CHAPTER 1 INTRODUCTION .................... 1 CHAPTER 2 THE EFFICIENCY OF HALF-SIB, HALF-DIALLEL AND CIRCULAR MATING DESIGNS IN THE ESTIMATION OF GENETIC PARAMETERS WITH VARIABLE NUMBERS OF PARENTS AND LOCATIONS ................ 4 Introduction ............................................... 4 M ethods .............................................. 6 Assumptions Concerning Block Size ...................... 6 The Use of Efficiency (i) ............................. 7 General Methodology .................................. 8 Levels of Genetic Determination .......................... 10 Covariance Matrix for Variance Components ............... 12 Covariance Matrix for Linear Combinations of Variance Components and Variance of a Ratio ............... .......... 13 Comparison Among Estimates of Variances of Ratios ............ .14 Results .............. ... ........ ... ........... ...... 17 H eritability ........................................ 17 Type B Correlation .................................. 18 Dominance to Additive Variance Ratio ...................... 21 Discussion ................... ........... ............... 22 Comparison of Mating Designs ........................... 22 A General Approach to the Estimation Problem ................ 23 Use of the Variance of a Ratio Approximation ................. 25 Conclusions ............................................. 26 CHAPTER 3 ORDINARY LEAST SQUARES ESTIMATION OF GENERAL AND SPECIFIC COMBINING ABILITIES FROM HALF-DIALLEL MATING DESIGNS ........ Introduction . .. .. . .. 28 M ethods ........................ Linear Model ................ Ordinary Least Squares Solutions ... Sum-to-Zero Restrictions ........ Components of the Matrix Equation . Estimation of Fixed Effects ....... Numerical Examples ................ Balanced Data (Plot-mean Basis) .... Missing Plot ................ Missing Cross ............... Several Missing Crosses ......... Discussion ...................... Uniqueness of Estimates ......... Weighting of Plot Means and Cross Me Diallel Mean ................ Variance and Covariance of Plot Means Comparison of Prediction and Estimatio ...................... ...................... ...................... ...................... ................ol.. ...........l......... ...................... ...................... ...................... ...................... ..................... ...eto ol .is ........... ........... ....o. ,ans in Estimating Parameters .. . ................o.. )n Methodologies .. .. .. .. .. . Conclusions ............................................ CHAPTER 4 VARIANCE COMPONENT ESTIMATION TECHNIQUES COMPARED FOR TWO MATING DESIGNS WITH FOREST GENETIC ARCHITECTURE THROUGH COMPUTER SIMULATION ..... Introduction .................................... M ethods ............................ Experimental Approach ............. Experimental Design for Simulated Data . Full-Sib Linear Model .............. Half-sib Linear Model .............. Data Generation and Deletion .... ..... Variance Component Estimation Techniques Comparison Among Estimation Techniques Results and Discussion ................... Variance Components .............. Ratios of Variance Components ........ General Discussion ..................... Observational Unit ................ Negative Estimates ................ Estimation Technique ............... Recommendation ................. i CHAPTER 5 GAREML: A COMPUTER ALGORITHM FOR ESTIMATING VARIANCE COMPONENTS AND PREDICTING GENETIC VALUES ............... 82 Introduction ............................................. 82 Algorithm .............................................. 83 Operating GAREML ...................................... 86 Interpreting GAREML Output ................................ 90 Variance Component Estimates ........................... 90 Predictions of Random Variables .......................... 91 Asymptotic Covariance Matrix of Variance Components ........... 92 Fixed Effect Estimates ............................... 93 Error Covariance Matrices ............................. 93 Example ............... ......................... .... 94 Data .................. ........ .... .............. 94 Analysis ..................................... ... 94 O utput .......................................... 98 Conclusions .............................................. 103 CHAPTER 6 CONCLUSIONS ..................... 104 APPENDIX FORTRAN SOURCE CODE FOR GAREML ............ 107 REFERENCE LIST ........................................... 145 BIOGRAPHICAL SKETCH ...................................... 151 LIST OF TABLES Table 2-1. Parametric variance components .. ..................... .11 Table 3-1. Data set for numerical examples .......................... 43 Table 3-2. Numerical results for examples ............................. 44 Table 4-1. Abbreviation for and description of variance component estimation m ethods .................. ............................ 60 Table 4-2. Sets of true variance components ............................ .61 Table 4-3. Sampling variance for the estimates .......................... 72 Table 4-4. Bias for the estimates ................................... 74 Table 4-5. Probability of nearness .................................. 75 Table 5-1. Data for example ...................................... 95 LIST OF FIGURES Figure 2-1. Efficiency () for h .................................... 16 Figure 2-2. Efficiency () for rB ................. ................. 19 Figure 2-3. Efficiency () fory .................................... 20 Figure 3-1. The overparameterized linear model ......................... 33 Figure 3-2. The linear model for a four-parent half-diallel .................. 33 Figure 3-3. Intermediate result in SCA submatrix generation .................. 39 Figure 3-4. Weights on overall cross means ............................ 49 Figure 4-1. Distribution of 1000 MIVQUE estimates ....................... 77 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OPTIMAL MATING DESIGNS AND OPTIMAL TECHNIQUES FOR ANALYSIS OF QUANTITATIVE TRAITS IN FOREST GENETICS By Dudley Arvie Huber May 1993 Chairperson: Timothy L. White Major Department: School of Forest Resources and Conservation First, the asymptotic covariance matrix of the variance component estimates is used to compare three common mating designs for efficiency (maximizing the variance reducing property of each observation) for genetic parameters across numbers of parents and locations and varying genetic architectures. It is determined that the circular mating design is always superior in efficiency to the half-diallel design. For single-tree heritability, the half-sib design is most efficient. For estimating type B correlation, maximum efficiency is achieved by either the half- sib or circular mating design and that change in rank for efficiency is determined by the underlying genetic architecture. Another intent of this work is comparing analysis methodologies for determining parental worth. The first of these investigations is ordinary least squares assumptions in the estimation of parental worth for the half-diallel mating design with balanced and unbalanced data. The conclusion from comparison of ordinary least squares to alternative analysis methodologies is that best linear unbiased prediction and best linear prediction are more appropriate to the problem of determining parental worth. The next analysis investigation contrasts variance component estimation techniques across levels of imbalance for the half-diallel and half-sib mating designs for the estimation of genetic parameters with plot means and individuals used as the unit of observation. The criteria for discrimination are variance of the estimates, mean square error, bias and probability of nearness. For all estimation techniques individuals as the unit of observation produced estimates with the most desirable properties. Of the estimation techniques examined, restricted maximum likelihood is the most robust to imbalance. The computer program used to produce restricted maximum likelihood estimates of variance components was modified to form a user friendly analysis package. Both the algorithm and the outputs of the program are documented. Outputs available from the program include variance component estimates, generalized least squares estimates of fixed effects, asymptotic covariance matrix for variance components, best linear unbiased predictions for general and specific combining abilities and the error covariance matrix for predictions and estimates. CHAPTER 1 INTRODUCTION Analysis of quantitative traits in forest genetic experiments has traditionally been approached as a two-part problem. Parental worth would be estimated as fixed effects and later considered as random effects for the determination of genetic architecture. While traditional, this approach is most probably sub-optimal given the proliferation of alternative analysis approaches with enhanced theoretical properties (White and Hodge 1989). In this dissertation emphasis is placed on the half-diallel mating design because of its omnipresence and the uniqueness of the analysis problem this mating design presents. The half- diallel mating design has been and continues to be used in plant sciences (Sprague and Tatum 1942, Gilbert 1958, Matzinger et al. 1959, Burley et al. 1966, Squillace 1973, Weir and Zobel 1975, Wilcox et al. 1975, Snyder and Namkoong 1978, Hallauer and Miranda 1981, Singh and Singh 1984, Greenwood et al. 1986, and Weir and Goddard 1986). The unique feature of the half-diallel mating system which hinders analysis with many statistical packages is that a single observation contains two levels of the same main effect. Optimality of mating design for the estimation of commonly needed genetic parameters (single-tree heritability, type B correlation and dominance to additive variance ratio) is examined utilizing the asymptotic covariance of the variance components (Kendall and Stuart 1963, Giesbrecht 1983 and McCutchan et al. 1989). Since genetic field experiments are composed of both a mating design and a field design, the central consideration in this investigation is which mating design with what field design (how many parents and across what number of locations 2 within a randomized complete block design) is most efficient. The criterion for discernment among designs is the efficiency of the individual observation in reducing the variance of the estimate (Pederson 1972). This question is considered under a range of genetic architectures which spans that reported for coniferous growth traits (Campbell 1972, Stonecypher et al. 1973, Snyder and Namkoong 1978, Foster 1986, Foster and Bridgwater 1986, Hodge and White [in press]). The investigation into optimal analysis proceeds by considering the ordinary least squares (OLS) treatment of estimating parental worth for the half-diallel mating design. OLS assumptions are examined in detail through the use of matrix algebra for both balanced and unbalanced data. The use of matrix algebra illustrates both the uniqueness of the problem and the interpretation of the OLS assumptions. Comparisons among OLS, generalized least squares (GLS), best linear unbiased prediction (BLUP) and best linear prediction (BLP) are made on a theoretical basis. Although consideration of field and mating design of future experiments is essential, the problem of optimal analysis of current data remains. In response to this need, simulated data with differing levels of imbalance, genetic architecture and mating design is utilized as a basis for discriminating among variance component estimation techniques in the determination of genetic architecture. The levels of imbalance simulated represent those commonly seen in forest genetic data as less than 100% survival, missing crosses for full-sib mating designs and only subsets of parents in common across location for half-sib mating designs. The two mating designs are half-sib and half-diallel with a subset of the previously used genetic architectures. The field design is a randomized complete block with fifteen families per block and six trees per family per block. The four criteria used to discriminate among variance component estimation techniques are probability of nearness (Pittman 1937), bias, variance of the estimates and mean square error (Hogg and Craig 1978). 3 The techniques compared for variance component estimation are minimum variance quadratic unbiased estimation (Rao 1971b), minimum norm quadratic unbiased estimation (Rao 1971a), restricted maximum likelihood (Patterson and Thompson 1971), maximum likelihood (Hartley and Rao 1967) and Henderson's method 3 (Henderson 1953). These techniques are compared using the individual and plot means as the unit of observation. Further, three alternatives are explored for dealing with negative variance component estimates which are accept and live with negative estimates, set negative estimates to zero, and re-solve the system setting negative components to zero. The algorithm used for the method which provided estimates with optimal properties across experimental levels was converted to a user friendly program. This program providing restricted maximum likelihood variance component estimates uses Giesbrecht's algorithm (1983). Documentation of the algorithm and explanation of the program's output are provided along with the Fortran source code (appendix). CHAPTER 2 THE EFFICIENCY OF HALF-SIB, HALF-DIALLEL AND CIRCULAR MATING DESIGNS IN THE ESTIMATION OF GENETIC PARAMETERS WITH VARIABLE NUMBERS OF PARENTS AND LOCATIONS Introduction In forest tree improvement, genetic tests are established for four primary purposes: 1) ranking parents, 2) selecting families or individuals, 3) estimating genetic parameters, and 4) demonstrating genetic gain (Zobel and Talbert 1984). While the four purposes are not mutually exclusive, a test design optimal for one purpose is most probably not optimal for all (Burdon and Shelbourne 1971, White 1987). A breeder then must prioritize the purposes for which a given test is established and choose a design based on these priorities. Within a genetic test design there are two primary components: mating design and field design. There have been several investigations of optimal designs for these two components either separately or simultaneously under various criteria. These criteria have included the efficient and/or precise estimation of heritability (Pederson 1972, Namkoong and Roberds 1974, Pepper and Namkoong 1978, McCutchan et al. 1985, McCutchan et al. 1989), precise estimation of variance components (Braaten 1965, Pepper 1983), and efficient selection of progeny (van Buijtenen 1972, White and Hodge 1987, van Buijtenen and Burdon 1990, Loo-Dinkins et al. 1990). Incorporated within this body of research has been a wide range of genetic and environmental variance parameters and field and mating designs. However, the models in previous investigations have been primarily constrained to consideration of testing in a single 5 environment with a corresponding limited number of factors in the model, i.e., genotype by environment interaction and/or dominance variance are usually not considered. This chapter focuses on optimal mating designs through consideration of three common mating designs (half- sib, half-diallel, and circular with four crosses per parent) for estimation of genetic parameters with a field design extending across multiple locations. In this chapter the approach to the optimal design problem is to maintain the basic field design within locations as randomized complete block with four blocks and a six-tree row-plot representing each genetic entry within a block (noted as one of the most common field designs by Loo-Dinkins et al. 1990). The number of families in a block, number of locations, mating design and number of parents within a mating design are allowed to change. Since optimality, besides being a function of the field and mating designs, is also a function of the underlying genetic parameters, all designs are examined across a range of levels of genetic determination (as varying levels of heritability, genotype by environment interaction and dominance) reflecting estimates for many economically important traits in conifers (Campbell 1972, Stonecypher et al. 1973, Snyder and Namkoong 1978, Foster 1986, Foster and Bridgwater 1986, Hodge and White (in press)). For each design and level of genetic determination, a Minimum Variance Quadratic Unbiased Estimation (MIVQUE) technique and an approximation of the variance of a ratio (Kendall and Stuart 1963, Giesbrecht 1983 and McCutchan et al. 1989) are applied to estimate the variance of estimates of heritability, additive to additive plus additive by environment variance ratio, and dominance to additive variance ratio. These techniques use the true covariance matrix of the variance component estimates (utilizing only the known parameters and the test design and precluding the need for simulated or real data) and a Taylor series approximation of the variance of a ratio. The relative efficiencies of different test designs are compared on the basis of i (the 6 efficiency of an individual observation in reducing the variance of an estimate, Pederson 1972). Thus this research explores which mating design, number of parents and number of locations is most efficient per unit of observation in estimating heritability, additive to additive plus additive by environment variance ratio, and dominance to additive variance ratio for several variance structures representative of coniferous growth traits. Methods Assumptions Concerning Block Size As opposed to McCutchan et al. (1985), where block sizes were held constant and including more families resulted in fewer observations per family per block, in this chapter the blocks are allowed to expand to accommodate increasing numbers of families. This expansion is allowed without increasing either the variance among block or the variance within blocks. For the three mating designs which are discussed, the addition of one parent to the half-sib design increases block size by 6 trees (plot for a half-sib family), the addition of a parent to the circular design increases block size by 12 trees (two plots for full-sib families), and the addition of a parent to the half-diallel design increases block size by 6p (where p is the number of parents before the addition or there are p new full-sib families per block). Therefore, block size is determined by the mating design and the number of parents. All comparisons among mating designs and numbers of locations are for equal block sizes, i.e., equal numbers of observations per location. This results in comparing mating designs with unequal numbers of parents in the designs and comparing two location experiments against five location experiments with equal numbers of observations per location but unequal total numbers of observations. The Use of Efficiency (i) Efficiency is the tool by which comparisons are made and is the efficacy of the individual observations in an experiment in lowering the variance of parameter estimates. An increasing efficiency indicates that for increasing experimental size the additional observations have enhanced the variance reducing property of all observations. Efficiency is calculated as i = 1 / N(Var(x)) where N is the total number of observations and Var(x) is the variance of a generic parameter estimate. Increasing N always results in a reduction of the variance of estimation, all other things being equal. Yet the change in efficiency with increasing N is dependent on whether the reduction in variance is adequate to offset the increase in N which caused the reduction. Comparing a previous efficiency with that obtained by increasing N, i.e., increasing the number of parents in a mating design or increasing the number of locations in which an experiment is planted: since i, = 1 / N(Var(x)), 2-1 then N(Var(x)) = 1 / i, and (N + AN)(Var(x) + AVar(x)) = 1 / i; if i, (the old efficiency) = i. (the new efficiency), then AVar(x) / Var(x) = AN / (N + AN); if i, < in, then AVar(x) / Var(x) < AN / (N + AN); and if i, > i,, then AVar(x) / Var(x) > AN / (N + AN); where A denotes the change in magnitude. Viewing equation 2-1, if N is held constant and one design has a higher efficiency (i), the design must also produce parameter estimates which have a lower variance. General Methodology Sets of true variance components are calculated in accordance with a stated level of genetic control and the design matrix is generated in correspondence with the field and mating design. Knowing the design matrix and a set of true variance components, a true covariance covariancee) matrix of variance component estimates is generated. Once the covariance matrix of the variance components is in hand, the variance of and covariances between any linear combinations of the variance component estimates are calculated. From the covariance matrix for linear combinations, the variance of genetic ratios as ratios of linear combinations of variance components are approximated by a Taylor series expansion. Since definition of a set of variance components and formation of the design matrix are dependent on the linear model employed, discussion of specific methodology begins with linear models. Linear Models Half-diallel and circular designs The scalar linear model employed for half-diallel and circular mating designs is yijm = IA + ti + bij + g + g, + Sw + tgk + tg, + tsiJ + pij + wijv 2-2 where yix. is the mL observation of the kli cross in the jth block of the ih test; jL is the population mean; ti is the random variable test environment ~ NID(0,o,); by is the random variable block NID(O,o2); gk is the random variable female general combining ability (gca) ~ NID(0,^g); g, is the random variable male gca ~ NID(0,og,); sk is the random variable specific combining ability (sca) ~ NID(0,ol,); tg, is the random variable test by female gca interaction ~ NID(0,olg); tgn is the random variable test by male gca interaction NID(O,a2; ts, is the random variable test by sea interaction NID(O,a ); p, is the random variable plot ~ NID(O,a2,); wij is the random variable within plot ~ NID(0,a2,); and there is no covariance between random variables in the model. This linear model in matrix notation is (dimensions below model component): y = Al + Zer + Ze, + ZGeG + Zes + + G + se + Zpe, +ew 2-3 nxl nxI nxt txl nxb bxl nxg gxl nxs sxl nxtg tgxl nxts tsxl nxp pxl nxl where y is the observation vector; Z7 is the portion of the design matrix for the il random variable; e, is the vector of unobservable random effects for the it random variable; 1 is a vector of l's; and n, t, b, g, s, tg, ts, and p are the number of observations, tests, blocks, gca's, sca's, test by gca interactions, test by sea interactions and plots, respectively. Utilizing customary assumptions in half-diallel mating designs (Method 4, Griffing 1956), the variance of an individual observation is Var(yI,~j = oa + o2, + 2oW + a2e + 2oa + ou, + o2p + o2,; 2-4 and in matrix notation the covariance matrix for the observations is Var(y) = ZrZ;t + ZBZo2b + ZGZoga. + ZsZo + ZcZeoG + Z oZr + Z22p, + I.o2, 2-5 where indicates the transpose operator, all matrices of the form Z.Z1' are nxn, and I, is an nxn identity matrix. Half-sib design The scalar linear model for half-sib mating designs is yjkm = L + t- + bi + gk + tgik + p*,jk + W*,2 2-6 where y,, is the mh observation of the kL half-sib family in the jL block of the ih test; I, t., bj, gk, and tga, retain the definition in Eq.2-2; p*4k is the random variable plot containing different genotype by environment components than Eq.2-2 NID(0,2,.); w*'jB is the random variable within plot containing different levels of genotypic and genotype by environment components than Eq.2-2 ~ NID(O,o2.); and there is no covariance between random variables in the model. The matrix notation model is y = ll + Zre + Z4e, + Zce, + ZrGe + Ze, + ew 2-7 nxl nxl nxt txl nxb bxl nxg gxl nxtg tgxl nxp pxl nxl The variance of an individual observation in half-sib designs is Var(yij = o, + o2b + o2 + o, + o2,. + o.; 2-8 and Var(y) = Z.r 2, + ZBZoa2b + ZGZGoa9cr + ZrGZtG02g + ZpZp'o,. + I2, 2-9 Levels of Genetic Determination Eight levels of genetic determination are derived from a factorial combination of two levels of each of three genetic ratios: heritability (h2 = 4o2gq / (202p + o2, + 2o2, + o2, + a2p + o2,) for full-sib models and h2 = 4o2~ / (o2gs + a2, + o2, + o2,) for half-sib models); additive to additive plus additive by environment variance ratio (r. = a2, I (oa2 + o,), Type B correlation of Burdon 1977); and dominance to additive variance ratio (7 = o2 / agJ. The levels employed for each ratio are h2 = 0.1 and 0.25; rE = 0.5 and 0.8; and 7 = 0.25 and 1.0. To generate sets of true variance components (Table 2-1) for half-diallel and circular mating designs from the factorial combinations of genetic parameters, the denominator of h2 is set to 10 (arbitrarily, but without loss of generality) which, given the level of h2, leads to the 11 solution for oa Solving for o, and knowing y yields the value for o2,. Knowing the level of rB and a2, allows the equation for rB to be solved for o2l. An assumption that the ratio of y Table 2-1. Parametric variance components for the factorial combination of heritability (.1 and .25), Type B Correlation (.5 and .8) and dominance to additive variance ratio (.25 and 1.0) for full and half-sib designs. o2, and o2b were maintained at 1.0 and .5, respectively for all levels and designs. Design Level h2 r, y o., oa2 o o. o, t2. Full 1 .1 .8 1.0 .2500 .2500 .0625 .0625 .6344 8.4281 2 .1 .5 1.0 .2500 .2500 .2500 .2500 .5950 7.9050 3 .1 .8 .25 .2500 .0625 .0625 .0156 .6508 8.6461 4 .1 .5 .25 .2500 .0625 .2500 .0625 .6212 8.2538 5 .25 .8 1.0 .6250 .6250 .1562 .1562 .5359 7.1203 6 .25 .5 1.0 .6250 .6250 .6250 .6250 .4376 5.8125 7 .25 .8 .25 .6250 .1562 .1562 .0391 .5769 7.6649 8 .25 .5 .25 .6250 .1562 .6250 .1562 .5031 6.6844 Half I and .1 .8 .2500 .0625 .4844 9.2031 3 2 and .1 .5 .2500 .2500 .4750 9.0250 4 5 and .25 .8 .6250 .1562 .4609 8.7579 7 6 and .25 .5 .6250 .6250 .4375 8.3125 8 equals the ratio of a2, / o, permits a solution for a2,. A further assumption that 2, is seven percent of a02 + o2, yields a solution for both a2p and a2,. Finally, a2 and o2, are set to 1.0 and 0.5, respectively, for all treatment levels. In order to facilitate comparisons of half-sib mating designs with full-sib mating designs, og,. and o2, retain the same values for given levels of h2 and r. and the denominator of heritability again is set to 10. To solve for o2,. and o2,, the assumption that o21. is five percent of o2. + o2,. permits a solution for a2p. and o2, and maintains ap. approximately equal to and no larger than o2 of the full-sib mating designs (Namkoong et al. 1966) for the same levels of 12 h2 and rB. Under the previous definitions all consideration of differences in y changing the magnitudes of o2,. and o2, is disallowed. Thus, there are only four parameter sets for the half- sib mating design (Table 2-1). Covariance Matrix for Variance Components The base algorithm to produce the covariance matrix for variance component estimates is from Giesbrecht (1983) and was rewritten in Fortran for ease of handling the study data. In using this algorithm, we assume that all random variables are independent and normally distributed and that the true variances of the random variables are known. Under these assumptions, Minimum Norm Quadratic Unbiased Estimation (MINQUE, Rao 1972) using the true variance components as priors (the starting point for the algorithm) becomes MIVQUE (Rao 1971b), which requires normality and the true variance components as priors (Searle 1987), and for a given design the covariance matrix of the variance component estimates becomes fixed. A sketch of the steps from the MIVQUE equation (Eq.2-10, Giesbrecht 1983, Searle 1987) to the true covariance matrix for variance components estimates is {tr(QVQVj)})2 = {y'QV,Qy) 2-10 rxr rxl rxl then ( = {tr(QVQVj)}-'{y'QVQy) and Var(2) = {tr(QVQVj)}-'Var({y'QVQy}){tr(QVQVj)}- rxr rxr rxr rxr where {aj is a matrix whose elements are aj where in the full-sib designs i= 1 to 8 and j= 1 to 8, i.e., there is a row and column for every random variable in the linear model; 13 tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V' V-'X(X'V'X)-X'V- for V = the covariance matrix of y and X as the design matrix for fixed effects; V, = ZZZ', where i = the random variables test, block, etc.; W is the vector of variance component estimates; and r is the number of random variables in the model. The variance of a quadratic form (where A is any non-negative definite matrix of proper dimension) under normality is Var(y'Ay) = 2tr(AVAV) + jI'Ajz (Searle 1987); however, MINQUE derivation (Rao 1971) requires that AX = 0 which in our case is Al =0 and is equivalent to /W'Al11 = 0, thus Var({y'QViQy}) = 2{tr(QVQVj)}; 2-11 and using Eq.2-10 and Eq.2-11 Var(2) = {tr(QV.QVj)}-'2{tr(QVQVj)}{tr(QVQVQV)}-1 and therefore Var(2) = V, = 2{tr(QVQVJ)}-'. 2-12 From Eq.2-12 it is seen that the MIVQUE covariance matrix of the variance component estimates is dependent only on the design matrix (the result of the field design and mating design) and the true variance components; a data vector is not needed. Covariance Matrix for Linear Combinations of Variance Components and Variance of a Ratio Once the covariance matrix for the variance component estimates (Eq.2-12) is created, then the covariance matrix of linear combinations of these variance components is formed as V, = L'VL 2-13 2x2 2xrrxrrx2 14 where L specifies the linear combinations of the variance components which are the combinations of variance components in the denominator and numerator of the genetic ratio being estimated. A Taylor series expansion (first approximation) for the variance of a ratio using the variances of and covariance between numerator and denominator is then applied using the elements of V, to produce the approximate variance of the three ratio estimates as (Kendall and Stuart 1963): Var(ratio) (1/D)2(Vk(1,1)) 2(N/D3)(V,(1,2)) + (N2/D4)(Vk(2,2)) 2-14 where the generic ratio is N/D and N and D are the parametric values; V,(1,1) is the variance of N; V,(1,2) is the covariance between N and D; and Vl(2,2) is the variance of D. Comparison Among Estimates of Variances of Ratios The approximate variances of the three ratio estimates (h2, r., and y) are compared across mating designs with equal (or approximately equal) numbers of observations, across numbers of locations, and across numbers of parents within a mating design all within a level of genetic determination. The standard for comparison is i. Results are presented by the genetic ratio estimated so that direct comparisons may be made among the mating designs for equal numbers of observations within a number of locations for varying levels of genetic control. Number of genetic entries (number of crosses for full-sib designs and number of half-sib families for half-sib designs) is used as a proxy for number of observations since, for all designs, number of observations equals twenty-four times the number of locations times the number of genetic entries. Further, by plotting the two levels of numbers of locations on a single figure, a 15 comparison is made of the utility of replication of a design across increasing numbers of locations. Efficiency plots also permit contrasts of the absolute magnitude of variance of estimation among designs. For a given number of genetic entries and locations, the design with the highest efficiency is the most precise (lowest variance of estimation). Increasing the number of genetic entries or locations always results in greater precision (lower variance of estimation), but is not necessarily as efficient (the reduction in variance was not sufficient to offset the increase in numbers of observations). A primary justification for using the efficiency of a design as a criterion is that a more precise estimate of a genetic ratio is obtained by using the mean of two estimates from replication of the small design as two disconnected experiments as opposed to the estimate from single large design. This is true when 1) the number of observations in the large design (N) equals twice the number of observations in small design (n), 2) the small design is more efficient, and 3) the variances are homogeneous. This is proven below: Since N = n, + n2 and n, = n, then N = 2n,. By definition i = 1 / (N*(Var(Ratio))); and Var(Ratio) = 1 /(i*N). The proposition is (Var,(Ratio) + Var,(Ratio))/4.0 < Varl(Ratio); substitution gives ((1/(n1*i)) + (1/(n,*i,)))/4.0 < (1/(N*i)). Simplification yields (1/(2.0*n,*ij) < (1/(N*i)); and multiplication by N produces /i, < l/i, 2-15 which is strictly true so long as i, > ii where i, is the efficiency of the smaller experiment and i, is the efficiency of the larger experiment. ,i i : i i, a i ie \ 5 - \ \\ .7 S 5 / ; SW 9J a 1 o o 13 *, ; 4N \\ <; ! -- 8 _______________________ : uIi S iU I I l U s a Ib oq r i i i----------------i 8 ? a do 00 ,/ /- ,, I \ Q - S a 8 a d d I--------------------4 ^ ? f 1 ^9 4 H ~ 8 : g v 5 v d d d d d AON30tUi 0 4. -0 to g' . E ~II a a" - 4 '- a) -.g II 4-3 a)'^ U U, kaN3DiIJ Results Heritability Half-sib designs are almost globally superior to the two full-sib designs in precision of heritability estimates (results not shown for variance but may be seen from efficiencies in Figure 2-1). For designs of equal size, half-sib designs excel with the exception of genetic level three (Figure 2-1c, h2 = 0.1, r, = 0.8, and y = 0.25). In genetic level three, the circular design provides the most precise estimate of h2 for two location designs; however, when the design is extended across five locations, the half-sib mating design again provides the most precise estimates. The circular mating design is superior in precision to the half-diallel design across all levels of genetic control and location, even with a relatively large number of crosses per parent (four). Half-sib designs are, in general, (seven genetic control levels out of eight, Figure 2-1) more efficient with the exception of level three across two locations (Figure 2-1c). For the circular and half-sib mating designs considered, increasing the number of genetic entries always improves the efficiency of the design. However, definite optima exist for the half-diallel mating design for number of genetic entries, i.e., crosses which convert to a specific number of parents. These optima are not constant but tend to be six parents or less, lower with increasing h2 or number of locations. The six-parent half-diallel is never far from the half-diallel optima, and increasing the number of parents past the optimum results in decreased efficiency. For half-sib designs with h2 = 0.1, five locations are more efficient than two locations; however, at h2 = 0.25 two locations are most efficient. Further, the number of locations required to efficiently estimate h2 for half-sib designs is determined only by the level of h2 and does not depend on the levels of the other ratios. Although estimates over larger numbers of 18 observations are more precise (five-location estimates are more precise than two-location estimates), the efficiency (increase in precision per unit observation) declines. So that if h2 = 0.25 and estimates of a certain precision are required, disconnected sets of two-location experiments are preferred to five-location experiments. The relative efficiencies of five locations versus two locations is enhanced with decreasing r, (increasing genotype by environment interaction) within a level of h2 (compare Figures 2-la to 2-lb and 2-lc to 2-1d for h2 = 0.1, and 2-le to 2-If and 2-lg to 2-lh for h2 = 0.25). Yet, this enhancement is not sufficient to cause a change in efficiency ranking between the location levels. The full-sib designs differ markedly from this pattern (Figure 2-1) in that, for these parameter levels, it is never more efficient to increase the number of locations from two to five for heritability estimation. As observed with half-sib designs, for full-sib designs the relative efficiency status of five locations improves with decreasing r.. To further contrast mating designs note that the efficiency status of full-sib designs relative to the half-sib design improves with decreasing y and increasing r. (Figures 2-lb versus 2-lc and 2-if versus 2-1g). Type B Correlation As opposed to h2 estimation, no mating design performs at or near the optima for precision of rB estimates across all levels of genetic control (Figure 2-2). However, the circular mating designs produce globally more precise estimates than those of the half-diallel mating design. In general, the utility of full-sib versus half-sib designs is dependent on the level of ra. The lower r, value favors half-sib designs while the higher r, tends to favor full-sib designs (compare Figures 2-2a to 2-2b, 2-2c to 2-2d, 2-2e to 2-2f and 2-2g to 2-2h). Decreasing y and lowering h2 always improves the relative efficiency of full-sib designs to half- sib designs (compare Figures 2-2c and 2-2d to 2-2e and 2-2f). i mi e- ' v i : ; i I i ^ --.- da . I I , 7- R S - a S al * X1 I b D U i / ,d T I I i / - ' i 5 5 5 5 - SI . .. o . I j 4(31 St4 AON131I3O 4- 3 'a U g So I- Bs at o, .8 8 at 8 0 h| oo a .a 00 St ru II ^U 08 4,O 'gcr 'E-. -s gi ba ^ II 0 ~ s g 4- II i '. 1 I "4 73 4; '4 %41 3C bL i 4* o . a XD ;k sg~~I~rrgj 0 "i 1 1 1 1 1 1 Id 1 1 1 1 1 1 1 A3N313J3 (3a) h= .1; re = .5; y"= 1.0 0.0006 0.000 0.0005 0.0001 0.00041 0.000 0.00031 0.000 0.0012 0.001 0.0008 0.0006 0.0004 0.0002 0 10 20 30 40 (3c) h2=.25; r = .8; /= 1.0 ----------- .. AA I I I d I I I GENETIC ENTRIES Circulr 2 location l-------o-------i Cicular 5localrn ------ ---- 5 B 5 5 5 4 5 0.016 0.014 0.012 0.01 0.008 0.006 0.04 0.035 0.03 0.025 0.02 0.015 0.01 n0E0 50 0 10 20 30 GENETIC ENTRIES SHalf-diallel 2 locations A------ HaF-diallel 5locations -----A----- Figure 2-3. Efficiency (i) for y plotted against number of genetic entries for four levels for genetic control for circular, half-diallel, and half-sib mating designs across levels of location where i = 1/(N(Var(y))) and N = the total number of observations. I I I I -a-*-- .EB ...-- Er S. ...... AA I I I I I 0 10 20 30 40 50 (3d) h=.25; re = .8; /= .25 -. .. -O 8- N- - ' .r--. - c-u: Y,^ ...... ..... .. ..- 0 10 20 30 40 5 (3b) h2 = .1; r = .5; -/= .25 21 For estimation of rs, full-sib designs are more efficient than half-sib designs except in the three cases of low r. (0.5) and high y (1.0) for h2 = 0.1 (Figure 2-2b) and low r. for h2 = 0.25 (Figures 2-2f and 2-2h). Within full-sib designs the circular design is globally superior to the half-diallel. As with h2 estimation, half-diallel designs have optimal levels for numbers of parents. The six-parent half-diallel is again close to these optima for all genetic levels and numbers of locations. At low h2 for full-sib designs, planting in two locations is always more efficient than five locations. For half-sib designs at low h2, the relative efficiency of two versus five locations is dependent on the level of r, with lower r. favoring replication across more locations. At h2 = 0.25, half-sib designs are more efficient when replicated across five locations. At the higher h2 value, full-sib design efficiency across locations is dependent on the level of rB. With rB = 0.5 and h2 = 0.25, replication of full-sib designs is for the first time more efficient across five locations than across two locations; however, at the higher rB level two locations is again the preferred number. Dominance to Additive Variance Ratio In comparing the two full-sib designs for relative efficiency in estimating y, the circular design is always approximately equal to or, for most cases, superior to the half-diallel design (Figure 2-3). The relative superiority of the circular design is enhanced by decreasing y and r, (not shown). The half-diallel design again demonstrates optima for number of parents with the six-parent design being near optimal. Within a mating design the use of two locations is always more efficient than the use of five locations. The magnitude of this superiority escalates with increasing rB and h2 (Figures 2-3a and 2-3b versus 2-3c and 2-3d). Discussion Comparison of Mating Designs A prior knowledge of genetic control is required to choose the optimal mating and field design for estimation of h2, rB and 7. Given that such knowledge may not be available, the choices are then based on the most robust mating designs and field designs for the estimation of certain of the genetic ratios. If h2 is the only ratio desired, then the half-sib mating design is best. Estimation of both h2 and rB requires a choice between the half-sib and circular designs. If there is no prior knowledge then the selection of a mating design is dependent on which ratio has the highest priority. For experiments in which h2 received highest weighting, the half-sib design is preferred and in the alternative case the circular design is the better choice. In the last scenario information on all three ratios is desired from the same experiment and in this case the circular design is the better selection since the circular design is almost globally more efficient than the half-diallel design. After choosing a mating design, the next decision is how many locations per experiment are required to optimize efficiency. For the half-sib design the number of locations required to optimize efficiency is dependent on both the ratio being estimated and the level of genetic control. A broad inference is that for h2 estimation a two location experiment is more efficient and for rB a five location experiment has the better efficiency. Estimation of any of the three ratios with a full-sib design is almost globally more efficient in two location experiments. The disparity between the behavior of the half-sib and full-sib designs with respect to the efficiency of location levels can be explained in terms of the genetic connectedness offered by the different designs. Genetic connectedness can be viewed as commonality of parentage among genetic entries. The more entries having a common parent the more connectedness is present. 23 The half-sib design is only connected across locations by the one common parent in a half-sib family in each replication. Full-sib designs are connected across locations in each replication by the full-sib cross plus the number of parents minus two (half-diallel) or three (circular) for each of the two parents in a cross. The connectedness in a full-sib design means each observation is providing information about many other observations. The result of this connectedness is that, in general, fewer observations (number of locations) are required for maximum efficiency. A General Approach to the Estimation Problem The estimation problems may be viewed in a broader context than the specific solutions in this chapter. The technique for comparison of mating designs and numbers of locations across levels of genetic determination may be construed, for the case of h2 estimation, to be the effect of these factors on the variance of o2g, estimates. Viewing the variance approximation formula, the conclusion may be reached that the variance of o2, estimates is the controlling factor in the variance of h2 estimates since the other factors at these heritability levels are multiplied by constants which reduce their impact dramatically. Given this conclusion, the variance of h2 estimates is essentially the (3,3) element in 2{tr(QVQVj)})1 (Eq. 2-11). Further, since the covariances of the other variance component estimates with oe estimates are small, the variance of o2g, estimates is basically determined by the magnitude of the (3,3) element of {tr(QVQVj)} which is tr(QVQV). Thus, the variance of h2 estimates is minimized by maximizing tr(QVgQVg with h2 used as an illustration because this simplification is possible. Considering the impact of changing levels of genetic control, while holding the mating and field designs constant, V, is fixed, the diagonal elements of V are fixed at 11.5 because of our assumptions, and only the off-diagonal elements of V change with genetic control levels. Since Q is a direct function of V', what we observe in Figure 2-1 comparing a design across 24 levels of genetic control are changes in V" brought about by changes in the magnitude of the off- diagonal elements of V (covariances among observations). The effect of positive (the linear model specifies that all off-diagonal elements in V are zero or positive) off-diagonal elements on V' is to reduce the magnitude of the diagonal elements and often also result in negative off- diagonal elements. If one increases the magnitude of the off-diagonal elements in V, then the magnitude of the diagonal elements of V' is reduced and the magnitude of negative off-diagonal elements is increased. Since tr(QVQV) is the sum of the squared elements of the product of a direct function of VY and a matrix of non-negative constants (V), as the diagonal elements of V' are reduced and the off-diagonal elements become more negative, tr(QVQV) must become smaller and the variance of h2 estimates increases. Mating designs may be compared by the same type of reasoning. Within a constant field design changes in mating design produce alterations in V. Of the three designs the half-sib produces a V matrix with the most zero off-diagonal elements, the circular design next, and the half-diallel the fewest number of zero off-diagonal elements. Knowing the effect of off-diagonal elements on the variance of h2 estimates, one could surmise that the variance of estimates is reduced in the order of least to most non-zero off-diagonal elements. This tenant is in basic agreement with the results in Figures 2-1 through 2-3. The effects of rB and y on the variance of h2 estimates can also be interpreted utilizing the above approach. In the results section of this chapter it is noted that decreasing the magnitude of rB and/or y causes full-sib designs to rise in efficiency relative to the half-sib design. In accordance with our previous arguments this would be expected since decreasing the magnitude of those two ratios causes a decrease in the magnitude of off-diagonal elements. More precisely, decreasing y results in the reduction of off-diagonal elements in V of the full-sib designs while not affecting the half-sib design, and decreasing r% results in the reduction of off-diagonal 25 elements in V of full-sib and half-sib designs. Relative increases in efficiency of full-sib designs result from the elements due to location by additive interaction occurring much less frequently in the half-sib designs; thus, the relative impact of reduction in r, in half-sib designs is less than that for full-sibs. Use of the Variance of a Ratio Approximation Use of Kendall and Stuart's (1963) first approximation (first-term Taylor series approximation) of the variance of a ratio has two major caveats. The approximation depends on large sample properties to approach the true variance of the ratio, i.e., with a small number of levels for random variables the approximation does not necessarily closely approximate the true variance of the ratio. Work by Pederson (1972) suggests that for approximating the variance of h2 at least ten parents are required in diallels before the approximation will converge to the true variance even after including Taylor series terms past the first derivative. Pederson's work also suggests that the approximation is progressively worse for increasing heritability with low numbers of parents. Using the field design in this chapter (two locations,four blocks and six-tree row-plots), simulation work (10,000 data sets) has demonstrated that with a heritability of 0.1 using four parents in a half-diallel across two locations that the variance of a ratio approximation yields a variance estimate for h2 of 0.1 while the convergent value for the simulation was 0.08 (Huber unpublished data). One should remember the dependence of the first approximation of the variance of a ratio on large sample properties when applying the technique to real data. The second caveat is that the range of estimates of the denominator of the ratio cannot pass through zero (Kendall and Stuart 1963). This constraint is of no concern for h2; however, the structure of r. and y denominators allows unbiased minimum variance estimates of those denominators to pass through zero which means at one point in the distribution of the estimates 26 of the ratios they are undefined (the distributions of these ratio estimates are not continuous). Simulation has shown that the variances of r, and y are much greater than the approximation would indicate (Huber unpublished data). The discrepancy in variance of the estimates could be partially alleviated through using a variance component estimation technique which restricts estimates to the parameter space 0 < o2 < oo. Nevertheless, because of the two caveats, approximations of the variance of h2, r. and y estimates should be viewed only on a relative basis for comparisons among designs and not on an absolute scale. Additionally, the expectation of a ratio does not equal the ratio of the expectations (Hogg and Craig 1978). If a value of genetic ratios is sought so that the value equals the ratio of the expectations, then the appropriate way to calculate the ratio would be to take the mean of variance components or linear combinations of variance components across many experiments and then take the ratio. If the value sought for h2 is the expectation of the ratio, then taking the mean of many h2 estimates is the appropriate approach. Returning to the results from simulated data (10,000 data sets) where the h2 value was set at 0.1, using the ratio of the means of variance components rendered a value of 0.1 for h2, the mean of the h2 estimates returned a value of 0.08, and a Taylor series approximation of the mean of the ratio yielded 0.07 (Pederson 1972). Conclusions Results from this study should be interpreted as relative comparisons of the levels of the factors investigated. However, viewing the optimal design problem as illustrated in the discussion section of this chapter can provide insight to the more general problem. There is no globally most efficient number of locations, parents or mating design for the three ratios estimated even within the restricted range of this study; yet, some general conclusions can be drawn. For estimating h2 the half-sib design is always optimal or close to optimal in 27 terms of variance of estimation and efficiency. In the estimation of rB and 7, the circular mating design is always optimal or near optimal in variance reduction and efficiency. Across numbers of parents within a mating design only the half-diallel shows optima for efficiency. The other mating designs have non-decreasing efficiency plots over the level of number of parent; so that while there is an optimal number of locations for a level of genetic control, the number of genetic entries per location is limited more by operational than efficiency constraints. Two locations is a near global optimum over five locations for the full-sib mating designs. Within the half-sib mating design optimality depends on the levels of h2 and rB: 1) for h2 estimation the optimal number of locations is inversely related to the level of h2, i.e. at the higher level two tests were optimal and at the lower level five tests were optimal; and 2) for rB estimation for the half-sib design, the optimal number of locations was also inversely related to the level of rB. Means of estimates from disconnected sets provide lower variance of estimation where the smaller experiments have higher efficiencies. Thus, disconnected sets are preferred according to number of locations for all mating designs and according to number of parents for the half- diallel mating design. In practical consideration of the optimal mating design problem, the results of this study indicate that if h2 estimation is the primary use of a progeny test then the half-sib mating design is the proper choice. Further, the circular mating design is an appropriate choice if the estimation of rB is more important than h2,. Finally, if a full-sib design is required to furnish information about dominance variance, the circular design provides almost globally better efficiencies for h2, rB, and y than the half-diallel. CHAPTER 3 ORDINARY LEAST SQUARES ESTIMATION OF GENERAL AND SPECIFIC COMBINING ABILITIES FROM HALF-DIALLEL MATING DESIGNS Introduction The diallel mating system is an altered factorial design in which the same individuals (or lines) are used as both male and female parents. A full diallel contains all crosses, including reciprocal crosses and selfs, resulting in a total of p2 combinations, where p is the number of parents. Assumptions that reciprocal effects, maternal effects, and paternal effects are negligible lead to the use of the half-diallel mating system (Griffing 1956, method 4) which has p(p-1)/2 parental combinations and is the mating system addressed in this chapter. Half diallels have been widely used in crop and tree breeding (Sprague and Tatum 1942, Gilbert 1958, Matzinger et al. 1959, Burley et al. 1966, and Squillace 1973) and the widespread use of this mating system continues today (Weir and Zobel 1975, Wilcox et al. 1975, Snyder and Namkoong 1978, Hallauer and Miranda 1981, Singh and Singh 1984, Greenwood et al. 1986, and Weir and Goddard 1986). Most of the statistical packages available treat fixed effect estimation as the objective of the program with random variables representing nuisance variation. Within this context a common analysis of half-diallel experiments is conducted by first treating genetic parameters as fixed effects for estimation of general (GCA) and specific (SCA) combining abilities and subsequently as random variables for variance component estimation (used for estimating heritabilities, genetic correlations, and general to specific combining ability variance ratios for 29 determining breeding strategies). This chapter focuses on the estimation of GCA's and SCA's as fixed effects. The treatment of GCA and SCA as fixed effects in OLS (ordinary least squares) is an entirely appropriate analysis if the comparisons are among parents and crosses in a particular experiment. If, as forest geneticists often wish to do, GCA estimates from disconnected experiments are to be compared, then methods such as checklots must be used to place the estimates on a common basis. Formulae (Griffing 1956, Falconer 1981, Hallauer and Miranda 1981, and Becker 1975) for hand calculation of general and specific combining abilities are based on a solution to the OLS equations for half-diallels created by sum-to-zero restrictions, i.e., the sum of all effect estimates for an experimental factor equals zero. These formulae will yield correct OLS solutions for sum- to-zero genetic parameters provided the data have no missing cells. If cell (plot) means are used as the basis for the estimation of effects, there must be at least one observation per cell (plot) where a cell is a subclassification of the data defined by one level of every factor (Searle 1987). An example of a cell is the group of observations denoted by ABj for a randomized complete block design with factor A across blocks (B). If the above formulae are applied without accounting for missing cells, incorrect and possibly misleading solutions can result. The matrix algebra approach is described in this chapter for these reasons: 1) in forest tree breeding applications data sets with missing cells are extremely common; 2) many statistical packages do not allow direct specification of the half-diallel model; 3) the use of a linear model and matrix algebra can yield relevant OLS solutions for any degree of data imbalance; and 4) viewing the mechanics of the OLS approach is an aid to understanding the properties of the estimates. The objectives of this chapter are to (1) detail the construction of ordinary least squares (OLS) analysis of half-diallel data sets to estimate genetic parameters (GCA and SCA) as fixed effects, (2) recount the assumptions and mathematical features of this type of analysis, (3) 30 facilitate the reader's implementation of OLS analyses for diallels of any degree of imbalance and suggest a method for combining estimates from disconnected experiments, and (4) aid the reader in ascertaining what method is an appropriate analysis for a given data set. Methods Linear Model Plot means are used as the unit of observation for this analysis with unequal numbers of observations per plot. Plot (cell) means are always estimable as long as there is one observation per plot, and linear combinations of these means (least squares means) provide the most efficient way of estimating OLS fixed effects (Yates 1934). Throughout this chapter, estimates are denoted by lower case letters while the parameters are designated by upper case letters and matrices are in bold print. Using plot means as observations, a common scalar linear model for an analysis of a half- diallel mating design with p(p-1)/2 crosses planted at a single location in a randomized complete block design with one plot per block is yik = z + B, + GCAj + GCAk + SCAji + eij 3-1 where yk is the mean of the il block for the jkt cross; it is an overall mean; B, is the fixed effect of block i for i= 1 to b; GCAj is the fixed general combining ability effect of the jLh female parent or kh male parent, j or k = 1,. .,p (j k); SCAj, is the fixed specific combining ability effect of parents j and k; and ei, is the random error associated with the observation of the jk- cross in the i1 block where eij (0, o2). Cross by block interaction as genotype by environment interaction is treated as confounded with between plot variation as for contiguous plots. The model in matrix notation is y = X# + e 3-2 where y is the vector of observation vectors (nxl = n rows and 1 column) where n equals the number of observations; X is the design matrix (nxm) whose function is to select the appropriate parameters for each observation where m equals the number of fixed effect parameters in the model; ( is the vector (mxl) of fixed effect parameters ordered in a column; and e is the vector (nxl) of deviations (errors) from the expectation associated with each observation. Ordinary Least Squares Solutions The matrix representation of an OLS fixed effects solution is b = (X'X)-X'y 3-3 where b is the vector of estimated fixed effect parameters, i.e., an estimate of P, and X is the design matrix either made full rank by reparameterization, or a generalized inverse of X'X may be used. Inherent in this solution is the ordinary least squares assumption that the variance- 32 covariance matrix (V) of the observations (y) is equal to Ia,, where I is an nxn identity matrix. The elements of an identity matrix are I's on the main diagonal and all other elements are 0. Multiplying I by i, places oa on the main diagonal. In the covariance matrix for the observations, the variance of the observations appears on the main diagonal and the covariance between observations appears in the off-diagonal elements. Thus, V = Io, states that the variance of the observations is equal to a. for each observation and there are no covariances between the observations (which is one direct result of considering genetic parameters as fixed effects). Sum-to-Zero Restrictions The design matrix presented in this chapter is reparameterized by sum-to-zero restrictions to (1) reduce the dimension of the matrices to a minimal size, and (2) yield estimates of fixed effects with the same solution as common formulae in the balanced case. Other restrictions such as set-to-zero could also be applied so the discussion that follows treats sum-to-zero restrictions as a specific solution to the more general problem which is finding an inverse for X'X. The subscripts 'o' and 's' refer to the overparameterized model and the reparameterized model with sum-to-zero restrictions, respectively. The matrix X, of Figure 3-1 is the design matrix for an overparameterized linear model (Milliken and Johnson 1984, page 96). Overparameterization means that the equations are written in more unknowns (parameters, in this case 13) than there are equations (number of observations minus degrees of freedom for error, in this case 12 5 = 7) with which to estimate the parameters. Reparameterization as a sum-to-zero matrix overcomes this dilemma by reducing the number of parameters through making some of the parameters linear combinations of others. Sum-to-zero restrictions make the resulting parameters and estimates sum to zero even though the unrestricted parameters (for example, the true GCA values as applied to a broader population) do not necessarily sum-to-zero within a diallel. This is the problem of comparability of GCA estimates from disconnected experiments. py B, B2 GCA, GCA2 GCA3 GCA4 SCA12 SCA,3 SCA4 SCA, SCA, SCA, 112 I 0 1 1 0 0 1 0 0 0 0 0 A Y13 110 1 0 1 0 0 1 0 0 0 0 B, y14 110 1 0 0 1 0 0 1 0 0 0 B2 Y,2s 1 0 0 1 1 0 0 0 0 1 0 0 GCA, y,2 11 0 0 1 0 1 0 0 0 0 1 0 GCA, y, = 1 1 0 0 0 1 1 0 0 0 0 0 1 GCA, 212 1 0 1 1 1 0 0 1 0 0 0 0 0 GCA4 Y213 0 1 1 0 1 0 0 1 0 0 0 0 SCA,, Y214 10 1 1 0 0 1 0 0 1 0 0 0 SCA,, y 1 0 1 0 1 1 0 0 0 0 1 0 0 SCA,4 y224 1 0 1 0 1 0 1 0 0 0 0 1 0 SCA23 Y2 1 0 1 0 0 1 1 0 0 0 0 0 1 SCA2 .SCA . y = X, # Figure 3-1. The overparameterized linear model for a four-parent half-diallel planted on a single site in two blocks displayed as matrices. The design matrix (X.) and parameter vector (.) are shown in overparameterized form. I's and 0's denote the presence or absence of a parameter in the model for the observed means (data vector, y). The parameters displayed above the design matrix label the appropriate column for each parameter. Error vector not exhibited. t B, GCA, GCA2 GCA, SCAIn SCA,3 Y112 Y113 YlI4 Y123 Y124 Y134 Y2n1 Y2z13 Y214 y2m . Bl GCA, GCA, GCA, SCA,, SCA,3 . e112 ell3 e113 e114 e123 e124 e134 e212 e213 e214 e223 e224 e234 y = X, + e. Figure 3-2. The linear model for a four-parent half-diallel planted on a single site in two blocks displayed as matrices. The design matrix (X) and the parameter vector (f,) are presented in sum-to-zero format. The parameters displayed above the design matrix label the appropriate column for each parameter. To illustrate the concept of sum-to-zero estimates versus population parameters, we use the expectation of a common formula. Becker (1975) gives equation 3-4 (which for balanced 34 cases is equivalent to gj = ((p-1)/(p-2))(Z,. Z..)) as the estimate for general combining ability for the jt line with p equalling the number of parents and Z4 equalling the site mean of the j x k cross. This equation yields the same solution as the matrix equations with no missing plots or crosses and with a design matrix which contains the sum-to-zero restrictions. An evaluation of this formula in a four-parent half-diallel planted in b blocks for the GCA of parent 1 is obtained by substituting the expectation of the linear model (equation 3-1) for each observation: gj = (/(p(p-2)))(pZj. 2Z..) 3-4 E{g,} = E{(1/(p(p-2)))(pZ,. 2Z..)} E{g,}= 3/4(GCAI) 1/4(GCA2 + GCA3 + GCA4) + 1/4(SCA12 + SCA13 + SCA4) - 1/4(SCA23 + SCA4 + SCA4). The result of equation 3-4 is obviously not GCA, from the unrestricted model (equation 3-1). Thus, gj, an estimable function and an estimate of parameter GCA,, (the estimate of the GCA of parent 1 given the sum-to-zero restrictions), does not have the same meaning as GCA, in the unrestricted model. An estimable function is a linear combination of the observations; but in order for an individual parameter in a model to be estimable, one must devise a linear combination of the observations such that the expectation has a weight of one on the parameter one wishes to estimate while having a weight of zero on all other parameters. A solution such as this does not exist for the individual parameters in the overparameterized model (equation 3-1). So, although the sum-to-zero restricted GCA parameters and estimates are forced to sum-to-zero for the sample of parents in a given diallel, the unrestricted GCA parameters only sum-to-zero across the entire population (Falconer 1981) and an evaluation of GCA1, demonstrates that the estimate contains other model parameters. The result of sum-to-zero restrictions is that the degrees of freedom for a factor equals the number of columns (parameters) for that factor in X, (Figure 3-2). Thus, a generalized 35 inverse for X,'X, is not required since the number of columns in the sum-to-zero X. matrix for each factor equals the degrees of freedom for that factor in the model (X, is full column rank and provides a solution to equation 3-3). Components of the Matrix Equation The equational components of 3-2 are now considered in greater detail. Data vector v Observations (plot means) in the data vector are ordered in the manner demonstrated in Figure 3-1. For our example Figure 3-1 is the matrix equation of a four parent half-diallel mating design planted in two randomized complete blocks on a single site. There are six crosses present in the two blocks for a total of 12 observations in the data vector, y. The observations are first sorted by block. Second, within each block the observations should be in the same sequence (for simplicity of presentation only). This sequence is obtained by assigning numbers 1 through p to each of the p parents and then sorting all crosses containing parent 1 (whether as male or female) as the primary index in descending numerical order by the other parent of the cross as the secondary index. Next all crosses containing parent 2 (primary index, as male or female) in which the other parent in the cross (secondary index) has a number greater than 2 are then also sorted in descending order by the secondary index. This procedure is followed through using parent p-1 as the primary index. Design matrix and parameter vector. X and B The design matrix for a model is conceptually a listing of the parameters present in the model for each observation (Searle 1987, page 243). In Figure 3-1, y and f. are exhibited and the parameters in f are displayed at the tops of the columns of X, (a visually correct interpretation of the multiplication of a matrix by a vector). For each observation in y, the scalar 36 model (equation 3-1) may be employed to obtain the listing of parameters for that observation (the row of the design matrix corresponding to the particular observation). The convention for design matrices is that the columns for the factors occur in the same order as the factors in the linear model (equation 3-1 and Figure 3-1). Since design matrices can be devised by first creating the columns pertinent to each factor in the model (submatrices) and then horizontally and/or vertically stacking the submatrices, the discussion of the reparameterized design matrix formulation will proceed by factor. Mean The first column of X, is for j and is a vector of I's with the number of rows equalling the number of observations (Figure 3-2). The linear model (equation 3-1) indicates that all observations contain u and the deviation of the observations from 1 is explained in terms of the factors and interactions in the model plus error. Block The number of columns for block is equal to the number of blocks minus one (column 2, X,). Each row of a block submatrix consists of I's and O's or -l's according to the identity of the observation for which the row is being formed. The normal convention is that the first column represents block 1 and the second column block 2, etc. through block b-1. Since we have used a sum-to-zero solution (Eb,=0), the effect due to block b is a linear combination of the other b-1 effects, i.e., bb = -E!"bi which in our example is 0 = b, + b2 and b2 = -b,. Thus, the row of the block submatrix for an observation in block b (the last block) has a -1 in each of the b-1 columns signifying that the block b effect is indeed a linear combination of the other b-1 block effects. Columns 2 and 3 of X, (Figure 3-1) have become column 2 of X, (Figure 3-2). General combining ability This submatrix of X, is slightly more complex than previous factors as a result of having two levels of a main effect present per observation, i.e., the deviation of an observation from tA is modeled as the result of the GCA's of both the male and female parents (equation 3-1). Again we have imposed a restriction, Ejgcaj=O. Since GCA has p-I degrees of freedom, the submatrix for GCA should have p-1 columns, i.e., gcap = -E;gca,. The GCA submatrix for X, (columns 3 through 5 in Figure 3-2) is formed from X, (columns 4 through 7 in Figure 3-1) according in the same manner as the block matrix: (1) add minus one to the elements in the other columns along each row containing a one for gca (p=4 in our example); and (3) delete the column from X, corresponding to gcap. The GCA submatrix has p(p-1)/2 rows (the number of crosses). This, with no missing cells (plots), equals the number of observations per block. To form the GCA factor submatrix for a site, the GCA submatrix is vertically concatenated (stacked on itself) b times. This completes the portion of the X, matrix for GCA. Specific combining ability In order to facilitate construction of the SCA submatrix, a horizontal direct product should be defined. A horizontal direct product, as applied to two column vectors, is the element by element product between the two vectors (SAS/IML' User's Guide 1985) such that the element in the it row of the resulting product vector is the product of the elements in the i1 rows of the two initial vectors. The resultant product vector has dimension n x 1. A horizontal direct product is useful for the formation of interaction or nested factor submatrices where the initial matrices represent the main factors and the resulting matrix represents an interaction or a nested factor (product rule, Searle 1987). 'SAS/IML is the registered trademark of the SAS Institute Inc. Cary, North Carolina. 38 The SCA submatrix can be formulated from the horizontal direct products of the columns of the GCA sub-matrix in X, (Figure 3-2). The results from the GCA columns require manipulation to become the SCA submatrix (since degrees of freedom for SCA do not equal those of an interaction for a half-diallel analysis), but the GCA column products provide a convenient starting point. The column of the SCA submatrix representing the cross between the jP and the kt parents (SCA.) is formed as the product between the GCAj and GCAk columns (Figure 3-3). The GCA columns in Figure 3-2 are multiplied in this order: column 1 times column 2 forming the first SCA column, column 1 times column 3 forming the second SCA column, and column 2 times column 3 forming the third SCA column (Figure 3-3). With four parents (six crosses) there are three degrees of freedom for GCA (p-1) and two degrees of freedom for SCA (6 crosses - 3 for GCA 1 for the mean). Since SCA has only two degrees of freedom, a sum-to-zero design matrix can have only two columns for SCA. Imposing the restriction that the sum of the SCA's across all parents equals zero is equivalent to making the last column for the SCA submatrix (Figure 3-3) a linear combination of the others (Figure 3-2). The procedure for deleting the third column product is identical to that for the GCA submatrix: add minus one to every element in the rows of the remaining SCA columns in which a one appears in the column which is to be deleted (Figure 3-2, columns 6 and 7). The number of rows in the SCA submatrix equals the number observations in a block and must be vertically concatenated b times to create the SCA submatrix for a site. An algebraic evaluation of SCA sum-to-zero restrictions requires that Ejscap = 0 for each k and that EEkscaj = 0; thus, for observations in the i- block with i serving to denote the row of the SCA submatrix in block i, sca,14 = -sca,12 -sca,3 and entries in the submatrix row for y,14 are -l's. The estimate for scai equals sca,14 because scai is the negative of the sum of the independently estimated SCA's (sca,,2 and scai1) from the restriction that the sum of the SCA's 39 across all parents equals zero. Similarly, by sum-to-zero definition sca~ = -scam -scan and by substitution scam = -(-sca,12 -sca13) -sca,2 = sca,13. By the same protocol, it can be shown that sca, = sca12n. The elements in the rows of the SCA submatrix are l's, -l's and 0's in accordance with the algebraic evaluation. Thus, while it may seem that there should be 6 SCA values (one for each cross), only 2 can be independently estimated and the remaining 4 are linear combinations of the independently estimated SCA's. Again the SCA sum-to-zero estimates are not equal to the parametric population SCA's. An analogous illustration for SCA to that for GCA would show that the estimable function (linear combination of observations) for a given SCA, contains a variety of other parameters. OBS. GCAixGCA2 GCAixGCA3 GCA2xGCA, SCAt2 SCA,, SCA, Y2 (1)(1)=1 (1)(0)=0 (1)(0)=0 1 0 0 YS (1)(0)=0 (1)(1)=1 (0)(1)=0 0 1 0 Yi4 (0)(-1)=0 (0)(-1)=0 (-1)(-1)=1 0 0 1 Ym (0)(1)=0 (0)(1)=0 (1)(1)=1 0 0 1 Y" (-1)(0)=0 (-1)(-1)=1 (0)(-1)=0 0 1 0 Y (-1)(-)= (-1)(0)=0 (-1)(0)=0 1 0 0 Figure 3-3. Intermediate result in SCA submatrix generation (SCA columns as horizontal direct products of GCAi, GCA2, and GCA3 columns within a block). The SCAj column is the horizontal direct product of the columns for GCA, and GCAk. Estimation of Fixed Effects GCA parameters The GCA parameters can be estimated (without mean, block, and SCA in the design matrix) through the use of equation 3-3, if there are no missing cell means (plots) for any cross and no missing crosses. The design matrix consists only of the GCA submatrix. This design matrix has {p-1} (for GCA's) columns (the third through the fifth columns of X.). The b vector is an estimate of the GCA portion of 8, as in Figure 3-2 and the linear combinations for the estimation of gca, is gca, = -pE-gcaj. Parameters for any of the factors can be estimated 40 independently using the pertinent submatrix as long as there are no missing cell means (plots) and no missing crosses; this uses a property known as orthogonality. Orthogonality requires that the dot product between two vectors equals zero (Schneider 1987, page 168). The dot product (a scalar) is the sum of the values in a vector obtained from the horizontal direct product of two vectors. For two factors to be orthogonal, the dot products of all the column vectors making up the section of the design matrix for one factor with the column vectors making up the portion of the design matrix for the second must be zero. If all factors in the model are orthogonal, then the X,'X, matrix is block diagonal. A block-diagonal X.'X. matrix is composed of square factor submatrices (degrees of freedom x degrees of freedom) along the diagonal with all off-diagonal elements not in one of the square factor submatrices equalling zero. A property of block-diagonal matrices is that the inverse can be calculated by inverting each block separately and replacing the original block in the full X'X matrix by the inverted block. Because the blocks can be inverted separately and all other off-diagonal elements of the inverse are zero, the effects for factors which are orthogonal to all other factors may be estimated separately, i.e., there are no functions of other sum-to-zero factors in the sum-to-zero estimates. Mean, block. GCA and SCA parameters All parameters are estimated simultaneously by horizontally concatenating the mean, block, GCA, and SCA matrices to create X,. Equation 3-3 is again utilized to solve the system of equations. The b vector for the four parent example is an estimate of 3, of Figure 3-2. Again, one parameter is estimated for each column in the X, matrix and all parameter estimates not present are linear combinations of the parameter estimates in the b vector. So b, is equal to - E -Ib, and gca, is equal to -EI-gcaj. The linear combinations for SCA effects can be obtained by reading along the row of the SCA submatrix associated with the observation containing the 41 parameter, i.e., in Figure 3-2 the observation ym contains the effect sca, which is estimated as the linear combination -scal2 -sca.13. This completes the estimation of fixed effect parameters from a data set which is balanced on a plot-mean basis. Since field data sets with such completeness are a rarity in forestry applications, the next step is OLS analysis for various types of data imbalance. Calculations of solutions based on a complete data set and simulated data sets with common types of imbalance are demonstrated in numerical examples. Numerical Examples The data set analyzed in the numerical examples is from a five-year-old, six-parent half- diallel slash pine (Pinus elliottii var. elliottii Engelmn) progeny test planted on a single site in four complete blocks. Each cross is represented by a five-tree row plot within each block. Total height in meters and diameter at breast height (dbh in centimeters) are the traits selected for analysis. The data set is presented in Table 3-1 so that the reader may reconstruct the analysis and compare answers with the examples. The numbers 1 through 6 were arbitrarily assigned to the parents for analysis. Because of unequal survival within plots, plot means are used as the unit of observation. Balanced Data (Plot-mean Basis) The sum-to-zero design matrix for the balanced data set has (4 blocks)x(15 crosses) = 60 rows (which equals the number of observations in y) and has the following columns: one column for iJ, three columns for blocks (b-1), five columns for GCA (p-1), and nine columns for SCA (15 crosses 5 1) for a total of 18 columns. With sixty plot means (degrees of freedom) and 18 degrees of freedom in the model, subtracting 18 from 60 yields 42 degrees of freedom for 42 error which matches the degrees of freedom for cross by block interaction, thus verifying that degrees of freedom concur with the number of columns in the sum-to-zero design matrix. To illustrate the principle of orthogonality in the balanced case, the X'X and (X'X)1 matrices may be printed to show that they are block diagonal. In further illustration, the effects within a factor may also be estimated without any other factors in the design matrix and compared to the estimates from the full design matrix. The vectors of parameter estimates for height and dbh (Table 3-2) were calculated from the same X, matrix because height and dbh measurements were taken on the same trees. In other words, if a height measurement was taken on a tree, a dbh measurement was also taken, so the design matrices are equivalent. Missing Plot To illustrate the problem of a missing plot, the cross, parent two by parent three, was arbitrarily deleted in block one (as if observation yz were missing). This deletion prompts adjustments to the factor matrices in order to analyze the new data set. The new vector of observations (y) now has 59 rows. This necessitates deletion of the row of the design matrix (XY) in block 1 which would have been associated with cross 2 x 3. This is the only matrix alteration required for the analysis. Thus, the resultant X, matrix has 60 1 = 59 rows and 18 columns. With 59 means in y and 18 columns in X., the degrees of freedom for error is 41. Comparisons between results of the analyses (Table 3-2) of the full data set and the data set missing observation y,3 reveal that for this case the estimates of parameters have been relatively unaffected by the imbalance (magnitudes of GCA's changed only slightly and rankings by GCA were unaffected). 43 Table 3-1. Data set for numerical examples. Five-year-old slash pine progeny test with a 6- parent half-diallel mating design present on a single site with four randomized complete blocks and a five-tree row plot per cross per block. Within Plot Trees Mean Mean Variance Variance per Block Female Male Height DBH Height DBH Plot Meters Centimeters n2 cm2 1 1 2 2.6899 3.810 0.9800 3.484 4 1 1 3 1.9080 2.134 1.4277 3.893 5 1 1 5 3.1242 4.445 0.4487 1.656 4 1 1 6 2.4933 3.200 0.8488 5.664 5 1 2 5 1.4783 1.588 0.6556 2.167 4 1 2 6 2.7026 3.471 0.1136 0.344 3 1 3 2 3.0480 4.699 0.2341 0.968 4 1 3 5 3.4991 5.131 0.0945 0.271 5 1 3 6 2.4003 2.794 0.5149 1.548 4 1 4 1 3.3955 4.928 0.1489 0.761 5 1 4 2 3.4290 5.144 0.7943 3.285 4 1 4 3 2.5298 2.984 0.9557 4.188 4 1 4 5 2.4155 3.175 0.5936 2.946 4 1 4 6 3.2004 4.521 1.7034 7.594 5 1 5 6 2.2403 2.794 1.0433 6.280 4 2 1 2 3.5662 5.080 0.9560 2.903 5 2 1 3 2.6335 3.353 0.7695 3.497 5 2 1 5 3.6942 5.893 0.0573 0.432 5 2 1 6 3.4808 4.928 0.9222 2.890 5 2 2 5 3.4260 4.877 0.7017 2.432 5 2 2 6 2.4282 3.302 0.0616 0.452 3 2 3 2 3.0480 4.064 0.0192 0.301 4 2 3 5 2.8895 4.013 0.1957 0.690 5 2 3 6 1.9406 1.863 0.0560 0.408 3 2 4 1 3.0114 3.962 1.9753 6.342 5 2 4 2 3.6454 5.283 0.1731 0.787 5 2 4 3 2.9566 3.861 0.0506 0.174 5 2 4 5 2.8118 4.382 1.1336 5.435 4 2 4 6 3.2674 4.318 1.1211 4.354 5 2 5 6 3.7917 5.893 0.0848 0.497 5 3 1 2 2.2961 2.625 0.3914 1.699 3 3 1 3 2.8956 4.128 1.2926 4.532 4 3 1 5 2.5359 3.607 0.8284 4.303 5 3 1 6 2.9032 3.937 0.8252 4.064 4 3 2 5 2.7737 4.064 0.9829 3.226 2 3 2 6 1.2040 0.635 0.4464 0.806 2 3 3 2 2.9870 4.191 0.9049 2.989 4 3 3 5 2.8407 3.962 0.7309 3.632 5 3 3 6 1.3564 0.000 0.1677 0.000 2 3 4 1 2.6746 3.620 0.8463 2.984 4 3 4 2 2.7066 3.353 0.5590 1.787 5 3 4 3 3.4198 4.623 0.3509 0.690 5 3 4 5 3.3299 4.953 0.4102 1.226 4 3 4 6 3.4564 4.978 0.8369 3.503 5 3 5 6 3.2614 4.826 . 4 1 2 1.8974 2.476 1.0160 3.629 4 4 1 3 1.3005 0.508 0.2019 0.774 3 4 1 5 2.0726 2.540 1.2235 5.097 3 4 1 6 1.8821 1.778 0.4728 3.312 4 4 2 5 1. 64 1.334 0.5354 2.382 4 4 2 6 1.5392 0.635 0.0376 0.806 2 4 3 2 1.8898 2.032 0.7364 1.892 4 4 3 5 2.5146 3.620 0.0876 0.446 4 4 3 6 1.8389 2.201 0.0941 0.280 3 4 4 1 2.3348 2.591 0.3816 2.722 5 4 4 2 1.7272 1.693 2.1640 8.602 3 4 4 3 1.6581 1.524 0.0537 0.903 5 4 4 5 2.1184 2.286 0.3137 2.366 4 4 4 6 1.5545 1.422 0.4803 1.019 5 4 5 6 1.4122 1.693 0.0338 0.150 3 Table 3-2. Numerical results for examples of data imbalance using the OLS techniques presented in the text. Balanced' Missing Plotb Missing Cross" Estimate of' It B' B, B, GCA, GCA, GCA, GCA, GCA, SCAt SCA,, SCA4, SCA,, SCA, SCA2 SCA2 SCA, SCA35 DBH 3.362 0.292 0.976 0.205 0.144 -.180 -.347 0.398 0.489 0.172 -.628 -.128 0.126 0.912 0.289 -.706 0.164 0.677 Height 2.5787 0.1074 0.5274 0.1308 0.0760 -.1186 -.1426 0.2544 0.1320 0.0763 -.3277 -.0550 0.0700 0.3600 0.1627 -.3084 -.0493 0.3679 DBH 3.346 0.245 0.992 0.220 0.163 -.220 -.386 0.417 0.509 0.208 -.592 -.152 0.102 0.771 0.324 -.670 0.129 0.712 Height 2.5386 0.1074 0.5386 0.1180 0.1260 -.2186 -.2426 0.3044 0.1820 0.1663 -.2377 -.1150 0.0100 0.2527 -.2187 0.0406 0.4793 DBH 3.260 0.245 1.023 0.187 0.270 -.434 -.601 0.524 0.616 0.400 -.400 -.280 -.026 0.517 -.478 0.064 0.905 Five Missing Crossesd Height 2.4980 0.1393 0.6041 0.0689 0.1361 -.2371 -.3972 0.4241 0.1746 DBH 3.149 0.309 1.140 0.087 0.232 -.493 -.952 0.804 0.646 Height 2.5830 0.1203 0.5230 0.1264 0.0706 -.1077 -.1316 0.2489 0.1265 0.0665 -.3374 -.0484 0.0766 0.3995 0.1528 -.3185 -.0592 0.3580 "where (numerical examples are for height) b4= -E3bi = -.7697; gca6 = -E5gca, = -.2067; scaP = -Escap, for j or k = p and p= 1,2,3 then sca,1 = .2428, sca, = -.3002, and sca, = -.3608; sca4 = -Esca. = -.2898, e = independently estimated sea's 1, 9; sca4 = sca,2 + sca3l + sca15 + sca3 + sca2 + sca35 = .2446; and sca5 = scale2 + sca43 + sca4 + sca2 + sca4 + sca, = .1737. where the linear combinations for parameter estimates are identical to the balanced example. 'where scap = -Escaj for j or k = p and p= 1 to 3; sca4 = -VEsca, e = independently estimated SCA's 1,. ..,8; sca4 = sca2, + scale3 + sca15 + sca2 + sca35; and sca = sca12 + sca13 + sca14+ sca, + sca,. where sca16 = -sca14 -sca,,, sca. = -sca., sca, = sca,, sca, = sca=5, sca. = sca14 + sca2 + sca4, and scan = the negative of the sum of the four independently estimated sea's. "where for all cases linear combinations for block and gca are the same as in the balanced case. -.2041 -.410 0.0480 0.094 0.1920 0.408 0.1163 0.246 Missing Cross Another common form of imbalance in diallel data sets, the missing cross, is examined through arbitrary deletion of the 2 x 3 cross from all blocks, i.e., y12, y3, y33, y4 are missing in the data vector. This type of imbalance is representative of a particular cross that could not be made and is therefore missing from all blocks. The matrix manipulations required for this analysis are again presented by factor. For appropriate SCA restrictions, the data vector and design matrix should be ordered so that the p1 parent has no missing crosses. Since the labeling of a parent as parent p is entirely subjective, any parent with all crosses may be designated as parent p. The previous labelling directions are necessary since we generate the SCA submatrix as horizontal direct products of the columns of the GCA submatrix; and to account for missing crosses, the horizontal direct product for each particular missing parental combinations are not calculated which sets the missing SCA's to zero. If there is a cross missing from those of the p1 parent, we cannot account for the missing cross with this technique (Searle 1987, page 479). For the mean, block, and GCA submatrices, the adjustment for the missing cross dictates deleting the rows in the submatrices which would have corresponded to the y2 observations. The SCA submatrix must be reformed since a degree of freedom for SCA and hence a column of the submatrix has been lost. The SCA submatrix is reinstituted from the GCA horizontal direct products (remembering that one cross, 2x3, no longer exists and therefore that product GCA2 x GCA3 is inappropriate). Dropping the column for SCA, is equivalent to setting SCA, to zero (Searle 1987) so that the remaining SCA's will sum-to-zero. After that, the reformation is according to the established pattern. With one missing cross there are now 56 observations and hence 56 degrees of freedom available. The columns of the X, matrix are now: one for the mean, three for block, five for GCA, and eight for SCA for a total of 17 columns. The 46 remaining degrees of freedom for error is 39, matching the correct degrees of freedom ((14- 1)x(4-1)= 39). For the missing cross example 4 is no longer equivalent to the mean of the plot means since 4 = 2.5386 and Egjyij)/N = 2.5715 where N = 56 (number of plot means). This is the result of GCA effects which are no longer orthogonal to the mean. Check the X,'X, matrix or try estimating factors separately and compare to the estimates when all factors are included in X. If formulae for balanced data (Becker 1975, Falconer 1981, and Hallauer and Miranda 1981) are applied to unbalanced data (plot-mean basis) estimates of parameters are no longer appropriate because factors in the model are no longer independent orthogonall). Applying Becker's formula which uses totals of cross means for a site (y. to the missing cross example yields: gca, = .2992, gca2 = -.5649, gca3 = -.5888, gca4 = .4665, gca5 = .3552, and gca, = .0219. These answers are very different in magnitude from those in Table 3-2 for this example and gca6 also has a different sign. Employing these formulae in the analysis of unbalanced data is analogous to matrix estimation of GCA's without the other factors in the model which is inappropriate. Several Missing Crosses The concluding example (Table 3-2) is a drastically unbalanced data set resulting from the arbitrary deletion of five crosses (1 x 2, 1 x 3, 2 x 3, 3 x 5, and 4 x 5). The matrix manipulation for this example is an extension of the previous one cross deletion example. Rows corresponding to yi12, Yi13, yi2, yi, and y,5 are deleted from the mean, block and GCA submatrices for all blocks. The SCA matrix (now 4 columns = 10 crosses 5 -1 = 4 degrees of freedom) is again reformed with only the relevant products of the GCA columns. Counting degrees of freedom (columns of the sum-to-zero design matrix), the mean has one, block has 47 three, GCA has five, and SCA has four degrees of freedom for a total of 13. Error has (4-1)(10- 1) = 27 degrees of freedom. Totaling degrees of freedom for modeled effects and error yields 40 which equals the number of plot means. In increasingly unbalanced cases (Table 3-2), the spread among the GCA estimates tends to increase with increasing imbalance (loss of information). This is a general feature of OLS analyses and the basis for the feature is that the spread among the GCA estimates is due to both the innate spread due to additive genetics effects as well as the error in estimation of the GCA's. When there is less information, GCA estimates tend to be more widely spread due to the increase in the error variance associated with their estimation. This feature has been noted (White and Hodge 1989, page 54) as the tendency to pick as parental winners individuals in a breeding program which are the most poorly tested. Discussion After developing the OLS analysis and describing the inherent assumptions of the analysis, there are four important factors to consider in the interpretation of sum-to-zero OLS solutions: (1) the lack of uniqueness of the parameter estimates; (2) the weights given to plot means (yi) and in turn site means (y .) for crosses in data sets with missing crosses in parameter estimation; (3) the arbitrary nature of using a diallel mean (perforce a narrow genetic base) as the mean about which the GCA's sum-to-zero; and (4) the assumption that the covariance matrix for the observations (V) is Ia2,. Uniqueness of Estimates Sum-to-zero restrictions furnish what would appear to be unique estimates of the individual parameters, e.g. GCA1, when, in fact, these individual parameters are not estimable 48 (Graybill 1976, Freund and Littell 1981, and Milliken and Johnson 1984). The lack of estimability is again analogous to attempting to solve a set of equations in n unknowns with t equations where n is greater than t. Therefore, an infinite number of solutions exist for 8. There are quantities in this system of equations that are unique (estimable), i.e., the estimate is invariant regardless of the restriction (sum-to-zero or set-to-zero) or generalized inverse (no restrictions) used (Milliken and Johnson 1984) and the estimable functions include sum-to-zero GCA and SCA estimates since they are linear combinations of the observations; but, these estimable quantities do not estimate the individual parametric GCA's and SCA's of the overparameterized model (equation 3-4) since there is no unique solution for those parameters. Weighting of Plot Means and Cross Means in Estimating Parameters With at least one measurement tree in each plot and with plot means as the unit of observation, use of the matrix approach produces the same results as the basic formulae. The weight placed on each plot mean in the estimation of a parameter can be determined by calculating (X,'XX)'X' which can be viewed as a matrix of weights W so that equation 3-3 can be written as b = Wy. The matrix W has these dimensions: the number of rows equals the number of parameters in f, and the number of columns equals the number of plot means in y. The i!t row of the W contains the weights applied to y to estimate the i1 parameter in b (6). In the discussion which follows gca, is utilized as 6,. If there are no missing plots, the cross mean in every block (yj) has the same weighting and weights can be combined across blocks to yield the weight on the overall cross mean (y.j). It can be shown that for the balanced numerical example gca, is calculated by weighting the overall cross means containing parent 1 by 1/6 and weighting all overall cross means not GCA3 GCA4 1/6 1/6 1/6 1/6 1/6 .16667 .16667 .16667 .16667 .16667 .14583 -1/12 -1/12 -1/12 -1/12 missing -.08333 -.08333 -.08333 -.08333 .14583 missing -1/12 -1/12 -1/12 missing missing -.08333 -.08333 -.08333 .18056 -.10417 -.10417 -1/12 -1/12 .22549 .01961 -.11765 -.08333 -.08333 .18056 -.10417 -.10417 -.06944 -1/12 .31372 -.27451 missing missing t~ -.08333 .18056 -.10417 -.10417 -.06944 -.06944 .29412 .08824 -.04902 -.29412 -.20588 Figure 3-4. Weights on overall cross means (y.j) for the three numerical examples for estimation of GCAI. The weights for the balanced example (above the diagonal) are presented in both fractional and decimal form. The weights for the one-cross missing and the five-crosses missing are presented as the upper number and lower number, respectively, in cells below the diagonal. The marginal weights on GCA parameters (right margin) do not change although cells are missing. GCA1 GCA2 GCA3 GCA4 GCA5 GCA6 5/6 -1/6 -1/6 -1/6 -1/6 -1/6 GCA1 GCA2 GCA5 GCA6 50 containing parent 1 by -1/12. Figure 3-4 (above the diagonal) demonstrates the weightings on the overall cross means for the balanced numerical example as well as the marginal weighting on the GCA parameters. These marginal weightings are obtained by summing along a row and/or column as one would to obtain the marginal totals for a parent (Becker 1975). One feature of sum-to-zero solutions is that these marginal weightings will be maintained no matter the imbalance due to missing crosses, as will be seen by considering the numerical examples for a missing cross (Figure 3-4 below the diagonal, upper number) and five missing crosses (Figure 3-4 below the diagonal, lower number). The marginal weights have remained the same as in the balanced case while the weights on the cross means differ among the crosses containing parent 1 and also among the crosses not containing parent 1. In the five missing crosses example, crosses y.2 and y.2 even receive a positive weighting where in the prior examples they had negative weighting. The expected value in all three examples is GCA,, (for sum-to-zero) despite the apparently nonsensical weightings to cross means with missing crosses; however, the evaluation of the estimates in terms of the original model changes with each new combination of missing cells, i.e., y.2 and y26 have a positive weight in the five missing crosses example in GCA, estimation. Whether this type of estimation is desirable with missing cell (cross) means has been the subject of some discussion (Speed, Hocking and Hackney 1978, Freund 1980, and Milliken and Johnson 1984). The data analyst should be aware of the manner in which sum-to-zero treats the data with missing cell means and decide whether that particular linear combination of cross means estimating the parameter is one of interest, realizing that the meaning of the estimates in terms of the original model is changing. Diallel Mean The use of the mean for a half-diallel as the mean around which GCA's sum-to-zero is not satisfactory in that the diallel mean is the mean of a rather narrow genetically based population, and in particular that the comparisons of interest are not usually confined to the specific parents in a specific diallel on a particular site. A checklot can be employed to represent a base population against which comparison of half- or full-sib families can be made to provide for comparison of GCA estimates from other tests (van Buijtenen and Bridgwater 1986). Mathematically, when effects are forced to sum-to-zero around their own mean, the absolute value of the GCA's is reflective of their value relative to the mean of the group. Even if the parents involved in the particular diallel were all far superior to the population mean for GCA, GCA's calculated on an OLS basis would show that some of these GCA's were negative. If the GCA's of the diallel parents were in fact all below the population mean, the opposite and equally undesirable result ensues. For disconnected diallels together on a single site, an OLS analysis would yield GCA estimates that sum-to-zero within each diallel since parents are nested within diallels. Unless the comparisons of interest are only in the combination of the parents in a specific diallel on a specific site, the checklot alternative is desirable. A method for obtaining the desired goal of comparable GCA's from disconnected experiments, disregarding the problem of heteroscedasticity, is to form a function from the data which yields GCA estimates properly located on the number scale. Such a function can be formed (using GCA, as an example) from gca,,, the diallel mean, and the checklot mean. From expectations of the scalar linear model (equation 3-1), GCA,. = ((p-1)/p)GCA, (1/p)EJ.2GCAj + (1/p)E=2SCAlk 3-5 (2/(p(p-2)))E E 1E=.=SCA+; E{diallel mean) = A + (E=,B.)/b + (2/p)Ee=,GCAj + (2/(p(p-l)))EgII-IE=2SCAjk; and E{checklot mean} = j + (E1=B.)/b + 7; where j for GCA is j or k and r represents the fixed genetic parameter of the checklot. The function used to properly locate GCA,, (the subscript rel denotes the relocated GCAi) is gca,, = gca,, + (1/2)(diallel mean checklot mean). The expectation of gca,, with negligible SCA is GCA,,o = GCAi r/2; and since breeding value equals twice GCA, BV,, = BV, T. If SCA is non-negligible then the expectation is GCA,, = GCAI + (l/(p-1))E.=2SCA, (l/((p-l)(p-2)))E I .3SCA, r/2. 3-6 In either case the function provides a reasonable manner by which GCA estimates from disconnected diallels are centered at the same location on a number scale and are then comparable. Variance and Covariance of Plot Means The variances of plot means with unequal numbers of trees per plot are by definition unequal, i.e., Var(y1) = o2, + o2,,/n where o~, is plot variance, o, is the within plot variance and ni, is the number of observations per plot. Also, if blocks were considered random, there would be an additional source of variance for plot means due to blocks (as well as a covariance between plot means in the same block) and this could be incorporated into the V matrix with Var(yj) = oab + a2, + o2 ,n,/n. Since the variances of the means in the observation vector are not equal and there is a covariance between the means if blocks are being considered random, best linear unbiased estimates (BLUE) would be secured by weighting each mean by it's true associated variance (Searle 1987, page 316). This is the generalized least squares (GLS) approach as b = (X,'V-'X)-'X,'V-y 53 The GLS approach relaxes the OLS assumptions of equal variance of and no covariance between the observations (plot means) while still treating genetic parameters as fixed effects. The entries along the diagonal of the V matrix are the variances of the plot means (Var(yk)) in the same order as means in the data vector. The off-diagonal elements of V would be either 0 or o2b (the variance due to the random variable block) for elements corresponding to observations in the same block. BLUE requires exact knowledge of V; if estimates of ao, o2,, and o2, are utilized in the V matrix, estimable functions of f approximate BLUE. The OLS assumption that SCA and GCA are fixed effects can also be relaxed to allow for covariances due to genetic relatedness. In particular, the information that means are from the same half- or full-sib family could be included in the V matrix. Relaxation of the zero covariance assumption implies that GCA and SCA are random variables. If GCA and SCA are treated as random variables, then the application of best linear prediction (BLP) or best linear unbiased prediction (BLUP) to the problem would be more appropriate (White and Hodge 1989, page 64). The treatment of the genetic parameters as random variables is consistent with that used in estimating genetic correlations and heritabilities. The V matrix of such an application would include, in addition to the features of the GLS V matrix, the covariance between full-sib or half- sib families added to the off-diagonal elements in V, i.e., if the first and second plot means in the data vector had a covariance due to relationship, then that covariance is inserted twice in the V matrix. The covariance would appear as the second element in the first row and the first element in the second row of V (V is a symmetric matrix). Also the diagonal elements of V would increase by 2o,, (the variance due to treating GCA as a random variable) + o. (the variance due to treating SCA as a random variable). Comparison of Prediction and Estimation Methodologies Which methodology (OLS, GLS, BLP, or BLUP) to apply to individual data bases is somewhat a subjective decision. The decision can be based both on the computational or conceptual complexity of the method and the magnitude of the data base with which the analyst is working. To aid in this decision, this discussion highlights the differences in the inherent properties and assumptions of the techniques. For all practical purposes the answers from the four techniques will never be equal; however, there are two caveats. First, OLS estimates equal GLS estimates if all the cell means are known with the same precision (variance), (Searle 1987, page 490). Otherwise, GLS discounts the means that are known with less precision in the calculations and different estimates result. The second caveat is if the amount of data is infinite, i.e., all cross means are known without error, then all four techniques are equivalent (White and Hodge 1989, pages 104-106). In all other cases BLP and BLUP shrink predictions toward the location parameters) and produce predictions which are different from OLS or GLS estimates even with balanced data. During calculations GLS, BLP, and BLUP place less weight on observations known with less precision, which is intuitively pleasing. With OLS and GLS forest geneticists treat GCA's and SCA's as fixed effects for estimation and then as random variables for genetic correlations and heritabilities. BLP and BLUP provide a consistent treatment of GCA's and SCA's as random variables while differing in their assumptions about location parameters (fixed effects). In BLP fixed effects are assumed known without error (although they are usually estimated from the data) while with BLUP fixed effects are estimated using GLS. BLP and BLUP techniques also contain the assumption that the covariance matrix of the observations is known without error (most often variances must be estimated). In many BLUP applications (Henderson 1974), mixed model equations are utilized 55 iteratively to estimate fixed effects and to predict random variables from a data set. A BLUP treatment of fixed effects allows any connectedness between experiments to be utilized in the estimation of the fixed effects. This provides an intuitive advantage of BLUP over BLP in experimentation where connectedness among genetic experiments is available or where the data are so unbalanced that treating the fixed effects as known is less desirable than a GLS estimate of the fixed effects. An ordering of computational complexity and conceptual complexity from least to most complex of the four methods is OLS, GLS, BLP and BLUP. The latter three methods require the estimation of the covariance matrix of the observations either separately (a priori) or iteratively with the fixed effects. Precise estimation of the covariance matrix for observations requires a great number of observations and the precision of GLS, BLP and BLUP estimations or predictions is affected by the error of estimation of the components of V. Selection of a method can then be based on weighing the computational complexity and size of the available data base against the advantages offered by each method. Thus, if complexity of the computational problem is of paramount concern, the analyst necessarily would choose OLS. With a small data base (one that does not allow reasonable estimates of variances), the analyst would again choose OLS. With a large data base and no qualms with computational complexity, the analyst can choose between BLP and BLUP based on whether there is sufficient connectedness or imbalance among the experiments to make BLUP advantageous. Conclusions Methods of solving for GCA and SCA estimates for balanced (plot-mean basis) and unbalanced data have been presented along with the inherent assumptions of the analysis. The use of plot means and the matrix equations will produce sum-to-zero OLS estimates for GCA and 56 SCA for all types of imbalance. Formulae in the literature which yield OLS solutions for balanced data can yield misleading solutions for unbalanced data because of the loss of orthogonality and also weightings on site means for crosses (or totals) are constants. GCA's and SCA's obtained through sum-to-zero restriction are not truly estimates of parametric population GCA's and SCA's. There are an infinite number of solutions for GCA's and SCA's from the system of equations as a result of the overparameterized linear model. Yet, if the only comparisons of interest are among the specific parents on a particular site, then the estimates calculated by sum-to-zero restrictions are appropriate. Checklots may be used to provide comparability among estimates derived from disconnected sets. Having discussed the innate mathematical features of OLS analysis, knowledge of these features should help the data analyst decide if OLS is the most desirable technique for the data at hand. It may be desirable to relax OLS assumptions, which are in all likelihood invalid for the covariance matrix of the observations. This could lead to GLS, BLP or BLUP as better alternatives. CHAPTER 4 VARIANCE COMPONENT ESTIMATION TECHNIQUES COMPARED FOR TWO MATING DESIGNS WITH FOREST GENETIC ARCHITECTURE THROUGH COMPUTER SIMULATION Introduction In many applications of quantitative genetics, geneticists are commonly faced with the analysis of data containing a multitude of flaws (e.g. non-normality, imbalance, and heteroscedasticity). Imbalance, as one of these flaws, is intrinsic to quantitative forest genetics research because of the difficulty in making crosses for full-sib tests and the biological realities of long term field experiments. Few definitive studies have been conducted to establish optimal methods for estimation of variance components from unbalanced data. Simulation studies using simple models (one-way or two-way random models) have been conducted for certain data structures, i.e., imbalance, experimental design, and variance parameters (Corbeil and Searle 1976, Swallow 1981, Swallow and Monahan 1984, interpretations by Littell and McCutchan 1986). The results from these studies indicate that technique optimality is a function of the data structure. In practice (both historically and still common place), estimation of variance components in forest genetics applications has been achieved by using sequentially adjusted sums of squares as an application of Henderson's Method 3 (HM3, Henderson 1953). Under normality and with balanced data, this technique has the desirable properties of being the minimum variance unbiased estimator. If the data are unbalanced, then the only property retained by HM3 estimation is 58 unbiasedness (Searle 1971, Searle 1987 pp. 492,493,498). Other estimators have been shown to be locally superior to HM3 in variance or mean square error properties in certain cases (Klotz et al. 1969, Olsen et al. 1976, Swallow 1981, Swallow and Monahan 1984). Over the last 25 years, there has been a proliferation of variance component estimation techniques including minimum norm quadratic unbiased estimation (MINQUE, Rao 1971a), minimum variance quadratic unbiased estimation (MIVQUE, Rao 1971b), maximum likelihood (ML, Hartley and Rao 1967), and restricted maximum likelihood (REML, Patterson and Thompson 1971). The practical application of these techniques has been impeded by their computational complexity. However, with continuing advances in computer technology and the appearance of better computational algorithms, the application of these procedures continues to become more tractable (Harville 1977, Geisbrecht 1983, Meyer 1989). Whether these methods of analysis are superior to HM3 for many genetics applications remains to be shown. With balanced data and disregarding negative estimates, all previously mentioned techniques except ML produce the same estimates (Harville 1977). With unbalanced data, each technique produces a different set of variance component estimates. Criteria must then be adopted to discriminate among techniques. Candidate criteria for discrimination include unbiasedness (large number convergence on the parametric value), minimum variance (estimator with the smallest sampling variance), minimum mean square error (minimum of sampling variance plus squared bias, Hogg and Craig 1978), and probability of nearness (probability that sample estimates occur in a certain interval around the parametric value, Pitman 1937). Negative estimates are also problematic in the estimation of variance components. Five alternatives for dealing with the dilemma of estimates less than zero (outside the natural parameter space of zero to infinity) are (Searle 1971): 1) accept and use the negative estimate, 2) set the negative estimate to zero (producing biased estimates), 3) re-solve the system with the offending 59 component set to zero, 4) use an algorithm which does not allow negative estimates, and 5) use the negative estimate to infer that the wrong model was utilized. The purpose of this research was to determine if the criteria of unbiasedness, minimum variance, minimum mean square error, and probability of nearness discriminated among several variance component estimation techniques while exploring various alternatives for dealing with negative variance component estimates. In order to make such comparisons, a large number of data sets were required for each experimental level. Using simulated data, this chapter compares variance component estimation techniques for plot-mean and individual observations, two mating systems (modified half-diallel and half-sib) and two sets of parametric variance components. Types of imbalance and levels of factors were chosen to reflect common situations in forest genetics. Methods Experimental Approach For each experimental level 1000 data sets were generated and analyzed by various techniques (Table 4-1) producing numerous sets of variance component estimates for each data set. This workload resulted in enormous computational time being associated with each experimental level. The overall experimental design for the simulation was originally conceived as a factorial with two types of mating design (half-diallel and half-sib), two sets of true variance components (Table 4-2), two kinds of observations (individual and plot mean) and three types of imbalance: 1) survival levels (80% and 60%, with 80% representing moderate survival and 60% representing poor survival; 2) for full-sib designs three levels of missing crosses (0, 2, and 5 out of 15 crosses); and 3) for half-sib designs two levels of connectedness among tests (15 and 10 common families between tests out of 15 families per test). Because of the computational time Table 4-1. Abbreviation for and description of variance component estimation methods utilized for analyses based on individual observations (if utilized for plot-mean analysis the abbreviation is modified by pre-fixing a 'P'). Abbreviation Description Citation ML Maximum Likelihood: estimates not restricted to the parameter Hartley and Rao 1967; PML space (individual and plot-mean analysis). Shaw 1987 MODML Maximum Likelihood: negative estimates set to zero after Hartley and Rao 1967 convergence (individual analysis). NNML Maximum Likelihood: if negative estimates appeared at Hartley and Rao 1967; convergence, they were set to zero and the system re-solved Miller 1973 (individual analysis). REML Restricted Maximum Likelihood: estimates not restricted to the Patterson and PREML parameter space (individual and plot-mean analysis). Thompson 1971; Shaw 1987; Harville 1977 MODREML Restricted Maximum Likelihood: negative estimates set to zero Patterson and after convergence (individual analysis). Thompson 1971 NNREML Restricted Maximum Likelihood: if negative estimates appeared Patterson and PNNREML at convergence, they were set to zero and the system re-solved Thompson 1971; Miller (individual and plot-mean analysis). 1983 MIVQUE Minimum Variance Quadratic Unbiased: non-iterative with true Rao 1971b PMIVQUE parametricc) values of the variance components as priors (individual and plot-mean analysis). MINQUE1 Minimum Norm Quadratic Unbiased: non-iterative with ones as Rao 1971a PMINQUEI priors for all variance components (individual and plot-mean analysis). TYPE3 Sequentially Adjusted Sums of Squares; Henderson's Method 3 Henderson 1953 PTYPE3 (individual and plot-mean analysis). MIVPEN MIVQUE with a penalty algorithm to prevent negative estimates Harville 1977 (individual analysis). constraint, the experiment could not be run as a complete factorial and the investigation continued as a partial factorial. In general, the approach was to run levels which were at opposite ends of the imbalance spectrum, i.e., 80% survival and no missing crosses versus 60% survival and 5 missing crosses, within a variance component level. If results were consistent across these treatment combinations, intermediate levels were not run. 61 Designation of a treatment combination is by five character alpha-numeric field. The first character is either "H" (half-sib) or "D" (half-diallel). The second character denotes the set of parametric variance components where "1" designated the set of variance components associated with heritability of 0.1 and "2" designated the set of variance components associated with heritability of 0.25 (Table 4-1). The third character is an "S" indicating that the last two characters determine the imbalance level. The fourth character designates the survival level either "6" for 60% or "8" for 80%. The final character specifies the number of missing crosses (half- diallel) or lack of connectedness (half-sib). The treatment combination 'H1S80' is a half-sib mating design (H), the set of variance components associated with heritability equalling 0.1 (1), 80% survival (8), and 15 common parents across tests (0). Table 4-2. Sets of true variance components for the half-diallel and half-sib mating designs generated from specification of two levels of single-tree heritability (h2), type B correlation (rB), and non-additive to additive variance ratio (d/a). Genetic Ratios"* True Variance Components" SMating r. r, d/a is gn I e I aa 1 full-sib 1.0 0.5 0.25 0.25 0.25 0.25 .595 7.905 0.1 0.5 1.0 half-sib 1.0 0.5 0.25 NA 0.25 NA .475 7.9964 0.25 0.8 .25 full-sib 1.0 0.5 0.625 .1562 .1562 .0391 .5769 7.6649 a h = 4,2g / I2phnypc; re = 4o, / (402, + 4u2); and uD / 2A as d/a = 4o2 / 40,. b See definitions in equation 4-1. Experimental Design for Simulated Data The mating design for the simulation was either a six-parent half-diallel (no selfs) or a fifteen-parent half-sib. The randomized complete block field design was in three locations (i.e., separate field tests) with four complete blocks per location and six trees per family in a block; where family is a full-sib family for half-diallel or a half-sib family for the half-sib design. This 62 field design and the mating designs reflect typical designs in forestry applications (Squillace 1973, Wilcox et al. 1975, Bridgwater et al. 1983, Weir and Goddard 1986, Loo-Dinkins et al. 1991) and are also commonly used in other disciplines (Matzinger et al. 1959, Hallauer and Miranda 1981, Singh and Singh 1984). The six trees per family could be considered as contiguous or non-contiguous plots without affecting the results or inferences. Full-Sib Linear Model The scalar linear model employed for half-diallel individual observations is Yim = + ti + bj + gk + + Su + tga + tga + tSa+ + p + wju, 4-1 where yj1 is the mL observation of the klW cross in the jh block of the ih test; AL is the population mean; ti is the random variable test location ~ NID(0,,); byj is the random variable block ~ NID(O,,2b); gk is the random variable female general combining ability (gca) ~ NID(0,o2); g, is the random variable male gca NID(0,2); s, is the random variable specific combining ability (sca) ~ NID(0,o2,); tg,~ is the random variable test by female gca interaction ~ NID(O,es); tg, is the random variable test by male gca interaction ~ NID(0,o); ts, is the random variable test by sca interaction ~ NID(0,o,); pij is the random variable plot ~ NID(O0,p); wyjkl is the random variable within-plot NID(0,o2,); and there is no covariance between random variables in the model. This linear model in matrix notation is (dimensions below model component) y = Zl + ZTr + Z4eB + ZeG + Zss + ZeCG + Zse + Zpep + ew 4-2 nxl n nxt txl nxb bxl nggl nxs sxl nxtgtgxl nxts tsxl nxp pxl nxl where y is the observation vector; Z, is the portion of the design matrix for the ith random variable; e, is the vector of unobservable random effects for the it random variable; 1 is a vector of l's; and n, t, b, g, s, tg, ts, and p are the number of observations, tests, blocks, gca's, sca's, test by gca interactions, test by sea interactions and plots, respectively. Utilizing customary assumptions in half-diallel mating designs (Method 4, Griffing 1956), the variance of an individual observation is Var(yljm) = o2 + o2+ 2 + 0 2, + 2o+ 2 + o2, + 2p + o2w; 4-3 and in matrix notation the covariance matrix for the observations is var(y) = ZrZo, + zZo2b + ZG Zo2 + ZSZ 2. + z GG74G2, + ZrSZ 2, + ZP z2, + I.o2. 4-4 where indicates the transpose operator, all matrices of the form ZZ|' are nxn, and I, is an nxn identity matrix. Half-sib Linear Model The scalar linear model for half-sib individual observations is yij = V + ti + bi + gk + tg, + phi* + Whi, 4-5 where yi, is the mh observation of the kh half-sib family in the j'h block of the iL test; ,u, ti, b1,, gk, and tg, retain the definition in Eq.4-1; phj is the random variable plot containing different genotype by environment components than the corresponding term in Eq.4-1 NID(O,a2,); Whj, is the random variable within-plot containing different levels of genotypic and genotype by environment components than the corresponding term in Eq.4-1 ~ NID(O,c,); and there is no covariance between random variables in the model. The matrix notation model is (dimensions below model component) y = pl + Zrer + Ze, + ZGG + ZrTG + Zep + ew 4-6 Snxl nxt txl nxtt b bxl nxg gxl nxtg tgxl nxp pxl nx The variance of an individual observation in half-sib designs is Var(ygi = a2 + e + 0 + + 0 p+ 2w 4-7 and Var(y) = ZrZ;Wo + ZBZob + ZGZGa, + ZGZ'2tg + ZZp'2ph + Ia, 4-8 For an observational vector based on plot means, the plot and within-plot random variables were combined by taking the arithmetic mean across the observations within a plot. The resulting plot means model has a new o2 or o2p, (a. or fh.) term being a composite of the plot and within-plot variance terms of the individual observation model. Three estimates of ratios among variance components were determined: 1) single tree heritability adjusted for test location and block as f2 = 4g2 / 26,I where a2?,,, is the estimate of the variance of an individual observation from equations 4-3 and 4-7 with the variance components for test location and block deleted; 2) type B correlation as (i, = 4a2g / (4a2, + 4i2); and dominance to additive variance ratio as d/a = 42, / 42,. Data Generation and Deletion Data generation was accomplished by using a Cholesky upper-lower decomposition of the covariance matrix for the observations (Goodnight 1979) and a vector of pseudo-random standard normal deviates generated using the Box-Muller transformation with pseudo-random uniform deviates (Knuth 1981, Press et al. 1989). The upper-lower decomposition creates a matrix (U) with the property that Var(y) = U'U. The vector of pseudo-random standard normal deviates 65 (z) has a covariance matrix equal to an identity matrix (I) where n is the number of observations. The vector of observations is created as y = U'z. Then Var(y) = U'(Var(z))U and since Var(z) = I., Var(y) = U'IU = U'U. Analyses of survival patterns using data from the Cooperative Forest Genetic Research Program (CFGRP) at the University of Florida were used to develop survival distributions for the simulation. The data sets chosen for survival analysis were from full-sib slash pine (Pinus elliottii var elliottii Engelm) tests planted in randomized complete block designs with the families in row plots and were selected because the survival levels were either approximately 60% or 80%. Survival levels for most crosses (full-sib families) clustered around the expected value, i.e., approximately 60% for an average survival level of 60%; however, there were always a few crosses that had much poorer survival than average and also a small number of crosses that had much better survival than average. This survival pattern was consistent across the 50 experiments analyzed. Thus, a lower than average survival level was arbitrarily assigned to certain crosses, a higher than average survival level was assigned to certain crosses, and the average survival level assigned to most crosses. This modeling of survival pattern was also extended to the half- sib mating design. At 80% survival no missing plots were allowed and at 60% survival missing plots occurred at random. Full-sib family deletion simulated crosses which could not be made and were therefore missing from the experiment. When deleting five crosses, the deletion was restricted to a maximum of four crosses per parent to prevent loss of all the crosses in which a single parent appeared since this would have resulted in changing a six-parent to a five-parent half-diallel. Tests having only subsets of the half-sib families in common are a frequent occurrence in data analysis at CFGRP. This partial connectedness was simulated by generating data in which 66 only 10 of the 15 families present in a test were common to either one of the other two tests comprising a data set. Variance Component Estimation Techniques Two algorithms were utilized for all estimation techniques: sequentially adjusted sums of squares (Milliken and Johnson 1984, p 138) for HM3; and Giesbrecht's algorithm (Giesbrecht 1983) for REML, ML, MINQUE and MIVQUE. Giesbrecht's algorithm is primarily a gradient algorithm (the method of scoring), and as such allows negative estimates (Harville 1977, Giesbrecht 1983). Negative estimates are not a theoretical difficulty with MINQUE or MIVQUE; however, for REML and ML, estimates should be confined to the parameter space. For this reason estimators referred to as REML and ML in this chapter are not truly REML and ML when negative estimates occur; further, there is the possibility that the iterative solution stopped at a local maxima not the global maximum. These concerns are commonplace in REML and ML estimation (Corbeil and Searle 1976, Harville 1977, Swallow and Monahan 1984); however, ignoring these two points, these estimators are still referred to as REML and ML. The basic equation for variance component estimation under normality (Giesbrecht 1983) for MIVQUE, MINQUE and REML is {tr(QViQV4)}^2 = {y'QViQy} 4-9 rxr rxl rxl then = {tr(QVQVj)}-l{y'QViQy}; and for ML (tr(V-'VV-'Vj)}~i = {y'QVQy} 4-10 rxr rxl rxl where {tr(QViQVj)} is a matrix whose elements are tr(QViQVj) where in the full-sib designs i= 1 to 8 and j=l to 8, i.e., there is a row and column for every random variable in the linear model; tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V1 V'X(X'V'X)-X'V- for V as the covariance matrix of y and X as the design matrix for fixed effects; V, = ZZ'i where i = the random variables test, block, etc.; y is the vector of variance component estimates; and r is the number of random variables in the model. The MINQUE estimator used was MINQUE1 i.e., ones as priors for all variance components; calculated by applying Giesbrecht's algorithm non-iteratively. MINQUE1 was chosen because of results demonstrating MINQUEO (prior of 1 for the error term and of 0 for all others) to be an inferior estimation technique for many cases (Swallow and Monahan 1984, R.C. Littell unpublished data). With normally-distributed uncorrelated random variables, the use of the true values of the variance components as priors in a non-iterative application of Giesbrecht's algorithm produced the MIVQUE solutions (equation 4-5). Obtaining true MIVQUE estimation is a luxury of computer simulation and would not be possible in practice since the true variance components are required (Swallow and Searle 1978). This estimator was included to provide a standard of comparison for other estimators.. An additional MIVQUE-type estimator, referred to as MIVPEN, was also included. MIVPEN was also a non-iterative application of the algorithm with the true variance components as priors; however, this estimator was conditioned on the variance component parameter space and did not allow negative estimates. The non-negative conditioning of MIVPEN was accomplished by adding a penalty algorithm to MIVQUE such that no variance component was allowed to be less than 1x10l7. Estimates from MIVPEN were equal to MIVQUE for data sets for which there were no negative MIVQUE variance component estimates. When negative MIVQUE estimates occur the two techniques were no longer equivalent. The penalty 68 algorithm operated by using A = a2 and by choosing a scalar weight w such that no element of W,, is less than lx10-7. Then ~Z = o + wA, where A is the vector of departure from the true values (a2), 1x107 is an arbitrary constant and ,~, is the vector of estimated variance components conditioned on non-negativity. REML estimates were from repeated application of Giesbrecht's algorithm (equation 4-9) in which the estimates from the kh iteration become the priors for the k+ 1l iteration. The iterations were stopped when the difference between the estimates from the k* and k+1* iterations met the convergence criterion; then the estimates of the k + 1 iteration became the REML estimates. The convergence criterion utilized was E=, I o2ik) 2i+1) < 1x104. This criterion imposed convergence to the fourth decimal place for all variance components. Since for this experimental workload it was desired that the simulation run with little analyst intervention and in as few iterations as possible, the robustness of REML solutions obtained from Giesbrecht's algorithm to priors (or starting points) was explored. The difference in solutions starting from two distinct points (a vector of ones and the true values) was compared over 2000 data sets of different structures (imbalance, true variance components, and field design). The results (agreeing with those of Swallow and Monahan 1984) indicated that the difference between the two solutions was entirely dependent on the stringency of the convergence criterion and not on the starting point (priors). Also the number of iterations required for convergence was greatly decreased by using the true values as priors. Thus, all REML estimates were calculated starting with the true values as priors. Three alternatives for coping with negative estimates after convergence were used for REML solutions: accept and use the negative estimates (Shaw 1987), arbitrarily set negative estimates to zero, and re-solve the system setting negative estimates to zero (Miller 1973). The first two alternatives are self-explanatory and the latter is accomplished by re-analyzing those data 69 sets in which the initial unrestricted REML estimates included one or more negative estimates. During re-analysis if a variance component became negative, it was set to zero (could never be any value other than zero) and the iterations continued. This procedure persisted until the convergence criterion was met with a solution in which all variance components were either positive or zero. Harville (1977) suggested several adaptations of Henderson's mixed model equations (Henderson et al. 1959) which do not allow variance component estimates to become negative; however, the estimates can become arbitrarily close to zero. After trial of these techniques versus the set the negative estimates to zero after convergence and re-solve the system approach, comparison of results using the same data sets indicates that there is little practical advantage (although more desirable theoretically) in using the approach suggested by Harville. The differences between sets of estimates obtained by the two methods are extremely minor (solving the system with a variance component set to zero versus arbitrarily close to zero). ML solutions, as iterative applications of equation 4-6, were calculated from the same starting points and with the same convergence criterion as REML solutions. The three negative variance component alternatives explored for ML were to accept and use the negative estimates, to arbitrarily set negative estimates to zero after converging to a solution for the former, and (for half-sib data only) to re-solve the system setting negative variance components to zero. The algorithm to calculate solutions for HM3 (sequentially adjusted sums of squares) was based on the upper triangular G2 sweep (Goodnight 1979) and Hartley's method of synthesis (Hartley 1967). The equation solved was E{MS}) = MS where MS is the vector of mean squares and E{MS} is their expectation. The alternative used for negative estimates was to accept and use the negative estimates. Comparison Among Estimation Techniques For the simulation MIVQUE estimates were the basis for all comparisons because MIVQUE is by definition the minimum variance quadratic unbiased estimator. The results of comparing the mean of 1000 MIVQUE estimates for an experimental level to the means for other techniques were termed "apparent bias". "Apparent bias" denotes that 1000 data sets were not sufficient to achieve complete convergence to the true values of the variance components. Sampling variances of estimation were calculated from the 1000 observations within an experimental level and estimation technique for variance components and genetic ratios (single tree heritability, Type B correlation and dominance to additive variance ratio). Mean square error then equalled variance plus squared "apparent bias". While mean square error was investigated, there was never sufficient bias for mean square error to lead to a different decision concerning techniques than sampling variance of the estimates; so mean square error was deleted from the remainder of this discussion. Probability of nearness is the probability that an estimate will lie within a certain interval around the true parameter. The three total interval widths utilized were one-half, equal to, and twice the parameter size. The percentage of 1000 estimates falling within these intervals were calculated for the different estimation techniques within an experimental level for variance components and ratios and utilized as an estimate of probability of nearness. Results are presented by variance component or genetic ratio estimated as a percentage of MIVQUE (except in the case of probability of nearness). MIVQUE estimates represent 100% with estimates with greater variance having values larger than 100% and "apparently biased" estimates having values different from 100%. The percentages were calculated as equal to 100 times the estimate divided by the MIVQUE value. For the criterion of variance, the lower the 71 percentage the better the estimator performed; for bias, values equalling 100% (0 bias) are preferred; and for probability of nearness, larger percentages (probabilities) are favored since they are indicative of greater density of estimates near the parametric value. Results and Discussion Variance Components Sampling variance of the estimators For all variance components estimated, REML and ML estimation techniques were consistently equal to or less than MIVQUE for sampling variance of the estimator (Table 4-3). The variance among estimates from these techniques was further reduced by setting the negative components to zero (MODML and MODREML) or setting negative estimates to zero plus re- solving the system (NNREML, NNML, and PNNREML). Variance among MINQUE1 estimates is always equal to or greater than for MIVQUE, as one might expect, since they are, in this application, the same technique with MIVQUE having perfect priors (the true values). Variances for HM3 estimators (TYPE3 and PTYPE3) are either equal to or greater than MIVQUE (HM3 estimates have progressively larger relative variance with higher levels of imbalance. MIVPEN, although impractical because of the need for the true priors, had much more precise estimates of variance components than other techniques illustrating what could be accomplished given the true values as priors plus maintaining estimates within the parameter space. In general, the spread among the percentages for variance of estimation for the estimation techniques is highly dependent on the degree of imbalance and the type of mating system. With increasing imbalance the likelihood-based estimators realized greater advantage for sampling variance of the estimates over HM3 for both mating systems. The most advantageous application Table 4-3. Sampling variance for the estimates of a2, (upper number), oa2 (second number), and h2 (third number where calculated) as a percentage of the MIVQUE estimate by type of estimator and treatment combination; NA is not applied. Values greater than 100 indicate larger variance among 1000 estimates. Estimator II S80 DIS65 D2S65 H1S80 H1S65 REML 99.9 102.6 101.5 99.6 106.3 100.2 100.0 104.1 99.7 98.0 100.0 101.0 101.4 99.6 105.8 ML 77.3 78.2 76.4 95.9 103.9 106.9 104.8 110.7 100.8 99.1 82.5 82.9 86.4 96.2 103.8 MINQUEI 100.0 104.2 104.0 104.0 146.7 101.2 118.8 123.6 112.5 139.7 100.3 105.8 103.9 104.0 145.8 NNREML 80.8 71.6 95.2 88.0 68.6 67.9 48.3 54.9 78.7 48.6 76.8 64.2 92.2 87.3 67.7 NNML NA NA NA 83.3 65.3 79.4 48.9 83.1 64.7 MODML 58.2 50.0 69.5 84.7 74.6 12.8 81.4 81.6 86.6 68.5 58.1 46.1 72.0 83.8 71.4 MODREML 81.5 74.5 96.1 88.9 78.1 89.1 74.0 73.7 85.4 66.9 76.4 63.5 88.9 87.7 74.3 TYPE3 101.0 101.0 105.5 100.6 121.0 101.1 101.0 115.5 100.9 125.6 100.5 108.4 102.9 100.4 121.6 PREML 100.3 106.3 101.7 107.5 146.9 102.7 113.5 119.8 122.0 150.7 PML 77.6 81.9 77.1 103.6 143.4 109.7 117.3 127.2 123.3 151.9 PMINQUEI 100.3 107.6 105.4 107.5 179.3 102.7 129.0 137.3 122.0 180.6 PNNREML 80.9 71.1 93.9 92.7 86.6 69.8 53.2 60.5 94.0 68.1 PTYPE3 100.3 106.6 105.4 107.5 168.1 102.7 124.7 133.3 122.0 184.9 100.6 110.8 104.1 106.9 168.0 MIVPEN NA 36.2 29.1 80.0 45.6 26.6 20.0 74.3 39.6 34.7 30.2 79.8 45.4 PMIVQUE 100.3 104.2 102.4 107.5 146.9 102.7 114.4 117.8 122.0 150.7 73 of likelihood-based estimators is in the H1S65 case where the imbalance is not only random deletions of individuals but also incomplete connectedness across locations, i.e. the same families are not present in each test (akin to incomplete blocks within a test). An analysis of variance was conducted to determine the importance of the treatment of negative variance component estimates in the variance of estimation for REML and ML estimates. The model of sampling variance of the estimates as a result of mating design, imbalance level, treatment of negative estimates and size of the variance component demonstrated consistently (for all variance components except error) that treatment of negative estimates is an important component of the variance of the estimates (p < .05). The model accounted for up to 99% of the variation in the variance of the variance component estimates with 1) accepting and using negative estimates producing the highest variance; 2) setting the negative components to zero being intermediate; and 3) re-solving the system with negative estimates set to zero providing the lowest variance. For all estimation techniques, lower variance among estimates was obtained by using individual observations as compared to plot means. The advantage of individual over plot-mean observations increased with increasing imbalance. Bias The most consistent performance for bias (Table 4-4) across all variance components was TYPE3 known from inherent properties to be unbiased. The consistent convergence of the TYPE3 value to the MIVQUE value indicated that the number of data sets used (1000 per technique and experimental level) was suitable for the purpose of examining bias. The other two consistent performers were REML and MINQUE1. PTYPE3 (HM3 based on plot means) was unbiased when no plot means were missing, but produced "apparently biased" estimates when plot means were missing. Table 4-4. Bias for the estimates of o2 (upper number), o2' (second number), and h2 (third number where calculated) as a percentage of the MIVQUE estimate by type of estimator and experimental combination; NA is not applied. Values different from 100 denote "apparent" bias. Estimator DIS80 DIS65 D2S65 H1S80 HIS65 REML 99.9 101.5 98.7 99.9 102.8 99.9 102.2 99.8 99.9 98.9 99.9 101.3 98.6 99.9 102.6 ML 74.6 61.6 76.0 96.2 98.2 106.5 114.6 109.7 101.3 101.8 75.5 61.8 77.9 96.3 98.2 MINQUE 99.7 96.4 99.0 99.4 102.0 100.1 100.8 101.3 100.8 98.3 99.7 96.6 98.9 99.4 101.3 NNREML 107.9 116.5 98.1 101.9 107.8 93.1 92.9 92.9 100.5 102.3 108.7 118.4 98.2 102.2 107.7 NNML NA NA NA 101.9 107.8 100.5 102.3 98.2 103.8 MODML 86.6 90.4 79.0 98.1 114.1 109.9 129.9 127.4 101.3 122.9 87.8 91.5 79.4 99.6 112.6 MODREML 109.5 124.2 100.6 103.1 117.8 103.7 119.8 119.2 104.6 120.6 109.5 123.2 98.4 102.9 116.2 TYPE3 100.1 99.4 99.6 100.2 99.6 100.2 101.0 102.4 100.2 100.9 100.0 99.5 99.3 100.2 99.7 PREML 99.7 98.7 97.7 99.5 110.6 100.1 103.6 100.2 102.4 98.3 PML 74.2 58.5 73.6 95.9 105.2 106.9 116.2 111.5 103.2 102.0 PMINQUE 99.7 95.2 98.8 99.5 106.5 100.1 102.1 102.9 102.4 114.8 PNNREML 107.9 114.5 96.7 101.8 115.6 92.9 94.0 95.0 104.5 110.2 PTYPE3 99.7 96.8 99.0 99.5 104.5 100.1 97.2 96.0 102.4 108.7 99.8 98.0 98.8 99.6 104.1 MIVPEN NA 107.5 98.6 102.0 103.2 99.0 91.7 101.4 105.1 112.6 103.9 102.1 103.4 PMIVQUE 99.7 97.4 99.2 99.5 106.8 100.1 101.7 100.5 102.4 98.8 Table 4-5. Probability of nearness for o2 (upper number), o2, (second number), and h2 (third number where calculated). The probability interval is equal to the magnitude of the parameter. Estimator DI D1S65 D2S65 H1S80 H1S65 REML 32.8 24.3 41.8 45.3 28.6 43.0 26.2 25.7 36.6 27.1 34.2 25.3 45.4 45.0 28.3 ML 33.6 22.3 40.7 45.4 29.2 42.9 26.4 24.8 36.2 26.7 34.6 22.3 45.0 45.7 28.2 MINQUE 32.6 24.6 41.0 45.1 26.1 43.1 24.3 25.4 34.2 23.2 33.7 25.0 44.6 44.7 25.6 NNREML 33.4 23.4 41.7 45.1 29.3 44.9 28.1 25.6 38.0 28.9 34.3 24.3 46.1 45.2 29.5 NNML NA NA NA 45.9 29.7 37.9 29.1 46.0 29.0 TYPE3 34.0 23.2 42.5 45.3 27.1 42.6 27.1 24.8 37.3 25.0 35.3 23.8 45.8 45.9 27.3 PREML 32.1 20.0 41.6 43.7 24.6 42.7 26.8 24.6 32.3 20.4 PML 33.5 19.8 39.7 44.0 24.4 41.0 26.3 23.6 31.6 21.1 PMINQUE 32.1 21.4 40.4 43.7 24.5 42.7 24.8 23.1 32.3 21.9 PNNREML 31.9 19.2 41.0 43.4 26.0 43.3 28.0 23.3 33.1 21.3 PTYPE3 32.1 23.3 41.7 43.7 25.2 42.7 25.4 24.1 32.3 22.4 32.6 24.1 46.0 44.6 24.6 MIVQUE 33.6 25.7 43.7 45.1 29.2 42.9 28.6 26.4 36.9 26.3 34.8 26.8 47.7 45.4 29.4 MIVPEN NA 41.1 78.5 48.4 35.6 47.0 60.3 39.2 31.2 42.4 80.5 48.7 35.3 PMIVQUE 32.1 20.0 41.8 43.7 25.9 42.7 28.5 26.8 32.3 20.8 76 Among estimators which displayed bias, maximum likelihood estimators (ML and PML) were known to be inherently biased (Harville 1977, Searle 1987) with the amount of bias proportional to the number of degrees of freedom for a factor versus the number of levels for the factor. Other biases resulted from the method of dealing with negative estimates. Living with negative estimates produced the estimators with the least bias. Setting negative variance components to zero resulted in the greatest bias. Intermediate in bias were the estimates resulting from re-solving the system with negative components set to zero. Probability of nearness Results for probability of nearness proved to be largely non-discriminatory among techniques (Table 4-5). The low levels of probability density near the parametric values are indicative of the nature of the variance component estimation problem. Figure 4-1 illustrates the distribution of MIVQUE variance component estimates for h2 (4-la) and a2g (4-lb) for level D1S80. The distributions for all unconstrained variance component estimates have the appearance of a chi-square distribution, positively skewed with the expected value (mean) occurring to the right of the peak probability density and a proportion of the estimates occurring below zero (except error). With increasing imbalance, the variance among estimates increases and the probability of nearness decreases for all interval widths. Ratios of Variance Components Single tree heritability Results for estimates of single tree heritability adjusted for locations and blocks are shown in Tables 4-3 and 4-4 (third number from the top in each cell, if calculated). For these relatively low heritabilities (0.1 and 0.25), the bias and variance properties of the estimated ratio are similar to those for acg estimates (Figure 4-1). This implies that knowing the properties of the numerator 4-la. h2 w ,---------- -.25 -.10 0.0 .10 - 15 - -I 101 - .6 -0.5 -.5 0.0 U .2 1.5 2. 0.6 -0.625 -.250.0 .25 .625 1.0 1.5 2.0 MIVQUE ESTIMATES 1000 DATA SETS Figure 4-1. Distribution of 1000 MIVQUE estimates ofh2 (4-la) and o02 (4-1b) for experimental level D1S80 illustrating the positive skew and similarity of the distributions. The true values are .1 for h2 and .25 for oa,. The interval width of the bars is one-half the parametric value. n .25 0.4 4-lb. oa i- . --- 78 of heritability reveals the properties of the ratio (especially true of ratios with expected values of 0.1 and 0.25, Kendall and Stuart 1963, Ch. 10). Variance component estimation techniques which performed well for bias and/or variance among estimates for oa2 also performed well for h2. Type B correlation and dominance to additive variance ratio Type B correlation (Table 4-3 and 4-4 as ol) and dominance to additive variance ratio (not shown) estimates both proved to be too unstable (extremely large variance among estimates) in their original formulations to be useful in discrimination among variance component estimation techniques. This high variance is due to the estimates of the denominators of these ratios approaching zero and to the high variance of the denominator of ratios (Table 4-2). These ratios were reformulated with numerators of interest (4a' for additive genetic by test interaction and 4o2, for dominance variance, respectively) and a denominator equal to the estimate of the phenotypic variance. With this reformulation the variance and bias properties of estimates of the altered ratios is approximated by the properties of estimates of the numerators. For increasing imbalance maximum-likelihood-based estimation offers an increasing advantage over HM3, and for all techniques individual observations offer increasing advantage over plot-mean observations for variance of the estimates of these ratios. Bias, other than inherently biased methods (ML), is associated with the probability of negative estimates which is increased by increasing imbalance. This assertion is supported by comparing the biases of REML, NNREML, and MODREML estimates across imbalance levels. General Discussion Observational Unit Some general conclusions regarding the choice of a variance component estimation methodology can be drawn from the results of this investigation. For any degree of imbalance the use of individual observations is superior to the use of plot means for estimation of variance component or ratios of variance components. If the data are nearly balanced (close to 100% survival with no missing plots, crosses (full-sib) or lack of connectedness (half-sib)), the properties of the estimation techniques based on individual and plot-mean observations become similar; so if departure from balance is nominal, plot means can be used effectively. However, using individual observations obviates the need for a survey of imbalance in the data since individual observations produce better results than plot means for any of the estimation techniques examined. Negative Estimates Drawing on the results of this investigation, the discussion of practical solutions for the negative estimates problem will revolve around two solutions: 1) accept and use the negative estimates; and 2) re-solving the system with negative estimates set to zero. Given that the property of interest is the true value of a variance component or genetic ratio, often estimated as a mean across data sets, then negativity constraints come into play if the component of interest is small in comparison to other underlying variance components in the data, or the variance of estimates is high due to an inadequate experimental design for variance component estimation. These factors lead to an increased number of negative estimates. If the data structure is such that negative estimates would occur frequently, then accepting negative estimates is a good alternative. 80 If negative estimates tend to occur infrequently or bias is of less concern than variance among estimates, then re-solving the system after convergence yields negative estimates is the preferable solution. This tactic reduces both bias and variance among estimates below that of arbitrarily setting negative estimates to zero. Estimation Technique The primary competitors among estimation techniques that are practically achievable are REML and TYPE3 (HM3). Both techniques produce estimates with little or no bias; however, REML estimates for the most part have slightly less sampling variance than TYPE3 estimates. If only subsets of the parents are in common across tests as in the case H1S65, REML has a distinct advantage in variance among estimates over TYPE3. REML does have three additional advantages over TYPE3 which are 1) REML offers generalized least squares estimation of fixed effects while TYPE3 offers ordinary least squares estimation; 2) Best Linear Unbiased Predictions (BLUP) of random variables are inherent in REML solutions, i.e., gca predictions are available; and thus in solving for the variance components with REML, fixed effects are estimated and random variables are predicted simultaneously (Harville 1977); and 3) REML offers greater flexibility in the model specification both in univariate and multivariate forms as well as heterogeneous or correlated error terms. Further, although the likelihood equations for common REML applications are based on normality, the technique has been shown to be robust against the underlying distribution (Westfall 1987, Banks et al. 1985). Recommendation If one were to choose a single variance component estimation technique from among those tested which could be applied to any data set with confidence that the estimates had desirable properties (variance, MSE, and bias), that technique would be REML and the basic unit of observation would be the individual. This combination (REML plus individual observations) performed well across mating design and types and levels of imbalance. Treatment of negative estimates would be determined by the proposed use of the estimates that is whether unbiasedness (accepting and using the negative estimates) is more important than sampling variance (re-solve the system setting negative estimates to zero). A primary disadvantage of REML and individual observations is that they are both computationally expensive (computer memory and time). HM3 estimation could replace REML on many data sets and plot means could replace individual observations on some data sets; but general application of these without regard to the data at hand does result in a loss in desirable properties of the estimates in many instances. The computational expense of REML and individual observations ensures that estimates have desirable properties for a broad scope of applications. With the advent of bigger and faster computers and the evolution of better REML algorithms, what was not feasible in the past on most mainframe computers can now be accomplished on personal computers. CHAPTER 5 GAREML: A COMPUTER ALGORITHM FOR ESTIMATING VARIANCE COMPONENTS AND PREDICTING GENETIC VALUES Introduction The computer program described in this chapter, called GAREML for Giesbrecht's algorithm of restricted maximum likelihood estimation (REML), is useful for both estimating variance components and predicting genetic values. GAREML applies the methodology of Giesbrecht (1983) to the problems of REML estimation (Patterson and Thompson 1971) and best linear unbiased prediction (BLUP, Henderson 1973) for univariate (single trait) genetics models. GAREML can be applied to half-sib (open-pollinated or polymix) and full-sib (partial diallels, factorials, half-diallels [no selfs] or disconnected sets of half-diallels) mating designs when planted in single or multiple locations with single or multiple replications per location. When used for variance component estimation, this program has been shown to provide estimates with desirable properties across types of imbalance commonly encountered in forest genetics field tests (Huber et al. in press) and with varying underlying distributions (Banks et al. 1985, Westfall 1987). GAREML is also useful for determining efficiencies of alternative field and mating designs for the estimation of variance components. Utilizing the power of mixed-model methodology (Henderson 1984), GAREML provides BLUP of parental general (gca) and specific combining abilities (sca) as well as generalized least squares (GLS) solutions for fixed effects. The application of BLUP to forest genetics problems has been addressed by White and Hodge (1988, 1989). With certain assumptions, the desirable 83 properties of BLUP predictions include maximizing the probability of obtaining correct parental rankings from the data and minimizing the error associated with using the parental values obtained in future applications. GLS fixed effect estimation weights the observations comprising the estimates by their associated variances approximating best linear unbiased estimation (BLUE) for fixed effects (Searle 1987, p 489-490). The purpose of this chapter is to describe the theory and use of GAREML in enough detail to facilitate use by other investigators. The program is written in FORTRAN and is not dependent on other analysis programs. An interactive version of this program can be obtained as a stand-alone executable file from the senior author; this file will run on any IBM compatible PC under DOS or WINDOWS2 operating systems. The size of the problem an investigator can solve will be dependent on the amount of extended memory and hard disk space (for swap files) available for program use. In addition, the FORTRAN source code can be obtained for analysts wishing to compile the program for use on alternate systems (e.g. mainframe computers). Algorithm GAREML proceeds by reading the data and forming a design matrix based on the number of levels of factors in the model. Any portions of the design matrix for nested factors or interactions are formed by horizontal direct product. Columns of zeroes in the design matrix (the result of imbalance) are then deleted. The design matrix columns are in an order specified by Giesbrecht's algorithm: columns for fixed effects are first, followed by the data vector, and the last section of the matrix is for random effects. The design matrix is the only fully formed matrix in the program. All other matrices are symmetric; therefore, to save computational space 2Windows is the trademark of the Microsoft Corporation, Redmond, WA. 84 and time, only the diagonal and the above diagonal portions of matrices are formed and utilized (i.e., half-stored). A half-stored matrix of the dot products of the design columns is formed and either kept in common memory or stored in temporary disk space so that the matrix is available for recall in the iterative solution process. The algorithm proceeds by modifying the matrix of dot products such that the inverse of the covariance matrix for the observations (V) is enclosed by the column specifiers in the dot products as X'X becoming X'V'X. This transfer is completed without inversion of the total V matrix. The identity used to accomplish this transfer is if Vh = hZhZl' + V,+1) where Vh is nonsingular; then Vh = V'(+l) ahV-'(+l)Zh(Ih + CahZh'V-'lh+I)Zh) Vl(h+l). 5-1 A compact form of equation 5-1 is obtained by pre-multiplying by Z,' and post-multiplying by Zj where h = 1, k-l (k = the total number of random factors), at is the prior associated with random variable h, Vk = akI, V, = V and Z, is the portion of the design matrix for random variable i (Giesbrecht 1983). A partitioned matrix is formed in order to update V1',+1) until V,1' or V is obtained. This matrix is of the form: S h + ahZaV -+1)'Z, V-hZa,'V +,)'(X Y I Z ,1 ... I Zk1) V. (X I y I ZI... I Z-i)Vh+ )-'Z( T , where Tk-, = (XI y I Z ... I Zk)'Vk.,-'(X I y Z, ... IZk). The sweep operator of Goodnight (1979) is applied to the upper left partition of the matrix (equation 5-2) and the result of equation 5-1 is obtained. The matrix is sequentially updated and swept until T, = (XIy Z, I...| Zk-,)'V'(X y IZ, I... I Zk.) is obtained. T, is then swept on the columns for fixed effects (X'V'X). This sweep operation produces generalized least squares estimates for fixed effects, results which can be scaled into predictions of random variables, the residual sum of squares and all the necessary ingredients for assembling the 85 equation to solve for the variance components. The equation to be solved for the variance components is {tr(QVQVj)}f2 = {y'QV,Qy} rxr rxI rxl then ( = {tr(QVQVj)}1{y'QVQy}; 5-3 where {tr(QVQV)} is a matrix whose elements are tr(QVQVj) where i= 1 to r and j=l to r, i.e., there is a row and column for every random variable in the linear model; tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V- V -X(X'V-'X)X'V" for V as the covariance matrix of y and X as the design matrix for fixed effects; V, = Z1Z', where the i's are the random variables; Z is the vector of variance component estimates; and r is the number of random variables in the model (k-1). The entire procedure from forming T, to solving for the variance components continues until the variance component estimates from the last iteration are no more different from the estimates of the previous iteration than the convergence criterion specifies. The fixed effect estimates and predictions of random variables are then those of the final iteration. The asymptotic covariance matrix for the variance components is obtained as Var(2) = 2{tr(QV,QVj)}- 54 by utilizing intermediate results from the solution for the variance components. The coefficient matrix of Henderson's mixed model equations is formed in order to calculate the covariance matrix for fixed and random effects. The covariance matrix for 86 observations is constructed using the variance components estimates from Giesbrecht's algorithm. The coefficient matrix is S X'R''X X'R'Z ]5-5 Z'R-IX Z'R-'Z + D" where R is the error covariance matrix which in this application is lo2, where a2, is the variance of random variable w (equation 5-6 and 5-7); X is the fixed effects design matrix; Z is the random effects design matrix; and D is the covariance matrix for the random variables which, in this application, has variance components on the diagonal and zeroes on the off-diagonal (no covariance among random variables). The generalized inverse of the matrix (equation 5-5) is the error covariance matrix of the fixed effect estimates and random predictions assuming the covariance matrix for observation is known without error. Operating GAREML While GAREML will run in either batch or interactive mode, we focus on the interactive PC-version which begins by prompting the analyst to answer questions determining the factors to be read from the data. Specifically, the analyst answers yes or no to these questions: 1) are there multiple locations? 2) are there multiple blocks? 3) are there disconnected sets of full-sibs? i.e., usually referring to disconnected half-diallels and 4) is the mating design half-sib or full-sib? The program then determines the proper variables to read from the data as well as the most complicated (number of main factors plus interactions) scalar linear model allowed. The most complicated linear model allowed for full-sib observations is 87 yiu = + t+ b + set, + gk + S, + tg + tg, + ts, + pi3j + Wi. 5-6 where y, is the mm observation of the kld cross in the jI block of the it test; pt is the population mean; ti is the random or fixed variable test environment; bu is the random or fixed variable block; set, is the random or fixed variable set, i.e., a variable is created so that disconnected sets of half-diallels planted in the same experiment can be analyzed in the same run or to analyze provenances and families within provenance where provenance equals set; sets are assumed to be across test environments and blocks with families nested within sets and interactions with set are assumed unimportant. gk is the random variable female general combining ability (gca); g, is the random variable male gca; s, is the random variable specific combining ability (sca); tgg is the random variable test by female gca interaction; tgo is the random variable test by male gca interaction; tsj is the random variable test by sea interaction; pi is the random variable plot; w1i is the random variable within-plot; and there is no covariance between random variables in the model. The assumptions utilized are the variance for female and male random variables are equal (~2 = o = o2); and female and male environmental interactions are the same (oa2 = 0 = 02). The most complicated scalar linear model allowed for half-sib observations is yjhn = M + t. + bj + set, + gk + tg, + ph + Whi, 5-7 where yi~ is the mL observation of the kL half-sib family in the jh block of the i test; /, t., bj, set., gk, and tga retain the definition in the full-sib equation; ph\ is the random variable plot containing different genotype by environment components than the full-sib model; whig is the random variable within-plot containing different levels of genotypic and genotype by environment components than the full-sib model; and there is no covariance between random variables in the model. The analyst builds the linear model by answering further prompts. If test, block and/or set are in the model, they must be declared as fixed or random effects. When any of the three effects is declared random, the analyst must furnish prior values for the variance. If no prior value is known, 1.0's may be used as priors. Using 1.0's as priors will not affect the values for resulting variance component estimates within the constraints of the convergence criterion; but there may be a time penalty due to increasing the number of iterations required for convergence. All remaining factors in the model are treated as random variables. To complete the definition of the model, the analyst chooses to include or exclude each possible factor by answering yes or no when prompted. After each yes answer, the program asks for a prior value for the variance. Again, if no known priors exist, 1.0's may be substituted. After the model has been specified, the program counts the number of fixed effects and the number of random effects and asks if the number fits the model expected. A "yes" answer proceeds through the program while a "no" returns the program to the beginning. GAREML is now ready to read the data file (which must be an ASCII data file) in this order: test, block, set, female, male, and the response variable. The analyst is prompted to furnish a proper FORTRAN format statement for the data. Test, block, set, female and male are read as character variables (A fields) with as many as eight characters per field, while the data 89 vector (response variable) is read as a double precision variable (F field). An example of a format statement for a full-sib mating design across locations and blocks is "(4A8,F10.5)" which reads four character variables sequentially occupying 8 columns each and the response variable beginning in column 33 and ending in column 42 having five decimal places. After reading the data, GAREML begins to furnish information to the analyst. This information should be scanned to make sure the data read are correct. This information includes the number of parents, the number of full-sib crosses, the number of observations, the maximum number of fixed effect design matrix columns, and the maximum number of random effect design matrix columns. If there is an error at this point, use CTRL-BRK to exit the program. Probable causes of errors are the data are not in the format specified, missing values are included, blank lines or other similar errors are in the data file, or the model was not correctly specified. At this point, there are three other prompts concerning the data analysis (number of iterations, convergence criterion and treatment of negative variance components). The number of iterations is arbitrarily set to 30 and can be changed at the analyst's discretion. No warning is issued that the maximum number of iterations has been reached; however, the current iteration number and variance component estimates are output to the screen at the beginning of each iteration. The convergence criterion used is the sum of the absolute values of the difference between variance component estimates for consecutive iterations. The criterion has been set to lx104 meaning that convergence is required to the fourth decimal place for all variance components. The convergence criterion should be modified to suit the magnitude of the variances under consideration as well as the practical need for enhanced resolution. Enhanced resolution is obtained at the cost of increasing the number of iterations to convergence. The analyst must decide whether to accept and use negative estimates or to set negative estimates to zero and re-solve the system. The latter solution results in variance component 90 estimates with lower sampling variance and slight bias. If one is interested in unbiased estimates of variance components that have a high probability of negative estimates, then accepting and using the negative estimates may be the proper course to take. Interpreting GAREML Output Analysis is now underway. The priors for each iteration and the iteration number are printed out to the screen. GAREML continues to iterate until the convergence criterion is met or the maximum number of iterations is reached. The next time that analyst intervention is required is to provide a name for the output file for variance component estimates. The file name follows normal DOS file naming protocol; however, alternative directories may not be specified, i.e., all outputs will be found in the same directory as the data file. The program will now quiz the analyst to determine if additional outputs are desired. These additional outputs are gca predictions, sea predictions (if applicable), the asymptotic covariance matrix for the variance components, generalized least squares fixed effect estimates, error covariance matrix of the gca predictions and error covariance matrix for fixed effects. An answer of yes to the inclusion of an output will result in a prompting for a file name. In addition, for gca and sea predictions the analyst may input a different value for 2v. or &2, with which to scale predictions. The discussion which follows furnishes more detailed information concerning GAREML outputs. Variance Component Estimates Ignoring concerns about convergence to a global maximum and negative values, variance component estimates are restricted maximum likelihood estimates of Patterson and Thompson (1971). The estimates are robust against starting values (priors), i.e., the same estimates, within the limits of the convergence criterion, can be obtained from diverse priors. However, priors 91 close to the true values will, in general, reduce the number of iterations required to reach convergence. The value of the convergence criterion must be less than or equal to the desired precision for the variance components. REML variance component estimates from this program have been shown to have more desirable properties (variance and bias) than other commonly used estimation techniques (maximum likelihood, minimum norm quadratic unbiased estimation and Henderson's Method 3) over a wide range of data imbalance. The properties of the estimates are further enhanced by using individual observations as data rather than plot means. The output is labelled by the variance component estimated. Predictions of Random Variables The predictions output are for general and specific combining abilities and approximate best linear unbiased predictions (BLUP) of the random variables. BLUP predictions have several optimal properties: 1) the correlation between the predicted and true values is maximized; 2) if the distribution is multivariate normal then BLUP maximizes the probability of obtaining the correct rankings (Henderson 1973) and so maximizes the probability of selecting the best candidate from any pair of candidates (Henderson 1977). Predictions are of the form: 6 = DZ'V-'(y-X6) 5-8 where ii is the vector of predictions; I is the estimated covariance matrix for random variables from the REML variance component estimates, see equation 5-5; Z' is the transpose of the design matrix for random variables; y is the data vector; X is the design matrix for fixed effects; |

Full Text |

PAGE 2 X 237,0$/ 0$7,1* '(6,*16 $1' 237,0$/ 7(&+1,48(6 )25 $1$/<6,6 2) 48$17,7$7,9( 75$,76 ,1 )25(67 *(1(7,&6 %\ '8'/(< $59/( +8%(5 ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ PAGE 3 $&.12:/('*(0(176 H[SUHVV P\ JUDWLWXGH WR 'UV 7 / :KLWH 5 +RGJH 5 & /LWWHOO 0 $ 'H/RUHQ]R DQG / 5RFNZRRG IRU WKHLU WLPH DQG HIIRUW LQ WKH SXUVXLW RI WKLV ZRUN 7KHLU JXLGDQFH DQG ZLVGRP SURYHG LQYDOXDEOH WR WKH FRPSOHWLRQ RI WKLV SURMHFW IXUWKHU DFNQRZOHGJH 'U %UXFH %RQJDUWHQ IRU KLV HQFRXUDJHPHQW WR FRQWLQXH P\ DFDGHPLF FDUHHU DP JUDWHIXO WR 'U 7 / :KLWH DQG WKH 6FKRRO RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ DW WKH 8QLYHUVLW\ RI )ORULGD IRU IXQGLQJ WKLV ZRUN H[WHQG VSHFLDO WKDQNV WR *HRUJH %U\DQ DQG 'U 0 $ 'H/RUHQ]R RI WKH 'DLU\ 6FLHQFH 'HSDUWPHQW DQG *UHJ 3RZHOO RI WKH 6FKRRO RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ IRU WKH XVH RI FRPSXWLQJ IDFLOLWLHV SURJUDPPLQJ KHOS DQG DLG LQ UXQQLQJ WKH VLPXODWLRQV UHTXLUHG 0RVW LPSRUWDQWO\ WKDQN P\ IDPLO\ 1DQF\ -RKQ DQG +HDWKHU IRU WKHLU XQGHUVWDQGLQJ DQG HQFRXUDJHPHQW LQ WKLV HQGHDYRU PAGE 4 7$%/( 2) &217(176 $&.12:/('*(0(176 LL /,67 2) 7$%/(6 YL /,67 2) ),*85(6 YLL $%675$&7 YLLL &+$37(5 ,1752'8&7,21 &+$37(5 7+( ()),&,(1&< 2) +$/)6,% +$/)',$//(/ $1' &,5&8/$5 0$7,1* '(6,*16 ,1 7+( (67,0$7,21 2) *(1(7,& 3$5$0(7(56 :,7+ 9$5,$%/( 180%(56 2) 3$5(176 $1' /2&$7,216 ,QWURGXFWLRQ 0HWKRGV $VVXPSWLRQV &RQFHUQLQJ %ORFN 6L]H 7KH 8VH RI (IILFLHQF\ Lf *HQHUDO 0HWKRGRORJ\ /HYHOV RI *HQHWLF 'HWHUPLQDWLRQ &RYDULDQFH 0DWUL[ IRU 9DULDQFH &RPSRQHQWV &RYDULDQFH 0DWUL[ IRU /LQHDU &RPELQDWLRQV RI 9DULDQFH &RPSRQHQWV DQG 9DULDQFH RI D 5DWLR &RPSDULVRQ $PRQJ (VWLPDWHV RI 9DULDQFHV RI 5DWLRV 5HVXOWV +HULWDELOLW\ 7\SH % &RUUHODWLRQ 'RPLQDQFH WR $GGLWLYH 9DULDQFH 5DWLR 'LVFXVVLRQ &RPSDULVRQ RI 0DWLQJ 'HVLJQV $ *HQHUDO $SSURDFK WR WKH (VWLPDWLRQ 3UREOHP 8VH RI WKH 9DULDQFH RI D 5DWLR $SSUR[LPDWLRQ &RQFOXVLRQV LLL PAGE 5 &+$37(5 25',1$5< /($67 648$5(6 (67,0$7,21 2) *(1(5$/ $1' 63(&,),& &20%,1,1* $%,/,7,(6 )520 +$/)',$//(/ 0$7,1* '(6,*16 ,QWURGXFWLRQ 0HWKRGV /LQHDU 0RGHO 2UGLQDU\ /HDVW 6TXDUHV 6ROXWLRQV 6XPWR=HUR 5HVWULFWLRQV &RPSRQHQWV RI WKH 0DWUL[ (TXDWLRQ (VWLPDWLRQ RI )L[HG (IIHFWV 1XPHULFDO ([DPSOHV %DODQFHG 'DWD 3ORWPHDQ %DVLVf 0LVVLQJ 3ORW 0LVVLQJ &URVV 6HYHUDO 0LVVLQJ &URVVHV 'LVFXVVLRQ 8QLTXHQHVV RI (VWLPDWHV :HLJKWLQJ RI 3ORW 0HDQV DQG &URVV 0HDQV LQ (VWLPDWLQJ 3DUDPHWHUV 'LDOOHO 0HDQ 9DULDQFH DQG &RYDULDQFH RI 3ORW 0HDQV &RPSDULVRQ RI 3UHGLFWLRQ DQG (VWLPDWLRQ 0HWKRGRORJLHV &RQFOXVLRQV &+$37(5 9$5,$1&( &20321(17 (67,0$7,21 7(&+1,48(6 &203$5(' )25 7:2 0$7,1* '(6,*16 :,7+ )25(67 *(1(7,& $5&+,7(&785( 7+528*+ &20387(5 6,08/$7,21 ,QWURGXFWLRQ 0HWKRGV ([SHULPHQWDO $SSURDFK ([SHULPHQWDO 'HVLJQ IRU 6LPXODWHG 'DWD )XOO6LE /LQHDU 0RGHO +DOIVLE /LQHDU 0RGHO 'DWD *HQHUDWLRQ DQG 'HOHWLRQ 9DULDQFH &RPSRQHQW (VWLPDWLRQ 7HFKQLTXHV &RPSDULVRQ $PRQJ (VWLPDWLRQ 7HFKQLTXHV 5HVXOWV DQG 'LVFXVVLRQ 9DULDQFH &RPSRQHQWV 5DWLRV RI 9DULDQFH &RPSRQHQWV *HQHUDO 'LVFXVVLRQ 2EVHUYDWLRQDO 8QLW 1HJDWLYH (VWLPDWHV (VWLPDWLRQ 7HFKQLTXH 5HFRPPHQGDWLRQ ,9 PAGE 6 &+$37(5 *$5(0/ $ &20387(5 $/*25,7+0 )25 (67,0$7,1* 9$5,$1&( &20321(176 $1' 35(',&7,1* *(1(7,& 9$/8(6 ,QWURGXFWLRQ $OJRULWKP 2SHUDWLQJ *$5(0/ ,QWHUSUHWLQJ *$5(0/ 2XWSXW 9DULDQFH &RPSRQHQW (VWLPDWHV 3UHGLFWLRQV RI 5DQGRP 9DULDEOHV $V\PSWRWLF &RYDULDQFH 0DWUL[ RI 9DULDQFH &RPSRQHQWV )L[HG (IIHFW (VWLPDWHV (UURU &RYDULDQFH 0DWULFHV ([DPSOH 'DWD $QDO\VLV 2XWSXW &RQFOXVLRQV &+$37(5 &21&/86,216 $33(1',; )2575$1 6285&( &2'( )25 *$5(0/ 5()(5(1&( /,67 %,2*5$3+,&$/ 6.(7&+ Y PAGE 7 /,67 2) 7$%/(6 7DEOH 3DUDPHWULF YDULDQFH FRPSRQHQWV 7DEOH 'DWD VHW IRU QXPHULFDO H[DPSOHV 7DEOH 1XPHULFDO UHVXOWV IRU H[DPSOHV 7DEOH $EEUHYLDWLRQ IRU DQG GHVFULSWLRQ RI YDULDQFH FRPSRQHQW HVWLPDWLRQ PHWKRGV 7DEOH 6HWV RI WUXH YDULDQFH FRPSRQHQWV 7DEOH 6DPSOLQJ YDULDQFH IRU WKH HVWLPDWHV 7DEOH %LDV IRU WKH HVWLPDWHV 7DEOH 3UREDELOLW\ RI QHDUQHVV 7DEOH 'DWD IRU H[DPSOH 9, PAGE 8 /,67 2) ),*85(6 )LJXUH (IILFLHQF\ Lf IRU K )LJXUH (IILFLHQF\ Lf IRU U% )LJXUH (IILFLHQF\ f IRU )LJXUH 7KH RYHUSDUDPHWHUL]HG OLQHDU PRGHO )LJXUH 7KH OLQHDU PRGHO IRU D IRXUSDUHQW KDOIGLDOOHO )LJXUH ,QWHUPHGLDWH UHVXOW LQ 6& $ VXEPDWUL[ JHQHUDWLRQ )LJXUH :HLJKWV RQ RYHUDOO FURVV PHDQV )LJXUH 'LVWULEXWLRQ RI 0,948( HVWLPDWHV YLL PAGE 9 $EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ 237,0$/ 0$7,1* '(6,*16 $1' 237,0$/ 7(&+1,48(6 )25 $1$/<6,6 2) 48$17,7$7,9( 75$,76 ,1 )25(67 *(1(7,&6 %\ 'XGOH\ $UYOH +XEHU 0D\ &KDLUSHUVRQ 7LPRWK\ / :KLWH 0DMRU 'HSDUWPHQW 6FKRRO RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ )LUVW WKH DV\PSWRWLF FRYDULDQFH PDWUL[ RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV LV XVHG WR FRPSDUH WKUHH FRPPRQ PDWLQJ GHVLJQV IRU HIILFLHQF\ PD[LPL]LQJ WKH YDULDQFH UHGXFLQJ SURSHUW\ RI HDFK REVHUYDWLRQf IRU JHQHWLF SDUDPHWHUV DFURVV QXPEHUV RI SDUHQWV DQG ORFDWLRQV DQG YDU\LQJ JHQHWLF DUFKLWHFWXUHV ,W LV GHWHUPLQHG WKDW WKH FLUFXODU PDWLQJ GHVLJQ LV DOZD\V VXSHULRU LQ HIILFLHQF\ WR WKH KDOIGLDOOHO GHVLJQ )RU VLQJOHWUHH KHULWDELOLW\ WKH KDOIVLE GHVLJQ LV PRVW HIILFLHQW )RU HVWLPDWLQJ W\SH % FRUUHODWLRQ PD[LPXP HIILFLHQF\ LV DFKLHYHG E\ HLWKHU WKH KDOI VLE RU FLUFXODU PDWLQJ GHVLJQ DQG WKDW FKDQJH LQ UDQN IRU HIILFLHQF\ LV GHWHUPLQHG E\ WKH XQGHUO\LQJ JHQHWLF DUFKLWHFWXUH $QRWKHU LQWHQW RI WKLV ZRUN LV FRPSDULQJ DQDO\VLV PHWKRGRORJLHV IRU GHWHUPLQLQJ SDUHQWDO ZRUWK 7KH ILUVW RI WKHVH LQYHVWLJDWLRQV LV RUGLQDU\ OHDVW VTXDUHV DVVXPSWLRQV LQ WKH HVWLPDWLRQ RI SDUHQWDO ZRUWK IRU WKH KDOIGLDOOHO PDWLQJ GHVLJQ ZLWK EDODQFHG DQG XQEDODQFHG GDWD 7KH FRQFOXVLRQ IURP FRPSDULVRQ RI RUGLQDU\ OHDVW VTXDUHV WR DOWHUQDWLYH DQDO\VLV PHWKRGRORJLHV LV WKDW EHVW OLQHDU XQELDVHG SUHGLFWLRQ DQG EHVW OLQHDU SUHGLFWLRQ DUH PRUH DSSURSULDWH WR WKH SUREOHP RI GHWHUPLQLQJ SDUHQWDO ZRUWK YLLL PAGE 10 7KH QH[W DQDO\VLV LQYHVWLJDWLRQ FRQWUDVWV YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV DFURVV OHYHOV RI LPEDODQFH IRU WKH KDOIGLDOOHO DQG KDOIVLE PDWLQJ GHVLJQV IRU WKH HVWLPDWLRQ RI JHQHWLF SDUDPHWHUV ZLWK SORW PHDQV DQG LQGLYLGXDOV XVHG DV WKH XQLW RI REVHUYDWLRQ 7KH FULWHULD IRU GLVFULPLQDWLRQ DUH YDULDQFH RI WKH HVWLPDWHV PHDQ VTXDUH HUURU ELDV DQG SUREDELOLW\ RI QHDUQHVV )RU DOO HVWLPDWLRQ WHFKQLTXHV LQGLYLGXDOV DV WKH XQLW RI REVHUYDWLRQ SURGXFHG HVWLPDWHV ZLWK WKH PRVW GHVLUDEOH SURSHUWLHV 2I WKH HVWLPDWLRQ WHFKQLTXHV H[DPLQHG UHVWULFWHG PD[LPXP OLNHOLKRRG LV WKH PRVW UREXVW WR LPEDODQFH 7KH FRPSXWHU SURJUDP XVHG WR SURGXFH UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWHV RI YDULDQFH FRPSRQHQWV ZDV PRGLILHG WR IRUP D XVHU IULHQGO\ DQDO\VLV SDFNDJH %RWK WKH DOJRULWKP DQG WKH RXWSXWV RI WKH SURJUDP DUH GRFXPHQWHG 2XWSXWV DYDLODEOH IURP WKH SURJUDP LQFOXGH YDULDQFH FRPSRQHQW HVWLPDWHV JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV RI IL[HG HIIHFWV DV\PSWRWLF FRYDULDQFH PDWUL[ IRU YDULDQFH FRPSRQHQWV EHVW OLQHDU XQELDVHG SUHGLFWLRQV IRU JHQHUDO DQG VSHFLILF FRPELQLQJ DELOLWLHV DQG WKH HUURU FRYDULDQFH PDWUL[ IRU SUHGLFWLRQV DQG HVWLPDWHV ,; PAGE 11 &+$37(5 ,1752'8&7,21 $QDO\VLV RI TXDQWLWDWLYH WUDLWV LQ IRUHVW JHQHWLF H[SHULPHQWV KDV WUDGLWLRQDOO\ EHHQ DSSURDFKHG DV D WZRSDUW SUREOHP 3DUHQWDO ZRUWK ZRXOG EH HVWLPDWHG DV IL[HG HIIHFWV DQG ODWHU FRQVLGHUHG DV UDQGRP HIIHFWV IRU WKH GHWHUPLQDWLRQ RI JHQHWLF DUFKLWHFWXUH :KLOH WUDGLWLRQDO WKLV DSSURDFK LV PRVW SUREDEO\ VXERSWLPDO JLYHQ WKH SUROLIHUDWLRQ RI DOWHUQDWLYH DQDO\VLV DSSURDFKHV ZLWK HQKDQFHG WKHRUHWLFDO SURSHUWLHV :KLWH DQG +RGJH f ,Q WKLV GLVVHUWDWLRQ HPSKDVLV LV SODFHG RQ WKH KDOIGLDOOHO PDWLQJ GHVLJQ EHFDXVH RI LWV RPQLSUHVHQFH DQG WKH XQLTXHQHVV RI WKH DQDO\VLV SUREOHP WKLV PDWLQJ GHVLJQ SUHVHQWV 7KH KDOI GLDOOHO PDWLQJ GHVLJQ KDV EHHQ DQG FRQWLQXHV WR EH XVHG LQ SODQW VFLHQFHV 6SUDJXH DQG 7DWXP *LOEHUW 0DW]LQJHU HW DO %XUOH\ HW DO 6TXLOODFH :HLU DQG =REHO :LOFR[ HW DO 6Q\GHU DQG 1DPNRRQJ +DOODXHU DQG 0LUDQGD 6LQJK DQG 6LQJK *UHHQZRRG HW DO DQG :HLU DQG *RGGDUG f 7KH XQLTXH IHDWXUH RI WKH KDOIGLDOOHO PDWLQJ V\VWHP ZKLFK KLQGHUV DQDO\VLV ZLWK PDQ\ VWDWLVWLFDO SDFNDJHV LV WKDW D VLQJOH REVHUYDWLRQ FRQWDLQV WZR OHYHOV RI WKH VDPH PDLQ HIIHFW 2SWLPDOLW\ RI PDWLQJ GHVLJQ IRU WKH HVWLPDWLRQ RI FRPPRQO\ QHHGHG JHQHWLF SDUDPHWHUV VLQJOHWUHH KHULWDELOLW\ W\SH % FRUUHODWLRQ DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLRf LV H[DPLQHG XWLOL]LQJ WKH DV\PSWRWLF FRYDULDQFH RI WKH YDULDQFH FRPSRQHQWV .HQGDOO DQG 6WXDUW *LHVEUHFKW DQG 0F&XWFKDQ HW DO f 6LQFH JHQHWLF ILHOG H[SHULPHQWV DUH FRPSRVHG RI ERWK D PDWLQJ GHVLJQ DQG D ILHOG GHVLJQ WKH FHQWUDO FRQVLGHUDWLRQ LQ WKLV LQYHVWLJDWLRQ LV ZKLFK PDWLQJ GHVLJQ ZLWK ZKDW ILHOG GHVLJQ KRZ PDQ\ SDUHQWV DQG DFURVV ZKDW QXPEHU RI ORFDWLRQV PAGE 12 ZLWKLQ D UDQGRPL]HG FRPSOHWH EORFN GHVLJQf LV PRVW HIILFLHQW 7KH FULWHULRQ IRU GLVFHUQPHQW DPRQJ GHVLJQV LV WKH HIILFLHQF\ RI WKH LQGLYLGXDO REVHUYDWLRQ LQ UHGXFLQJ WKH YDULDQFH RI WKH HVWLPDWH 3HGHUVRQ f 7KLV TXHVWLRQ LV FRQVLGHUHG XQGHU D UDQJH RI JHQHWLF DUFKLWHFWXUHV ZKLFK VSDQV WKDW UHSRUWHG IRU FRQLIHURXV JURZWK WUDLWV &DPSEHOO 6WRQHF\SKHU HW DO 6Q\GHU DQG 1DPNRRQJ )RVWHU )RVWHU DQG %ULGJZDWHU +RGJH DQG :KLWH >LQ SUHVV@f 7KH LQYHVWLJDWLRQ LQWR RSWLPDO DQDO\VLV SURFHHGV E\ FRQVLGHULQJ WKH RUGLQDU\ OHDVW VTXDUHV 2/6f WUHDWPHQW RI HVWLPDWLQJ SDUHQWDO ZRUWK IRU WKH KDOIGLDOOHO PDWLQJ GHVLJQ 2/6 DVVXPSWLRQV DUH H[DPLQHG LQ GHWDLO WKURXJK WKH XVH RI PDWUL[ DOJHEUD IRU ERWK EDODQFHG DQG XQEDODQFHG GDWD 7KH XVH RI PDWUL[ DOJHEUD LOOXVWUDWHV ERWK WKH XQLTXHQHVV RI WKH SUREOHP DQG WKH LQWHUSUHWDWLRQ RI WKH 2/6 DVVXPSWLRQV &RPSDULVRQV DPRQJ 2/6 JHQHUDOL]HG OHDVW VTXDUHV */6f EHVW OLQHDU XQELDVHG SUHGLFWLRQ %/83f DQG EHVW OLQHDU SUHGLFWLRQ %/3f DUH PDGH RQ D WKHRUHWLFDO EDVLV $OWKRXJK FRQVLGHUDWLRQ RI ILHOG DQG PDWLQJ GHVLJQ RI IXWXUH H[SHULPHQWV LV HVVHQWLDO WKH SUREOHP RI RSWLPDO DQDO\VLV RI FXUUHQW GDWD UHPDLQV ,Q UHVSRQVH WR WKLV QHHG VLPXODWHG GDWD ZLWK GLIIHULQJ OHYHOV RI LPEDODQFH JHQHWLF DUFKLWHFWXUH DQG PDWLQJ GHVLJQ LV XWLOL]HG DV D EDVLV IRU GLVFULPLQDWLQJ DPRQJ YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV LQ WKH GHWHUPLQDWLRQ RI JHQHWLF DUFKLWHFWXUH 7KH OHYHOV RI LPEDODQFH VLPXODWHG UHSUHVHQW WKRVH FRPPRQO\ VHHQ LQ IRUHVW JHQHWLF GDWD DV OHVV WKDQ b VXUYLYDO PLVVLQJ FURVVHV IRU IXOOVLE PDWLQJ GHVLJQV DQG RQO\ VXEVHWV RI SDUHQWV LQ FRPPRQ DFURVV ORFDWLRQ IRU KDOIVLE PDWLQJ GHVLJQV 7KH WZR PDWLQJ GHVLJQV DUH KDOIVLE DQG KDOIGLDOOHO ZLWK D VXEVHW RI WKH SUHYLRXVO\ XVHG JHQHWLF DUFKLWHFWXUHV 7KH ILHOG GHVLJQ LV D UDQGRPL]HG FRPSOHWH EORFN ZLWK ILIWHHQ IDPLOLHV SHU EORFN DQG VL[ WUHHV SHU IDPLO\ SHU EORFN 7KH IRXU FULWHUD XVHG WR GLVFULPLQDWH DPRQJ YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV DUH SUREDELOLW\ RI QHDUQHVV 3LWWPDQ f ELDV YDULDQFH RI WKH HVWLPDWHV DQG PHDQ VTXDUH HUURU +RJJ DQG &UDLJ f PAGE 13 7KH WHFKQLTXHV FRPSDUHG IRU YDULDQFH FRPSRQHQW HVWLPDWLRQ DUH PLQLPXP YDULDQFH TXDGUDWLF XQELDVHG HVWLPDWLRQ 5DR Ef PLQLPXP QRUP TXDGUDWLF XQELDVHG HVWLPDWLRQ 5DR Df UHVWULFWHG PD[LPXP OLNHOLKRRG 3DWWHUVRQ DQG 7KRPSVRQ f PD[LPXP OLNHOLKRRG +DUWOH\ DQG 5DR f DQG +HQGHUVRQfV PHWKRG +HQGHUVRQ f 7KHVH WHFKQLTXHV DUH FRPSDUHG XVLQJ WKH LQGLYLGXDO DQG SORW PHDQV DV WKH XQLW RI REVHUYDWLRQ )XUWKHU WKUHH DOWHUQDWLYHV DUH H[SORUHG IRU GHDOLQJ ZLWK QHJDWLYH YDULDQFH FRPSRQHQW HVWLPDWHV ZKLFK DUH DFFHSW DQG OLYH ZLWK QHJDWLYH HVWLPDWHV VHW QHJDWLYH HVWLPDWHV WR ]HUR DQG UHVROYH WKH V\VWHP VHWWLQJ QHJDWLYH FRPSRQHQWV WR ]HUR 7KH DOJRULWKP XVHG IRU WKH PHWKRG ZKLFK SURYLGHG HVWLPDWHV ZLWK RSWLPDO SURSHUWLHV DFURVV H[SHULPHQWDO OHYHOV ZDV FRQYHUWHG WR D XVHU IULHQGO\ SURJUDP 7KLV SURJUDP SURYLGLQJ UHVWULFWHG PD[LPXP OLNHOLKRRG YDULDQFH FRPSRQHQW HVWLPDWHV XVHV *LHVEUHFKWfV DOJRULWKP f 'RFXPHQWDWLRQ RI WKH DOJRULWKP DQG H[SODQDWLRQ RI WKH SURJUDPfV RXWSXW DUH SURYLGHG DORQJ ZLWK WKH )RUWUDQ VRXUFH FRGH DSSHQGL[f PAGE 14 &+$37(5 7+( ()),&,(1&< 2) +$/)6,% +$/)',$//(/ $1' &,5&8/$5 0$7,1* '(6,*16 ,1 7+( (67,0$7,21 2) *(1(7,& 3$5$0(7(56 :,7+ 9$5,$%/( 180%(56 2) 3$5(176 $1' /2&$7,216 ,QWURGXFWLRQ ,Q IRUHVW WUHH LPSURYHPHQW JHQHWLF WHVWV DUH HVWDEOLVKHG IRU IRXU SULPDU\ SXUSRVHV f UDQNLQJ SDUHQWV f VHOHFWLQJ IDPLOLHV RU LQGLYLGXDOV f HVWLPDWLQJ JHQHWLF SDUDPHWHUV DQG f GHPRQVWUDWLQJ JHQHWLF JDLQ =REHO DQG 7DOEHUW f :KLOH WKH IRXU SXUSRVHV DUH QRW PXWXDOO\ H[FOXVLYH D WHVW GHVLJQ RSWLPDO IRU RQH SXUSRVH LV PRVW SUREDEO\ QRW RSWLPDO IRU DOO %XUGRQ DQG 6KHOERXUQH :KLWH f $ EUHHGHU WKHQ PXVW SULRULWL]H WKH SXUSRVHV IRU ZKLFK D JLYHQ WHVW LV HVWDEOLVKHG DQG FKRRVH D GHVLJQ EDVHG RQ WKHVH SULRULWLHV :LWKLQ D JHQHWLF WHVW GHVLJQ WKHUH DUH WZR SULPDU\ FRPSRQHQWV PDWLQJ GHVLJQ DQG ILHOG GHVLJQ 7KHUH KDYH EHHQ VHYHUDO LQYHVWLJDWLRQV RI RSWLPDO GHVLJQV IRU WKHVH WZR FRPSRQHQWV HLWKHU VHSDUDWHO\ RU VLPXOWDQHRXVO\ XQGHU YDULRXV FULWHULD 7KHVH FULWHULD KDYH LQFOXGHG WKH HIILFLHQW DQGRU SUHFLVH HVWLPDWLRQ RI KHULWDELOLW\ 3HGHUVRQ 1DPNRRQJ DQG 5REHUGV 3HSSHU DQG 1DPNRRQJ 0F&XWFKDQ HW DO 0F&XWFKDQ HW DO f SUHFLVH HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV %UDDWHQ 3HSSHU f DQG HIILFLHQW VHOHFWLRQ RI SURJHQ\ YDQ %XLMWHQHQ :KLWH DQG +RGJH YDQ %XLMWHQHQ DQG %XUGRQ /RR'LQNLQV HW DO f ,QFRUSRUDWHG ZLWKLQ WKLV ERG\ RI UHVHDUFK KDV EHHQ D ZLGH UDQJH RI JHQHWLF DQG HQYLURQPHQWDO YDULDQFH SDUDPHWHUV DQG ILHOG DQG PDWLQJ GHVLJQV +RZHYHU WKH PRGHOV LQ SUHYLRXV LQYHVWLJDWLRQV KDYH EHHQ SULPDULO\ FRQVWUDLQHG WR FRQVLGHUDWLRQ RI WHVWLQJ LQ D VLQJOH PAGE 15 HQYLURQPHQW ZLWK D FRUUHVSRQGLQJ OLPLWHG QXPEHU RI IDFWRUV LQ WKH PRGHO LH JHQRW\SH E\ HQYLURQPHQW LQWHUDFWLRQ DQGRU GRPLQDQFH YDULDQFH DUH XVXDOO\ QRW FRQVLGHUHG 7KLV FKDSWHU IRFXVHV RQ RSWLPDO PDWLQJ GHVLJQV WKURXJK FRQVLGHUDWLRQ RI WKUHH FRPPRQ PDWLQJ GHVLJQV KDOI VLE KDOIGLDOOHO DQG FLUFXODU ZLWK IRXU FURVVHV SHU SDUHQWf IRU HVWLPDWLRQ RI JHQHWLF SDUDPHWHUV ZLWK D ILHOG GHVLJQ H[WHQGLQJ DFURVV PXOWLSOH ORFDWLRQV ,Q WKLV FKDSWHU WKH DSSURDFK WR WKH RSWLPDO GHVLJQ SUREOHP LV WR PDLQWDLQ WKH EDVLF ILHOG GHVLJQ ZLWKLQ ORFDWLRQV DV UDQGRPL]HG FRPSOHWH EORFN ZLWK IRXU EORFNV DQG D VL[WUHH URZSORW UHSUHVHQWLQJ HDFK JHQHWLF HQWU\ ZLWKLQ D EORFN QRWHG DV RQH RI WKH PRVW FRPPRQ ILHOG GHVLJQV E\ /RR'LQNLQV HW DO f 7KH QXPEHU RI IDPLOLHV LQ D EORFN QXPEHU RI ORFDWLRQV PDWLQJ GHVLJQ DQG QXPEHU RI SDUHQWV ZLWKLQ D PDWLQJ GHVLJQ DUH DOORZHG WR FKDQJH 6LQFH RSWLPDOLW\ EHVLGHV EHLQJ D IXQFWLRQ RI WKH ILHOG DQG PDWLQJ GHVLJQV LV DOVR D IXQFWLRQ RI WKH XQGHUO\LQJ JHQHWLF SDUDPHWHUV DOO GHVLJQV DUH H[DPLQHG DFURVV D UDQJH RI OHYHOV RI JHQHWLF GHWHUPLQDWLRQ DV YDU\LQJ OHYHOV RI KHULWDELOLW\ JHQRW\SH E\ HQYLURQPHQW LQWHUDFWLRQ DQG GRPLQDQFHf UHIOHFWLQJ HVWLPDWHV IRU PDQ\ HFRQRPLFDOO\ LPSRUWDQW WUDLWV LQ FRQLIHUV &DPSEHOO 6WRQHF\SKHU HW DO 6Q\GHU DQG 1DPNRRQJ )RVWHU )RVWHU DQG %ULGJZDWHU +RGJH DQG :KLWH LQ SUHVVff )RU HDFK GHVLJQ DQG OHYHO RI JHQHWLF GHWHUPLQDWLRQ D 0LQLPXP 9DULDQFH 4XDGUDWLF 8QELDVHG (VWLPDWLRQ 0,948(f WHFKQLTXH DQG DQ DSSUR[LPDWLRQ RI WKH YDULDQFH RI D UDWLR .HQGDOO DQG 6WXDUW *LHVEUHFKW DQG 0F&XWFKDQ HW DO f DUH DSSOLHG WR HVWLPDWH WKH YDULDQFH RI HVWLPDWHV RI KHULWDELOLW\ DGGLWLYH WR DGGLWLYH SOXV DGGLWLYH E\ HQYLURQPHQW YDULDQFH UDWLR DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR 7KHVH WHFKQLTXHV XVH WKH WUXH FRYDULDQFH PDWUL[ RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV XWLOL]LQJ RQO\ WKH NQRZQ SDUDPHWHUV DQG WKH WHVW GHVLJQ DQG SUHFOXGLQJ WKH QHHG IRU VLPXODWHG RU UHDO GDWDf DQG D 7D\ORU VHULHV DSSUR[LPDWLRQ RI WKH YDULDQFH RI D UDWLR 7KH UHODWLYH HIILFLHQFLHV RI GLIIHUHQW WHVW GHVLJQV DUH FRPSDUHG RQ WKH EDVLV RI L WKH PAGE 16 HIILFLHQF\ RI DQ LQGLYLGXDO REVHUYDWLRQ LQ UHGXFLQJ WKH YDULDQFH RI DQ HVWLPDWH 3HGHUVRQ f 7KXV WKLV UHVHDUFK H[SORUHV ZKLFK PDWLQJ GHVLJQ QXPEHU RI SDUHQWV DQG QXPEHU RI ORFDWLRQV LV PRVW HIILFLHQW SHU XQLW RI REVHUYDWLRQ LQ HVWLPDWLQJ KHULWDELOLW\ DGGLWLYH WR DGGLWLYH SOXV DGGLWLYH E\ HQYLURQPHQW YDULDQFH UDWLR DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR IRU VHYHUDO YDULDQFH VWUXFWXUHV UHSUHVHQWDWLYH RI FRQLIHURXV JURZWK WUDLWV 0HWKRGV $VVXPSWLRQV &RQFHUQLQJ %ORFN 6L]H $V RSSRVHG WR 0F&XWFKDQ HW DO f ZKHUH EORFN VL]HV ZHUH KHOG FRQVWDQW DQG LQFOXGLQJ PRUH IDPLOLHV UHVXOWHG LQ IHZHU REVHUYDWLRQV SHU IDPLO\ SHU EORFN LQ WKLV FKDSWHU WKH EORFNV DUH DOORZHG WR H[SDQG WR DFFRPRGDWH LQFUHDVLQJ QXPEHUV RI IDPLOLHV 7KLV H[SDQVLRQ LV DOORZHG ZLWKRXW LQFUHDVLQJ HLWKHU WKH YDULDQFH DPRQJ EORFN RU WKH YDULDQFH ZLWKLQ EORFNV )RU WKH WKUHH PDWLQJ GHVLJQV ZKLFK DUH GLVFXVVHG WKH DGGLWLRQ RI RQH SDUHQW WR WKH KDOIVLE GHVLJQ LQFUHDVHV EORFN VL]H E\ WUHHV SORW IRU D KDOIVLE IDPLO\f WKH DGGLWLRQ RI D SDUHQW WR WKH FLUFXODU GHVLJQ LQFUHDVHV EORFN VL]H E\ WUHHV WZR SORWV IRU IXOOVLE IDPLOLHVf DQG WKH DGGLWLRQ RI D SDUHQW WR WKH KDOIGLDOOHO GHVLJQ LQFUHDVHV EORFN VL]H E\ S ZKHUH S LV WKH QXPEHU RI SDUHQWV EHIRUH WKH DGGLWLRQ RU WKHUH DUH S QHZ IXOOVLE IDPLOLHV SHU EORFNf 7KHUHIRUH EORFN VL]H LV GHWHUPLQHG E\ WKH PDWLQJ GHVLJQ DQG WKH QXPEHU RI SDUHQWV $OO FRPSDULVRQV DPRQJ PDWLQJ GHVLJQV DQG QXPEHUV RI ORFDWLRQV DUH IRU HTXDO EORFN VL]HV LH HTXDO QXPEHUV RI REVHUYDWLRQV SHU ORFDWLRQ 7KLV UHVXOWV LQ FRPSDULQJ PDWLQJ GHVLJQV ZLWK XQHTXDO QXPEHUV RI SDUHQWV LQ WKH GHVLJQV DQG FRPSDULQJ WZR ORFDWLRQ H[SHULPHQWV DJDLQVW ILYH ORFDWLRQ H[SHULPHQWV ZLWK HTXDO QXPEHUV RI REVHUYDWLRQV SHU ORFDWLRQ EXW XQHTXDO WRWDO QXPEHUV RI REVHUYDWLRQV PAGE 17 7KH 8VH RI (IILFLHQF\ Lf (IILFLHQF\ LV WKH WRRO E\ ZKLFK FRPSDULVRQV DUH PDGH DQG LV WKH HIILFDF\ RI WKH LQGLYLGXDO REVHUYDWLRQV LQ DQ H[SHULPHQW LQ ORZHULQJ WKH YDULDQFH RI SDUDPHWHU HVWLPDWHV $Q LQFUHDVLQJ HIILFLHQF\ LQGLFDWHV WKDW IRU LQFUHDVLQJ H[SHULPHQWDO VL]H WKH DGGLWLRQDO REVHUYDWLRQV KDYH HQKDQFHG WKH YDULDQFH UHGXFLQJ SURSHUW\ RI DOO REVHUYDWLRQV (IILFLHQF\ LV FDOFXODWHG DV L 19DU[ff ZKHUH 1 LV WKH WRWDO QXPEHU RI REVHUYDWLRQV DQG 9DU[f LV WKH YDULDQFH RI D JHQHULF SDUDPHWHU HVWLPDWH ,QFUHDVLQJ 1 DOZD\V UHVXOWV LQ D UHGXFWLRQ RI WKH YDULDQFH RI HVWLPDWLRQ DOO RWKHU WKLQJV EHLQJ HTXDO PAGE 18 *HQHUDO 0HWKRGRORJ\ 6HWV RI WUXH YDULDQFH FRPSRQHQWV DUH FDOFXODWHG LQ DFFRUGDQFH ZLWK D VWDWHG OHYHO RI JHQHWLF FRQWURO DQG WKH GHVLJQ PDWUL[ LV JHQHUDWHG LQ FRUUHVSRQGHQFH ZLWK WKH ILHOG DQG PDWLQJ GHVLJQ .QRZLQJ WKH GHVLJQ PDWUL[ DQG D VHW RI WUXH YDULDQFH FRPSRQHQWV D WUXH FRYDULDQFH FRYDULDQFHf PDWUL[ RI YDULDQFH FRPSRQHQW HVWLPDWHV LV JHQHUDWHG 2QFH WKH FRYDULDQFH PDWUL[ RI WKH YDULDQFH FRPSRQHQWV LV LQ KDQG WKH YDULDQFH RI DQG FRYDULDQFHV EHWZHHQ DQ\ OLQHDU FRPELQDWLRQV RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV DUH FDOFXODWHG )URP WKH FRYDULDQFH PDWUL[ IRU OLQHDU FRPELQDWLRQV WKH YDULDQFH RI JHQHWLF UDWLRV DV UDWLRV RI OLQHDU FRPELQDWLRQV RI YDULDQFH FRPSRQHQWV DUH DSSUR[LPDWHG E\ D 7D\ORU VHULHV H[SDQVLRQ 6LQFH GHILQLWLRQ RI D VHW RI YDULDQFH FRPSRQHQWV DQG IRUPDWLRQ RI WKH GHVLJQ PDWUL[ DUH GHSHQGHQW RQ WKH OLQHDU PRGHO HPSOR\HG GLVFXVVLRQ RI VSHFLILF PHWKRGRORJ\ EHJLQV ZLWK OLQHDU PRGHOV /LQHDU 0RGHOV +DOIGLDOOHO DQG FLUFXODU GHVLJQV 7KH VFDODU OLQHDU PRGHO HPSOR\HG IRU KDOIGLDOOHO DQG FLUFXODU PDWLQJ GHVLJQV LV \cMNOP + E\ JN J 6X WJLN WJX W6Z SLMNO ZLMNOP ZKHUH \LMNOP LV WKH P REVHUYDWLRQ RI WKH NO FURVV LQ WKH M EORFN RI WKH L WHVW + LV WKH SRSXODWLRQ PHDQ WL LV WKH UDQGRP YDULDEOH WHVW HQYLURQPHQW a 1,'L2Rf EM LV WKH UDQGRP YDULDEOH EORFN a 1,'UEf JN LV WKH UDQGRP YDULDEOH IHPDOH JHQHUDO FRPELQLQJ DELOLW\ JFDf a 1,'UJFJ LV WKH UDQGRP YDULDEOH PDOH JFD a 1,'DJFDf VX LV WKH UDQGRP YDULDEOH VSHFLILF FRPELQLQJ DELOLW\ VHDf a 1,'AIIAf WJcM LV WKH UDQGRP YDULDEOH WHVW E\ IHPDOH JFD LQWHUDFWLRQ a 1,'L2Af PAGE 19 WJLL LV WKH UDQGRP YDULDEOH WHVW E\ PDOH JFD LQWHUDFWLRQ a 1,'ADA WV0 LV WKH UDQGRP YDULDEOH WHVW E\ VHD LQWHUDFWLRQ a 1,'DSLMNO LV WKH UDQGRP YDULDEOH SORW a 1,'&2Af :MMNLP LV WKH UDQGRP YDULDEOH ZLWKLQ SORW a 1,'FUZf DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KLV OLQHDU PRGHO LQ PDWUL[ QRWDWLRQ LV GLPHQVLRQV EHORZ PRGHO FRPSRQHQWf \ =MM&M =%H% =*H* =VHV =AHA =AAWV =3H3 HZ Q[O D[O R[W W[O Q[E E[O Q[J J[O Q[V V[O Q[WJ WJ[O R[WV WVMHO Q[S S[O Q[O ZKHUH \ LV WKH REVHUYDWLRQ YHFWRU =M LV WKH SRUWLRQ RI WKH GHVLJQ PDWUL[ IRU WKH LfÂ§ UDQGRP YDULDEOH H LV WKH YHFWRU RI XQREVHUYDEOH UDQGRP HIIHFWV IRU WKH LfÂ§ UDQGRP YDULDEOH LV D YHFWRU RI OfV DQG Q W E J V WJ WV DQG S DUH WKH QXPEHU RI REVHUYDWLRQV WHVWV EORFNV JFDfV VHDfV WHVW E\ JFD LQWHUDFWLRQV WHVW E\ VHD LQWHUDFWLRQV DQG SORWV UHVSHFWLYHO\ 8WLOL]LQJ FXVWRPDU\ DVVXPSWLRQV LQ KDOIGLDOOHO PDWLQJ GHVLJQV 0HWKRG *ULIILQJ f WKH YDULDQFH RI DQ LQGLYLGXDO REVHUYDWLRQ LV 9DU\LMOGUW 7E "JFD DA A Dr GY D DQG LQ PDWUL[ QRWDWLRQ WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV LV 9DU\f =S=IR =%=ÂR? =*=ÂR& =V=\f A*AR? =A=AR =3=fUS ,R ZKHUH f LQGLFDWHV WKH WUDQVSRVH RSHUDWRU DOO PDWULFHV RI WKH IRUP =Af DUH Q[Q DQG ,Q LV DQ Q[Q LGHQWLW\ PDWUL[ +DOIVLE GHVLJQ 7KH VFDODU OLQHDU PRGHO IRU KDOIVLE PDWLQJ GHVLJQV LV \cMNP IW Wc E\ JN WJr 3rMN ZrMNP PAGE 20 ZKHUH \LMNP LV WKH P REVHUYDWLRQ RI WKH N KDOIVLE IDPLO\ LQ WKH MfÂ§ EORFN RI WKH LfÂ§ WHVW U Wc EcM JN DQG WJA UHWDLQ WKH GHILQLWLRQ LQ (T SrLMN LV WKH UDQGRP YDULDEOH SORW FRQWDLQLQJ GLIIHUHQW JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ (T a 1,'Â2pf ZrMNP LV WKH UDQGRP YDULDEOH ZLWKLQ SORW FRQWDLQLQJ GLIIHUHQW OHYHOV RI JHQRW\SLF DQG JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ (T a 1,'I2pr}f DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KH PDWUL[ QRWDWLRQ PRGHO LV \ QO =7Wr7 I =JJ I =T&T I A[JAWJ f =MHS Ir UHG UXH DUW WUO UXE KUO UXJ J[O LXWJ WJ[O UXS SUO UXO 7KH YDULDQFH RI DQ LQGLYLGXDO REVHUYDWLRQ LQ KDOIVLE GHVLJQV LV 9DU\LMND UE UJFD D? FUS RA DQG 9DU\f 7A/?D? =%=ÂUE =*= Â[JFD =A=ADA =3=3fDS f /HYHOV RI *HQHWLF 'HWHUPLQDWLRQ (LJKW OHYHOV RI JHQHWLF GHWHUPLQDWLRQ DUH GHULYHG IURP D IDFWRULDO FRPELQDWLRQ RI WZR OHYHOV RI HDFK RI WKUHH JHQHWLF UDWLRV KHULWDELOLW\ K RUJUD DJFD DVFD ODA DA DS UZf IRU IXOOVLE PRGHOV DQG K UJFD DJFD D? RUS RAf IRU KDOIVLE PRGHOVf DGGLWLYH WR DGGLWLYH SOXV DGGLWLYH E\ HQYLURQPHQW YDULDQFH UDWLR U% UJFD UJFD DAf 7\SH % FRUUHODWLRQ RI %XUGRQ f DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR RA UJF7KH OHYHOV HPSOR\HG IRU HDFK UDWLR DUH K DQG U% DQG DQG DQG 7R JHQHUDWH VHWV RI WUXH YDULDQFH FRPSRQHQWV 7DEOH f IRU KDOIGLDOOHO DQG FLUFXODU PDWLQJ GHVLJQV IURP WKH IDFWRULDO FRPELQDWLRQV RI JHQHWLF SDUDPHWHUV WKH GHQRPLQDWRU RI K LV VHW WR DUELWUDULO\ EXW ZLWKRXW ORVV RI JHQHUDOLW\f ZKLFK JLYHQ WKH OHYHO RI K OHDGV WR WKH PAGE 21 VROXWLRQ IRU DLDc 6ROYLQJ IRU HUJFD DQG NQRZLQJ \ \LHOGV WKH YDOXH IRU RA .QRZLQJ WKH OHYHO RI U% DQG DOORZV WKH HTXDWLRQ IRU U% WR EH VROYHG IRU DA $Q DVVXPSWLRQ WKDW WKH UDWLR RI 7DEOH 3DUDPHWULF YDULDQFH FRPSRQHQWV IRU WKH IDFWRULDO FRPELQDWLRQ RI KHULWDELOLW\ DQG f 7\SH % &RUUHODWLRQ DQG f DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR DQG f IRU IXOO DQG KDOIVLE GHVLJQV D[ DQG D? ZHUH PDLQWDLQHG DW DQG UHVSHFWLYHO\ IRU DOO OHYHOV DQG GHVLJQV 'HVLJQ /HYHO K I% R} )XOO +DOI DQG DQG DQG DQG HTXDOV WKH UDWLR RI Dr D? SHUPLWV D VROXWLRQ IRU XWH $ IXUWKHU DVVXPSWLRQ WKDW US LV VHYHQ SHUFHQW RI DS DZ \LHOGV D VROXWLRQ IRU ERWK DS DQG DZ )LQDOO\ FU DQG D? DUH VHW WR DQG UHVSHFWLYHO\ IRU DOO WUHDWPHQW OHYHOV ,Q RUGHU WR IDFLOLWDWH FRPSDULVRQV RI KDOIVLE PDWLQJ GHVLJQV ZLWK IXOOVLE PDWLQJ GHVLJQV DJFD DQG DA UHWDLQ WKH VDPH YDOXHV IRU JLYHQ OHYHOV RI K DQG U% DQG WKH GHQRPLQDWRU RI KHULWDELOLW\ DJDLQ LV VHW WR 7R VROYH IRU US DQG RA WKH DVVXPSWLRQ WKDW [S LV ILYH SHUFHQW RI D3 Dr SHUPLWV D VROXWLRQ IRU XS DQG D: DQG PDLQWDLQV US DSSUR[LPDWHO\ HTXDO WR DQG QR ODUJHU WKDQ DS RI WKH IXOOVLE PDWLQJ GHVLJQV 1DPNRRQJ HW DO f IRU WKH VDPH OHYHOV RI PAGE 22 K DQG U% 8QGHU WKH SUHYLRXV GHILQLWLRQV DOO FRQVLGHUDWLRQ RI GLIIHUHQFHV LQ \ FKDQJLQJ WKH PDJQLWXGHV RI DS DQG DA LV GLVDOORZHG 7KXV WKHUH DUH RQO\ IRXU SDUDPHWHU VHWV IRU WKH KDOI VLE PDWLQJ GHVLJQ 7DEOH f &RYDULDQFH 0DWUL[ IRU 9DULDQFH &RPSRQHQWV 7KH EDVH DOJRULWKP WR SURGXFH WKH FRYDULDQFH PDWUL[ IRU YDULDQFH FRPSRQHQW HVWLPDWHV LV IURP *LHVEUHFKW f DQG ZDV UHZULWWHQ LQ )RUWUDQ IRU HDVH RI KDQGOLQJ WKH VWXG\ GDWD ,Q XVLQJ WKLV DOJRULWKP ZH DVVXPH WKDW DOO UDQGRP YDULDEOHV DUH LQGHSHQGHQW DQG QRUPDOO\ GLVWULEXWHG DQG WKDW WKH WUXH YDULDQFHV RI WKH UDQGRP YDULDEOHV DUH NQRZQ 8QGHU WKHVH DVVXPSWLRQV 0LQLPXP 1RUP 4XDGUDWLF 8QELDVHG (VWLPDWLRQ 0,148( 5DR f XVLQJ WKH WUXH YDULDQFH FRPSRQHQWV DV SULRUV WKH VWDUWLQJ SRLQW IRU WKH DOJRULWKPf EHFRPHV 0,948( 5DR Ef ZKLFK UHTXLUHV QRUPDOLW\ DQG WKH WUXH YDULDQFH FRPSRQHQWV DV SULRUV 6HDUOH f DQG IRU D JLYHQ GHVLJQ WKH FRYDULDQFH PDWUL[ RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV EHFRPHV IL[HG $ VNHWFK RI WKH VWHSV IURP WKH 0,948( HTXDWLRQ (T *LHVEUHFKW 6HDUOH f WR WKH WUXH FRYDULDQFH PDWUL[ IRU YDULDQFH FRPSRQHQWV HVWLPDWHV LV ^WUL49M49Mf`A ^\f49M4\` U[U U[O U[O WKHQ Âr ^WU4 9 c4 9Mf`n ^\f49M4\` DQG 9DURf ^WU49t9An9DU4\f49t\0WU49t9Mf` U[U U[U U[U U[U ZKHUH ^DM LV D PDWUL[ ZKRVH HOHPHQWV DUH DcM ZKHUH LQ WKH IXOOVLE GHVLJQV L WR DQG M O WR LH WKHUH LV D URZ DQG FROXPQ IRU HYHU\ UDQGRP YDULDEOH LQ WKH OLQHDU PRGHO PAGE 23 WU LV WKH WUDFH RSHUDWRU WKDW LV WKH VXP RI WKH GLDJRQDO HOHPHQWV RI D PDWUL[ 4 9n 9n;;f9,;fn;f9n IRU 9 WKH FRYDULDQFH PDWUL[ RI \ DQG ; DV WKH GHVLJQ PDWUL[ IRU IL[HG HIIHFWV 9c =M=f ZKHUH L WKH UDQGRP YDULDEOHV WHVW EORFN HWF D LV WKH YHFWRU RI YDULDQFH FRPSRQHQW HVWLPDWHV DQG U LV WKH QXPEHU RI UDQGRP YDULDEOHV LQ WKH PRGHO 7KH YDULDQFH RI D TXDGUDWLF IRUP ZKHUH $ LV DQ\ QRQQHJDWLYH GHILQLWH PDWUL[ RI SURSHU GLPHQVLRQf XQGHU QRUPDOLW\ LV 9DU\f$\f WU$9$9f Wf$r 6HDUOH f KRZHYHU 0,148( GHULYDWLRQ 5DR f UHTXLUHV WKDW $; ZKLFK LQ RXU FDVH LV $ DQG LV HTXLYDOHQW WR [Of$OL WKXV 9DU^\f49M4\`f ^WU49L49Mf` DQG XVLQJ (T DQG (T 9DUAf ^WUW49M49AAWUW49L49S+WUW49L49Mf` DQG WKHUHIRUH 9DUAf 9YF ^WU49L49Mf` )URP (T LW LV VHHQ WKDW WKH 0,948( FRYDULDQFH PDWUL[ RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV LV GHSHQGHQW RQO\ RQ WKH GHVLJQ PDWUL[ WKH UHVXOW RI WKH ILHOG GHVLJQ DQG PDWLQJ GHVLJQf DQG WKH WUXH YDULDQFH FRPSRQHQWV D GDWD YHFWRU LV QRW QHHGHG &RYDULDQFH 0DWUL[ IRU /LQHDU &RPELQDWLRQV RI 9DULDQFH &RPSRQHQWV DQG 9DULDQFH RI D 5DWLR 2QFH WKH FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQW HVWLPDWHV (Tf LV FUHDWHG WKHQ WKH FRYDULDQFH PDWUL[ RI OLQHDU FRPELQDWLRQV RI WKHVH YDULDQFH FRPSRQHQWV LV IRUPHG DV 9N /f9YF/ [ [U U[U U[ PAGE 24 ZKHUH / VSHFLILHV WKH OLQHDU FRPELQDWLRQV RI WKH YDULDQFH FRPSRQHQWV ZKLFK DUH WKH FRPELQDWLRQV RI YDULDQFH FRPSRQHQWV LQ WKH GHQRPLQDWRU DQG QXPHUDWRU RI WKH JHQHWLF UDWLR EHLQJ HVWLPDWHG $ 7D\ORU VHULHV H[SDQVLRQ ILUVW DSSUR[LPDWLRQf IRU WKH YDULDQFH RI D UDWLR XVLQJ WKH YDULDQFHV RI DQG FRYDULDQFH EHWZHHQ QXPHUDWRU DQG GHQRPLQDWRU LV WKHQ DSSOLHG XVLQJ WKH HOHPHQWV RI 9N WR SURGXFH WKH DSSUR[LPDWH YDULDQFH RI WKH WKUHH UDWLR HVWLPDWHV DV .HQGDOO DQG 6WXDUW f 9DUUDWLRf O'f9NOOff 1'f9NOff O63'f9Nff ZKHUH WKH JHQHULF UDWLR LV 1' DQG 1 DQG DUH WKH SDUDPHWULF YDOXHV 9NOOf LV WKH YDULDQFH RI 1 9NOf LV WKH FRYDULDQFH EHWZHHQ 1 DQG DQG 9Nf LV WKH YDULDQFH RI &RPSDULVRQ $PRQJ (VWLPDWHV RI 9DULDQFHV RI 5DWLRV 7KH DSSUR[LPDWH YDULDQFHV RI WKH WKUHH UDWLR HVWLPDWHV K U% DQG \f DUH FRPSDUHG DFURVV PDWLQJ GHVLJQV ZLWK HTXDO RU DSSUR[LPDWHO\ HTXDOf QXPEHUV RI REVHUYDWLRQV DFURVV QXPEHUV RI ORFDWLRQV DQG DFURVV QXPEHUV RI SDUHQWV ZLWKLQ D PDWLQJ GHVLJQ DOO ZLWKLQ D OHYHO RI JHQHWLF GHWHUPLQDWLRQ 7KH VWDQGDUG IRU FRPSDULVRQ LV L 5HVXOWV DUH SUHVHQWHG E\ WKH JHQHWLF UDWLR HVWLPDWHG VR WKDW GLUHFW FRPSDULVRQV PD\ EH PDGH DPRQJ WKH PDWLQJ GHVLJQV IRU HTXDO QXPEHUV RI REVHUYDWLRQV ZLWKLQ D QXPEHU RI ORFDWLRQV IRU YDU\LQJ OHYHOV RI JHQHWLF FRQWURO 1XPEHU RI JHQHWLF HQWULHV QXPEHU RI FURVVHV IRU IXOOVLE GHVLJQV DQG QXPEHU RI KDOIVLE IDPLOLHV IRU KDOIVLE GHVLJQVf LV XVHG DV D SUR[\ IRU QXPEHU RI REVHUYDWLRQV VLQFH IRU DOO GHVLJQV QXPEHU RI REVHUYDWLRQV HTXDOV WZHQW\IRXU WLPHV WKH QXPEHU RI ORFDWLRQV WLPHV WKH QXPEHU RI JHQHWLF HQWULHV )XUWKHU E\ SORWWLQJ WKH WZR OHYHOV RI QXPEHUV RI ORFDWLRQV RQ D VLQJOH ILJXUH D PAGE 25 FRPSDULVRQ LV PDGH RI WKH XWLOLW\ RI UHSOLFDWLRQ RI D GHVLJQ DFURVV LQFUHDVLQJ QXPEHUV RI ORFDWLRQV (IILFLHQF\ SORWV DOVR SHUPLW FRQWUDVWV RI WKH DEVROXWH PDJQLWXGH RI YDULDQFH RI HVWLPDWLRQ DPRQJ GHVLJQV )RU D JLYHQ QXPEHU RI JHQHWLF HQWULHV DQG ORFDWLRQV WKH GHVLJQ ZLWK WKH KLJKHVW HIILFLHQF\ LV WKH PRVW SUHFLVH ORZHVW YDULDQFH RI HVWLPDWLRQf ,QFUHDVLQJ WKH QXPEHU RI JHQHWLF HQWULHV RU ORFDWLRQV DOZD\V UHVXOWV LQ JUHDWHU SUHFLVLRQ ORZHU YDULDQFH RI HVWLPDWLRQf EXW LV QRW QHFHVVDULO\ DV HIILFLHQW WKH UHGXFWLRQ LQ YDULDQFH ZDV QRW VXIILFLHQW WR RIIVHW WKH LQFUHDVH LQ QXPEHUV RI REVHUYDWLRQVf $ SULPDU\ MXVWLILFDWLRQ IRU XVLQJ WKH HIILFLHQF\ RI D GHVLJQ DV D FULWHULRQ LV WKDW D PRUH SUHFLVH HVWLPDWH RI D JHQHWLF UDWLR LV REWDLQHG E\ XVLQJ WKH PHDQ RI WZR HVWLPDWHV IURP UHSOLFDWLRQ RI WKH VPDOO GHVLJQ DV WZR GLVFRQQHFWHG H[SHULPHQWV DV RSSRVHG WR WKH HVWLPDWH IURP VLQJOH ODUJH GHVLJQ 7KLV LV WUXH ZKHQ f WKH QXPEHU RI REVHUYDWLRQV LQ WKH ODUJH GHVLJQ 1f HTXDOV WZLFH WKH QXPEHU RI REVHUYDWLRQV LQ VPDOO GHVLJQ Qcf f WKH VPDOO GHVLJQ LV PRUH HIILFLHQW DQG f WKH YDULDQFHV DUH KRPRJHQHRXV 7KLV LV SURYHQ EHORZ 6LQFH 1 Q Q DQG QL Q WKHQ 1 Q %\ GHILQLWLRQ L 1r9DU 5DWLRfff DQG 9DU5DWLRf r1f 7KH SURSRVLWLRQ LV 9DUV5DWLRf 9DUV5DWLRff 9DU5DWLRf VXEVWLWXWLRQ JLYHV OQLrf Qrfff 1r_ff 6LPSOLILFDWLRQ \LHOGV OrQrff O1rLMff DQG PXOWLSOLFDWLRQ E\ 1 SURGXFHV +L WM ZKLFK LV VWULFWO\ WUXH VR ORQJ DV L% ZKHUH L LV WKH HIILFLHQF\ RI WKH VPDOOHU H[SHULPHQW DQG L LV WKH HIILFLHQF\ RI WKH ODUJHU H[SHULPHQW PAGE 26 ()),&,(1&< ()),&,(1&< Df K U 9 Ef K U% 9 Hf K U% 9 K Uf 9 &LUFXODU ORFDWLRQV +DOIGLDOOG ORFDWLRQV +DOIVLE ORFDWLRQV &LUFXODU ORFDWLRQV +DOIGLDOOHO ORFDWLRQV +DOIVLE ORFDWLRQV f Jf Kf U 9 )LJXUH (IILFLHQF\ Lf IRU K SORWWHG DJDLQVW QXPEHU RI JHQHWLF HQWULHV IRU OHYHOV WKURXJK IRU JHQHWLF FRQWURO IRU FLUFXODU KDOI GLDOOHO DQG KDOIVLE PDWLQJ GHVLJQV DFURVV OHYHOV RI ORFDWLRQ ZKHUH L O19DUKfff DQG 1 WKH WRWDO QXPEHU RI REVHUYDWLRQV Lr D? PAGE 27 5HVXOWV +HULWDELOLWY +DOIVLE GHVLJQV DUH DOPRVW JOREDOO\ VXSHULRU WR WKH WZR IXOOVLE GHVLJQV LQ SUHFLVLRQ RI KHULWDELOLW\ HVWLPDWHV UHVXOWV QRW VKRZQ IRU YDULDQFH EXW PD\ EH VHHQ IURP HIILFLHQFLHV LQ )LJXUH f )RU GHVLJQV RI HTXDO VL]H KDOIVLE GHVLJQV H[FHO ZLWK WKH H[FHSWLRQ RI JHQHWLF OHYHO WKUHH )LJXUH OF K U% DQG \ f ,Q JHQHWLF OHYHO WKUHH WKH FLUFXODU GHVLJQ SURYLGHV WKH PRVW SUHFLVH HVWLPDWH RI K IRU WZR ORFDWLRQ GHVLJQV KRZHYHU ZKHQ WKH GHVLJQ LV H[WHQGHG DFURVV ILYH ORFDWLRQV WKH KDOIVLE PDWLQJ GHVLJQ DJDLQ SURYLGHV WKH PRVW SUHFLVH HVWLPDWHV 7KH FLUFXODU PDWLQJ GHVLJQ LV VXSHULRU LQ SUHFLVLRQ WR WKH KDOIGLDOOHO GHVLJQ DFURVV DOO OHYHOV RI JHQHWLF FRQWURO DQG ORFDWLRQ HYHQ ZLWK D UHODWLYHO\ ODUJH QXPEHU RI FURVVHV SHU SDUHQW IRXUf +DOIVLE GHVLJQV DUH LQ JHQHUDO VHYHQ JHQHWLF FRQWURO OHYHOV RXW RI HLJKW )LJXUH f PRUH HIILFLHQW ZLWK WKH H[FHSWLRQ RI OHYHO WKUHH DFURVV WZR ORFDWLRQV )LJXUH OFf )RU WKH FLUFXODU DQG KDOIVLE PDWLQJ GHVLJQV FRQVLGHUHG LQFUHDVLQJ WKH QXPEHU RI JHQHWLF HQWULHV DOZD\V LPSURYHV WKH HIILFLHQF\ RI WKH GHVLJQ +RZHYHU GHILQLWH RSWLPD H[LVW IRU WKH KDOIGLDOOHO PDWLQJ GHVLJQ IRU QXPEHU RI JHQHWLF HQWULHV LH FURVVHV ZKLFK FRQYHUW WR D VSHFLILF QXPEHU RI SDUHQWV 7KHVH RSWLPD DUH QRW FRQVWDQW EXW WHQG WR EH VL[ SDUHQWV RU OHVV ORZHU ZLWK LQFUHDVLQJ K RU QXPEHU RI ORFDWLRQV 7KH VL[SDUHQW KDOIGLDOOHO LV QHYHU IDU IURP WKH KDOIGLDOOHO RSWLPD DQG LQFUHDVLQJ WKH QXPEHU RI SDUHQWV SDVW WKH RSWLPXP UHVXOWV LQ GHFUHDVHG HIILFLHQF\ )RU KDOIVLE GHVLJQV ZLWK K ILYH ORFDWLRQV DUH PRUH HIILFLHQW WKDQ WZR ORFDWLRQV KRZHYHU DW K WZR ORFDWLRQV DUH PRVW HIILFLHQW )XUWKHU WKH QXPEHU RI ORFDWLRQV UHTXLUHG WR HIILFLHQWO\ HVWLPDWH K IRU KDOIVLE GHVLJQV LV GHWHUPLQHG RQO\ E\ WKH OHYHO RI K DQG GRHV QRW GHSHQG RQ WKH OHYHOV RI WKH RWKHU UDWLRV $OWKRXJK HVWLPDWHV RYHU ODUJHU QXPEHUV RI PAGE 28 REVHUYDWLRQV DUH PRUH SUHFLVH ILYHORFDWLRQ HVWLPDWHV DUH PRUH SUHFLVH WKDQ WZRORFDWLRQ HVWLPDWHVf WKH HIILFLHQF\ LQFUHDVH LQ SUHFLVLRQ SHU XQLW REVHUYDWLRQf GHFOLQHV 6R WKDW LI K DQG HVWLPDWHV RI D FHUWDLQ SUHFLVLRQ DUH UHTXLUHG GLVFRQQHFWHG VHWV RI WZRORFDWLRQ H[SHULPHQWV DUH SUHIHUUHG WR ILYHORFDWLRQ H[SHULPHQWV 7KH UHODWLYH HIILFLHQFLHV RI ILYH ORFDWLRQV YHUVXV WZR ORFDWLRQV LV HQKDQFHG ZLWK GHFUHDVLQJ U% LQFUHDVLQJ JHQRW\SH E\ HQYLURQPHQW LQWHUDFWLRQf ZLWKLQ Â£ OHYHO RI K FRPSDUH )LJXUHV OD WR OE DQG OF WR OG IRU K DQG OH WR OI DQG OJ WR OK IRU K f PAGE 29 ()),&,(1&< ()),&,(1&< Df K Â‘ r < r Ef K < f K LS 9 f K] < Ff K UB < Gf K < Jf K < Ef K Uf < *(1(7,& (175,(6 &LUFXODU ORFDWLRQV +DOIGLDOOHO ORFDWLRQV +DOIVLE ORFDWLRQV &LUFXODU ORFDWLRQV +DOIGLR'FO ORFDWLRQV +DOIVLE ORFDWLRQV )LJXUH (IILFLHQF\ Lf IRU U% SORWWHG DJDLQVW QXPEHU RI JHQHWLF HQWULHV IRU OHYHOV WKURXJK IRU JHQHWLF FRQWURO IRU FLUFXODU KDOIGLDOOHO DQG KDOIVLE PDWLQJ GHVLJQV DFURVV OHYHOV RI ORFDWLRQ ZKHUH L O19DUU%fff DQG 1 WKH WRWDO QXPEHU RI REVHUYDWLRQV YR PAGE 30 ()),&,(1&< ()),&,(1&< Df K U \ Ff K U% 9 &LUFXODU ORFDWLRQV Ef K U% \ Gf K U% 9 *(1(7,& (175,(6 +DOIGLDOOHO ORFDWLRQV &LUFXODU ORFDWLRQV +DOIGLDOOHO ORFDWLRQV Â‘ $ )LJXUH (IILFLHQF\ ]f IRU SORWWHG DJDLQVW QXPEHU RI JHQHWLF HQWULHV IRU IRXU OHYHOV IRU JHQHWLF FRQWURO IRU FLUFXODU KDOIGLDOOHO DQG KDOIVLE PDWLQJ GHVLJQV DFURVV OHYHOV RI ORFDWLRQ ZKHUH L O19DUfff DQG 1 WKH WRWDO QXPEHU RI REVHUYDWLRQV PAGE 31 )RU HVWLPDWLRQ RI U% IXOOVLE GHVLJQV DUH PRUH HIILFLHQW WKDQ KDOIVLE GHVLJQV H[FHSW LQ WKH WKUHH FDVHV RI ORZ U% f DQG KLJK f IRU K )LJXUH Ef DQG ORZ U% IRU K )LJXUHV I DQG Kf :LWKLQ IXOOVLE GHVLJQV WKH FLUFXODU GHVLJQ LV JOREDOO\ VXSHULRU WR WKH KDOIGLDOOHO $V ZLWK K HVWLPDWLRQ KDOIGLDOOHO GHVLJQV KDYH RSWLPDO OHYHOV IRU QXPEHUV RI SDUHQWV 7KH VL[SDUHQW KDOIGLDOOHO LV DJDLQ FORVH WR WKHVH RSWLPD IRU DOO JHQHWLF OHYHOV DQG QXPEHUV RI ORFDWLRQV $W ORZ K IRU IXOOVLE GHVLJQV SODQWLQJ LQ WZR ORFDWLRQV LV DOZD\V PRUH HIILFLHQW WKDQ ILYH ORFDWLRQV )RU KDOIVLE GHVLJQV DW ORZ K WKH UHODWLYH HIILFLHQF\ RI WZR YHUVXV ILYH ORFDWLRQV LV GHSHQGHQW RQ WKH OHYHO RI U% ZLWK ORZHU U% IDYRULQJ UHSOLFDWLRQ DFURVV PRUH ORFDWLRQV $W K KDOIVLE GHVLJQV DUH PRUH HIILFLHQW ZKHQ UHSOLFDWHG DFURVV ILYH ORFDWLRQV $W WKH KLJKHU K YDOXH IXOOVLE GHVLJQ HIILFLHQF\ DFURVV ORFDWLRQV LV GHSHQGHQW RQ WKH OHYHO RI U% :LWK U% DQG K UHSOLFDWLRQ RI IXOOVLE GHVLJQV LV IRU WKH ILUVW WLPH PRUH HIILFLHQW DFURVV ILYH ORFDWLRQV WKDQ DFURVV WZR ORFDWLRQV KRZHYHU DW WKH KLJKHU U% OHYHO WZR ORFDWLRQV LV DJDLQ WKH SUHIHUUHG QXPEHU 'RPLQDQFH WR $GGLWLYH 9DULDQFH 5DWLR ,Q FRPSDULQJ WKH WZR IXOOVLE GHVLJQV IRU UHODWLYH HIILFLHQF\ LQ HVWLPDWLQJ WKH FLUFXODU GHVLJQ LV DOZD\V DSSUR[LPDWHO\ HTXDO WR RU IRU PRVW FDVHV VXSHULRU WR WKH KDOIGLDOOHO GHVLJQ )LJXUH f 7KH UHODWLYH VXSHULRULW\ RI WKH FLUFXODU GHVLJQ LV HQKDQFHG E\ GHFUHDVLQJ DQG U% QRW VKRZQf 7KH KDOIGLDOOHO GHVLJQ DJDLQ GHPRQVWUDWHV RSWLPD IRU QXPEHU RI SDUHQWV ZLWK WKH VL[SDUHQW GHVLJQ EHLQJ QHDU RSWLPDO :LWKLQ D PDWLQJ GHVLJQ WKH XVH RI WZR ORFDWLRQV LV DOZD\V PRUH HIILFLHQW WKDQ WKH XVH RI ILYH ORFDWLRQV 7KH PDJQLWXGH RI WKLV VXSHULRULW\ HVFDODWHV ZLWK LQFUHDVLQJ U% DQG K )LJXUHV D DQG E YHUVXV F DQG Gf PAGE 32 'LVFXVVLRQ &RPSDULVRQ RI 0DWLQJ 'HVLJQV $ SULRUL NQRZOHGJH RI JHQHWLF FRQWURO LV UHTXLUHG WR FKRRVH WKH RSWLPDO PDWLQJ DQG ILHOG GHVLJQ IRU HVWLPDWLRQ RI K U% DQG *LYHQ WKDW VXFK NQRZOHGJH PD\ QRW EH DYDLODEOH WKH FKRLFHV DUH WKHQ EDVHG RQ WKH PRVW UREXVW PDWLQJ GHVLJQV DQG ILHOG GHVLJQV IRU WKH HVWLPDWLRQ RI FHUWDLQ RI WKH JHQHWLF UDWLRV ,I K LV WKH RQO\ UDWLR GHVLUHG WKHQ WKH KDOIVLE PDWLQJ GHVLJQ LV EHVW (VWLPDWLRQ RI ERWK K DQG U% UHTXLUHV D FKRLFH EHWZHHQ WKH KDOIVLE DQG FLUFXODU GHVLJQV ,I WKHUH LV QR SULRU NQRZOHGJH WKHQ WKH VHOHFWLRQ RI D PDWLQJ GHVLJQ LV GHSHQGHQW RQ ZKLFK UDWLR KDV WKH KLJKHVW SULRULW\ )RU H[SHULPHQWV LQ ZKLFK K UHFHLYHG KLJKHVW ZHLJKWLQJ WKH KDOIVLE GHVLJQ LV SUHIHUUHG DQG LQ WKH DOWHUQDWLYH FDVH WKH FLUFXODU GHVLJQ LV WKH EHWWHU FKRLFH ,Q WKH ODVW VFHQDULR LQIRUPDWLRQ RQ DOO WKUHH UDWLRV LV GHVLUHG IURP WKH VDPH H[SHULPHQW DQG LQ WKLV FDVH WKH FLUFXODU GHVLJQ LV WKH EHWWHU VHOHFWLRQ VLQFH WKH FLUFXODU GHVLJQ LV DOPRVW JOREDOO\ PRUH HIILFLHQW WKDQ WKH KDOIGLDOOHO GHVLJQ $IWHU FKRRVLQJ D PDWLQJ GHVLJQ WKH QH[W GHFLVLRQ LV KRZ PDQ\ ORFDWLRQV SHU H[SHULPHQW DUH UHTXLUHG WR RSWLPL]H HIILFLHQF\ )RU WKH KDOIVLE GHVLJQ WKH QXPEHU RI ORFDWLRQV UHTXLUHG WR RSWLPL]H HIILFLHQF\ LV GHSHQGHQW RQ ERWK WKH UDWLR EHLQJ HVWLPDWHG DQG WKH OHYHO RI JHQHWLF FRQWURO $ EURDG LQIHUHQFH LV WKDW IRU K HVWLPDWLRQ D WZR ORFDWLRQ H[SHULPHQW LV PRUH HIILFLHQW DQG IRU U% D ILYH ORFDWLRQ H[SHULPHQW KDV WKH EHWWHU HIILFLHQF\ (VWLPDWLRQ RI DQ\ RI WKH WKUHH UDWLRV ZLWK D IXOOVLE GHVLJQ LV DOPRVW JOREDOO\ PRUH HIILFLHQW LQ WZR ORFDWLRQ H[SHULPHQWV 7KH GLVSDULW\ EHWZHHQ WKH EHKDYLRU RI WKH KDOIVLE DQG IXOOVLE GHVLJQV ZLWK UHVSHFW WR WKH HIILFLHQF\ RI ORFDWLRQ OHYHOV FDQ EH H[SODLQHG LQ WHUPV RI WKH JHQHWLF FRQQHFWHGQHVV RIIHUHG E\ WKH GLIIHUHQW GHVLJQV *HQHWLF FRQQHFWHGQHVV FDQ EH YLHZHG DV FRPPRQDOLW\ RI SDUHQWDJH DPRQJ JHQHWLF HQWULHV 7KH PRUH HQWULHV KDYLQJ D FRPPRQ SDUHQW WKH PRUH FRQQHFWHGQHVV LV SUHVHQW PAGE 33 7KH KDOIVLE GHVLJQ LV RQO\ FRQQHFWHG DFURVV ORFDWLRQV E\ WKH RQH FRPPRQ SDUHQW LQ D KDOIVLE IDPLO\ LQ HDFK UHSOLFDWLRQ )XOOVLE GHVLJQV DUH FRQQHFWHG DFURVV ORFDWLRQV LQ HDFK UHSOLFDWLRQ E\ WKH IXOOVLE FURVV SOXV WKH QXPEHU RI SDUHQWV PLQXV WZR KDOIGLDOOHOf RU WKUHH FLUFXODUf IRU HDFK RI WKH WZR SDUHQWV LQ D FURVV 7KH FRQQHFWHGQHVV LQ D IXOOVLE GHVLJQ PHDQV HDFK REVHUYDWLRQ LV SURYLGLQJ LQIRUPDWLRQ DERXW PDQ\ RWKHU REVHUYDWLRQV 7KH UHVXOW RI WKLV FRQQHFWHGQHVV LV WKDW LQ JHQHUDO IHZHU REVHUYDWLRQV QXPEHU RI ORFDWLRQVf DUH UHTXLUHG IRU PD[LPXP HIILFLHQF\ $ *HQHUDO $SSURDFK WR WKH (VWLPDWLRQ 3UREOHP 7KH HVWLPDWLRQ SUREOHPV PD\ EH YLHZHG LQ D EURDGHU FRQWH[W WKDQ WKH VSHFLILF VROXWLRQV LQ WKLV FKDSWHU 7KH WHFKQLTXH IRU FRPSDULVRQ RI PDWLQJ GHVLJQV DQG QXPEHUV RI ORFDWLRQV DFURVV OHYHOV RI JHQHWLF GHWHUPLQDWLRQ PD\ EH FRQVWUXHG IRU WKH FDVH RI K HVWLPDWLRQ WR EH WKH HIIHFW RI WKHVH IDFWRUV RQ WKH YDULDQFH RI DA HVWLPDWHV 9LHZLQJ WKH YDULDQFH DSSUR[LPDWLRQ IRUPXOD WKH FRQFOXVLRQ PD\ EH UHDFKHG WKDW WKH YDULDQFH RI RA HVWLPDWHV LV WKH FRQWUROOLQJ IDFWRU LQ WKH YDULDQFH RI K HVWLPDWHV VLQFH WKH RWKHU IDFWRUV DW WKHVH KHULWDELOLW\ OHYHOV DUH PXOWLSOLHG E\ FRQVWDQWV ZKLFK UHGXFH WKHLU LPSDFW GUDPDWLFDOO\ *LYHQ WKLV FRQFOXVLRQ WKH YDULDQFH RI K HVWLPDWHV LV HVVHQWLDOO\ WKH f HOHPHQW LQ ^WU49c49Mf`nf (T f )XUWKHU VLQFH WKH FRYDULDQFHV RI WKH RWKHU YDULDQFH FRPSRQHQW HVWLPDWHV ZLWK RA HVWLPDWHV DUH VPDOO WKH YDULDQFH RI DA HVWLPDWHV LV EDVLFDOO\ GHWHUPLQHG E\ WKH PDJQLWXGH RI WKH f HOHPHQW RI ^WU49c49Mf` ZKLFK LV WU49J49Jf 7KXV WKH YDULDQFH RI K HVWLPDWHV LV PLQLPL]HG E\ PD[LPL]LQJ WU49J49Jf ZLWK K XVHG DV DQ LOOXVWUDWLRQ EHFDXVH WKLV VLPSOLILFDWLRQ LV SRVVLEOH &RQVLGHULQJ WKH LPSDFW RI FKDQJLQJ OHYHOV RI JHQHWLF FRQWURO ZKLOH KROGLQJ WKH PDWLQJ DQG ILHOG GHVLJQV FRQVWDQW 9J LV IL[HG WKH GLDJRQDO HOHPHQWV RI 9 DUH IL[HG DW EHFDXVH RI RXU DVVXPSWLRQV DQG RQO\ WKH RIIGLDJRQDO HOHPHQWV RI 9 FKDQJH ZLWK JHQHWLF FRQWURO OHYHOV 6LQFH 4 LV D GLUHFW IXQFWLRQ RI 9 ZKDW ZH REVHUYH LQ )LJXUH FRPSDULQJ D GHVLJQ DFURVV PAGE 34 OHYHOV RI JHQHWLF FRQWURO DUH FKDQJHV LQ 9n EURXJKW DERXW E\ FKDQJHV LQ WKH PDJQLWXGH RI WKH RII GLDJRQDO HOHPHQWV RI 9 FRYDULDQFHV DPRQJ REVHUYDWLRQVf 7KH HIIHFW RI SRVLWLYH WKH OLQHDU PRGHO VSHFLILHV WKDW DOO RIIGLDJRQDO HOHPHQWV LQ 9 DUH ]HUR RU SRVLWLYHf RIIGLDJRQDO HOHPHQWV RQ 9n LV WR UHGXFH WKH PDJQLWXGH RI WKH GLDJRQDO HOHPHQWV DQG RIWHQ DOVR UHVXOW LQ QHJDWLYH RII GLDJRQDO HOHPHQWV ,I RQH LQFUHDVHV WKH PDJQLWXGH RI WKH RIIGLDJRQDO HOHPHQWV LQ 9 WKHQ WKH PDJQLWXGH RI WKH GLDJRQDO HOHPHQWV RI 9n LV UHGXFHG DQG WKH PDJQLWXGH RI QHJDWLYH RIIGLDJRQDO HOHPHQWV LV LQFUHDVHG 6LQFH WU49J49Jf LV WKH VXP RI WKH VTXDUHG HOHPHQWV RI WKH SURGXFW RI D GLUHFW IXQFWLRQ RI 9n DQG D PDWUL[ RI QRQQHJDWLYH FRQVWDQWV 9Jf DV WKH GLDJRQDO HOHPHQWV RI 9n DUH UHGXFHG DQG WKH RIIGLDJRQDO HOHPHQWV EHFRPH PRUH QHJDWLYH WU49J49Jf PXVW EHFRPH VPDOOHU DQG WKH YDULDQFH RI K HVWLPDWHV LQFUHDVHV 0DWLQJ GHVLJQV PD\ EH FRPSDUHG E\ WKH VDPH W\SH RI UHDVRQLQJ :LWKLQ D FRQVWDQW ILHOG GHVLJQ FKDQJHV LQ PDWLQJ GHVLJQ SURGXFH DOWHUDWLRQV LQ 9 2I WKH WKUHH GHVLJQV WKH KDOIVLE SURGXFHV D 9 PDWUL[ ZLWK WKH PRVW ]HUR RIIGLDJRQDO HOHPHQWV WKH FLUFXODU GHVLJQ QH[W DQG WKH KDOIGLDOOHO WKH IHZHVW QXPEHU RI ]HUR RIIGLDJRQDO HOHPHQWV .QRZLQJ WKH HIIHFW RI RIIGLDJRQDO HOHPHQWV RQ WKH YDULDQFH RI K HVWLPDWHV RQH FRXOG VXUPLVH WKDW WKH YDULDQFH RI HVWLPDWHV LV UHGXFHG LQ WKH RUGHU RI OHDVW WR PRVW QRQ]HUR RIIGLDJRQDO HOHPHQWV 7KLV WHQDQW LV LQ EDVLF DJUHHPHQW ZLWK WKH UHVXOWV LQ )LJXUHV WKURXJK 7KH HIIHFWV RI U% DQG \ RQ WKH YDULDQFH RI K HVWLPDWHV FDQ DOVR EH LQWHUSUHWHG XWLOL]LQJ WKH DERYH DSSURDFK ,Q WKH UHVXOWV VHFWLRQ RI WKLV FKDSWHU LW LV QRWHG WKDW GHFUHDVLQJ WKH PDJQLWXGH RI U% DQGRU \ FDXVHV IXOOVLE GHVLJQV WR ULVH LQ HIILFLHQF\ UHODWLYH WR WKH KDOIVLE GHVLJQ ,Q DFFRUGDQFH ZLWK RXU SUHYLRXV DUJXPHQWV WKLV ZRXOG EH H[SHFWHG VLQFH GHFUHDVLQJ WKH PDJQLWXGH RI WKRVH WZR UDWLRV FDXVHV D GHFUHDVH LQ WKH PDJQLWXGH RI RIIGLDJRQDO HOHPHQWV 0RUH SUHFLVHO\ GHFUHDVLQJ \ UHVXOWV LQ WKH UHGXFWLRQ RI RIIGLDJRQDO HOHPHQWV LQ 9 RI WKH IXOOVLE GHVLJQV ZKLOH QRW DIIHFWLQJ WKH KDOIVLE GHVLJQ DQG GHFUHDVLQJ U% UHVXOWV LQ WKH UHGXFWLRQ RI RIIGLDJRQDO PAGE 35 HOHPHQWV LQ 9 RI IXOOVLE DQG KDOIVLE GHVLJQV 5HODWLYH LQFUHDVHV LQ HIILFLHQF\ RI IXOOVLE GHVLJQV UHVXOW IURP WKH HOHPHQWV GXH WR ORFDWLRQ E\ DGGLWLYH LQWHUDFWLRQ RFFXUULQJ PXFK OHVV IUHTXHQWO\ LQ WKH KDOIVLE GHVLJQV WKXV WKH UHODWLYH LPSDFW RI UHGXFWLRQ LQ U% LQ KDOIVLE GHVLJQV LV OHVV WKDQ WKDW IRU IXOOVLEV 8VH RI WKH 9DULDQFH RI D 5DWLR $SSUR[LPDWLRQ 8VH RI .HQGDOO DQG 6WXDUWfV f ILUVW DSSUR[LPDWLRQ ILUVWWHUP 7D\ORU VHULHV DSSUR[LPDWLRQf RI WKH YDULDQFH RI D UDWLR KDV WZR PDMRU FDYHDWV 7KH DSSUR[LPDWLRQ GHSHQGV RQ ODUJH VDPSOH SURSHUWLHV WR DSSURDFK WKH WUXH YDULDQFH RI WKH UDWLR LH ZLWK D VPDOO QXPEHU RI OHYHOV IRU UDQGRP YDULDEOHV WKH DSSUR[LPDWLRQ GRHV QRW QHFHVVDULO\ FORVHO\ DSSUR[LPDWH WKH WUXH YDULDQFH RI WKH UDWLR :RUN E\ 3HGHUVRQ f VXJJHVWV WKDW IRU DSSUR[LPDWLQJ WKH YDULDQFH RI K DW OHDVW WHQ SDUHQWV DUH UHTXLUHG LQ GLDOOHOV EHIRUH WKH DSSUR[LPDWLRQ ZLOO FRQYHUJH WR WKH WUXH YDULDQFH HYHQ DIWHU LQFOXGLQJ 7D\ORU VHULHV WHUPV SDVW WKH ILUVW GHULYDWLYH 3HGHUVRQfV ZRUN DOVR VXJJHVWV WKDW WKH DSSUR[LPDWLRQ LV SURJUHVVLYHO\ ZRUVH IRU LQFUHDVLQJ KHULWDELOLW\ ZLWK ORZ QXPEHUV RI SDUHQWV 8VLQJ WKH ILHOG GHVLJQ LQ WKLV FKDSWHU WZR ORFDWLRQVIRXU EORFNV DQG VL[WUHH URZSORWVf VLPXODWLRQ ZRUN GDWD VHWVf KDV GHPRQVWUDWHG WKDW ZLWK D KHULWDELOLW\ RI XVLQJ IRXU SDUHQWV LQ D KDOIGLDOOHO DFURVV WZR ORFDWLRQV WKDW WKH YDULDQFH RI D UDWLR DSSUR[LPDWLRQ \LHOGV D YDULDQFH HVWLPDWH IRU K RI ZKLOH WKH FRQYHUJHQW YDOXH IRU WKH VLPXODWLRQ ZDV +XEHU XQSXEOLVKHG GDWDf 2QH VKRXOG UHPHPEHU WKH GHSHQGHQFH RI WKH ILUVW DSSUR[LPDWLRQ RI WKH YDULDQFH RI D UDWLR RQ ODUJH VDPSOH SURSHUWLHV ZKHQ DSSO\LQJ WKH WHFKQLTXH WR UHDO GDWD 7KH VHFRQG FDYHDW LV WKDW WKH UDQJH RI HVWLPDWHV RI WKH GHQRPLQDWRU RI WKH UDWLR FDQQRW SDVV WKURXJK ]HUR .HQGDOO DQG 6WXDUW f 7KLV FRQVWUDLQW LV RI QR FRQFHUQ IRU K KRZHYHU WKH VWUXFWXUH RI U% DQG \ GHQRPLQDWRUV DOORZV XQELDVHG PLQLPXP YDULDQFH HVWLPDWHV RI WKRVH GHQRPLQDWRUV WR SDVV WKURXJK ]HUR ZKLFK PHDQV DW RQH SRLQW LQ WKH GLVWULEXWLRQ RI WKH HVWLPDWHV PAGE 36 RI WKH UDWLRV WKH\ DUH XQGHILQHG WKH GLVWULEXWLRQV RI WKHVH UDWLR HVWLPDWHV DUH QRW FRQWLQXRXVf 6LPXODWLRQ KDV VKRZQ WKDW WKH YDULDQFHV RI U% DQG DUH PXFK JUHDWHU WKDQ WKH DSSUR[LPDWLRQ ZRXOG LQGLFDWH +XEHU XQSXEOLVKHG GDWDf 7KH GLVFUHSDQF\ LQ YDULDQFH RI WKH HVWLPDWHV FRXOG EH SDUWLDOO\ DOOHYLDWHG WKURXJK XVLQJ D YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXH ZKLFK UHVWULFWV HVWLPDWHV WR WKH SDUDPHWHU VSDFH D RR 1HYHUWKHOHVV EHFDXVH RI WKH WZR FDYHDWV DSSUR[LPDWLRQV RI WKH YDULDQFH RI K U% DQG HVWLPDWHV VKRXOG EH YLHZHG RQO\ RQ D UHODWLYH EDVLV IRU FRPSDULVRQV DPRQJ GHVLJQV DQG QRW RQ DQ DEVROXWH VFDOH $GGLWLRQDOO\ WKH H[SHFWDWLRQ RI D UDWLR GRHV QRW HTXDO WKH UDWLR RI WKH H[SHFWDWLRQV +RJJ DQG &UDLJ f ,I D YDOXH RI JHQHWLF UDWLRV LV VRXJKW VR WKDW WKH YDOXH HTXDOV WKH UDWLR RI WKH H[SHFWDWLRQV WKHQ WKH DSSURSULDWH ZD\ WR FDOFXODWH WKH UDWLR ZRXOG EH WR WDNH WKH PHDQ RI YDULDQFH FRPSRQHQWV RU OLQHDU FRPELQDWLRQV RI YDULDQFH FRPSRQHQWV DFURVV PDQ\ H[SHULPHQWV DQG WKHQ WDNH WKH UDWLR ,I WKH YDOXH VRXJKW IRU K LV WKH H[SHFWDWLRQ RI WKH UDWLR WKHQ WDNLQJ WKH PHDQ RI PDQ\ K HVWLPDWHV LV WKH DSSURSULDWH DSSURDFK 5HWXUQLQJ WR WKH UHVXOWV IURP VLPXODWHG GDWD GDWD VHWVf ZKHUH WKH K YDOXH ZDV VHW DW XVLQJ WKH UDWLR RI WKH PHDQV RI YDULDQFH FRPSRQHQWV UHQGHUHG D YDOXH RI IRU K WKH PHDQ RI WKH K HVWLPDWHV UHWXUQHG D YDOXH RI DQG D 7D\ORU VHULHV DSSUR[LPDWLRQ RI WKH PHDQ RI WKH UDWLR \LHOGHG 3HGHUVRQ f &RQFOXVLRQV 5HVXOWV IURP WKLV VWXG\ VKRXOG EH LQWHUSUHWHG DV UHODWLYH FRPSDULVRQV RI WKH OHYHOV RI WKH IDFWRUV LQYHVWLJDWHG +RZHYHU YLHZLQJ WKH RSWLPDO GHVLJQ SUREOHP DV LOOXVWUDWHG LQ WKH GLVFXVVLRQ VHFWLRQ RI WKLV FKDSWHU FDQ SURYLGH LQVLJKW WR WKH PRUH JHQHUDO SUREOHP 7KHUH LV QR JOREDOO\ PRVW HIILFLHQW QXPEHU RI ORFDWLRQV SDUHQWV RU PDWLQJ GHVLJQ IRU WKH WKUHH UDWLRV HVWLPDWHG HYHQ ZLWKLQ WKH UHVWULFWHG UDQJH RI WKLV VWXG\ \HW VRPH JHQHUDO FRQFOXVLRQV FDQ EH GUDZQ )RU HVWLPDWLQJ K WKH KDOIVLE GHVLJQ LV DOZD\V RSWLPDO RU FORVH WR RSWLPDO LQ PAGE 37 WHUPV RI YDULDQFH RI HVWLPDWLRQ DQG HIILFLHQF\ ,Q WKH HVWLPDWLRQ RI U% DQG WKH FLUFXODU PDWLQJ GHVLJQ LV DOZD\V RSWLPDO RU QHDU RSWLPDO LQ YDULDQFH UHGXFWLRQ DQG HIILFLHQF\ $FURVV QXPEHUV RI SDUHQWV ZLWKLQ D PDWLQJ GHVLJQ RQO\ WKH KDOIGLDOOHO VKRZV RSWLPD IRU HIILFLHQF\ 7KH RWKHU PDWLQJ GHVLJQV KDYH QRQGHFUHDVLQJ HIILFLHQF\ SORWV RYHU WKH OHYHO RI QXPEHU RI SDUHQW VR WKDW ZKLOH WKHUH LV DQ RSWLPDO QXPEHU RI ORFDWLRQV IRU D OHYHO RI JHQHWLF FRQWURO WKH QXPEHU RI JHQHWLF HQWULHV SHU ORFDWLRQ LV OLPLWHG PRUH E\ RSHUDWLRQDO WKDQ HIILFLHQF\ FRQVWUDLQWV 7ZR ORFDWLRQV LV D QHDU JOREDO RSWLPXP RYHU ILYH ORFDWLRQV IRU WKH IXOOVLE PDWLQJ GHVLJQV :LWKLQ WKH KDOIVLE PDWLQJ GHVLJQ RSWLPDOLW\ GHSHQGV RQ WKH OHYHOV RI K DQG U% f IRU K HVWLPDWLRQ WKH RSWLPDO QXPEHU RI ORFDWLRQV LV LQYHUVHO\ UHODWHG WR WKH OHYHO RI K LH DW WKH KLJKHU OHYHO WZR WHVWV ZHUH RSWLPDO DQG DW WKH ORZHU OHYHO ILYH WHVWV ZHUH RSWLPDO DQG f IRU U% HVWLPDWLRQ IRU WKH KDOIVLE GHVLJQ WKH RSWLPDO QXPEHU RI ORFDWLRQV ZDV DOVR LQYHUVHO\ UHODWHG WR WKH OHYHO RI U% 0HDQV RI HVWLPDWHV IURP GLVFRQQHFWHG VHWV SURYLGH ORZHU YDULDQFH RI HVWLPDWLRQ ZKHUH WKH VPDOOHU H[SHULPHQWV KDYH KLJKHU HIILFLHQFLHV 7KXV GLVFRQQHFWHG VHWV DUH SUHIHUUHG DFFRUGLQJ WR QXPEHU RI ORFDWLRQV IRU DOO PDWLQJ GHVLJQV DQG DFFRUGLQJ WR QXPEHU RI SDUHQWV IRU WKH KDOI GLDOOHO PDWLQJ GHVLJQ ,Q SUDFWLFDO FRQVLGHUDWLRQ RI WKH RSWLPDO PDWLQJ GHVLJQ SUREOHP WKH UHVXOWV RI WKLV VWXG\ LQGLFDWH WKDW LI K HVWLPDWLRQ LV WKH SULPDU\ XVH RI D SURJHQ\ WHVW WKHQ WKH KDOIVLE PDWLQJ GHVLJQ LV WKH SURSHU FKRLFH )XUWKHU WKH FLUFXODU PDWLQJ GHVLJQ LV DQ DSSURSULDWH FKRLFH LI WKH HVWLPDWLRQ RI U% LV PRUH LPSRUWDQW WKDQ K )LQDOO\ LI D IXOOVLE GHVLJQ LV UHTXLUHG WR IXUQLVK LQIRUPDWLRQ DERXW GRPLQDQFH YDULDQFH WKH FLUFXODU GHVLJQ SURYLGHV DOPRVW JOREDOO\ EHWWHU HIILFLHQFLHV IRU K U% DQG \ WKDQ WKH KDOIGLDOOHO PAGE 38 &+$37(5 25',1$5< /($67 648$5(6 (67,0$7,21 2) *(1(5$/ $1' 63(&,),& &20%,1,1* $%,/,7,(6 )520 +$/)',$//(/ 0$7,1* '(6,*16 ,QWURGXFWLRQ 7KH GLDOOHO PDWLQJ V\VWHP LV DQ DOWHUHG IDFWRULDO GHVLJQ LQ ZKLFK WKH VDPH LQGLYLGXDOV RU OLQHVf DUH XVHG DV ERWK PDOH DQG IHPDOH SDUHQWV $ IXOO GLDOOHO FRQWDLQV DOO FURVVHV LQFOXGLQJ UHFLSURFDO FURVVHV DQG VHLIV UHVXOWLQJ LQ D WRWDO RI S FRPELQDWLRQV ZKHUH S LV WKH QXPEHU RI SDUHQWV $VVXPSWLRQV WKDW UHFLSURFDO HIIHFWV PDWHUQDO HIIHFWV DQG SDWHUQDO HIIHFWV DUH QHJOLJLEOH OHDG WR WKH XVH RI WKH KDOIGLDOOHO PDWLQJ V\VWHP *ULIILQJ PHWKRG f ZKLFK KDV SSOf SDUHQWDO FRPELQDWLRQV DQG LV WKH PDWLQJ V\VWHP DGGUHVVHG LQ WKLV FKDSWHU +DOI GLDOOHOV KDYH EHHQ ZLGHO\ XVHG LQ FURS DQG WUHH EUHHGLQJ 6SUDJXH DQG 7DWXP *LOEHUW 0DW]LQJHU HW DO %XUOH\ HW DO DQG 6TXLOODFH f DQG WKH ZLGHVSUHDG XVH RI WKLV PDWLQJ V\VWHP FRQWLQXHV WRGD\ :HLU DQG =REHO :LOFR[ HW DO 6Q\GHU DQG 1DPNRRQJ +DOODXHU DQG 0LUDQGD 6LQJK DQG 6LQJK *UHHQZRRG HW DO DQG :HLU DQG *RGGDUG f 0RVW RI WKH VWDWLVWLFDO SDFNDJHV DYDLODEOH WUHDW IL[HG HIIHFW HVWLPDWLRQ DV WKH REMHFWLYH RI WKH SURJUDP ZLWK UDQGRP YDULDEOHV UHSUHVHQWLQJ QXLVDQFH YDULDWLRQ :LWKLQ WKLV FRQWH[W D FRPPRQ DQDO\VLV RI KDOIGLDOOHO H[SHULPHQWV LV FRQGXFWHG E\ ILUVW WUHDWLQJ JHQHWLF SDUDPHWHUV DV IL[HG HIIHFWV IRU HVWLPDWLRQ RI JHQHUDO *&$f DQG VSHFLILF 6&$f FRPELQLQJ DELOLWLHV DQG VXEVHTXHQWO\ DV UDQGRP YDULDEOHV IRU YDULDQFH FRPSRQHQW HVWLPDWLRQ XVHG IRU HVWLPDWLQJ KHULWDELOLWLHV JHQHWLF FRUUHODWLRQV DQG JHQHUDO WR VSHFLILF FRPELQLQJ DELOLW\ YDULDQFH UDWLRV IRU PAGE 39 GHWHUPLQLQJ EUHHGLQJ VWUDWHJLHVf 7KLV FKDSWHU IRFXVHV RQ WKH HVWLPDWLRQ RI *&$fV DQG 6&$fV DV IL[HG HIIHFWV 7KH WUHDWPHQW RI *&$ DQG 6&$ DV IL[HG HIIHFWV LQ 2/6 RUGLQDU\ OHDVW VTXDUHVf LV DQ HQWLUHO\ DSSURSULDWH DQDO\VLV LI WKH FRPSDULVRQV DUH DPRQJ SDUHQWV DQG FURVVHV LQ D SDUWLFXODU H[SHULPHQW ,I DV IRUHVW JHQHWLFLVWV RIWHQ ZLVK WR GR *&$ HVWLPDWHV IURP GLVFRQQHFWHG H[SHULPHQWV DUH WR EH FRPSDUHG WKHQ PHWKRGV VXFK DV FKHFNORWV PXVW EH XVHG WR SODFH WKH HVWLPDWHV RQ D FRPPRQ EDVLV )RUPXODH *ULIILQJ )DOFRQHU +DOODXHU DQG 0LUDQGD DQG %HFNHU f IRU KDQG FDOFXODWLRQ RI JHQHUDO DQG VSHFLILF FRPELQLQJ DELOLWLHV DUH EDVHG RQ D VROXWLRQ WR WKH 2/6 HTXDWLRQV IRU KDOIGLDOOHOV FUHDWHG E\ VXPWR]HUR UHVWULFWLRQV LH WKH VXP RI DOO HIIHFW HVWLPDWHV IRU DQ H[SHULPHQWDO IDFWRU HTXDOV ]HUR 7KHVH IRUPXODH ZLOO \LHOG FRUUHFW 2/6 VROXWLRQV IRU VXP WR]HUR JHQHWLF SDUDPHWHUV SURYLGHG WKH GDWD KDYH QR PLVVLQJ FHOOV ,I FHOO SORWf PHDQV DUH XVHG DV WKH EDVLV IRU WKH HVWLPDWLRQ RI HIIHFWV WKHUH PXVW EH DW OHDVW RQH REVHUYDWLRQ SHU FHOO SORWf ZKHUH D FHOO LV D VXEFODVVLILFDWLRQ RI WKH GDWD GHILQHG E\ RQH OHYHO RI HYHU\ IDFWRU 6HDUOH f $Q H[DPSOH RI D FHOO LV WKH JURXS RI REVHUYDWLRQV GHQRWHG E\ $%cM IRU D UDQGRPL]HG FRPSOHWH EORFN GHVLJQ ZLWK IDFWRU $ DFURVV EORFNV %f ,I WKH DERYH IRUPXODH DUH DSSOLHG ZLWKRXW DFFRXQWLQJ IRU PLVVLQJ FHOOV LQFRUUHFW DQG SRVVLEO\ PLVOHDGLQJ VROXWLRQV FDQ UHVXOW 7KH PDWUL[ DOJHEUD DSSURDFK LV GHVFULEHG LQ WKLV FKDSWHU IRU WKHVH UHDVRQV f LQ IRUHVW WUHH EUHHGLQJ DSSOLFDWLRQV GDWD VHWV ZLWK PLVVLQJ FHOOV DUH H[WUHPHO\ FRPPRQ f PDQ\ VWDWLVWLFDO SDFNDJHV GR QRW DOORZ GLUHFW VSHFLILFDWLRQ RI WKH KDOIGLDOOHO PRGHO f WKH XVH RI D OLQHDU PRGHO DQG PDWUL[ DOJHEUD FDQ \LHOG UHOHYDQW 2/6 VROXWLRQV IRU DQ\ GHJUHH RI GDWD LPEDODQFH DQG f YLHZLQJ WKH PHFKDQLFV RI WKH 2/6 DSSURDFK LV DQ DLG WR XQGHUVWDQGLQJ WKH SURSHUWLHV RI WKH HVWLPDWHV 7KH REMHFWLYHV RI WKLV FKDSWHU DUH WR f GHWDLO WKH FRQVWUXFWLRQ RI RUGLQDU\ OHDVW VTXDUHV 2/6f DQDO\VLV RI KDOIGLDOOHO GDWD VHWV WR HVWLPDWH JHQHWLF SDUDPHWHUV *&$ DQG 6&$f DV IL[HG HIIHFWV f UHFRXQW WKH DVVXPSWLRQV DQG PDWKHPDWLFDO IHDWXUHV RI WKLV W\SH RI DQDO\VLV f PAGE 40 IDFLOLWDWH WKH UHDGHUfV LPSOHPHQWDWLRQ RI 2/6 DQDO\VHV IRU GLDOOHOV RI DQ\ GHJUHH RI LPEDODQFH DQG VXJJHVW D PHWKRG IRU FRPELQLQJ HVWLPDWHV IURP GLVFRQQHFWHG H[SHULPHQWV DQG f DLG WKH UHDGHU LQ DVFHUWDLQLQJ ZKDW PHWKRG LV DQ DSSURSULDWH DQDO\VLV IRU D JLYHQ GDWD VHW 0HWKRGV /LQHDU 0RGHO 3ORW PHDQV DUH XVHG DV WKH XQLW RI REVHUYDWLRQ IRU WKLV DQDO\VLV ZLWK XQHTXDO QXPEHUV RI REVHUYDWLRQV SHU SORW 3ORW FHOOf PHDQV DUH DOZD\V HVWLPDEOH DV ORQJ DV WKHUH LV RQH REVHUYDWLRQ SHU SORW DQG OLQHDU FRPELQDWLRQV RI WKHVH PHDQV OHDVW VTXDUHV PHDQVf SURYLGH WKH PRVW HIILFLHQW ZD\ RI HVWLPDWLQJ 2/6 IL[HG HIIHFWV PAGE 41 HLMN LV WKH UDQGRP HUURU DVVRFLDWHG ZLWK WKH REVHUYDWLRQ RI WKH MN FURVV LQ WKH L EORFN ZKHUH HLMN B DHf &URVV E\ EORFN LQWHUDFWLRQ DV JHQRW\SH E\ HQYLURQPHQW LQWHUDFWLRQ LV WUHDWHG DV FRQIRXQGHG ZLWK EHWZHHQ SORW YDULDWLRQ DV IRU FRQWLJXRXV SORWV 7KH PRGHO LQ PDWUL[ QRWDWLRQ LV \ ;H ZKHUH \ LV WKH YHFWRU RI REVHUYDWLRQ YHFWRUV Q[O Q URZV DQG FROXPQf ZKHUH Q HTXDOV WKH QXPEHU RI REVHUYDWLRQV ; LV WKH GHVLJQ PDWUL[ Q[Pf ZKRVH IXQFWLRQ LV WR VHOHFW WKH DSSURSULDWH SDUDPHWHUV IRU HDFK REVHUYDWLRQ ZKHUH P HTXDOV WKH QXPEHU RI IL[HG HIIHFW SDUDPHWHUV LQ WKH PRGHO LV WKH YHFWRU P[Of RI IL[HG HIIHFW SDUDPHWHUV RUGHUHG LQ D FROXPQ DQG H LV WKH YHFWRU Q[Of RI GHYLDWLRQV HUURUVf IURP WKH H[SHFWDWLRQ DVVRFLDWHG ZLWK HDFK REVHUYDWLRQ 2UGLQDU\ /HDVW 6TXDUHV 6ROXWLRQV 7KH PDWUL[ UHSUHVHQWDWLRQ RI DQ 2/6 IL[HG HIIHFWV VROXWLRQ LV E ;f;\;f\ ZKHUH E LV WKH YHFWRU RI HVWLPDWHG IL[HG HIIHFW SDUDPHWHUV LH DQ HVWLPDWH RI DQG ; LV WKH GHVLJQ PDWUL[ HLWKHU PDGH IXOO UDQN E\ UHSDUDPHWHUL]DWLRQ RU D JHQHUDOL]HG LQYHUVH RI ;f; PD\ EH XVHG ,QKHUHQW LQ WKLV VROXWLRQ LV WKH RUGLQDU\ OHDVW VTXDUHV DVVXPSWLRQ WKDW WKH YDULDQFH PAGE 42 FRYDULDQFH PDWUL[ 9f RI WKH REVHUYDWLRQV \f LV HTXDO WR D ZKHUH LV DQ Q[Q LGHQWLW\ PDWUL[ 7KH HOHPHQWV RI DQ LGHQWLW\ PDWUL[ DUH OfV RQ WKH PDLQ GLDJRQDO DQG DOO RWKHU HOHPHQWV DUH 0XOWLSO\LQJ E\ FUH SODFHV UH RQ WKH PDLQ GLDJRQDO ,Q WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV WKH YDULDQFH RI WKH REVHUYDWLRQV DSSHDUV RQ WKH PDLQ GLDJRQDO DQG WKH FRYDULDQFH EHWZHHQ REVHUYDWLRQV DSSHDUV LQ WKH RIIGLDJRQDO HOHPHQWV 7KXV 9 ,DH VWDWHV WKDW WKH YDULDQFH RI WKH REVHUYDWLRQV LV HTXDO WR DH IRU HDFK REVHUYDWLRQ DQG WKHUH DUH QR FRYDULDQFHV EHWZHHQ WKH REVHUYDWLRQV ZKLFK LV RQH GLUHFW UHVXOW RI FRQVLGHULQJ JHQHWLF SDUDPHWHUV DV IL[HG HIIHFWVf 6XPWR=HUR 5HVWULFWLRQV 7KH GHVLJQ PDWUL[ SUHVHQWHG LQ WKLV FKDSWHU LV UHSDUDPHWHUL]HG E\ VXPWR]HUR UHVWULFWLRQV WR f UHGXFH WKH GLPHQVLRQ RI WKH PDWULFHV WR D PLQLPDO VL]H DQG f \LHOG HVWLPDWHV RI IL[HG HIIHFWV ZLWK WKH VDPH VROXWLRQ DV FRPPRQ IRUPXODH LQ WKH EDODQFHG FDVH 2WKHU UHVWULFWLRQV VXFK DV VHWWR]HUR FRXOG DOVR EH DSSOLHG VR WKH GLVFXVVLRQ WKDW IROORZV WUHDWV VXPWR]HUR UHVWULFWLRQV DV D VSHFLILF VROXWLRQ WR WKH PRUH JHQHUDO SUREOHP ZKLFK LV ILQGLQJ DQ LQYHUVH IRU ;f; 7KH VXEVFULSWV fRf DQG fVf UHIHU WR WKH RYHUSDUDPHWHUL]HG PRGHO DQG WKH UHSDUDPHWHUL]HG PRGHO ZLWK VXPWR]HUR UHVWULFWLRQV UHVSHFWLYHO\ 7KH PDWUL[ ; RI )LJXUH LV WKH GHVLJQ PDWUL[ IRU DQ RYHUSDUDPHWHUL]HG OLQHDU PRGHO 0LOOLNHQ DQG -RKQVRQ SDJH f 2YHUSDUDPHWHUL]DWLRQ PHDQV WKDW WKH HTXDWLRQV DUH ZULWWHQ LQ PRUH XQNQRZQV SDUDPHWHUV LQ WKLV FDVH f WKDQ WKHUH DUH HTXDWLRQV QXPEHU RI REVHUYDWLRQV PLQXV GHJUHHV RI IUHHGRP IRU HUURU LQ WKLV FDVH f ZLWK ZKLFK WR HVWLPDWH WKH SDUDPHWHUV 5HSDUDPHWHUL]DWLRQ DV D VXPWR]HUR PDWUL[ RYHUFRPHV WKLV GLOHPPD E\ UHGXFLQJ WKH QXPEHU RI SDUDPHWHUV WKURXJK PDNLQJ VRPH RI WKH SDUDPHWHUV OLQHDU FRPELQDWLRQV RI RWKHUV 6XPWR]HUR UHVWULFWLRQV PDNH WKH UHVXOWLQJ SDUDPHWHUV DQG HVWLPDWHV VXP WR ]HUR HYHQ WKRXJK PAGE 43 WKH XQUHVWULFWHG SDUDPHWHUV IRU H[DPSOH WKH WUXH *&$ YDOXHV DV DSSOLHG WR D EURDGHU SRSXODWLRQf GR QRW QHFHVVDULO\ VXPWR]HUR ZLWKLQ D GLDOOHO 7KLV LV WKH SUREOHP RI FRPSDUDELOLW\ RI *&$ HVWLPDWHV IURP GLVFRQQHFWHG H[SHULPHQWV \ \XV \LZ \LD \A \X fÂ§ \rL \L \0 \L \ B\ % % *&$ *&$M JFD *&$f VFD VFDL VFD 6& $\ 6&$M 6&$f n % % *&$ *&$M *&$ L JFD L VFD L VFD L VFD L 6& $\ L 6&$ 6&$r B \ [ S )LJXUH 7KH RYHUSDUDPHWHUL]HG OLQHDU PRGHO IRU D IRXUSDUHQW KDOIGLDOOHO SODQWHG RQ D VLQJOH VLWH LQ WZR EORFNV GLVSOD\HG DV PDWULFHV 7KH GHVLJQ PDWUL[ ;DQG SDUDPHWHU YHFWRU 2f DUH VKRZQ LQ RYHUSDUDPHWHUL]HG IRUP fV DQG fV GHQRWH WKH SUHVHQFH RU DEVHQFH RI D SDUDPHWHU LQ WKH PRGHO IRU WKH REVHUYHG PHDQV GDWD YHFWRU \f 7KH SDUDPHWHUV GLVSOD\HG DERYH WKH GHVLJQ PDWUL[ ODEHO WKH DSSURSULDWH FROXPQ IRU HDFK SDUDPHWHU (UURU YHFWRU QRW H[KLELWHG L % *&$ *&$ *&$ 6&$ 6&$ n HO % HO HOO *&$ HO HO *&$ HO H *&$ H H VFD H H VFD H \ ;Vf6V H )LJXUH 7KH OLQHDU PRGHO IRU D IRXUSDUHQW KDOIGLDOOHO SODQWHG RQ D VLQJOH VLWH LQ WZR EORFNV GLVSOD\HG DV PDWULFHV 7KH GHVLJQ PDWUL[ ;DQG WKH SDUDPHWHU YHFWRU ILM DUH SUHVHQWHG LQ VXPWR]HUR IRUPDW 7KH SDUDPHWHUV GLVSOD\HG DERYH WKH GHVLJQ PDWUL[ ODEHO WKH DSSURSULDWH FROXPQ IRU HDFK SDUDPHWHU 7R LOOXVWUDWH WKH FRQFHSW RI VXPWR]HUR HVWLPDWHV YHUVXV SRSXODWLRQ SDUDPHWHUV ZH XVH WKH H[SHFWDWLRQ RI D FRPPRQ IRUPXOD %HFNHU f JLYHV HTXDWLRQ ZKLFK IRU EDODQFHG PAGE 44 FDVHV LV HTXLYDOHQW WR J S fSff=M = ff DV WKH HVWLPDWH IRU JHQHUDO FRPELQLQJ DELOLW\ IRU WKH MfÂ§ OLQH ZLWK S HTXDOOLQJ WKH QXPEHU RI SDUHQWV DQG HTXDOOLQJ WKH VLWH PHDQ RI WKH M [ N FURVV 7KLV HTXDWLRQ \LHOGV WKH VDPH VROXWLRQ DV WKH PDWUL[ HTXDWLRQV ZLWK QR PLVVLQJ SORWV RU FURVVHV DQG ZLWK D GHVLJQ PDWUL[ ZKLFK FRQWDLQV WKH VXPWR]HUR UHVWULFWLRQV $Q HYDOXDWLRQ RI WKLV IRUPXOD LQ D IRXUSDUHQW KDOIGLDOOHO SODQWHG LQ E EORFNV IRU WKH *&$ RI SDUHQW LV REWDLQHG E\ VXEVWLWXWLQJ WKH H[SHFWDWLRQ RI WKH OLQHDU PRGHO HTXDWLRQ f IRU HDFK REVHUYDWLRQ JM OLSLSA;S=M =f 7 (^J` (^OSSfffS= =f` (^J` *&$f *&$ *&$ *&$6&$ 6&$ 6&$8f 6&$ 6&$ 6&$7KH UHVXOW RI HTXDWLRQ LV REYLRXVO\ QRW *&$ IURP WKH XQUHVWULFWHG PRGHO HTXDWLRQ f 7KXV J DQ HVWLPDEOH IXQFWLRQ DQG DQ HVWLPDWH RI SDUDPHWHU *&$6 WKH HVWLPDWH RI WKH *&$ RI SDUHQW JLYHQ WKH VXPWR]HUR UHVWULFWLRQVf GRHV QRW KDYH WKH VDPH PHDQLQJ DV *&$ LQ WKH XQUHVWULFWHG PRGHO $Q HVWLPDEOH IXQFWLRQ LV D OLQHDU FRPELQDWLRQ RI WKH REVHUYDWLRQV EXW LQ RUGHU IRU DQ LQGLYLGXDO SDUDPHWHU LQ D PRGHO WR EH HVWLPDEOH RQH PXVW GHYLVH D OLQHDU FRPELQDWLRQ RI WKH REVHUYDWLRQV VXFK WKDW WKH H[SHFWDWLRQ KDV D ZHLJKW RI RQH RQ WKH SDUDPHWHU RQH ZLVKHV WR HVWLPDWH ZKLOH KDYLQJ D ZHLJKW RI ]HUR RQ DOO RWKHU SDUDPHWHUV $ VROXWLRQ VXFK DV WKLV GRHV QRW H[LVW IRU WKH LQGLYLGXDO SDUDPHWHUV LQ WKH RYHUSDUDPHWHUL]HG PRGHO HTXDWLRQ f 6R DOWKRXJK WKH VXPWR]HUR UHVWULFWHG *&$ SDUDPHWHUV DQG HVWLPDWHV DUH IRUFHG WR VXPWR]HUR IRU WKH VDPSOH RI SDUHQWV LQ D JLYHQ GLDO OHL WKH XQUHVWULFWHG *&$ SDUDPHWHUV RQO\ VXPWR]HUR DFURVV WKH HQWLUH SRSXODWLRQ )DOFRQHU f DQG DQ HYDOXDWLRQ RI *&$6 GHPRQVWUDWHV WKDW WKH HVWLPDWH FRQWDLQV RWKHU PRGHO SDUDPHWHUV 7KH UHVXOW RI VXPWR]HUR UHVWULFWLRQV LV WKDW WKH GHJUHHV RI IUHHGRP IRU D IDFWRU HTXDOV WKH QXPEHU RI FROXPQV SDUDPHWHUVf IRU WKDW IDFWRU LQ ; )LJXUH f 7KXV D JHQHUDOL]HG PAGE 45 LQYHUVH IRU ;6f;6 LV QRW UHTXLUHG VLQFH WKH QXPEHU RI FROXPQV LQ WKH VXPWR]HUR ; PDWUL[ IRU HDFK IDFWRU HTXDOV WKH GHJUHHV RI IUHHGRP IRU WKDW IDFWRU LQ WKH PRGHO ; LV IXOO FROXPQ UDQN DQG SURYLGHV D VROXWLRQ WR HTXDWLRQ f &RPSRQHQWV RI WKH 0DWUL[ (TXDWLRQ 7KH HTXDWLRQDO FRPSRQHQWV RI DUH QRZ FRQVLGHUHG LQ JUHDWHU GHWDLO 'DWD YHFWRU Y 2EVHUYDWLRQV SORW PHDQVf LQ WKH GDWD YHFWRU DUH RUGHUHG LQ WKH PDQQHU GHPRQVWUDWHG LQ )LJXUH )RU RXU H[DPSOH )LJXUH LV WKH PDWUL[ HTXDWLRQ RI D IRXU SDUHQW KDOIGLDOOHO PDWLQJ GHVLJQ SODQWHG LQ WZR UDQGRPL]HG FRPSOHWH EORFNV RQ D VLQJOH VLWH 7KHUH DUH VL[ FURVVHV SUHVHQW LQ WKH WZR EORFNV IRU D WRWDO RI REVHUYDWLRQV LQ WKH GDWD YHFWRU \ 7KH REVHUYDWLRQV DUH ILUVW VRUWHG E\ EORFN 6HFRQG ZLWKLQ HDFK EORFN WKH REVHUYDWLRQV VKRXOG EH LQ WKH VDPH VHTXHQFH IRU VLPSOLFLW\ RI SUHVHQWDWLRQ RQO\f 7KLV VHTXHQFH LV REWDLQHG E\ DVVLJQLQJ QXPEHUV WKURXJK S WR HDFK RI WKH S SDUHQWV DQG WKHQ VRUWLQJ DOO FURVVHV FRQWDLQLQJ SDUHQW ZKHWKHU DV PDOH RU IHPDOHf DV WKH SULPDU\ LQGH[ LQ GHVFHQGLQJ QXPHULFDO RUGHU E\ WKH RWKHU SDUHQW RI WKH FURVV DV WKH VHFRQGDU\ LQGH[ 1H[W DOO FURVVHV FRQWDLQLQJ SDUHQW SULPDU\ LQGH[ DV PDOH RU IHPDOHf LQ ZKLFK WKH RWKHU SDUHQW LQ WKH FURVV VHFRQGDU\ LQGH[f KDV D QXPEHU JUHDWHU WKDQ DUH WKHQ DOVR VRUWHG LQ GHVFHQGLQJ RUGHU E\ WKH VHFRQGDU\ LQGH[ 7KLV SURFHGXUH LV IROORZHG WKURXJK XVLQJ SDUHQW S DV WKH SULPDU\ LQGH[ 'HVLJQ PDWUL[ DQG SDUDPHWHU YHFWRU ; DQG 7KH GHVLJQ PDWUL[ IRU D PRGHO LV FRQFHSWXDOO\ D OLVWLQJ RI WKH SDUDPHWHUV SUHVHQW LQ WKH PRGHO IRU HDFK REVHUYDWLRQ 6HDUOH SDJH f ,Q )LJXUH \ DQG IW DUH H[KLELWHG DQG WKH SDUDPHWHUV LQ IW DUH GLVSOD\HG DW WKH WRSV RI WKH FROXPQV RI ; D YLVXDOO\ FRUUHFW LQWHUSUHWDWLRQ RI WKH PXOWLSOLFDWLRQ RI D PDWUL[ E\ D YHFWRUf )RU HDFK REVHUYDWLRQ LQ \ WKH VFDODU PAGE 46 PRGHO HTXDWLRQ f PD\ EH HPSOR\HG WR REWDLQ WKH OLVWLQJ RI SDUDPHWHUV IRU WKDW REVHUYDWLRQ WKH URZ RI WKH GHVLJQ PDWUL[ FRUUHVSRQGLQJ WR WKH SDUWLFXODU REVHUYDWLRQf 7KH FRQYHQWLRQ IRU GHVLJQ PDWULFHV LV WKDW WKH FROXPQV IRU WKH IDFWRUV RFFXU LQ WKH VDPH RUGHU DV WKH IDFWRUV LQ WKH OLQHDU PRGHO HTXDWLRQ DQG )LJXUH f 6LQFH GHVLJQ PDWULFHV FDQ EH GHYLVHG E\ ILUVW FUHDWLQJ WKH FROXPQV SHUWLQHQW WR HDFK IDFWRU LQ WKH PRGHO VXEPDWULFHVf DQG WKHQ KRUL]RQWDOO\ DQGRU YHUWLFDOO\ VWDFNLQJ WKH VXEPDWULFHV WKH GLVFXVVLRQ RI WKH UHSDUDPHWHUL]HG GHVLJQ PDWUL[ IRUPXODWLRQ ZLOO SURFHHG E\ IDFWRU 0HDQ 7KH ILUVW FROXPQ RI ; LV IRU Q DQG LV D YHFWRU RI OfV ZLWK WKH QXPEHU RI URZV HTXDOOLQJ WKH QXPEHU RI REVHUYDWLRQV )LJXUH f 7KH OLQHDU PRGHO HTXDWLRQ f LQGLFDWHV WKDW DOO REVHUYDWLRQV FRQWDLQ Q DQG WKH GHYLDWLRQ RI WKH REVHUYDWLRQV IURP S LV H[SODLQHG LQ WHUPV RI WKH IDFWRUV DQG LQWHUDFWLRQV LQ WKH PRGHO SOXV HUURU %ORFN 7KH QXPEHU RI FROXPQV IRU EORFN LV HTXDO WR WKH QXPEHU RI EORFNV PLQXV RQH FROXPQ ;(DFK URZ RI D EORFN VXEPDWUL[ FRQVLVWV RI OfV DQG fV RU OfV DFFRUGLQJ WR WKH LGHQWLW\ RI WKH REVHUYDWLRQ IRU ZKLFK WKH URZ LV EHLQJ IRUPHG 7KH QRUPDO FRQYHQWLRQ LV WKDW WKH ILUVW FROXPQ UHSUHVHQWV EORFN DQG WKH VHFRQG FROXPQ EORFN HWF WKURXJK EORFN E 6LQFH ZH KDYH XVHG D VXPWR]HUR VROXWLRQ eec f WKH HIIHFW GXH WR EORFN E LV D OLQHDU FRPELQDWLRQ RI WKH RWKHU E HIIHFWV LH EE ( cEc ZKLFK LQ RXU H[DPSOH LV Ec E DQG E E 7KXV WKH URZ RI WKH EORFN VXEPDWUL[ IRU DQ REVHUYDWLRQ LQ EORFN E WKH ODVW EORFNf KDV D LQ HDFK RI WKH E FROXPQV VLJQLI\LQJ WKDW WKH EORFN E HIIHFW LV LQGHHG D OLQHDU FRPELQDWLRQ RI WKH RWKHU E EORFN HIIHFWV &ROXPQV DQG RI ;f )LJXUH f KDYH EHFRPH FROXPQ RI ; )LJXUH f PAGE 47 *HQHUDO FRPELQLQJ DELOLW\ 7KLV VXEPDWUL[ RI ; LV VOLJKWO\ PRUH FRPSOH[ WKDQ SUHYLRXV IDFWRUV DV D UHVXOW RI KDYLQJ WZR OHYHOV RI D PDLQ HIIHFW SUHVHQW SHU REVHUYDWLRQ LH WKH GHYLDWLRQ RI DQ REVHUYDWLRQ IURP LV PRGHOHG DV WKH UHVXOW RI WKH *&$fV RI ERWK WKH PDOH DQG IHPDOH SDUHQWV HTXDWLRQ f $JDLQ ZH KDYH LPSRVHG D UHVWULFWLRQ (MJFDA2 6LQFH *&$ KDV S GHJUHHV RI IUHHGRP WKH VXEPDWUL[ IRU *&$ VKRXOG KDYH S FROXPQV LH JFD (MMJFDM 7KH *&$ VXEPDWUL[ IRU ; FROXPQV WKURXJK LQ )LJXUH f LV IRUPHG IURP ; FROXPQV WKURXJK LQ )LJXUH f DFFRUGLQJ LQ WKH VDPH PDQQHU DV WKH EORFN PDWUL[ f DGG PLQXV RQH WR WKH HOHPHQWV LQ WKH RWKHU FROXPQV DORQJ HDFK URZ FRQWDLQLQJ D RQH IRU JFDS S LQ RXU H[DPSOHf DQG f GHOHWH WKH FROXPQ IURP ; FRUUHVSRQGLQJ WR JFDS 7KH *&$ VXEPDWUL[ KDV SSOf URZV WKH QXPEHU RI FURVVHVf 7KLV ZLWK QR PLVVLQJ FHOOV SORWVf HTXDOV WKH QXPEHU RI REVHUYDWLRQV SHU EORFN 7R IRUP WKH *&$ IDFWRU VXEPDWUL[ IRU D VLWH WKH *&$ VXEPDWUL[ LV YHUWLFDOO\ FRQFDWHQDWHG VWDFNHG RQ LWVHOIf E WLPHV 7KLV FRPSOHWHV WKH SRUWLRQ RI WKH ; PDWUL[ IRU *&$ 6SHFLILF FRPELQLQJ DELOLW\ ,Q RUGHU WR IDFLOLWDWH FRQVWUXFWLRQ RI WKH 6&$ VXEPDWUL[ D KRUL]RQWDO GLUHFW SURGXFW VKRXOG EH GHILQHG $ KRUL]RQWDO GLUHFW SURGXFW DV DSSOLHG WR WZR FROXPQ YHFWRUV LV WKH HOHPHQW E\ HOHPHQW SURGXFW EHWZHHQ WKH WZR YHFWRUV 6$6,0/ 8VHUfV *XLGH f VXFK WKDW WKH HOHPHQW LQ WKH LfÂ§ URZ RI WKH UHVXOWLQJ SURGXFW YHFWRU LV WKH SURGXFW RI WKH HOHPHQWV LQ WKH LfÂ§ URZV RI WKH WZR LQLWLDO YHFWRUV 7KH UHVXOWDQW SURGXFW YHFWRU KDV GLPHQVLRQ Q [ $ KRUL]RQWDO GLUHFW SURGXFW LV XVHIXO IRU WKH IRUPDWLRQ RI LQWHUDFWLRQ RU QHVWHG IDFWRU VXEPDWULFHV ZKHUH WKH LQLWLDO PDWULFHV UHSUHVHQW WKH PDLQ IDFWRUV DQG WKH UHVXOWLQJ PDWUL[ UHSUHVHQWV DQ LQWHUDFWLRQ RU D QHVWHG IDFWRU SURGXFW UXOH 6HDUOH f n6$6,0/ LV WKH UHJLVWHUHG WUDGHPDUN RI WKH 6$6 ,QVWLWXWH ,QF &DU\ 1RUWK &DUROLQD PAGE 48 7KH 6&$ VXEPDWUL[ FDQ EH IRUPXODWHG IURP WKH KRUL]RQWDO GLUHFW SURGXFWV RI WKH FROXPQV RI WKH *&$ VXEPDWUL[ LQ ; )LJXUH f 7KH UHVXOWV IURP WKH *&$ FROXPQV UHTXLUH PDQLSXODWLRQ WR EHFRPH WKH 6&$ VXEPDWUL[ VLQFH GHJUHHV RI IUHHGRP IRU 6&$ GR QRW HTXDO WKRVH RI DQ LQWHUDFWLRQ IRU D KDOIGLDOOHO DQDO\VLVf EXW WKH *&$ FROXPQ SURGXFWV SURYLGH D FRQYHQLHQW VWDUWLQJ SRLQW 7KH FROXPQ RI WKH 6&$ VXEPDWUL[ UHSUHVHQWLQJ WKH FURVV EHWZHHQ WKH MfÂ§ DQG WKH N SDUHQWV 6&$MLV IRUPHG DV WKH SURGXFW EHWZHHQ WKH *&$M DQG *&$N FROXPQV )LJXUH f 7KH *&$ FROXPQV LQ )LJXUH DUH PXOWLSOLHG LQ WKLV RUGHU FROXPQ WLPHV FROXPQ IRUPLQJ WKH ILUVW 6&$ FROXPQ FROXPQ WLPHV FROXPQ IRUPLQJ WKH VHFRQG 6&$ FROXPQ DQG FROXPQ WLPHV FROXPQ IRUPLQJ WKH WKLUG 6&$ FROXPQ )LJXUH f :LWK IRXU SDUHQWV VL[ FURVVHVf WKHUH DUH WKUHH GHJUHHV RI IUHHGRP IRU *&$ Sf DQG WZR GHJUHHV RI IUHHGRP IRU 6&$ FURVVHV IRU *&$ IRU WKH PHDQf 6LQFH 6&$ KDV RQO\ WZR GHJUHHV RI IUHHGRP D VXPWR]HUR GHVLJQ PDWUL[ FDQ KDYH RQO\ WZR FROXPQV IRU 6&$ ,PSRVLQJ WKH UHVWULFWLRQ WKDW WKH VXP RI WKH 6&$fV DFURVV DOO SDUHQWV HTXDOV ]HUR LV HTXLYDOHQW WR PDNLQJ WKH ODVW FROXPQ IRU WKH 6&$ VXEPDWUL[ )LJXUH f D OLQHDU FRPELQDWLRQ RI WKH RWKHUV )LJXUH f 7KH SURFHGXUH IRU GHOHWLQJ WKH WKLUG FROXPQ SURGXFW LV LGHQWLFDO WR WKDW IRU WKH *&$ VXEPDWUL[ DGG PLQXV RQH WR HYHU\ HOHPHQW LQ WKH URZV RI WKH UHPDLQLQJ 6&$ FROXPQV LQ ZKLFK D RQH DSSHDUV LQ WKH FROXPQ ZKLFK LV WR EH GHOHWHG )LJXUH FROXPQV DQG f 7KH QXPEHU RI URZV LQ WKH 6&$ VXEPDWUL[ HTXDOV WKH QXPEHU REVHUYDWLRQV LQ D EORFN DQG PXVW EH YHUWLFDOO\ FRQFDWHQDWHG E WLPHV WR FUHDWH WKH 6&$ VXEPDWUL[ IRU D VLWH $Q DOJHEUDLF HYDOXDWLRQ RI 6&$ VXPWR]HUR UHVWULFWLRQV UHTXLUHV WKDW 6MVFDMN IRU HDFK N DQG WKDW (AVFDA WKXV IRU REVHUYDWLRQV LQ WKH LfÂ§ EORFN ZLWK L VHUYLQJ WR GHQRWH WKH URZ RI WKH 6&$ VXEPDWUL[ LQ EORFN L VFDc VFDLO VFDLO DQG HQWULHV LQ WKH VXEPDWUL[ URZ IRU \LO DUH OfV 7KH HVWLPDWH IRU VFDA HTXDOV VFDc EHFDXVH VFDL LV WKH QHJDWLYH RI WKH VXP RI WKH LQGHSHQGHQWO\ HVWLPDWHG 6&$fV VFDM DQG VFDLOf IURP WKH UHVWULFWLRQ WKDW WKH VXP RI WKH 6&$fV PAGE 49 DFURVV DOO SDUHQWV HTXDOV ]HUR 6LPLODUO\ E\ VXPWR]HUR GHILQLWLRQ VFDA VFDA VHD DQG E\ VXEVWLWXWLRQ VFDA VFDc VFDcf VFDL VFDc %\ WKH VDPH SURWRFRO LW FDQ EH VKRZQ WKDW VFDA VFDc 7KH HOHPHQWV LQ WKH URZV RI WKH 6&$ VXEPDWUL[ DUH OfV OfV DQG fV LQ DFFRUGDQFH ZLWK WKH DOJHEUDLF HYDOXDWLRQ 7KXV ZKLOH LW PD\ VHHP WKDW WKHUH VKRXOG EH 6&$ YDOXHV RQH IRU HDFK FURVVf RQO\ FDQ EH LQGHSHQGHQWO\ HVWLPDWHG DQG WKH UHPDLQLQJ DUH OLQHDU FRPELQDWLRQV RI WKH LQGHSHQGHQWO\ HVWLPDWHG 6&$fV $JDLQ WKH 6&$ VXPWR]HUR HVWLPDWHV DUH QRW HTXDO WR WKH SDUDPHWULF SRSXODWLRQ 6&$fV $Q DQDORJRXV LOOXVWUDWLRQ IRU 6&$ WR WKDW IRU *&$ ZRXOG VKRZ WKDW WKH HVWLPDEOH IXQFWLRQ OLQHDU FRPELQDWLRQ RI REVHUYDWLRQVf IRU D JLYHQ 6&$H FRQWDLQV D YDULHW\ RI RWKHU SDUDPHWHUV 2%6 *&$[*&$ *&$[*&$ *&$[*&$ VFD 6&$VFD PAGE 50 LQGHSHQGHQWO\ XVLQJ WKH SHUWLQHQW VXEPDWUL[ DV ORQJ DV WKHUH DUH QR PLVVLQJ FHOO PHDQV SORWVf DQG QR PLVVLQJ FURVVHV WKLV XVHV D SURSHUW\ NQRZQ DV RUWKRJRQDOLW\ 2UWKRJRQDOLW\ UHTXLUHV WKDW WKH GRW SURGXFW EHWZHHQ WZR YHFWRUV HTXDOV ]HUR 6FKQHLGHU SDJH f 7KH GRW SURGXFW D VFDODUf LV WKH VXP RI WKH YDOXHV LQ D YHFWRU REWDLQHG IURP WKH KRUL]RQWDO GLUHFW SURGXFW RI WZR YHFWRUV )RU WZR IDFWRUV WR EH RUWKRJRQDO WKH GRW SURGXFWV RI DOO WKH FROXPQ YHFWRUV PDNLQJ XS WKH VHFWLRQ RI WKH GHVLJQ PDWUL[ IRU RQH IDFWRU ZLWK WKH FROXPQ YHFWRUV PDNLQJ XS WKH SRUWLRQ RI WKH GHVLJQ PDWUL[ IRU WKH VHFRQG PXVW EH ]HUR ,I DOO IDFWRUV LQ WKH PRGHO DUH RUWKRJRQDO WKHQ WKH ;f; PDWUL[ LV EORFN GLDJRQDO $ EORFNGLDJRQDO ;f; PDWUL[ LV FRPSRVHG RI VTXDUH IDFWRU VXEPDWULFHV GHJUHHV RI IUHHGRP [ GHJUHHV RI IUHHGRPf DORQJ WKH GLDJRQDO ZLWK DOO RIIGLDJRQDO HOHPHQWV QRW LQ RQH RI WKH VTXDUH IDFWRU VXEPDWULFHV HTXDOOLQJ ]HUR $ SURSHUW\ RI EORFNGLDJRQDO PDWULFHV LV WKDW WKH LQYHUVH FDQ EH FDOFXODWHG E\ LQYHUWLQJ HDFK EORFN VHSDUDWHO\ DQG UHSODFLQJ WKH RULJLQDO EORFN LQ WKH IXOO ;f; PDWUL[ E\ WKH LQYHUWHG EORFN %HFDXVH WKH EORFNV FDQ EH LQYHUWHG VHSDUDWHO\ DQG DOO RWKHU RIIGLDJRQDO HOHPHQWV RI WKH LQYHUVH DUH ]HUR WKH HIIHFWV IRU IDFWRUV ZKLFK DUH RUWKRJRQDO WR DOO RWKHU IDFWRUV PD\ EH HVWLPDWHG VHSDUDWHO\ LH WKHUH DUH QR IXQFWLRQV RI RWKHU VXPWR]HUR IDFWRUV LQ WKH VXPWR]HUR HVWLPDWHV 0HDQ EORFN *&$ DQG 6&$ SDUDPHWHUV $OO SDUDPHWHUV DUH HVWLPDWHG VLPXOWDQHRXVO\ E\ KRUL]RQWDOO\ FRQFDWHQDWLQJ WKH PHDQ EORFN *&$ DQG 6&$ PDWULFHV WR FUHDWH ; (TXDWLRQ LV DJDLQ XWLOL]HG WR VROYH WKH V\VWHP RI HTXDWLRQV 7KH E YHFWRU IRU WKH IRXU SDUHQW H[DPSOH LV DQ HVWLPDWH RI RI )LJXUH $JDLQ RQH SDUDPHWHU LV HVWLPDWHG IRU HDFK FROXPQ LQ WKH ; PDWUL[ DQG DOO SDUDPHWHU HVWLPDWHV QRW SUHVHQW DUH OLQHDU FRPELQDWLRQV RI WKH SDUDPHWHU HVWLPDWHV LQ WKH E YHFWRU 6R LV HTXDO WR ( cEc DQG JFD LV HTXDO WR 'f cJFDM 7KH OLQHDU FRPELQDWLRQV IRU 6&$ HIIHFWV FDQ EH REWDLQHG E\ UHDGLQJ DORQJ WKH URZ RI WKH 6&$ VXEPDWUL[ DVVRFLDWHG ZLWK WKH REVHUYDWLRQ FRQWDLQLQJ WKH PAGE 51 SDUDPHWHU LH LQ )LJXUH WKH REVHUYDWLRQ FRQWDLQV WKH HIIHFW VFDA ZKLFK LV HVWLPDWHG DV WKH OLQHDU FRPELQDWLRQ VFDc VFDc 7KLV FRPSOHWHV WKH HVWLPDWLRQ RI IL[HG HIIHFW SDUDPHWHUV IURP D GDWD VHW ZKLFK LV EDODQFHG RQ D SORWPHDQ EDVLV 6LQFH ILHOG GDWD VHWV ZLWK VXFK FRPSOHWHQHVV DUH D UDULW\ LQ IRUHVWU\ DSSOLFDWLRQV WKH QH[W VWHS LV 2/6 DQDO\VLV IRU YDULRXV W\SHV RI GDWD LPEDODQFH &DOFXODWLRQV RI VROXWLRQV EDVHG RQ D FRPSOHWH GDWD VHW DQG VLPXODWHG GDWD VHWV ZLWK FRPPRQ W\SHV RI LPEDODQFH DUH GHPRQVWUDWHG LQ QXPHULFDO H[DPSOHV 1XPHULFDO ([DPSOHV 7KH GDWD VHW DQDO\]HG LQ WKH QXPHULFDO H[DPSOHV LV IURP D ILYH\HDUROG VL[SDUHQW KDOI GLDOOHO VODVK SLQH 3LUQV HOOLRWWLL YDU HOOLRWWLL (QJHOPQf SURJHQ\ WHVW SODQWHG RQ D VLQJOH VLWH LQ IRXU FRPSOHWH EORFNV (DFK FURVV LV UHSUHVHQWHG E\ D ILYHWUHH URZ SORW ZLWKLQ HDFK EORFN 7RWDO KHLJKW LQ PHWHUV DQG GLDPHWHU DW EUHDVW KHLJKW GEK LQ FHQWLPHWHUVf DUH WKH WUDLWV VHOHFWHG IRU DQDO\VLV 7KH GDWD VHW LV SUHVHQWHG LQ 7DEOH VR WKDW WKH UHDGHU PD\ UHFRQVWUXFW WKH DQDO\VLV DQG FRPSDUH DQVZHUV ZLWK WKH H[DPSOHV 7KH QXPEHUV WKURXJK ZHUH DUELWUDULO\ DVVLJQHG WR WKH SDUHQWV IRU DQDO\VLV %HFDXVH RI XQHTXDO VXUYLYDO ZLWKLQ SORWV SORW PHDQV DUH XVHG DV WKH XQLW RI REVHUYDWLRQ %DODQFHG 'DWD 3ORWPHDQ %DVLVf 7KH VXPWR]HUR GHVLJQ PDWUL[ IRU WKH EDODQFHG GDWD VHW KDV EORFNVf[ FURVVHVf URZV ZKLFK HTXDOV WKH QXPEHU RI REVHUYDWLRQV LQ \f DQG KDV WKH IROORZLQJ FROXPQV RQH FROXPQ IRU L WKUHH FROXPQV IRU EORFNV Ef ILYH FROXPQV IRU *&$ Sf DQG QLQH FROXPQV IRU 6&$ FURVVHV f IRU D WRWDO RI FROXPQV :LWK VL[W\ SORW PHDQV GHJUHHV RI IUHHGRPf DQG GHJUHHV RI IUHHGRP LQ WKH PRGHO VXEWUDFWLQJ IURP \LHOGV GHJUHHV RI IUHHGRP IRU PAGE 52 HUURU ZKLFK PDWFKHV WKH GHJUHHV RI IUHHGRP IRU FURVV E\ EORFN LQWHUDFWLRQ WKXV YHULI\LQJ WKDW GHJUHHV RI IUHHGRP FRQFXU ZLWK WKH QXPEHU RI FROXPQV LQ WKH VXPWR]HUR GHVLJQ PDWUL[ 7R LOOXVWUDWH WKH SULQFLSOH RI RUWKRJRQDOLW\ LQ WKH EDODQFHG FDVH WKH ;f; DQG ;f;fn PDWULFHV PD\ EH SULQWHG WR VKRZ WKDW WKH\ DUH EORFN GLDJRQDO ,Q IXUWKHU LOOXVWUDWLRQ WKH HIIHFWV ZLWKLQ D IDFWRU PD\ DOVR EH HVWLPDWHG ZLWKRXW DQ\ RWKHU IDFWRUV LQ WKH GHVLJQ PDWUL[ DQG FRPSDUHG WR WKH HVWLPDWHV IURP WKH IXOO GHVLJQ PDWUL[ 7KH YHFWRUV RI SDUDPHWHU HVWLPDWHV IRU KHLJKW DQG GEK 7DEOH f ZHUH FDOFXODWHG IURP WKH VDPH ; PDWUL[ EHFDXVH KHLJKW DQG GEK PHDVXUHPHQWV ZHUH WDNHQ RQ WKH VDPH WUHHV ,Q RWKHU ZRUGV LI D KHLJKW PHDVXUHPHQW ZDV WDNHQ RQ D WUHH D GEK PHDVXUHPHQW ZDV DOVR WDNHQ VR WKH GHVLJQ PDWULFHV DUH HTXLYDOHQW 0LVVLQJ 3ORW 7R LOOXVWUDWH WKH SUREOHP RI D PLVVLQJ SORW WKH FURVV SDUHQW WZR E\ SDUHQW WKUHH ZDV DUELWUDULO\ GHOHWHG LQ EORFN RQH DV LI REVHUYDWLRQ \ ZHUH PLVVLQJf 7KLV GHOHWLRQ SURPSWV DGMXVWPHQWV WR WKH IDFWRU PDWULFHV LQ RUGHU WR DQDO\]H WKH QHZ GDWD VHW 7KH QHZ YHFWRU RI REVHUYDWLRQV \f QRZ KDV URZV 7KLV QHFHVVLWDWHV GHOHWLRQ RI WKH URZ RI WKH GHVLJQ PDWUL[ ;LQ EORFN ZKLFK ZRXOG KDYH EHHQ DVVRFLDWHG ZLWK FURVV [ 7KLV LV WKH RQO\ PDWUL[ DOWHUDWLRQ UHTXLUHG IRU WKH DQDO\VLV 7KXV WKH UHVXOWDQW ; PDWUL[ KDV URZV DQG FROXPQV :LWK PHDQV LQ \ DQG FROXPQV LQ ; WKH GHJUHHV RI IUHHGRP IRU HUURU LV &RPSDULVRQV EHWZHHQ UHVXOWV RI WKH DQDO\VHV 7DEOH f RI WKH IXOO GDWD VHW DQG WKH GDWD VHW PLVVLQJ REVHUYDWLRQ \ UHYHDO WKDW IRU WKLV FDVH WKH HVWLPDWHV RI SDUDPHWHUV KDYH EHHQ UHODWLYHO\ XQDIIHFWHG E\ WKH LPEDODQFH PDJQLWXGHV RI *&$fV FKDQJHG RQO\ VOLJKWO\ DQG UDQNLQJV E\ *&$ ZHUH XQDIIHFWHGf PAGE 53 7DEOH 'DWD VHW IRU QXPHULFDO H[DPSOHV )LYH\HDUROG VODVK SLQH SURJHQ\ WHVW ZLWK D SDUHQW KDOIGLDOOHO PDWLQJ GHVLJQ SUHVHQW RQ D VLQJOH VLWH ZLWK IRXU UDQGRPL]HG FRPSOHWH EORFNV DQG D ILYHWUHH URZ SORW SHU FURVV SHU EORFN %ORFN )HPDOH 0DOH 0HDQ +HLJKW 0HDQ '%+ :LWKLQ 3ORW 9DULDQFH 9DULDQFH +HLJKW '%+ 7UHH SHU 3ORW 0HWHUV &HQWLPHWHUV WQ FP PAGE 54 7DEOH 1XPHULFDO UHVXOWV IRU H[DPSOHV RI GDWD LPEDODQFH XVLQJ WKH 2/6 WHFKQLTXHV SUHVHQWHG LQ WKH WH[W )LYH (VWLPDWH %DODQFHGr 0LVVLQJ 3ORWE 0LVVLQJ &URVVr 0LVVLQJ &URVVHV R3 +HLJKW '%+ +HLJKW '%+ +HLJKW '%+ +HLJKW '%+ 0 % E *&$ JFD *&$M *&$ *&$6 6&$A 6&$M 6&$P VFD 6& $\ 6&$ 6& $\ 6&$f 6&$MM fZKHUH QXPHULFDO H[DPSOHV DUH IRU KHLJKWf E (ID JFDr (IHFDM VFDA (VFDMN IRU M RU N S DQG S WKHQ VFD VFDA DQG VFDA VFD (AVFD H LQGHSHQGHQWO\ HVWLPDWHG VHDfV VFDA VFD VFD VFD VFDA VFDA VFD DQG VHDr VFD VFD VFD VFDA VFDA VHDr EZKHUH WKH OLQHDU FRPELQDWLRQV IRU SDUDPHWHU HVWLPDWHV DUH LGHQWLFDO WR WKH EDODQFHG H[DPSOH FZKHUH VHDr (VFDMN IRU M RU N S DQG S WR VFD (pVFDH H LQGHSHQGHQWO\ HVWLPDWHG 6&$fV VFDA VFD VFD VFD VFDMM VFD DQG VFDA VFD VFD, VFD 6& VFDA GZKHUH VFD VFD VFD, VFDA 6& VHDr VFDA VHDr VFD VHDr VFD VFD VHDr DQG VFDMM WKH QHJDWLYH RI WKH VXP RI WKH IRXU LQGHSHQGHQWO\ HVWLPDWHG VHDfV fZKHUH IRU DOO FDVHV OLQHDU FRPELQDWLRQV IRU EORFN DQG JFD DUH WKH VDPH DV LQ WKH EDODQFHG FDVH PAGE 55 0LVVLQJ &URVV $QRWKHU FRPPRQ IRUP RI LPEDODQFH LQ GLDOOHO GDWD VHWV WKH PLVVLQJ FURVV LV H[DPLQHG WKURXJK DUELWUDU\ GHOHWLRQ RI WKH [ FURVV IURP DOO EORFNV LH \ \A \ \ DUH PLVVLQJ LQ WKH GDWD YHFWRU 7KLV W\SH RI LPEDODQFH LV UHSUHVHQWDWLYH RI D SDUWLFXODU FURVV WKDW FRXOG QRW EH PDGH DQG LV WKHUHIRUH PLVVLQJ IURP DOO EORFNV 7KH PDWUL[ PDQLSXODWLRQV UHTXLUHG IRU WKLV DQDO\VLV DUH DJDLQ SUHVHQWHG E\ IDFWRU )RU DSSURSULDWH 6&$ UHVWULFWLRQV WKH GDWD YHFWRU DQG GHVLJQ PDWUL[ VKRXOG EH RUGHUHG VR WKDW WKH SA SDUHQW KDV QR PLVVLQJ FURVVHV 6LQFH WKH ODEHOLQJ RI D SDUHQW DV SDUHQW S LV HQWLUHO\ VXEMHFWLYH DQ\ SDUHQW ZLWK DOO FURVVHV PD\ EH GHVLJQDWHG DV SDUHQW S 7KH SUHYLRXV ODEHOOLQJ GLUHFWLRQV DUH QHFHVVDU\ VLQFH ZH JHQHUDWH WKH 6&$ VXEPDWUL[ DV KRUL]RQWDO GLUHFW SURGXFWV RI WKH FROXPQV RI WKH *&$ VXEPDWUL[ DQG WR DFFRXQW IRU PLVVLQJ FURVVHV WKH KRUL]RQWDO GLUHFW SURGXFW IRU HDFK SDUWLFXODU PLVVLQJ SDUHQWDO FRPELQDWLRQV DUH QRW FDOFXODWHG ZKLFK VHWV WKH PLVVLQJ 6&$fV WR ]HUR ,I WKHUH LV D FURVV PLVVLQJ IURP WKRVH RI WKH S SDUHQW ZH FDQQRW DFFRXQW IRU WKH PLVVLQJ FURVV ZLWK WKLV WHFKQLTXH 6HDUOH SDJH f )RU WKH PHDQ EORFN DQG *&$ VXEPDWULFHV WKH DGMXVWPHQW IRU WKH PLVVLQJ FURVV GLFWDWHV GHOHWLQJ WKH URZV LQ WKH VXEPDWULFHV ZKLFK ZRXOG KDYH FRUUHVSRQGHG WR WKH \A REVHUYDWLRQV 7KH 6&$ VXEPDWUL[ PXVW EH UHIRUPHG VLQFH D GHJUHH RI IUHHGRP IRU 6&$ DQG KHQFH D FROXPQ RI WKH VXEPDWUL[ KDV EHHQ ORVW 7KH 6&$ VXEPDWUL[ LV UHLQVWLWXWHG IURP WKH *&$ KRUL]RQWDO GLUHFW SURGXFWV UHPHPEHULQJ WKDW RQH FURVV [ QR ORQJHU H[LVWV DQG WKHUHIRUH WKDW SURGXFW *&$ [ *&$ LV LQDSSURSULDWHf 'URSSLQJ WKH FROXPQ IRU 6&$A LV HTXLYDOHQW WR VHWWLQJ 6&$] WR ]HUR 6HDUOH f VR WKDW WKH UHPDLQLQJ 6&$fV ZLOO VXPWR]HUR $IWHU WKDW WKH UHIRUPDWLRQ LV DFFRUGLQJ WR WKH HVWDEOLVKHG SDWWHUQ :LWK RQH PLVVLQJ FURVV WKHUH DUH QRZ REVHUYDWLRQV DQG KHQFH GHJUHHV RI IUHHGRP DYDLODEOH 7KH FROXPQV RI WKH ; PDWUL[ DUH QRZ RQH IRU WKH PHDQ WKUHH IRU EORFN ILYH IRU *&$ DQG HLJKW IRU 6&$ IRU D WRWDO RI FROXPQV 7KH PAGE 56 UHPDLQLQJ GHJUHHV RI IUHHGRP IRU HUURU LV PDWFKLQJ WKH FRUUHFW GHJUHHV RI IUHHGRP Of[Of f )RU WKH PLVVLQJ FURVV H[DPSOH [ LV QR ORQJHU HTXLYDOHQW WR WKH PHDQ RI WKH SORW PHDQV VLQFH [ DQG (LMN\LMNf1 ZKHUH 1 QXPEHU RI SORW PHDQVf 7KLV LV WKH UHVXOW RI *&$ HIIHFWV ZKLFK DUH QR ORQJHU RUWKRJRQDO WR WKH PHDQ &KHFN WKH ;f; PDWUL[ RU WU\ HVWLPDWLQJ IDFWRUV VHSDUDWHO\ DQG FRPSDUH WR WKH HVWLPDWHV ZKHQ DOO IDFWRUV DUH LQFOXGHG LQ ; ,I IRUPXODH IRU EDODQFHG GDWD %HFNHU )DOFRQHU DQG +DOODXHU DQG 0LUDQGD f DUH DSSOLHG WR XQEDODQFHG GDWD SORWPHDQ EDVLVf HVWLPDWHV RI SDUDPHWHUV DUH QR ORQJHU DSSURSULDWH EHFDXVH IDFWRUV LQ WKH PRGHO DUH QR ORQJHU LQGHSHQGHQW RUWKRJRQDOf $SSO\LQJ %HFNHUfV IRUPXOD ZKLFK XVHV WRWDOV RI FURVV PHDQV IRU D VLWH \ MNf WR WKH PLVVLQJ FURVV H[DPSOH \LHOGV JFD JFD JFD JFD JFD DQG JFDr 7KHVH DQVZHUV DUH YHU\ GLIIHUHQW LQ PDJQLWXGH IURP WKRVH LQ 7DEOH IRU WKLV H[DPSOH DQG JFD DOVR KDV D GLIIHUHQW VLJQ (PSOR\LQJ WKHVH IRUPXODH LQ WKH DQDO\VLV RI XQEDODQFHG GDWD LV DQDORJRXV WR PDWUL[ HVWLPDWLRQ RI *&$fV ZLWKRXW WKH RWKHU IDFWRUV LQ WKH PRGHO ZKLFK LV LQDSSURSULDWH 6HYHUDO 0LVVLQJ &URVVHV 7KH FRQFOXGLQJ H[DPSOH 7DEOH f LV D GUDVWLFDOO\ XQEDODQFHG GDWD VHW UHVXOWLQJ IURP WKH DUELWUDU\ GHOHWLRQ RI ILYH FURVVHV [ [ [ [ DQG [ f 7KH PDWUL[ PDQLSXODWLRQ IRU WKLV H[DPSOH LV DQ H[WHQVLRQ RI WKH SUHYLRXV RQH FURVV GHOHWLRQ H[DPSOH 5RZV FRUUHVSRQGLQJ WR \LO \LO \A \A DQG \L DUH GHOHWHG IURP WKH PHDQ EORFN DQG *&$ VXEPDWULFHV IRU DOO EORFNV 7KH 6&$ PDWUL[ QRZ FROXPQV FURVVHV GHJUHHV RI IUHHGRPf LV DJDLQ UHIRUPHG ZLWK RQO\ WKH UHOHYDQW SURGXFWV RI WKH *&$ FROXPQV &RXQWLQJ GHJUHHV RI IUHHGRP FROXPQV RI WKH VXPWR]HUR GHVLJQ PDWUL[f WKH PHDQ KDV RQH EORFN KDV PAGE 57 WKUHH *&$ KDV ILYH DQG 6& $ KDV IRXU GHJUHHV RI IUHHGRP IRU D WRWDO RI (UURU KDV Of f GHJUHHV RI IUHHGRP 7RWDOLQJ GHJUHHV RI IUHHGRP IRU PRGHOHG HIIHFWV DQG HUURU \LHOGV ZKLFK HTXDOV WKH QXPEHU RI SORW PHDQV ,Q LQFUHDVLQJO\ XQEDODQFHG FDVHV 7DEOH f WKH VSUHDG DPRQJ WKH *&$ HVWLPDWHV WHQGV WR LQFUHDVH ZLWK LQFUHDVLQJ LPEDODQFH ORVV RI LQIRUPDWLRQf 7KLV LV D JHQHUDO IHDWXUH RI 2/6 DQDO\VHV DQG WKH EDVLV IRU WKH IHDWXUH LV WKDW WKH VSUHDG DPRQJ WKH *&$ HVWLPDWHV LV GXH WR ERWK WKH LQQDWH VSUHDG GXH WR DGGLWLYH JHQHWLFV HIIHFWV DV ZHOO DV WKH HUURU LQ HVWLPDWLRQ RI WKH *&$fV :KHQ WKHUH LV OHVV LQIRUPDWLRQ *&$ HVWLPDWHV WHQG WR EH PRUH ZLGHO\ VSUHDG GXH WR WKH LQFUHDVH LQ WKH HUURU YDULDQFH DVVRFLDWHG ZLWK WKHLU HVWLPDWLRQ 7KLV IHDWXUH KDV EHHQ QRWHG :KLWH DQG +RGJH SDJH f DV WKH WHQGHQF\ WR SLFN DV SDUHQWDO ZLQQHUV LQGLYLGXDOV LQ D EUHHGLQJ SURJUDP ZKLFK DUH WKH PRVW SRRUO\ WHVWHG 'LVFXVVLRQ $IWHU GHYHORSLQJ WKH 2/6 DQDO\VLV DQG GHVFULELQJ WKH LQKHUHQW DVVXPSWLRQV RI WKH DQDO\VLV WKHUH DUH IRXU LPSRUWDQW IDFWRUV WR FRQVLGHU LQ WKH LQWHUSUHWDWLRQ RI VXPWR]HUR 2/6 VROXWLRQV f WKH ODFN RI XQLTXHQHVV RI WKH SDUDPHWHU HVWLPDWHV f WKH ZHLJKWV JLYHQ WR SORW PHDQV \LMNf DQG LQ WXUQ VLWH PHDQV \ MNf IRU FURVVHV LQ GDWD VHWV ZLWK PLVVLQJ FURVVHV LQ SDUDPHWHU HVWLPDWLRQ f WKH DUELWUDU\ QDWXUH RI XVLQJ D GLDOOHO PHDQ SHUIRUFH D QDUURZ JHQHWLF EDVHf DV WKH PHDQ DERXW ZKLFK WKH *&$fV VXPWR]HUR DQG f WKH DVVXPSWLRQ WKDW WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV 9f LV ,DH 8QLTXHQHVV RI (VWLPDWHV 6XPWR]HUR UHVWULFWLRQV IXUQLVK ZKDW ZRXOG DSSHDU WR EH XQLTXH HVWLPDWHV RI WKH LQGLYLGXDO SDUDPHWHUV HJ *&$ ZKHQ LQ IDFW WKHVH LQGLYLGXDO SDUDPHWHUV DUH QRW HVWLPDEOH PAGE 58 *UD\ELOO )UHXQG DQG /LWWHOO DQG 0LOOLNHQ DQG -RKQVRQ f 7KH ODFN RI HVWLPDELOLW\ LV DJDLQ DQDORJRXV WR DWWHPSWLQJ WR VROYH D VHW RI HTXDWLRQV LQ Q XQNQRZQV ZLWK W HTXDWLRQV ZKHUH Q LV JUHDWHU WKDQ W 7KHUHIRUH DQ LQILQLWH QXPEHU RI VROXWLRQV H[LVW IRU 7KHUH DUH TXDQWLWLHV LQ WKLV V\VWHP RI HTXDWLRQV WKDW DUH XQLTXH HVWLPDEOHf LH WKH HVWLPDWH LV LQYDULDQW UHJDUGOHVV RI WKH UHVWULFWLRQ VXPWR]HUR RU VHWWR]HURf RU JHQHUDOL]HG LQYHUVH QR UHVWULFWLRQVf XVHG 0LOOLNHQ DQG -RKQVRQ f DQG WKH HVWLPDEOH IXQFWLRQV LQFOXGH VXPWR]HUR *&$ DQG 6&$ HVWLPDWHV VLQFH WKH\ DUH OLQHDU FRPELQDWLRQV RI WKH REVHUYDWLRQV EXW WKHVH HVWLPDEOH TXDQWLWLHV GR QRW HVWLPDWH WKH LQGLYLGXDO SDUDPHWULF *&$fV DQG 6&$fV RI WKH RYHUSDUDPHWHUL]HG PRGHO HTXDWLRQ f VLQFH WKHUH LV QR XQLTXH VROXWLRQ IRU WKRVH SDUDPHWHUV :HLJKWLQJ RI 3ORW 0HDQV DQG &URVV 0HDQV LQ (VWLPDWLQJ 3DUDPHWHUV :LWK DW OHDVW RQH PHDVXUHPHQW WUHH LQ HDFK SORW DQG ZLWK SORW PHDQV DV WKH XQLW RI REVHUYDWLRQ XVH RI WKH PDWUL[ DSSURDFK SURGXFHV WKH VDPH UHVXOWV DV WKH EDVLF IRUPXODH 7KH ZHLJKW SODFHG RQ HDFK SORW PHDQ LQ WKH HVWLPDWLRQ RI D SDUDPHWHU FDQ EH GHWHUPLQHG E\ FDOFXODWLQJ ;f;n;f ZKLFK FDQ EH YLHZHG DV D PDWUL[ RI ZHLJKWV : VR WKDW HTXDWLRQ FDQ EH ZULWWHQ DV E :\ 7KH PDWUL[ : KDV WKHVH GLPHQVLRQV WKH QXPEHU RI URZV HTXDOV WKH QXPEHU RI SDUDPHWHUV LQ IW DQG WKH QXPEHU RI FROXPQV HTXDOV WKH QXPEHU RI SORW PHDQV LQ \ 7KH L URZ RI WKH : FRQWDLQV WKH ZHLJKWV DSSOLHG WR \ WR HVWLPDWH WKH L SDUDPHWHU LQ E Ecf ,Q WKH GLVFXVVLRQ ZKLFK IROORZV JFD LV XWLOL]HG DV E ,I WKHUH DUH QR PLVVLQJ SORWV WKH FURVV PHDQ LQ HYHU\ EORFN \LMOFf KDV WKH VDPH ZHLJKWLQJ DQG ZHLJKWV FDQ EH FRPELQHG DFURVV EORFNV WR \LHOG WKH ZHLJKW RQ WKH RYHUDOO FURVV PHDQ \ MNf ,W FDQ EH VKRZQ WKDW IRU WKH EDODQFHG QXPHULFDO H[DPSOH JFD LV FDOFXODWHG E\ ZHLJKWLQJ WKH RYHUDOO FURVV PHDQV FRQWDLQLQJ SDUHQW E\ DQG ZHLJKWLQJ DOO RYHUDOO FURVV PHDQV QRW PAGE 59 *&$ *&$ *&$ *&$ *&$ *&$ *&$ *&$ *&$ *&$ *&$ *&$ PLVVLQJ PLVVLQJ PLVVLQJ PLVVLQJ PLVVLQJ PLVVLQJ ? ;;;;;;; Mii )LJXUH :HLJKWV RQ RYHUDOO FURVV PHDQV \ MNf IRU WKH WKUHH QXPHULFDO H[DPSOHV IRU HVWLPDWLRQ RI *&$ 7KH ZHLJKWV IRU WKH EDODQFHG H[DPSOH DERYH WKH GLDJRQDOf DUH SUHVHQWHG LQ ERWK IUDFWLRQDO DQG GHFLPDO IRUP 7KH ZHLJKWV IRU WKH RQHFURVV PLVVLQJ DQG WKH ILYHFURVVHV PLVVLQJ DUH SUHVHQWHG DV WKH XSSHU QXPEHU DQG ORZHU QXPEHU UHVSHFWLYHO\ LQ FHOOV EHORZ WKH GLDJRQDO 7KH PDUJLQDO ZHLJKWV RQ *&$ SDUDPHWHUV ULJKW PDUJLQf GR QRW FKDQJH DOWKRXJK FHOOV DUH PLVVLQJ PAGE 60 FRQWDLQLQJ SDUHQW E\ )LJXUH DERYH WKH GLDJRQDOf GHPRQVWUDWHV WKH ZHLJKWLQJV RQ WKH RYHUDOO FURVV PHDQV IRU WKH EDODQFHG QXPHULFDO H[DPSOH DV ZHOO DV WKH PDUJLQDO ZHLJKWLQJ RQ WKH *&$ SDUDPHWHUV 7KHVH PDUJLQDO ZHLJKWLQJV DUH REWDLQHG E\ VXPPLQJ DORQJ D URZ DQGRU FROXPQ DV RQH ZRXOG WR REWDLQ WKH PDUJLQDO WRWDOV IRU D SDUHQW %HFNHU f 2QH IHDWXUH RI VXPWR]HUR VROXWLRQV LV WKDW WKHVH PDUJLQDO ZHLJKWLQJV ZLOO EH PDLQWDLQHG QR PDWWHU WKH LPEDODQFH GXH WR PLVVLQJ FURVVHV DV ZLOO EH VHHQ E\ FRQVLGHULQJ WKH QXPHULFDO H[DPSOHV IRU D PLVVLQJ FURVV )LJXUH EHORZ WKH GLDJRQDO XSSHU QXPEHUf DQG ILYH PLVVLQJ FURVVHV )LJXUH EHORZ WKH GLDJRQDO ORZHU QXPEHUf 7KH PDUJLQDO ZHLJKWV KDYH UHPDLQHG WKH VDPH DV LQ WKH EDODQFHG FDVH ZKLOH WKH ZHLJKWV RQ WKH FURVV PHDQV GLIIHU DPRQJ WKH FURVVHV FRQWDLQLQJ SDUHQW DQG DOVR DPRQJ WKH FURVVHV QRW FRQWDLQLQJ SDUHQW ,Q WKH ILYH PLVVLQJ FURVVHV H[DPSOH FURVVHV \0 DQG \ HYHQ UHFHLYH D SRVLWLYH ZHLJKWLQJ ZKHUH LQ WKH SULRU H[DPSOHV WKH\ KDG QHJDWLYH ZHLJKWLQJ 7KH H[SHFWHG YDOXH LQ DOO WKUHH H[DPSOHV LV *&$OV IRU VXPWR]HURf GHVSLWH WKH DSSDUHQWO\ QRQVHQVLFDO ZHLJKWLQJV WR FURVV PHDQV ZLWK PLVVLQJ FURVVHV KRZHYHU WKH HYDOXDWLRQ RI WKH HVWLPDWHV LQ WHUPV RI WKH RULJLQDO PRGHO FKDQJHV ZLWK HDFK QHZ FRPELQDWLRQ RI PLVVLQJ FHOOV LH \ A DQG \ D KDYH D SRVLWLYH ZHLJKW LQ WKH ILYH PLVVLQJ FURVVHV H[DPSOH LQ *&$W HVWLPDWLRQ :KHWKHU WKLV W\SH RI HVWLPDWLRQ LV GHVLUDEOH ZLWK PLVVLQJ FHOO FURVVf PHDQV KDV EHHQ WKH VXEMHFW RI VRPH GLVFXVVLRQ 6SHHG +RFNLQJ DQG +DFNQH\ )UHXQG DQG 0LOOLNHQ DQG -RKQVRQ f 7KH GDWD DQDO\VW VKRXOG EH DZDUH RI WKH PDQQHU LQ ZKLFK VXPWR]HUR WUHDWV WKH GDWD ZLWK PLVVLQJ FHOO PHDQV DQG GHFLGH ZKHWKHU WKDW SDUWLFXODU OLQHDU FRPELQDWLRQ RI FURVV PHDQV HVWLPDWLQJ WKH SDUDPHWHU LV RQH RI LQWHUHVW UHDOL]LQJ WKDW WKH PHDQLQJ RI WKH HVWLPDWHV LQ WHUPV RI WKH RULJLQDO PRGHO LV FKDQJLQJ PAGE 61 'LDOOHO 0HDQ 7KH XVH RI WKH PHDQ IRU D KDOIGLDOOHO DV WKH PHDQ DURXQG ZKLFK *&$fV VXPWR]HUR LV QRW VDWLVIDFWRU\ LQ WKDW WKH GLDOOHO PHDQ LV WKH PHDQ RI D UDWKHU QDUURZ JHQHWLFDOO\ EDVHG SRSXODWLRQ DQG LQ SDUWLFXODU WKDW WKH FRPSDULVRQV RI LQWHUHVW DUH QRW XVXDOO\ FRQILQHG WR WKH VSHFLILF SDUHQWV LQ D VSHFLILF GLDOOHO RQ D SDUWLFXODU VLWH $ FKHFNORW FDQ EH HPSOR\HG WR UHSUHVHQW D EDVH SRSXODWLRQ DJDLQVW ZKLFK FRPSDULVRQ RI KDOI RU IXOOVLE IDPLOLHV FDQ EH PDGH WR SURYLGH IRU FRPSDULVRQ RI *&$ HVWLPDWHV IURP RWKHU WHVWV YDQ %XLMWHQHQ DQG %ULGJZDWHU f 0DWKHPDWLFDOO\ ZKHQ HIIHFWV DUH IRUFHG WR VXPWR]HUR DURXQG WKHLU RZQ PHDQ WKH DEVROXWH YDOXH RI WKH *&$fV LV UHIOHFWLYH RI WKHLU YDOXH UHODWLYH WR WKH PHDQ RI WKH JURXS (YHQ LI WKH SDUHQWV LQYROYHG LQ WKH SDUWLFXODU GLDOOHO ZHUH DOO IDU VXSHULRU WR WKH SRSXODWLRQ PHDQ IRU *&$ *&$fV FDOFXODWHG RQ DQ 2/6 EDVLV ZRXOG VKRZ WKDW VRPH RI WKHVH *&$fV ZHUH QHJDWLYH ,I WKH *&$fV RI WKH GLDOOHO SDUHQWV ZHUH LQ IDFW DOO EHORZ WKH SRSXODWLRQ PHDQ WKH RSSRVLWH DQG HTXDOO\ XQGHVLUDEOH UHVXOW HQVXHV )RU GLVFRQQHFWHG GLDOOHOV WRJHWKHU RQ D VLQJOH VLWH DQ 2/6 DQDO\VLV ZRXOG \LHOG *&$ HVWLPDWHV WKDW VXPWR]HUR ZLWKLQ HDFK GLDOOHO VLQFH SDUHQWV DUH QHVWHG ZLWKLQ GLDOOHOV 8QOHVV WKH FRPSDULVRQV RI LQWHUHVW DUH RQO\ LQ WKH FRPELQDWLRQ RI WKH SDUHQWV LQ D VSHFLILF GLDOOHO RQ D VSHFLILF VLWH WKH FKHFNORW DOWHUQDWLYH LV GHVLUDEOH $ PHWKRG IRU REWDLQLQJ WKH GHVLUHG JRDO RI FRPSDUDEOH *&$fV IURP GLVFRQQHFWHG H[SHULPHQWV GLVUHJDUGLQJ WKH SUREOHP RI KHWHURVFHGDVWLFLW\ LV WR IRUP D IXQFWLRQ IURP WKH GDWD ZKLFK \LHOGV *&$ HVWLPDWHV SURSHUO\ ORFDWHG RQ WKH QXPEHU VFDOH 6XFK D IXQFWLRQ FDQ EH IRUPHG XVLQJ *&$ DV DQ H[DPSOHf IURP JFDOV WKH GLDOOHO PHDQ DQG WKH FKHFNORW PHDQ )URP H[SHFWDWLRQV RI WKH VFDODU OLQHDU PRGHO HTXDWLRQ f *&$OV SOfSf*&$ OSfeI *&$M OSf(( 6&$ON SSfff("f(( 6&$MN (^GLDOOHO PHDQ` Q (A%-E Sf(3 *&$M SSOfff(3M(A6&$MN DQG PAGE 62 (^FKHFNORW PHDQ` Q (A%-E U ZKHUH M IRU *&$ LV M RU N DQG W UHSUHVHQWV WKH IL[HG JHQHWLF SDUDPHWHU RI WKH FKHFNORW 7KH IXQFWLRQ XVHG WR SURSHUO\ ORFDWH *&$OUG WKH VXEVFULSW UHO GHQRWHV WKH UHORFDWHG *&$f LV JFDUH JFD OfGLDOOHO PHDQ FKHFNORW PHDQf 7KH H[SHFWDWLRQ RI JFDUH ZLWK QHJOLJLEOH 6&$ LV *&$OQ *&$ W DQG VLQFH EUHHGLQJ YDOXH HTXDOV WZLFH *&$ %9UH %9 W ,I 6&$ LV QRQQHJOLJLEOH WKHQ WKH H[SHFWDWLRQ LV *&$UH *&$ OSOff(_86&$N OSOfSfffAM-"6&$r 7 ,Q HLWKHU FDVH WKH IXQFWLRQ SURYLGHV D UHDVRQDEOH PDQQHU E\ ZKLFK *&$ HVWLPDWHV IURP GLVFRQQHFWHG GLDOOHOV DUH FHQWHUHG DW WKH VDPH ORFDWLRQ RQ D QXPEHU VFDOH DQG DUH WKHQ FRPSDUDEOH 9DULDQFH DQG &RYDULDQFH RI 3ORW 0HDQV 7KH YDULDQFHV RI SORW PHDQV ZLWK XQHTXDO QXPEHUV RI WUHHV SHU SORW DUH E\ GHILQLWLRQ XQHTXDO LH 9DU\LMNf FUS R9QLMN ZKHUH DS LV SORW YDULDQFH Rr LV WKH ZLWKLQ SORW YDULDQFH DQG QLMN LV WKH QXPEHU RI REVHUYDWLRQV SHU SORW $OVR LI EORFNV ZHUH FRQVLGHUHG UDQGRP WKHUH ZRXOG EH DQ DGGLWLRQDO VRXUFH RI YDULDQFH IRU SORW PHDQV GXH WR EORFNV DV ZHOO DV D FRYDULDQFH EHWZHHQ SORW PHDQV LQ WKH VDPH EORFNf DQG WKLV FRXOG EH LQFRUSRUDWHG LQWR WKH 9 PDWUL[ ZLWK 9DU\LMNf D? US R-QWMN 6LQFH WKH YDULDQFHV RI WKH PHDQV LQ WKH REVHUYDWLRQ YHFWRU DUH QRW HTXDO DQG WKHUH LV D FRYDULDQFH EHWZHHQ WKH PHDQV LI EORFNV DUH EHLQJ FRQVLGHUHG UDQGRP EHVW OLQHDU XQELDVHG HVWLPDWHV %/8(f ZRXOG EH VHFXUHG E\ ZHLJKWLQJ HDFK PHDQ E\ LWfV WUXH DVVRFLDWHG YDULDQFH 6HDUOH SDJH f 7KLV LV WKH JHQHUDOL]HG OHDVW VTXDUHV */6f DSSURDFK DV E ;9n;-n;f9n\ PAGE 63 7KH */6 DSSURDFK UHOD[HV WKH 2/6 DVVXPSWLRQV RI HTXDO YDULDQFH RI DQG QR FRYDULDQFH EHWZHHQ WKH REVHUYDWLRQV SORW PHDQVf ZKLOH VWLOO WUHDWLQJ JHQHWLF SDUDPHWHUV DV IL[HG HIIHFWV 7KH HQWULHV DORQJ WKH GLDJRQDO RI WKH 9 PDWUL[ DUH WKH YDULDQFHV RI WKH SORW PHDQV 9DU\LMNff LQ WKH VDPH RUGHU DV PHDQV LQ WKH GDWD YHFWRU 7KH RIIGLDJRQDO HOHPHQWV RI 9 ZRXOG EH HLWKHU RU D? WKH YDULDQFH GXH WR WKH UDQGRP YDULDEOH EORFNf IRU HOHPHQWV FRUUHVSRQGLQJ WR REVHUYDWLRQV LQ WKH VDPH EORFN %/8( UHTXLUHV H[DFW NQRZOHGJH RI 9 LI HVWLPDWHV RI DS DDQG Rf DUH XWLOL]HG LQ WKH 9 PDWUL[ HVWLPDEOH IXQFWLRQV RI DSSUR[LPDWH %/8( 7KH 2/6 DVVXPSWLRQ WKDW 6&$ DQG *&$ DUH IL[HG HIIHFWV FDQ DOVR EH UHOD[HG WR DOORZ IRU FRYDULDQFHV GXH WR JHQHWLF UHODWHGQHVV ,Q SDUWLFXODU WKH LQIRUPDWLRQ WKDW PHDQV DUH IURP WKH VDPH KDOI RU IXOOVLE IDPLO\ FRXOG EH LQFOXGHG LQ WKH 9 PDWUL[ 5HOD[DWLRQ RI WKH ]HUR FRYDULDQFH DVVXPSWLRQ LPSOLHV WKDW *&$ DQG 6&$ DUH UDQGRP YDULDEOHV ,I *&$ DQG 6&$ DUH WUHDWHG DV UDQGRP YDULDEOHV WKHQ WKH DSSOLFDWLRQ RI EHVW OLQHDU SUHGLFWLRQ %/3f RU EHVW OLQHDU XQELDVHG SUHGLFWLRQ %/83f WR WKH SUREOHP ZRXOG EH PRUH DSSURSULDWH :KLWH DQG +RGJH SDJH f 7KH WUHDWPHQW RI WKH JHQHWLF SDUDPHWHUV DV UDQGRP YDULDEOHV LV FRQVLVWHQW ZLWK WKDW XVHG LQ HVWLPDWLQJ JHQHWLF FRUUHODWLRQV DQG KHULWDELOLWLHV 7KH 9 PDWUL[ RI VXFK DQ DSSOLFDWLRQ ZRXOG LQFOXGH LQ DGGLWLRQ WR WKH IHDWXUHV RI WKH */6 9 PDWUL[ WKH FRYDULDQFH EHWZHHQ IXOOVLE RU KDOI VLE IDPLOLHV DGGHG WR WKH RIIGLDJRQDO HOHPHQWV LQ 9 LH LI WKH ILUVW DQG VHFRQG SORW PHDQV LQ WKH GDWD YHFWRU KDG D FRYDULDQFH GXH WR UHODWLRQVKLS WKHQ WKDW FRYDULDQFH LV LQVHUWHG WZLFH LQ WKH 9 PDWUL[ 7KH FRYDULDQFH ZRXOG DSSHDU DV WKH VHFRQG HOHPHQW LQ WKH ILUVW URZ DQG WKH ILUVW HOHPHQW LQ WKH VHFRQG URZ RI 9 9 LV D V\PPHWULF PDWUL[f $OVR WKH GLDJRQDO HOHPHQWV RI 9 ZRXOG LQFUHDVH E\ XJFD WKH YDULDQFH GXH WR WUHDWLQJ *&$ DV D UDQGRP YDULDEOHf UVFD WKH YDULDQFH GXH WR WUHDWLQJ 6&$ DV D UDQGRP YDULDEOHf PAGE 64 &RPSDULVRQ RI 3UHGLFWLRQ DQG (VWLPDWLRQ 0HWKRGRORJLHV :KLFK PHWKRGRORJ\ 2/6 */6 %/3 RU %/83f WR DSSO\ WR LQGLYLGXDO GDWD EDVHV LV VRPHZKDW D VXEMHFWLYH GHFLVLRQ 7KH GHFLVLRQ FDQ EH EDVHG ERWK RQ WKH FRPSXWDWLRQDO RU FRQFHSWXDO FRPSOH[LW\ RI WKH PHWKRG DQG WKH PDJQLWXGH RI WKH GDWD EDVH ZLWK ZKLFK WKH DQDO\VW LV ZRUNLQJ 7R DLG LQ WKLV GHFLVLRQ WKLV GLVFXVVLRQ KLJKOLJKWV WKH GLIIHUHQFHV LQ WKH LQKHUHQW SURSHUWLHV DQG DVVXPSWLRQV RI WKH WHFKQLTXHV )RU DOO SUDFWLFDO SXUSRVHV WKH DQVZHUV IURP WKH IRXU WHFKQLTXHV ZLOO QHYHU EH HTXDO KRZHYHU WKHUH DUH WZR FDYHDWV )LUVW 2/6 HVWLPDWHV HTXDO */6 HVWLPDWHV LI DOO WKH FHOO PHDQV DUH NQRZQ ZLWK WKH VDPH SUHFLVLRQ YDULDQFHf 6HDUOH SDJH f 2WKHUZLVH */6 GLVFRXQWV WKH PHDQV WKDW DUH NQRZQ ZLWK OHVV SUHFLVLRQ LQ WKH FDOFXODWLRQV DQG GLIIHUHQW HVWLPDWHV UHVXOW 7KH VHFRQG FDYHDW LV LI WKH DPRXQW RI GDWD LV LQILQLWH LH DOO FURVV PHDQV DUH NQRZQ ZLWKRXW HUURU WKHQ DOO IRXU WHFKQLTXHV DUH HTXLYDOHQW :KLWH DQG +RGJH SDJHV f ,Q DOO RWKHU FDVHV %/3 DQG %/83 VKULQN SUHGLFWLRQV WRZDUG WKH ORFDWLRQ SDUDPHWHUVf DQG SURGXFH SUHGLFWLRQV ZKLFK DUH GLIIHUHQW IURP 2/6 RU */6 HVWLPDWHV HYHQ ZLWK EDODQFHG GDWD 'XULQJ FDOFXODWLRQV */6 %/3 DQG %/83 SODFH OHVV ZHLJKW RQ REVHUYDWLRQV NQRZQ ZLWK OHVV SUHFLVLRQ ZKLFK LV LQWXLWLYHO\ SOHDVLQJ :LWK 2/6 DQG */6 IRUHVW JHQHWLFLVWV WUHDW *&$fV DQG 6&$fV DV IL[HG HIIHFWV IRU HVWLPDWLRQ DQG WKHQ DV UDQGRP YDULDEOHV IRU JHQHWLF FRUUHODWLRQV DQG KHULWDELOLWLHV %/3 DQG %/83 SURYLGH D FRQVLVWHQW WUHDWPHQW RI *&$fV DQG 6&$fV DV UDQGRP YDULDEOHV ZKLOH GLIIHULQJ LQ WKHLU DVVXPSWLRQV DERXW ORFDWLRQ SDUDPHWHUV IL[HG HIIHFWVf ,Q %/3 IL[HG HIIHFWV DUH DVVXPHG NQRZQ ZLWKRXW HUURU DOWKRXJK WKH\ DUH XVXDOO\ HVWLPDWHG IURP WKH GDWDf ZKLOH ZLWK %/83 IL[HG HIIHFWV DUH HVWLPDWHG XVLQJ */6 %/3 DQG %/83 WHFKQLTXHV DOVR FRQWDLQ WKH DVVXPSWLRQ WKDW WKH FRYDULDQFH PDWUL[ RI WKH REVHUYDWLRQV LV NQRZQ ZLWKRXW HUURU PRVW RIWHQ YDULDQFHV PXVW EH HVWLPDWHGf ,Q PDQ\ %/83 DSSOLFDWLRQV +HQGHUVRQ f PL[HG PRGHO HTXDWLRQV DUH XWLOL]HG PAGE 65 LWHUDWLYHO\ WR HVWLPDWH IL[HG HIIHFWV DQG WR SUHGLFW UDQGRP YDULDEOHV IURP D GDWD VHW $ %/83 WUHDWPHQW RI IL[HG HIIHFWV DOORZV DQ\ FRQQHFWHGQHVV EHWZHHQ H[SHULPHQWV WR EH XWLOL]HG LQ WKH HVWLPDWLRQ RI WKH IL[HG HIIHFWV 7KLV SURYLGHV DQ LQWXLWLYH DGYDQWDJH RI %/83 RYHU %/3 LQ H[SHULPHQWDWLRQ ZKHUH FRQQHFWHGQHVV DPRQJ JHQHWLF H[SHULPHQWV LV DYDLODEOH RU ZKHUH WKH GDWD DUH VR XQEDODQFHG WKDW WUHDWLQJ WKH IL[HG HIIHFWV DV NQRZQ LV OHVV GHVLUDEOH WKDQ D */6 HVWLPDWH RI WKH IL[HG HIIHFWV $Q RUGHULQJ RI FRPSXWDWLRQDO FRPSOH[LW\ DQG FRQFHSWXDO FRPSOH[LW\ IURP OHDVW WR PRVW FRPSOH[ RI WKH IRXU PHWKRGV LV 2/6 */6 %/3 DQG %/83 7KH ODWWHU WKUHH PHWKRGV UHTXLUH WKH HVWLPDWLRQ RI WKH FRYDULDQFH PDWUL[ RI WKH REVHUYDWLRQV HLWKHU VHSDUDWHO\ D SULRULf RU LWHUDWLYHO\ ZLWK WKH IL[HG HIIHFWV 3UHFLVH HVWLPDWLRQ RI WKH FRYDULDQFH PDWUL[ IRU REVHUYDWLRQV UHTXLUHV D JUHDW QXPEHU RI REVHUYDWLRQV DQG WKH SUHFLVLRQ RI */6 %/3 DQG %/83 HVWLPDWLRQV RU SUHGLFWLRQV LV DIIHFWHG E\ WKH HUURU RI HVWLPDWLRQ RI WKH FRPSRQHQWV RI 9 6HOHFWLRQ RI D PHWKRG FDQ WKHQ EH EDVHG RQ ZHLJKLQJ WKH FRPSXWDWLRQDO FRPSOH[LW\ DQG VL]H RI WKH DYDLODEOH GDWD EDVH DJDLQVW WKH DGYDQWDJHV RIIHUHG E\ HDFK PHWKRG 7KXV LI FRPSOH[LW\ RI WKH FRPSXWDWLRQDO SUREOHP LV RI SDUDPRXQW FRQFHUQ WKH DQDO\VW QHFHVVDULO\ ZRXOG FKRRVH 2/6 :LWK D VPDOO GDWD EDVH RQH WKDW GRHV QRW DOORZ UHDVRQDEOH HVWLPDWHV RI YDULDQFHVf WKH DQDO\VW ZRXOG DJDLQ FKRRVH 2/6 :LWK D ODUJH GDWD EDVH DQG QR TXDOPV ZLWK FRPSXWDWLRQDO FRPSOH[LW\ WKH DQDO\VW FDQ FKRRVH EHWZHHQ %/3 DQG %/83 EDVHG RQ ZKHWKHU WKHUH LV VXIILFLHQW FRQQHFWHGQHVV RU LPEDODQFH DPRQJ WKH H[SHULPHQWV WR PDNH %/83 DGYDQWDJHRXV &RQFOXVLRQV 0HWKRGV RI VROYLQJ IRU *&$ DQG 6&$ HVWLPDWHV IRU EDODQFHG SORWPHDQ EDVLVf DQG XQEDODQFHG GDWD KDYH EHHQ SUHVHQWHG DORQJ ZLWK WKH LQKHUHQW DVVXPSWLRQV RI WKH DQDO\VLV 7KH XVH RI SORW PHDQV DQG WKH PDWUL[ HTXDWLRQV ZLOO SURGXFH VXPWR]HUR 2/6 HVWLPDWHV IRU *&$ DQG PAGE 66 6&$ IRU DOO W\SHV RI LPEDODQFH )RUPXODH LQ WKH OLWHUDWXUH ZKLFK \LHOG 2/6 VROXWLRQV IRU EDODQFHG GDWD FDQ \LHOG PLVOHDGLQJ VROXWLRQV IRU XQEDODQFHG GDWD EHFDXVH RI WKH ORVV RI RUWKRJRQDOLW\ DQG DOVR ZHLJKWLQJV RQ VLWH PHDQV IRU FURVVHV RU WRWDOVf DUH FRQVWDQWV *&$fV DQG 6&$fV REWDLQHG WKURXJK VXPWR]HUR UHVWULFWLRQ DUH QRW WUXO\ HVWLPDWHV RI SDUDPHWULF SRSXODWLRQ *&$fV DQG 6&$fV 7KHUH DUH DQ LQILQLWH QXPEHU RI VROXWLRQV IRU *&$fV DQG 6&$fV IURP WKH V\VWHP RI HTXDWLRQV DV D UHVXOW RI WKH RYHUSDUDPHWHUL]HG OLQHDU PRGHO PAGE 67 &+$37(5 9$5,$1&( &20321(17 (67,0$7,21 7(&+1,48(6 &203$5(' )25 7:2 0$7,1* '(6,*16 :,7+ )25(67 *(1(7,& $5&+,7(&785( 7+528*+ &20387(5 6,08/$7,21 ,QWURGXFWLRQ ,Q PDQ\ DSSOLFDWLRQV RI TXDQWLWDWLYH JHQHWLFV JHQHWLFLVWV DUH FRPPRQO\ IDFHG ZLWK WKH DQDO\VLV RI GDWD FRQWDLQLQJ D PXOWLWXGH RI IODZV HJ QRQQRUPDOLW\ LPEDODQFH DQG KHWHURVFHGDVWLFLW\f ,PEDODQFH DV RQH RI WKHVH IODZV LV LQWULQVLF WR TXDQWLWDWLYH IRUHVW JHQHWLFV UHVHDUFK EHFDXVH RI WKH GLIILFXOW\ LQ PDNLQJ FURVVHV IRU IXOOVLE WHVWV DQG WKH ELRORJLFDO UHDOLWLHV RI ORQJ WHUP ILHOG H[SHULPHQWV )HZ GHILQLWLYH VWXGLHV KDYH EHHQ FRQGXFWHG WR HVWDEOLVK RSWLPDO PHWKRGV IRU HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV IURP XQEDODQFHG GDWD 6LPXODWLRQ VWXGLHV XVLQJ VLPSOH PRGHOV RQHZD\ RU WZRZD\ UDQGRP PRGHOVf KDYH EHHQ FRQGXFWHG IRU FHUWDLQ GDWD VWUXFWXUHV LH LPEDODQFH H[SHULPHQWDO GHVLJQ DQG YDULDQFH SDUDPHWHUV &RUEHLO DQG 6HDUOH 6ZDOORZ 6ZDOORZ DQG 0RQDKDQ LQWHUSUHWDWLRQV E\ /LWWHOO DQG 0F&XWFKDQ f 7KH UHVXOWV IURP WKHVH VWXGLHV LQGLFDWH WKDW WHFKQLTXH RSWLPDOLW\ LV D IXQFWLRQ RI WKH GDWD VWUXFWXUH ,Q SUDFWLFH ERWK KLVWRULFDOO\ DQG VWLOO FRPPRQ SODFHf HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV LQ IRUHVW JHQHWLFV DSSOLFDWLRQV KDV EHHQ DFKLHYHG E\ XVLQJ VHTXHQWLDOO\ DGMXVWHG VXPV RI VTXDUHV DV DQ DSSOLFDWLRQ RI +HQGHUVRQfV 0HWKRG +0 +HQGHUVRQ f 8QGHU QRUPDOLW\ DQG ZLWK EDODQFHG GDWD WKLV WHFKQLTXH KDV WKH GHVLUDEOH SURSHUWLHV RI EHLQJ WKH PLQLPXP YDULDQFH XQELDVHG HVWLPDWRU ,I WKH GDWD DUH XQEDODQFHG WKHQ WKH RQO\ SURSHUW\ UHWDLQHG E\ +0 HVWLPDWLRQ LV PAGE 68 XQELDVHGQHVV 6HDUOH 6HDUOH SS f 2WKHU HVWLPDWRUV KDYH EHHQ VKRZQ WR EH ORFDOO\ VXSHULRU WR +0 LQ YDULDQFH RU PHDQ VTXDUH HUURU SURSHUWLHV LQ FHUWDLQ FDVHV .ORW] HW DO 2OVHQ HW DO 6ZDOORZ 6ZDOORZ DQG 0RQDKDQ f 2YHU WKH ODVW \HDUV WKHUH KDV EHHQ D SUROLIHUDWLRQ RI YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV LQFOXGLQJ PLQLPXP QRUP TXDGUDWLF XQELDVHG HVWLPDWLRQ 0,148( 5DR Df PLQLPXP YDULDQFH TXDGUDWLF XQELDVHG HVWLPDWLRQ 0,948( 5DR Ef PD[LPXP OLNHOLKRRG 0/ +DUWOH\ DQG 5DR f DQG UHVWULFWHG PD[LPXP OLNHOLKRRG 5(0/ 3DWWHUVRQ DQG 7KRPSVRQ f 7KH SUDFWLFDO DSSOLFDWLRQ RI WKHVH WHFKQLTXHV KDV EHHQ LPSHGHG E\ WKHLU FRPSXWDWLRQDO FRPSOH[LW\ +RZHYHU ZLWK FRQWLQXLQJ DGYDQFHV LQ FRPSXWHU WHFKQRORJ\ DQG WKH DSSHDUDQFH RI EHWWHU FRPSXWDWLRQDO DOJRULWKPV WKH DSSOLFDWLRQ RI WKHVH SURFHGXUHV FRQWLQXHV WR EHFRPH PRUH WUDFWDEOH +DUYLOOH *HLVEUHFKW 0H\HU f :KHWKHU WKHVH PHWKRGV RI DQDO\VLV DUH VXSHULRU WR +0 IRU PDQ\ JHQHWLFV DSSOLFDWLRQV UHPDLQV WR EH VKRZQ :LWK EDODQFHG GDWD DQG GLVUHJDUGLQJ QHJDWLYH HVWLPDWHV DOO SUHYLRXVO\ PHQWLRQHG WHFKQLTXHV H[FHSW 0/ SURGXFH WKH VDPH HVWLPDWHV +DUYLOOH f :LWK XQEDODQFHG GDWD HDFK WHFKQLTXH SURGXFHV D GLIIHUHQW VHW RI YDULDQFH FRPSRQHQW HVWLPDWHV &ULWHULD PXVW WKHQ EH DGRSWHG WR GLVFULPLQDWH DPRQJ WHFKQLTXHV &DQGLGDWH FULWHULD IRU GLVFULPLQDWLRQ LQFOXGH XQELDVHGQHVV ODUJH QXPEHU FRQYHUJHQFH RQ WKH SDUDPHWULF YDOXHf PLQLPXP YDULDQFH HVWLPDWRU ZLWK WKH VPDOOHVW VDPSOLQJ YDULDQFHf PLQLPXP PHDQ VTXDUH HUURU PLQLPXP RI VDPSOLQJ YDULDQFH SOXV VTXDUHG ELDV +RJJ DQG &UDLJ f DQG SUREDELOLW\ RI QHDUQHVV SUREDELOLW\ WKDW VDPSOH HVWLPDWHV RFFXU LQ D FHUWDLQ LQWHUYDO DURXQG WKH SDUDPHWULF YDOXH 3LWPDQ f 1HJDWLYH HVWLPDWHV DUH DOVR SUREOHPDWLF LQ WKH HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV )LYH DOWHUQDWLYHV IRU GHDOLQJ ZLWK WKH GLOHPPD RI HVWLPDWHV OHVV WKDQ ]HUR RXWVLGH WKH QDWXUDO SDUDPHWHU VSDFH RI ]HUR WR LQILQLW\f DUH 6HDUOH f f DFFHSW DQG XVH WKH QHJDWLYH HVWLPDWH f VHW WKH QHJDWLYH HVWLPDWH WR ]HUR SURGXFLQJ ELDVHG HVWLPDWHVf f UHVROYH WKH V\VWHP ZLWK WKH RIIHQGLQJ PAGE 69 FRPSRQHQW VHW WR ]HUR f XVH DQ DOJRULWKP ZKLFK GRHV QRW DOORZ QHJDWLYH HVWLPDWHV DQG f XVH WKH QHJDWLYH HVWLPDWH WR LQIHU WKDW WKH ZURQJ PRGHO ZDV XWLOL]HG 7KH SXUSRVH RI WKLV UHVHDUFK ZDV WR GHWHUPLQH LI WKH FULWHULD RI XQELDVHGQHVV PLQLPXP YDULDQFH PLQLPXP PHDQ VTXDUH HUURU DQG SUREDELOLW\ RI QHDUQHVV GLVFULPLQDWHG DPRQJ VHYHUDO YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV ZKLOH H[SORULQJ YDULRXV DOWHUQDWLYHV IRU GHDOLQJ ZLWK QHJDWLYH YDULDQFH FRPSRQHQW HVWLPDWHV ,Q RUGHU WR PDNH VXFK FRPSDULVRQV D ODUJH QXPEHU RI GDWD VHWV ZHUH UHTXLUHG IRU HDFK H[SHULPHQWDO OHYHO 8VLQJ VLPXODWHG GDWD WKLV FKDSWHU FRPSDUHV YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV IRU SORWPHDQ DQG LQGLYLGXDO REVHUYDWLRQV WZR PDWLQJ V\VWHPV PRGLILHG KDOIGLDOOHO DQG KDOIVLEf DQG WZR VHWV RI SDUDPHWULF YDULDQFH FRPSRQHQWV 7\SHV RI LPEDODQFH DQG OHYHOV RI IDFWRUV ZHUH FKRVHQ WR UHIOHFW FRPPRQ VLWXDWLRQV LQ IRUHVW JHQHWLFV 0HWKRGV ([SHULPHQWDO $SSURDFK )RU HDFK H[SHULPHQWDO OHYHO GDWD VHWV ZHUH JHQHUDWHG DQG DQDO\]HG E\ YDULRXV WHFKQLTXHV 7DEOH f SURGXFLQJ QXPHURXV VHWV RI YDULDQFH FRPSRQHQW HVWLPDWHV IRU HDFK GDWD VHW 7KLV ZRUNORDG UHVXOWHG LQ HQRUPRXV FRPSXWDWLRQDO WLPH EHLQJ DVVRFLDWHG ZLWK HDFK H[SHULPHQWDO OHYHO 7KH RYHUDOO H[SHULPHQWDO GHVLJQ IRU WKH VLPXODWLRQ ZDV RULJLQDOO\ FRQFHLYHG DV D IDFWRULDO ZLWK WZR W\SHV RI PDWLQJ GHVLJQ KDOIGLDOOHO DQG KDOIVLEf WZR VHWV RI WUXH YDULDQFH FRPSRQHQWV 7DEOH f WZR NLQGV RI REVHUYDWLRQV LQGLYLGXDO DQG SORW PHDQf DQG WKUHH W\SHV RI LPEDODQFH f VXUYLYDO OHYHOV b DQG b ZLWK b UHSUHVHQWLQJ PRGHUDWH VXUYLYDO DQG b UHSUHVHQWLQJ SRRU VXUYLYDO f IRU IXOOVLE GHVLJQV WKUHH OHYHOV RI PLVVLQJ FURVVHV DQG RXW RI FURVVHVf DQG f IRU KDOIVLE GHVLJQV WZR OHYHOV RI FRQQHFWHGQHVV DPRQJ WHVWV DQG FRPPRQ IDPLOLHV EHWZHHQ WHVWV RXW RI IDPLOLHV SHU WHVWf %HFDXVH RI WKH FRPSXWDWLRQDO WLPH PAGE 70 7DEOH $EEUHYLDWLRQ IRU DQG GHVFULSWLRQ RI YDULDQFH FRPSRQHQW HVWLPDWLRQ PHWKRGV XWLOL]HG IRU DQDO\VHV EDVHG RQ LQGLYLGXDO REVHUYDWLRQV LI XWLOL]HG IRU SORWPHDQ DQDO\VLV WKH DEEUHYLDWLRQ LV PRGLILHG E\ SUHIL[LQJ D f3ff $EEUHYLDWLRQ 'HVFULSWLRQ &LWDWLRQ 0/ 30/ 0D[LPXP /LNHOLKRRG HVWLPDWHV QRW UHVWULFWHG WR WKH SDUDPHWHU VSDFH LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf +DUWOH\ DQG 5DR 6KDZ 02'0/ 0D[LPXP /LNHOLKRRG QHJDWLYH HVWLPDWHV VHW WR ]HUR DIWHU FRQYHUJHQFH LQGLYLGXDO DQDO\VLVf +DUWOH\ DQG 5DR 110/ 0D[LPXP /LNHOLKRRG LI QHJDWLYH HVWLPDWHV DSSHDUHG DW FRQYHUJHQFH WKH\ ZHUH VHW WR ]HUR DQG WKH V\VWHP UHVROYHG LQGLYLGXDO DQDO\VLVf +DUWOH\ DQG 5DR 0LOOHU 5(0/ 35(0/ 5HVWULFWHG 0D[LPXP /LNHOLKRRG HVWLPDWHV QRW UHVWULFWHG WR WKH SDUDPHWHU VSDFH LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf 3DWWHUVRQ DQG 7KRPSVRQ 6KDZ +DUYLOOH 02'5(0/ 5HVWULFWHG 0D[LPXP /LNHOLKRRG QHJDWLYH HVWLPDWHV VHW WR ]HUR DIWHU FRQYHUJHQFH LQGLYLGXDO DQDO\VLVf 3DWWHUVRQ DQG 7KRPSVRQ 115(0/ 3115(0/ 5HVWULFWHG 0D[LPXP /LNHOLKRRG LI QHJDWLYH HVWLPDWHV DSSHDUHG DW FRQYHUJHQFH WKH\ ZHUH VHW WR ]HUR DQG WKH V\VWHP UHVROYHG LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf 3DWWHUVRQ DQG 7KRPSVRQ 0LOOHU 0,948( 30,948( 0LQLPXP 9DULDQFH 4XDGUDWLF 8QELDVHG QRQLWHUDWLYH ZLWK WUXH SDUDPHWULFf YDOXHV RI WKH YDULDQFH FRPSRQHQWV DV SULRUV LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf 5DR E 0,148( 30,148( 0LQLPXP 1RUP 4XDGUDWLF 8QELDVHG QRQLWHUDWLYH ZLWK RQHV DV SULRUV IRU DOO YDULDQFH FRPSRQHQWV LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf 5DR D 7<3( 37<3( 6HTXHQWLDOO\ $GMXVWHG 6XPV RI 6TXDUHV +HQGHUVRQfV 0HWKRG LQGLYLGXDO DQG SORWPHDQ DQDO\VLVf +HQGHUVRQ 0,93(1 0,948( ZLWK D SHQDOW\ DOJRULWKP WR SUHYHQW QHJDWLYH HVWLPDWHV LQGLYLGXDO DQDO\VLVf +DUYLOOH FRQVWUDLQW WKH H[SHULPHQW FRXOG QRW EH UXQ DV D FRPSOHWH IDFWRULDO DQG WKH LQYHVWLJDWLRQ FRQWLQXHG DV D SDUWLDO IDFWRULDO ,Q JHQHUDO WKH DSSURDFK ZDV WR UXQ OHYHOV ZKLFK ZHUH DW RSSRVLWH HQGV RI WKH LPEDODQFH VSHFWUXP LH b VXUYLYDO DQG QR PLVVLQJ FURVVHV YHUVXV b VXUYLYDO DQG PLVVLQJ FURVVHV ZLWKLQ D YDULDQFH FRPSRQHQW OHYHO ,I UHVXOWV ZHUH FRQVLVWHQW DFURVV WKHVH WUHDWPHQW FRPELQDWLRQV LQWHUPHGLDWH OHYHOV ZHUH QRW UXQ PAGE 71 'HVLJQDWLRQ RI D WUHDWPHQW FRPELQDWLRQ LV E\ ILYH FKDUDFWHU DOSKDQXPHULF ILHOG 7KH ILUVW FKDUDFWHU LV HLWKHU + KDOIVLEf RU KDOIGLDOOHOf 7KH VHFRQG FKDUDFWHU GHQRWHV WKH VHW RI SDUDPHWULF YDULDQFH FRPSRQHQWV ZKHUH GHVLJQDWHG WKH VHW RI YDULDQFH FRPSRQHQWV DVVRFLDWHG ZLWK KHULWDELOLW\ RI DQG GHVLJQDWHG WKH VHW RI YDULDQFH FRPSRQHQWV DVVRFLDWHG ZLWK KHULWDELOLW\ RI 7DEOH f 7KH WKLUG FKDUDFWHU LV DQ 6 LQGLFDWLQJ WKDW WKH ODVW WZR FKDUDFWHUV GHWHUPLQH WKH LPEDODQFH OHYHO 7KH IRXUWK FKDUDFWHU GHVLJQDWHV WKH VXUYLYDO OHYHO HLWKHU IRU b RU IRU b 7KH ILQDO FKDUDFWHU VSHFLILHV WKH QXPEHU RI PLVVLQJ FURVVHV KDOI GLDOOHOf RU ODFN RI FRQQHFWHGQHVV KDOIVLEf 7KH WUHDWPHQW FRPELQDWLRQ f+6f LV D KDOIVLE PDWLQJ GHVLJQ +f WKH VHW RI YDULDQFH FRPSRQHQWV DVVRFLDWHG ZLWK KHULWDELOLW\ HTXDOOLQJ f b VXUYLYDO f DQG FRPPRQ SDUHQWV DFURVV WHVWV f 7DEOH 6HWV RI WUXH YDULDQFH FRPSRQHQWV IRU WKH KDOIGLDOOHO DQG KDOIVLE PDWLQJ GHVLJQV JHQHUDWHG IURP VSHFLILFDWLRQ RI WZR OHYHOV RI VLQJOHWUHH KHULWDELOLW\ Kf W\SH % FRUUHODWLRQ U%f DQG QRQDGGLWLYH WR DGGLWLYH YDULDQFH UDWLR GDf *HQHWLF 5DWLRVr 0DWLQJ 'HVLJQ 7UXH 9DULDQFH &RPSRQHQWVf K GD R" R R@ RI } IXOOVLE KDOIVLE 1$ 1$ IXOOVLE D K RJ IISKHQRW\SLF U% R DJ UWJf DQG "' D? DV GD D UJ E 6HH GHILQLWLRQV LQ HTXDWLRQ ([SHULPHQWDO 'HVLJQ IRU 6LPXODWHG 'DWD 7KH PDWLQJ GHVLJQ IRU WKH VLPXODWLRQ ZDV HLWKHU D VL[SDUHQW KDOIGLDOOHO QR VHLIVf RU D ILIWHHQSDUHQW KDOIVLE 7KH UDQGRPL]HG FRPSOHWH EORFN ILHOG GHVLJQ ZDV LQ WKUHH ORFDWLRQV LH VHSDUDWH ILHOG WHVWVf ZLWK IRXU FRPSOHWH EORFNV SHU ORFDWLRQ DQG VL[ WUHHV SHU IDPLO\ LQ D EORFN ZKHUH IDPLO\ LV D IXOOVLE IDPLO\ IRU KDOIGLDOOHO RU D KDOIVLE IDPLO\ IRU WKH KDOIVLE GHVLJQ 7KLV PAGE 72 ILHOG GHVLJQ DQG WKH PDWLQJ GHVLJQV UHIOHFW W\SLFDO GHVLJQV LQ IRUHVWU\ DSSOLFDWLRQV 6TXLOODFH :LOFR[ HW DO %ULGJZDWHU HW DO :HLU DQG *RGGDUG /RR'LQNLQV HW DO f DQG DUH DOVR FRPPRQO\ XVHG LQ RWKHU GLVFLSOLQHV 0DW]LQJHU HW DO +DOODXHU DQG 0LUDQGD 6LQJK DQG 6LQJK f 7KH VL[ WUHHV SHU IDPLO\ FRXOG EH FRQVLGHUHG DV FRQWLJXRXV RU QRQFRQWLJXRXV SORWV ZLWKRXW DIIHFWLQJ WKH UHVXOWV RU LQIHUHQFHV )XOO6LE /LQHDU 0RGHO 7KH VFDODU OLQHDU PRGHO HPSOR\HG IRU KDOIGLDOOHO LQGLYLGXDO REVHUYDWLRQV LV \cMNWR 0 Wc EM JN J_ 6X WJLN WJX W6MMM SLMNO ZLMNWR ZKHUH \LMNOP LV WKH P REVHUYDWLRQ RI WKH NO FURVV LQ WKH MfÂ§ EORFN RI WKH LfÂ§ WHVW + LV WKH SRSXODWLRQ PHDQ Wc LV WKH UDQGRP YDULDEOH WHVW ORFDWLRQ a 1,'Df EcM LV WKH UDQGRP YDULDEOH EORFN a 1,'UEf JN LV WKH UDQGRP YDULDEOH IHPDOH JHQHUDO FRPELQLQJ DELOLW\ JFDf a 1,'L2RA J LV WKH UDQGRP YDULDEOH PDOH JFD a 1,'O2DA VX LV WKH UDQGRP YDULDEOH VSHFLILF FRPELQLQJ DELOLW\ VHDf a 1,'ARf WJA LV WKH UDQGRP YDULDEOH WHVW E\ IHPDOH JFD LQWHUDFWLRQ a 1,'UAf WJX LV WKH UDQGRP YDULDEOH WHVW E\ PDOH JFD LQWHUDFWLRQ a 1,'AFUA WVA LV WKH UDQGRP YDULDEOH WHVW E\ VHD LQWHUDFWLRQ a 1,'XSLMNO LV WKH UDQGRP YDULDEOH SORW a 1,'USf ZLMNWD LV WKH UDQGRP YDULDEOH ZLWKLQSORW a 1,'XZf DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KLV OLQHDU PRGHO LQ PDWUL[ QRWDWLRQ LV GLPHQVLRQV EHORZ PRGHO FRPSRQHQWf \ fÂ§ UO =7HU =%H% =*H* =JA =7*HS* =7VHL6 =3H3 PAGE 73 UXH UXO UXW W[O UXE EMHO UXJ J[O UXV VMF UXWJ WJ[O UXWV WVMHO UXS SMF UXO ZKHUH \ LV WKH REVHUYDWLRQ YHFWRU =c LV WKH SRUWLRQ RI WKH GHVLJQ PDWUL[ IRU WKH LfÂ§ UDQGRP YDULDEOH Hc LV WKH YHFWRU RI XQREVHUYDEOH UDQGRP HIIHFWV IRU WKH LfÂ§ UDQGRP YDULDEOH LV D YHFWRU RI OfV DQG Q W E J V WJ WV DQG S DUH WKH QXPEHU RI REVHUYDWLRQV WHVWV EORFNV JFDfV VHDfV WHVW E\ JFD LQWHUDFWLRQV WHVW E\ VHD LQWHUDFWLRQV DQG SORWV UHVSHFWLYHO\ 8WLOL]LQJ FXVWRPDU\ DVVXPSWLRQV LQ KDOIGLDOOHO PDWLQJ GHVLJQV 0HWKRG *ULIILQJ f WKH YDULDQFH RI DQ LQGLYLGXDO REVHUYDWLRQ LV 9DU\LMOGD A U A Dr US Dr DQG LQ PDWUL[ QRWDWLRQ WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV LV 9DU\f ==fR ==ÂRJ =V=fD? =7&=A =S=-R \Z ZKHUH f LQGLFDWHV WKH WUDQVSRVH RSHUDWRU DOO PDWULFHV RI WKH IRUP =c=cf DUH UXQ DQG ,f LV DQ UXQ LGHQWLW\ PDWUL[ +DOIVLE /LQHDU 0RGHO 7KH VFDODU OLQHDU PRGHO IRU KDOIVLE LQGLYLGXDO REVHUYDWLRQV LV \cMNP 0 Wc E\ JN WJr 3KMN :KLMNP ZKHUH \LMNQL LV WKH P REVHUYDWLRQ RI WKH N KDOIVLE IDPLO\ LQ WKH MfÂ§ EORFN RI WKH L WHVW + WM E\ JN DQG WJr UHWDLQ WKH GHILQLWLRQ LQ (T SKLMN LV WKH UDQGRP YDULDEOH SORW FRQWDLQLQJ GLIIHUHQW JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ WKH FRUUHVSRQGLQJ WHUP LQ (T a 1,'DSKf ZKLMNP LV WKH UDQGRP YDULDEOH ZLWKLQSORW FRQWDLQLQJ GLIIHUHQW OHYHOV RI JHQRW\SLF DQG JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ WKH FRUUHVSRQGLQJ WHUP LQ (T PAGE 74 a 1,'L2RAf DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KH PDWUL[ QRWDWLRQ PRGHO LV GLPHQVLRQV EHORZ PRGHO FRPSRQHQWf \ S&M =JJ =T&M AWJAWJ r =S*S HAY UXO D[O UXW W[O UXE E[O D[J J[O UXWJ WJMFO D[S S[O UXO 7KH YDULDQFH RI DQ LQGLYLGXDO REVHUYDWLRQ LQ KDOIVLE GHVLJQV LV 9DU\LMND A R A DA WUA DQG 9DU\f fÂ§ =A=AFU =%=%ME =*=ÂFUJ A =3=3 LfSE ,QII}r )RU DQ REVHUYDWLRQDO YHFWRU EDVHG RQ SORW PHDQV WKH SORW DQG ZLWKLQSORW UDQGRP YDULDEOHV ZHUH FRPELQHG E\ WDNLQJ WKH DULWKPHWLF PHDQ DFURVV WKH REVHUYDWLRQV ZLWKLQ D SORW 7KH UHVXOWLQJ SORW PHDQV PRGHO KDV D QHZ US RU USK DS RU DSEf WHUP EHLQJ D FRPSRVLWH RI WKH SORW DQG ZLWKLQSORW YDULDQFH WHUPV RI WKH LQGLYLGXDO REVHUYDWLRQ PRGHO 7KUHH HVWLPDWHV RI UDWLRV DPRQJ YDULDQFH FRPSRQHQWV ZHUH GHWHUPLQHG f VLQJOH WUHH KHULWDELOLW\ DGMXVWHG IRU WHVW ORFDWLRQ DQG EORFN DV IL UJ *7SEHQRW\SLF ZKHUH IISKFQRO\SLF LV WKH HVWLPDWH RI WKH YDULDQFH RI DQ LQGLYLGXDO REVHUYDWLRQ IURP HTXDWLRQV DQG ZLWK WKH YDULDQFH FRPSRQHQWV IRU WHVW ORFDWLRQ DQG EORFN GHOHWHG f W\SH % FRUUHODWLRQ DV U% EJ UJ UWJf DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR DV GD UJ FUJ 'DWD *HQHUDWLRQ DQG 'HOHWLRQ 'DWD JHQHUDWLRQ ZDV DFFRPSOLVKHG E\ XVLQJ D &KROHVN\ XSSHUORZHU GHFRPSRVLWLRQ RI WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV *RRGQLJKW f DQG D YHFWRU RI SVHXGRUDQGRP VWDQGDUG QRUPDO GHYLDWHV JHQHUDWHG XVLQJ WKH %R[0XOOHU WUDQVIRUPDWLRQ ZLWK SVHXGRUDQGRP XQLIRUP GHYLDWHV .QXWK 3UHVV HW DO f 7KH XSSHUORZHU GHFRPSRVLWLRQ FUHDWHV D PDWUL[ 8f ZLWK WKH SURSHUW\ WKDW 9DU\f 8f8 7KH YHFWRU RI SVHXGRUDQGRP VWDQGDUG QRUPDO GHYLDWHV PAGE 75 ]f KDV D FRYDULDQFH PDWUL[ HTXDO WR DQ LGHQWLW\ PDWUL[ ,ZKHUH Q LV WKH QXPEHU RI REVHUYDWLRQV 7KH YHFWRU RI REVHUYDWLRQV LV FUHDWHG DV \ 8f] 7KHQ 9DU\f 8f9DU]ff8 DQG VLQFH 9DU]f ,f 9DU\f 878 8f8 $QDO\VHV RI VXUYLYDO SDWWHUQV XVLQJ GDWD IURP WKH &RRSHUDWLYH )RUHVW *HQHWLF 5HVHDUFK 3URJUDP &)*53f DW WKH 8QLYHUVLW\ RI )ORULGD ZHUH XVHG WR GHYHORS VXUYLYDO GLVWULEXWLRQV IRU WKH VLPXODWLRQ 7KH GDWD VHWV FKRVHQ IRU VXUYLYDO DQDO\VLV ZHUH IURP IXOOVLE VODVK SLQH 3LUQV HOOLRWWLL YDU HOOLRWWLL (QJHOPf WHVWV SODQWHG LQ UDQGRPL]HG FRPSOHWH EORFN GHVLJQV ZLWK WKH IDPLOLHV LQ URZ SORWV DQG ZHUH VHOHFWHG EHFDXVH WKH VXUYLYDO OHYHOV ZHUH HLWKHU DSSUR[LPDWHO\ b RU b 6XUYLYDO OHYHOV IRU PRVW FURVVHV IXOOVLE IDPLOLHVf FOXVWHUHG DURXQG WKH H[SHFWHG YDOXH LH DSSUR[LPDWHO\ b IRU DQ DYHUDJH VXUYLYDO OHYHO RI b KRZHYHU WKHUH ZHUH DOZD\V D IHZ FURVVHV WKDW KDG PXFK SRRUHU VXUYLYDO WKDQ DYHUDJH DQG DOVR D VPDOO QXPEHU RI FURVVHV WKDW KDG PXFK EHWWHU VXUYLYDO WKDQ DYHUDJH 7KLV VXUYLYDO SDWWHUQ ZDV FRQVLVWHQW DFURVV WKH H[SHULPHQWV DQDO\]HG 7KXV D ORZHU WKDQ DYHUDJH VXUYLYDO OHYHO ZDV DUELWUDULO\ DVVLJQHG WR FHUWDLQ FURVVHV D KLJKHU WKDQ DYHUDJH VXUYLYDO OHYHO ZDV DVVLJQHG WR FHUWDLQ FURVVHV DQG WKH DYHUDJH VXUYLYDO OHYHO DVVLJQHG WR PRVW FURVVHV 7KLV PRGHOLQJ RI VXUYLYDO SDWWHUQ ZDV DOVR H[WHQGHG WR WKH KDOI VLE PDWLQJ GHVLJQ $W b VXUYLYDO QR PLVVLQJ SORWV ZHUH DOORZHG DQG DW b VXUYLYDO PLVVLQJ SORWV RFFXUUHG DW UDQGRP )XOOVLE IDPLO\ GHOHWLRQ VLPXODWHG FURVVHV ZKLFK FRXOG QRW EH PDGH DQG ZHUH WKHUHIRUH PLVVLQJ IURP WKH H[SHULPHQW :KHQ GHOHWLQJ ILYH FURVVHV WKH GHOHWLRQ ZDV UHVWULFWHG WR D PD[LPXP RI IRXU FURVVHV SHU SDUHQW WR SUHYHQW ORVV RI DOO WKH FURVVHV LQ ZKLFK D VLQJOH SDUHQW DSSHDUHG VLQFH WKLV ZRXOG KDYH UHVXOWHG LQ FKDQJLQJ D VL[SDUHQW WR D ILYHSDUHQW KDOIGLDOOHO 7HVWV KDYLQJ RQO\ VXEVHWV RI WKH KDOIVLE IDPLOLHV LQ FRPPRQ DUH D IUHTXHQW RFFXUUHQFH LQ GDWD DQDO\VLV DW &)*53 7KLV SDUWLDO FRQQHFWHGQHVV ZDV VLPXODWHG E\ JHQHUDWLQJ GDWD LQ ZKLFK PAGE 76 RQO\ RI WKH IDPLOLHV SUHVHQW LQ D WHVW ZHUH FRPPRQ WR HLWKHU RQH RI WKH RWKHU WZR WHVWV FRPSULVLQJ D GDWD VHW 9DULDQFH &RPSRQHQW (VWLPDWLRQ 7HFKQLTXHV 7ZR DOJRULWKPV ZHUH XWLOL]HG IRU DOO HVWLPDWLRQ WHFKQLTXHV VHTXHQWLDOO\ DGMXVWHG VXPV RI VTXDUHV 0LOOLNHQ DQG -RKQVRQ S f IRU +0 DQG *LHVEUHFKWfV DOJRULWKP *LHVEUHFKW f IRU 5(0/ 0/ 0,148( DQG 0,948( *LHVEUHFKWfV DOJRULWKP LV SULPDULO\ D JUDGLHQW DOJRULWKP WKH PHWKRG RI VFRULQJf DQG DV VXFK DOORZV QHJDWLYH HVWLPDWHV +DUYLOOH *LHVEUHFKW f 1HJDWLYH HVWLPDWHV DUH QRW D WKHRUHWLFDO GLIILFXOW\ ZLWK 0,148( RU 0,948( KRZHYHU IRU 5(0/ DQG 0/ HVWLPDWHV VKRXOG EH FRQILQHG WR WKH SDUDPHWHU VSDFH )RU WKLV UHDVRQ HVWLPDWRUV UHIHUUHG WR DV 5(0/ DQG 0/ LQ WKLV FKDSWHU DUH QRW WUXO\ 5(0/ DQG 0/ ZKHQ QHJDWLYH HVWLPDWHV RFFXU IXUWKHU WKHUH LV WKH SRVVLELOLW\ WKDW WKH LWHUDWLYH VROXWLRQ VWRSSHG DW D ORFDO PD[LPD QRW WKH JOREDO PD[LPXP 7KHVH FRQFHUQV DUH FRPPRQSODFH LQ 5(0/ DQG 0/ HVWLPDWLRQ &RUEHLO DQG 6HDUOH +DUYLOOH 6ZDOORZ DQG 0RQDKDQ f KRZHYHU LJQRULQJ WKHVH WZR SRLQWV WKHVH HVWLPDWRUV DUH VWLOO UHIHUUHG WR DV 5(0/ DQG 0/ 7KH EDVLF HTXDWLRQ IRU YDULDQFH FRPSRQHQW HVWLPDWLRQ XQGHU QRUPDOLW\ *LHVEUHFKW f IRU 0,948( 0,148( DQG 5(0/ LV 04949Mf`r ^\f4A4\` U[U UMFO UMFO WKHQ A ^WU49L49Mf`n^\f49L4\` DQG IRU 0/ WU2A99n9f`r ^\f49L4\` U[U U[O U[O ZKHUH ^WU49c49Mf` LV D PDWUL[ ZKRVH HOHPHQWV DUH WU49L49Mf ZKHUH LQ WKH IXOOVLE GHVLJQV L WR DQG M O WR LH WKHUH LV D URZ DQG FROXPQ IRU HYHU\ UDQGRP YDULDEOH LQ WKH OLQHDU PRGHO PAGE 77 WU LV WKH WUDFH RSHUDWRU WKDW LV WKH VXP RI WKH GLDJRQDO HOHPHQWV RI D PDWUL[ 4 9n 9n;;f9n;fn;f9n IRU 9 DV WKH FRYDULDQFH PDWUL[ RI \ DQG ; DV WKH GHVLJQ PDWUL[ IRU IL[HG HIIHFWV 9 =W=? ZKHUH L WKH UDQGRP YDULDEOHV WHVW EORFN HWF D LV WKH YHFWRU RI YDULDQFH FRPSRQHQW HVWLPDWHV DQG U LV WKH QXPEHU RI UDQGRP YDULDEOHV LQ WKH PRGHO 7KH 0,148( HVWLPDWRU XVHG ZDV 0,148( LH RQHV DV SULRUV IRU DOO YDULDQFH FRPSRQHQWV FDOFXODWHG E\ DSSO\LQJ *LHVEUHFKWfV DOJRULWKP QRQLWHUDWLYHO\ 0,148( ZDV FKRVHQ EHFDXVH RI UHVXOWV GHPRQVWUDWLQJ 0,148(2 SULRU RI IRU WKH HUURU WHUP DQG RI IRU DOO RWKHUVf WR EH DQ LQIHULRU HVWLPDWLRQ WHFKQLTXH IRU PDQ\ FDVHV 6ZDOORZ DQG 0RQDKDQ 5& /LWWHOO XQSXEOLVKHG GDWDf :LWK QRUPDOO\GLVWULEXWHG XQFRUUHODWHG UDQGRP YDULDEOHV WKH XVH RI WKH WUXH YDOXHV RI WKH YDULDQFH FRPSRQHQWV DV SULRUV LQ D QRQLWHUDWLYH DSSOLFDWLRQ RI *LHVEUHFKWfV DOJRULWKP SURGXFHG WKH 0,948( VROXWLRQV HTXDWLRQ f 2EWDLQLQJ WUXH 0,948( HVWLPDWLRQ LV D OX[XU\ RI FRPSXWHU VLPXODWLRQ DQG ZRXOG QRW EH SRVVLEOH LQ SUDFWLFH VLQFH WKH WUXH YDULDQFH FRPSRQHQWV DUH UHTXLUHG 6ZDOORZ DQG 6HDUOH f 7KLV HVWLPDWRU ZDV LQFOXGHG WR SURYLGH D VWDQGDUG RI FRPSDULVRQ IRU RWKHU HVWLPDWRUV $Q DGGLWLRQDO 0,948(W\SH HVWLPDWRU UHIHUUHG WR DV 0,93(1 ZDV DOVR LQFOXGHG 0,93(1 ZDV DOVR D QRQLWHUDWLYH DSSOLFDWLRQ RI WKH DOJRULWKP ZLWK WKH WUXH YDULDQFH FRPSRQHQWV DV SULRUV KRZHYHU WKLV HVWLPDWRU ZDV FRQGLWLRQHG RQ WKH YDULDQFH FRPSRQHQW SDUDPHWHU VSDFH DQG GLG QRW DOORZ QHJDWLYH HVWLPDWHV 7KH QRQQHJDWLYH FRQGLWLRQLQJ RI 0,93(1 ZDV DFFRPSOLVKHG E\ DGGLQJ D SHQDOW\ DOJRULWKP WR 0,948( VXFK WKDW QR YDULDQFH FRPSRQHQW ZDV DOORZHG WR EH OHVV WKDQ O[O2 (VWLPDWHV IURP 0,93(1 ZHUH HTXDO WR 0,948( IRU GDWD VHWV IRU ZKLFK WKHUH ZHUH QR QHJDWLYH 0,948( YDULDQFH FRPSRQHQW HVWLPDWHV :KHQ QHJDWLYH 0,948( HVWLPDWHV RFFXU WKH WZR WHFKQLTXHV ZHUH QR ORQJHU HTXLYDOHQW 7KH SHQDOW\ PAGE 78 DOJRULWKP RSHUDWHG E\ XVLQJ $ D R DQG E\ FKRRVLQJ D VFDODU ZHLJKW Z VXFK WKDW QR HOHPHQW RI DA LV OHVV WKDQ O[O2n 7KHQ DA D Z$ ZKHUH $ LV WKH YHFWRU RI GHSDUWXUH IURP WKH WUXH YDOXHV Rf O[O2n LV DQ DUELWUDU\ FRQVWDQW DQG Â£A LV WKH YHFWRU RI HVWLPDWHG YDULDQFH FRPSRQHQWV FRQGLWLRQHG RQ QRQQHJDWLYLW\ 5(0/ HVWLPDWHV ZHUH IURP UHSHDWHG DSSOLFDWLRQ RI *LHVEUHFKWfV DOJRULWKP HTXDWLRQ f LQ ZKLFK WKH HVWLPDWHV IURP WKH Nr LWHUDWLRQ EHFRPH WKH SULRUV IRU WKH Nr LWHUDWLRQ 7KH LWHUDWLRQV ZHUH VWRSSHG ZKHQ WKH GLIIHUHQFH EHWZHHQ WKH HVWLPDWHV IURP WKH Nr DQG NOr LWHUDWLRQV PHW WKH FRQYHUJHQFH FULWHULRQ WKHQ WKH HVWLPDWHV RI WKH NOr LWHUDWLRQ EHFDPH WKH 5(0/ HVWLPDWHV 7KH FRQYHUJHQFH FULWHULRQ XWLOL]HG ZDV FWP ULNf O[O2 7KLV FULWHULRQ LPSRVHG FRQYHUJHQFH WR WKH IRXUWK GHFLPDO SODFH IRU DOO YDULDQFH FRPSRQHQWV 6LQFH IRU WKLV H[SHULPHQWDO ZRUNORDG LW ZDV GHVLUHG WKDW WKH VLPXODWLRQ UXQ ZLWK OLWWOH DQDO\VW LQWHUYHQWLRQ DQG LQ DV IHZ LWHUDWLRQV DV SRVVLEOH WKH UREXVWQHVV RI 5(0/ VROXWLRQV REWDLQHG IURP *LHVEUHFKWfV DOJRULWKP WR SULRUV RU VWDUWLQJ SRLQWVf ZDV H[SORUHG 7KH GLIIHUHQFH LQ VROXWLRQV VWDUWLQJ IURP WZR GLVWLQFW SRLQWV D YHFWRU RI RQHV DQG WKH WUXH YDOXHVf ZDV FRPSDUHG RYHU GDWD VHWV RI GLIIHUHQW VWUXFWXUHV LPEDODQFH WUXH YDULDQFH FRPSRQHQWV DQG ILHOG GHVLJQf 7KH UHVXOWV DJUHHLQJ ZLWK WKRVH RI 6ZDOORZ DQG 0RQDKDQ f LQGLFDWHG WKDW WKH GLIIHUHQFH EHWZHHQ WKH WZR VROXWLRQV ZDV HQWLUHO\ GHSHQGHQW RQ WKH VWULQJHQF\ RI WKH FRQYHUJHQFH FULWHULRQ DQG QRW RQ WKH VWDUWLQJ SRLQW SULRUVf $OVR WKH QXPEHU RI LWHUDWLRQV UHTXLUHG IRU FRQYHUJHQFH ZDV JUHDWO\ GHFUHDVHG E\ XVLQJ WKH WUXH YDOXHV DV SULRUV 7KXV DOO 5(0/ HVWLPDWHV ZHUH FDOFXODWHG VWDUWLQJ ZLWK WKH WUXH YDOXHV DV SULRUV 7KUHH DOWHUQDWLYHV IRU FRSLQJ ZLWK QHJDWLYH HVWLPDWHV DIWHU FRQYHUJHQFH ZHUH XVHG IRU 5(0/ VROXWLRQV DFFHSW DQG XVH WKH QHJDWLYH HVWLPDWHV 6KDZ f DUELWUDULO\ VHW QHJDWLYH HVWLPDWHV WR ]HUR DQG UHVROYH WKH V\VWHP VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR 0LOOHU f 7KH ILUVW WZR DOWHUQDWLYHV DUH VHOIH[SODQDWRU\ DQG WKH ODWWHU LV DFFRPSOLVKHG E\ UHDQDO\]LQJ WKRVH GDWD PAGE 79 VHWV LQ ZKLFK WKH LQLWLDO XQUHVWULFWHG 5(0/ HVWLPDWHV LQFOXGHG RQH RU PRUH QHJDWLYH HVWLPDWHV 'XULQJ UHDQDO\VLV LI D YDULDQFH FRPSRQHQW EHFDPH QHJDWLYH LW ZDV VHW WR ]HUR FRXOG QHYHU EH DQ\ YDOXH RWKHU WKDQ ]HURf DQG WKH LWHUDWLRQV FRQWLQXHG 7KLV SURFHGXUH SHUVLVWHG XQWLO WKH FRQYHUJHQFH FULWHULRQ ZDV PHW ZLWK D VROXWLRQ LQ ZKLFK DOO YDULDQFH FRPSRQHQWV ZHUH HLWKHU SRVLWLYH RU ]HUR +DUYLOOH f VXJJHVWHG VHYHUDO DGDSWDWLRQV RI +HQGHUVRQfV PL[HG PRGHO HTXDWLRQV +HQGHUVRQ HW DO f ZKLFK GR QRW DOORZ YDULDQFH FRPSRQHQW HVWLPDWHV WR EHFRPH QHJDWLYH KRZHYHU WKH HVWLPDWHV FDQ EHFRPH DUELWUDULO\ FORVH WR ]HUR $IWHU WULDO RI WKHVH WHFKQLTXHV YHUVXV WKH VHW WKH QHJDWLYH HVWLPDWHV WR ]HUR DIWHU FRQYHUJHQFH DQG UHVROYH WKH V\VWHP DSSURDFK FRPSDULVRQ RI UHVXOWV XVLQJ WKH VDPH GDWD VHWV LQGLFDWHV WKDW WKHUH LV OLWWOH SUDFWLFDO DGYDQWDJH DOWKRXJK PRUH GHVLUDEOH WKHRUHWLFDOO\f LQ XVLQJ WKH DSSURDFK VXJJHVWHG E\ +DUYLOOH 7KH GLIIHUHQFHV EHWZHHQ VHWV RI HVWLPDWHV REWDLQHG E\ WKH WZR PHWKRGV DUH H[WUHPHO\ PLQRU VROYLQJ WKH V\VWHP ZLWK D YDULDQFH FRPSRQHQW VHW WR ]HUR YHUVXV DUELWUDULO\ FORVH WR ]HURf 0/ VROXWLRQV DV LWHUDWLYH DSSOLFDWLRQV RI HTXDWLRQ ZHUH FDOFXODWHG IURP WKH VDPH VWDUWLQJ SRLQWV DQG ZLWK WKH VDPH FRQYHUJHQFH FULWHULRQ DV 5(0/ VROXWLRQV 7KH WKUHH QHJDWLYH YDULDQFH FRPSRQHQW DOWHUQDWLYHV H[SORUHG IRU 0/ ZHUH WR DFFHSW DQG XVH WKH QHJDWLYH HVWLPDWHV WR DUELWUDULO\ VHW QHJDWLYH HVWLPDWHV WR ]HUR DIWHU FRQYHUJLQJ WR D VROXWLRQ IRU WKH IRUPHU DQG IRU KDOIVLE GDWD RQO\f WR UHVROYH WKH V\VWHP VHWWLQJ QHJDWLYH YDULDQFH FRPSRQHQWV WR ]HUR 7KH DOJRULWKP WR FDOFXODWH VROXWLRQV IRU +0 VHTXHQWLDOO\ DGMXVWHG VXPV RI VTXDUHVf ZDV EDVHG RQ WKH XSSHU WULDQJXODU VZHHS *RRGQLJKW f DQG +DUWOH\fV PHWKRG RI V\QWKHVLV +DUWOH\ f 7KH HTXDWLRQ VROYHG ZDV (^06`R 06 ZKHUH 06 LV WKH YHFWRU RI PHDQ VTXDUHV DQG (^06` LV WKHLU H[SHFWDWLRQ 7KH DOWHUQDWLYH XVHG IRU QHJDWLYH HVWLPDWHV ZDV WR DFFHSW DQG XVH WKH QHJDWLYH HVWLPDWHV PAGE 80 &RPSDULVRQ $PRQJ (VWLPDWLRQ 7HFKQLTXHV )RU WKH VLPXODWLRQ 0,948( HVWLPDWHV ZHUH WKH EDVLV IRU DOO FRPSDULVRQV EHFDXVH 0,948( LV E\ GHILQLWLRQ WKH PLQLPXP YDULDQFH TXDGUDWLF XQELDVHG HVWLPDWRU 7KH UHVXOWV RI FRPSDULQJ WKH PHDQ RI 0,948( HVWLPDWHV IRU DQ H[SHULPHQWDO OHYHO WR WKH PHDQV IRU RWKHU WHFKQLTXHV ZHUH WHUPHG DSSDUHQW ELDV $SSDUHQW ELDV GHQRWHV WKDW GDWD VHWV ZHUH QRW VXIILFLHQW WR DFKLHYH FRPSOHWH FRQYHUJHQFH WR WKH WUXH YDOXHV RI WKH YDULDQFH FRPSRQHQWV 6DPSOLQJ YDULDQFHV RI HVWLPDWLRQ ZHUH FDOFXODWHG IURP WKH REVHUYDWLRQV ZLWKLQ DQ H[SHULPHQWDO OHYHO DQG HVWLPDWLRQ WHFKQLTXH IRU YDULDQFH FRPSRQHQWV DQG JHQHWLF UDWLRV VLQJOH WUHH KHULWDELOLW\ 7\SH % FRUUHODWLRQ DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLRf 0HDQ VTXDUH HUURU WKHQ HTXDOOHG YDULDQFH SOXV VTXDUHG DSSDUHQW ELDV :KLOH PHDQ VTXDUH HUURU ZDV LQYHVWLJDWHG WKHUH ZDV QHYHU VXIILFLHQW ELDV IRU PHDQ VTXDUH HUURU WR OHDG WR D GLIIHUHQW GHFLVLRQ FRQFHUQLQJ WHFKQLTXHV WKDQ VDPSOLQJ YDULDQFH RI WKH HVWLPDWHV VR PHDQ VTXDUH HUURU ZDV GHOHWHG IURP WKH UHPDLQGHU RI WKLV GLVFXVVLRQ 3UREDELOLW\ RI QHDUQHVV LV WKH SUREDELOLW\ WKDW DQ HVWLPDWH ZLOO OLH ZLWKLQ D FHUWDLQ LQWHUYDO DURXQG WKH WUXH SDUDPHWHU 7KH WKUHH WRWDO LQWHUYDO ZLGWKV XWLOL]HG ZHUH RQHKDOI HTXDO WR DQG WZLFH WKH SDUDPHWHU VL]H 7KH SHUFHQWDJH RI HVWLPDWHV IDOOLQJ ZLWKLQ WKHVH LQWHUYDOV ZHUH FDOFXODWHG IRU WKH GLIIHUHQW HVWLPDWLRQ WHFKQLTXHV ZLWKLQ DQ H[SHULPHQWDO OHYHO IRU YDULDQFH FRPSRQHQWV DQG UDWLRV DQG XWLOL]HG DV DQ HVWLPDWH RI SUREDELOLW\ RI QHDUQHVV 5HVXOWV DUH SUHVHQWHG E\ YDULDQFH FRPSRQHQW RU JHQHWLF UDWLR HVWLPDWHG DV D SHUFHQWDJH RI 0,948( H[FHSW LQ WKH FDVH RI SUREDELOLW\ RI QHDUQHVVf 0,948( HVWLPDWHV UHSUHVHQW b ZLWK HVWLPDWHV ZLWK JUHDWHU YDULDQFH KDYLQJ YDOXHV ODUJHU WKDQ b DQG DSSDUHQWO\ ELDVHG HVWLPDWHV KDYLQJ YDOXHV GLIIHUHQW IURP b 7KH SHUFHQWDJHV ZHUH FDOFXODWHG DV HTXDO WR WLPHV WKH HVWLPDWH GLYLGHG E\ WKH 0,948( YDOXH )RU WKH FULWHULRQ RI YDULDQFH WKH ORZHU WKH PAGE 81 SHUFHQWDJH WKH EHWWHU WKH HVWLPDWRU SHUIRUPHG IRU ELDV YDOXHV HTXDOOLQJ b ELDVf DUH SUHIHUUHG DQG IRU SUREDELOLW\ RI QHDUQHVV ODUJHU SHUFHQWDJHV SUREDELOLWLHVf DUH IDYRUHG VLQFH WKH\ DUH LQGLFDWLYH RI JUHDWHU GHQVLW\ RI HVWLPDWHV QHDU WKH SDUDPHWULF YDOXH 5HVXOWV DQG 'LVFXVVLRQ 9DULDQFH &RPSRQHQWV 6DPSOLQJ YDULDQFH RI WKH HVWLPDWRUV )RU DOO YDULDQFH FRPSRQHQWV HVWLPDWHG 5(0/ DQG 0/ HVWLPDWLRQ WHFKQLTXHV ZHUH FRQVLVWHQWO\ HTXDO WR RU OHVV WKDQ 0,948( IRU VDPSOLQJ YDULDQFH RI WKH HVWLPDWRU 7DEOH f 7KH YDULDQFH DPRQJ HVWLPDWHV IURP WKHVH WHFKQLTXHV ZDV IXUWKHU UHGXFHG E\ VHWWLQJ WKH QHJDWLYH FRPSRQHQWV WR ]HUR 02'0/ DQG 02'5(0/f RU VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR SOXV UHn VROYLQJ WKH V\VWHP 115(0/ 110/ DQG 3115(0/f 9DULDQFH DPRQJ 0,148( HVWLPDWHV LV DOZD\V HTXDO WR RU JUHDWHU WKDQ IRU 0,948( DV RQH PLJKW H[SHFW VLQFH WKH\ DUH LQ WKLV DSSOLFDWLRQ WKH VDPH WHFKQLTXH ZLWK 0,948( KDYLQJ SHUIHFW SULRUV WKH WUXH YDOXHVf 9DULDQFHV IRU +0 HVWLPDWRUV 7<3( DQG 37<3(f DUH HLWKHU HTXDO WR RU JUHDWHU WKDQ 0,948( +0 HVWLPDWHV KDYH SURJUHVVLYHO\ ODUJHU UHODWLYH YDULDQFH ZLWK KLJKHU OHYHOV RI LPEDODQFH 0,93(1 DOWKRXJK LPSUDFWLFDO EHFDXVH RI WKH QHHG IRU WKH WUXH SULRUV KDG PXFK PRUH SUHFLVH HVWLPDWHV RI YDULDQFH FRPSRQHQWV WKDQ RWKHU WHFKQLTXHV LOOXVWUDWLQJ ZKDW FRXOG EH DFFRPSOLVKHG JLYHQ WKH WUXH YDOXHV DV SULRUV SOXV PDLQWDLQLQJ HVWLPDWHV ZLWKLQ WKH SDUDPHWHU VSDFH ,Q JHQHUDO WKH VSUHDG DPRQJ WKH SHUFHQWDJHV IRU YDULDQFH RI HVWLPDWLRQ IRU WKH HVWLPDWLRQ WHFKQLTXHV LV KLJKO\ GHSHQGHQW RQ WKH GHJUHH RI LPEDODQFH DQG WKH W\SH RI PDWLQJ V\VWHP :LWK LQFUHDVLQJ LPEDODQFH WKH OLNHOLKRRGEDVHG HVWLPDWRUV UHDOL]HG JUHDWHU DGYDQWDJH IRU VDPSOLQJ YDULDQFH RI WKH HVWLPDWHV RYHU +0 IRU ERWK PDWLQJ V\VWHPV 7KH PRVW DGYDQWDJHRXV DSSOLFDWLRQ PAGE 82 7DEOH 6DPSOLQJ YDULDQFH IRU WKH HVWLPDWHV RI UJ XSSHU QXPEHUf DA VHFRQG QXPEHUf DQG K WKLUG QXPEHU ZKHUH FDOFXODWHGf DV D SHUFHQWDJH RI WKH 0,948( HVWLPDWH E\ W\SH RI HVWLPDWRU DQG WUHDWPHQW FRPELQDWLRQ 1$ LV QRW DSSOLHG 9DOXHV JUHDWHU WKDQ LQGLFDWH ODUJHU YDULDQFH DPRQJ HVWLPDWHV (VWLPDWRU '6 '6 '6 +6 +6 5(0/ 0/ 0,148( 115(0/ 110/ 1$ 1$ 1$ 02'0/ 02'5(0/ 7<3( 35(0/ 30/ 30,148( 3115(0/ 37<3( 0,93(1 1$ 30,948( PAGE 83 RI OLNHOLKRRGEDVHG HVWLPDWRUV LV LQ WKH +6 FDVH ZKHUH WKH LPEDODQFH LV QRW RQO\ UDQGRP GHOHWLRQV RI LQGLYLGXDOV EXW DOVR LQFRPSOHWH FRQQHFWHGQHVV DFURVV ORFDWLRQV LH WKH VDPH IDPLOLHV DUH QRW SUHVHQW LQ HDFK WHVW DNLQ WR LQFRPSOHWH EORFNV ZLWKLQ D WHVWf $Q DQDO\VLV RI YDULDQFH ZDV FRQGXFWHG WR GHWHUPLQH WKH LPSRUWDQFH RI WKH WUHDWPHQW RI QHJDWLYH YDULDQFH FRPSRQHQW HVWLPDWHV LQ WKH YDULDQFH RI HVWLPDWLRQ IRU 5(0/ DQG 0/ HVWLPDWHV 7KH PRGHO RI VDPSOLQJ YDULDQFH RI WKH HVWLPDWHV DV D UHVXOW RI PDWLQJ GHVLJQ LPEDODQFH OHYHO WUHDWPHQW RI QHJDWLYH HVWLPDWHV DQG VL]H RI WKH YDULDQFH FRPSRQHQW GHPRQVWUDWHG FRQVLVWHQWO\ IRU DOO YDULDQFH FRPSRQHQWV H[FHSW HUURUf WKDW WUHDWPHQW RI QHJDWLYH HVWLPDWHV LV DQ LPSRUWDQW FRPSRQHQW RI WKH YDULDQFH RI WKH HVWLPDWHV S f 7KH PRGHO DFFRXQWHG IRU XS WR b RI WKH YDULDWLRQ LQ WKH YDULDQFH RI WKH YDULDQFH FRPSRQHQW HVWLPDWHV ZLWK f DFFHSWLQJ DQG XVLQJ QHJDWLYH HVWLPDWHV SURGXFLQJ WKH KLJKHVW YDULDQFH f VHWWLQJ WKH QHJDWLYH FRPSRQHQWV WR ]HUR EHLQJ LQWHUPHGLDWH DQG f UHVROYLQJ WKH V\VWHP ZLWK QHJDWLYH HVWLPDWHV VHW WR ]HUR SURYLGLQJ WKH ORZHVW YDULDQFH )RU DOO HVWLPDWLRQ WHFKQLTXHV ORZHU YDULDQFH DPRQJ HVWLPDWHV ZDV REWDLQHG E\ XVLQJ LQGLYLGXDO REVHUYDWLRQV DV FRPSDUHG WR SORW PHDQV 7KH DGYDQWDJH RI LQGLYLGXDO RYHU SORWPHDQ REVHUYDWLRQV LQFUHDVHG ZLWK LQFUHDVLQJ LPEDODQFH %LDV 7KH PRVW FRQVLVWHQW SHUIRUPDQFH IRU ELDV 7DEOH f DFURVV DOO YDULDQFH FRPSRQHQWV ZDV 7<3( NQRZQ IURP LQKHUHQW SURSHUWLHV WR EH XQELDVHG 7KH FRQVLVWHQW FRQYHUJHQFH RI WKH 7<3( YDOXH WR WKH 0,948( YDOXH LQGLFDWHG WKDW WKH QXPEHU RI GDWD VHWV XVHG SHU WHFKQLTXH DQG H[SHULPHQWDO OHYHOf ZDV VXLWDEOH IRU WKH SXUSRVH RI H[DPLQLQJ ELDV 7KH RWKHU WZR FRQVLVWHQW SHUIRUPHUV ZHUH 5(0/ DQG 0,148( 37<3( +0 EDVHG RQ SORW PHDQVf ZDV XQELDVHG ZKHQ QR SORW PHDQV ZHUH PLVVLQJ EXW SURGXFHG DSSDUHQWO\ ELDVHG HVWLPDWHV ZKHQ SORW PHDQV ZHUH PLVVLQJ PAGE 84 7DEOH %LDV IRU WKH HVWLPDWHV RI DJ XSSHU QXPEHUf DA VHFRQG QXPEHUf DQG K WKLUG QXPEHU ZKHUH FDOFXODWHGf DV D SHUFHQWDJH RI WKH 0,948( HVWLPDWH E\ W\SH RI HVWLPDWRU DQG H[SHULPHQWDO FRPELQDWLRQ 1$ LV QRW DSSOLHG 9DOXHV GLIIHUHQW IURP GHQRWH DSSDUHQW ELDV (VWLPDWRU '6 '6 '6 +6 +6 5(0/ 0/ 0,148( 115(0/ 110/ 1$ 1$ 1$ 02'0/ 02'5(0/ 7<3( 35(0/ 30/ 30,148( 3115(0/ 37<3( 0,93(1 1$ 30,948( PAGE 85 7DEOH 3UREDELOLW\ RI QHDUQHVV IRU FUJ XSSHU QXPEHUf D? VHFRQG QXPEHUf DQG K WKLUG QXPEHU ZKHUH FDOFXODWHGf 7KH SUREDELOLW\ LQWHUYDO LV HTXDO WR WKH PDJQLWXGH RI WKH SDUDPHWHU (VWLPDWRU '6 '6 '6 +6 +6 5(0/ 0/ 0,148( 115(0/ 110/ 1$ 1$ 1$ 7<3( 35(0/ 30/ 30,148( 3115(0/ 37<3( 0,948( 0,93(1 1$ 30,948( PAGE 86 $PRQJ HVWLPDWRUV ZKLFK GLVSOD\HG ELDV PD[LPXP OLNHOLKRRG HVWLPDWRUV 0/ DQG 30/f ZHUH NQRZQ WR EH LQKHUHQWO\ ELDVHG +DUYLOOH 6HDUOH f ZLWK WKH DPRXQW RI ELDV SURSRUWLRQDO WR WKH QXPEHU RI GHJUHHV RI IUHHGRP IRU D IDFWRU YHUVXV WKH QXPEHU RI OHYHOV IRU WKH IDFWRU 2WKHU ELDVHV UHVXOWHG IURP WKH PHWKRG RI GHDOLQJ ZLWK QHJDWLYH HVWLPDWHV /LYLQJ ZLWK QHJDWLYH HVWLPDWHV SURGXFHG WKH HVWLPDWRUV ZLWK WKH OHDVW ELDV 6HWWLQJ QHJDWLYH YDULDQFH FRPSRQHQWV WR ]HUR UHVXOWHG LQ WKH JUHDWHVW ELDV ,QWHUPHGLDWH LQ ELDV ZHUH WKH HVWLPDWHV UHVXOWLQJ IURP UHVROYLQJ WKH V\VWHP ZLWK QHJDWLYH FRPSRQHQWV VHW WR ]HUR 3UREDELOLW\ RI QHDUQHVV 5HVXOWV IRU SUREDELOLW\ RI QHDUQHVV SURYHG WR EH ODUJHO\ QRQGLVFULPLQDWRU\ DPRQJ WHFKQLTXHV 7DEOH f 7KH ORZ OHYHOV RI SUREDELOLW\ GHQVLW\ QHDU WKH SDUDPHWULF YDOXHV DUH LQGLFDWLYH RI WKH QDWXUH RI WKH YDULDQFH FRPSRQHQW HVWLPDWLRQ SUREOHP )LJXUH LOOXVWUDWHV WKH GLVWULEXWLRQ RI 0,948( YDULDQFH FRPSRQHQW HVWLPDWHV IRU K ODf DQG UJ OEf IRU OHYHO '6 7KH GLVWULEXWLRQV IRU DOO XQFRQVWUDLQHG YDULDQFH FRPSRQHQW HVWLPDWHV KDYH WKH DSSHDUDQFH RI D FKLVTXDUH GLVWULEXWLRQ SRVLWLYHO\ VNHZHG ZLWK WKH H[SHFWHG YDOXH PHDQf RFFXUULQJ WR WKH ULJKW RI WKH SHDN SUREDELOLW\ GHQVLW\ DQG D SURSRUWLRQ RI WKH HVWLPDWHV RFFXUULQJ EHORZ ]HUR H[FHSW HUURUf :LWK LQFUHDVLQJ LPEDODQFH WKH YDULDQFH DPRQJ HVWLPDWHV LQFUHDVHV DQG WKH SUREDELOLW\ RI QHDUQHVV GHFUHDVHV IRU DOO LQWHUYDO ZLGWKV 5DWLRV RI 9DULDQFH &RPSRQHQWV 6LQJOH WUHH KHULWDELOLWY 5HVXOWV IRU HVWLPDWHV RI VLQJOH WUHH KHULWDELOLW\ DGMXVWHG IRU ORFDWLRQV DQG EORFNV DUH VKRZQ LQ 7DEOHV DQG WKLUG QXPEHU IURP WKH WRS LQ HDFK FHOO LI FDOFXODWHGf )RU WKHVH UHODWLYHO\ ORZ KHULWDELOLWLHV DQG f WKH ELDV DQG YDULDQFH SURSHUWLHV RI WKH HVWLPDWHG UDWLR DUH VLPLODU WR WKRVH IRU XJ HVWLPDWHV )LJXUH f 7KLV LPSOLHV WKDW NQRZLQJ WKH SURSHUWLHV RI WKH QXPHUDWRU PAGE 87 3 ( 5 LR & ( 1 7 0,948( (67,0$7(6 '$7$ 6(76 OD K OE DW / f / )LJXUH 'LVWULEXWLRQ RI 0,948( HVWLPDWHV RI K ODf DQG UJ OEf IRU H[SHULPHQWDO OHYHO '6 LOOXVWUDWLQJ WKH SRVLWLYH VNHZ DQG VLPLODULW\ RI WKH GLVWULEXWLRQV 7KH WUXH YDOXHV DUH IRU K DQG IRU DIL 7KH LQWHUYDO ZLGWK RI WKH EDUV LV RQHKDOI WKH SDUDPHWULF YDOXH PAGE 88 RI KHULWDELOLW\ UHYHDOV WKH SURSHUWLHV RI WKH UDWLR HVSHFLDOO\ WUXH RI UDWLRV ZLWK H[SHFWHG YDOXHV RI DQG .HQGDOO DQG 6WXDUW &K f 9DULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV ZKLFK SHUIRUPHG ZHOO IRU ELDV DQGRU YDULDQFH DPRQJ HVWLPDWHV IRU WUJ DOVR SHUIRUPHG ZHOO IRU K 7\SH % FRUUHODWLRQ DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR 7\SH % FRUUHODWLRQ 7DEOH DQG DV DA DQG GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR QRW VKRZQf HVWLPDWHV ERWK SURYHG WR EH WRR XQVWDEOH H[WUHPHO\ ODUJH YDULDQFH DPRQJ HVWLPDWHVf LQ WKHLU RULJLQDO IRUPXODWLRQV WR EH XVHIXO LQ GLVFULPLQDWLRQ DPRQJ YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV 7KLV KLJK YDULDQFH LV GXH WR WKH HVWLPDWHV RI WKH GHQRPLQDWRUV RI WKHVH UDWLRV DSSURDFKLQJ ]HUR DQG WR WKH KLJK YDULDQFH RI WKH GHQRPLQDWRU RI UDWLRV 7DEOH f 7KHVH UDWLRV ZHUH UHIRUPXODWHG ZLWK QXPHUDWRUV RI LQWHUHVW AA IRU DGGLWLYH JHQHWLF E\ WHVW LQWHUDFWLRQ DQG IIV IRU GRPLQDQFH YDULDQFH UHVSHFWLYHO\f DQG D GHQRPLQDWRU HTXDO WR WKH HVWLPDWH RI WKH SKHQRW\SLF YDULDQFH :LWK WKLV UHIRUPXODWLRQ WKH YDULDQFH DQG ELDV SURSHUWLHV RI HVWLPDWHV RI WKH DOWHUHG UDWLRV LV DSSUR[LPDWHG E\ WKH SURSHUWLHV RI HVWLPDWHV RI WKH QXPHUDWRUV )RU LQFUHDVLQJ LPEDODQFH PD[LPXPOLNHOLKRRGEDVHG HVWLPDWLRQ RIIHUV DQ LQFUHDVLQJ DGYDQWDJH RYHU +0 DQG IRU DOO WHFKQLTXHV LQGLYLGXDO REVHUYDWLRQV RIIHU LQFUHDVLQJ DGYDQWDJH RYHU SORWPHDQ REVHUYDWLRQV IRU YDULDQFH RI WKH HVWLPDWHV RI WKHVH UDWLRV %LDV RWKHU WKDQ LQKHUHQWO\ ELDVHG PHWKRGV 0/f LV DVVRFLDWHG ZLWK WKH SUREDELOLW\ RI QHJDWLYH HVWLPDWHV ZKLFK LV LQFUHDVHG E\ LQFUHDVLQJ LPEDODQFH 7KLV DVVHUWLRQ LV VXSSRUWHG E\ FRPSDULQJ WKH ELDVHV RI 5(0/ 115(0/ DQG 02'5(0/ HVWLPDWHV DFURVV LPEDODQFH OHYHOV PAGE 89 *HQHUDO 'LVFXVVLRQ 2EVHUYDWLRQDO 8QLW 6RPH JHQHUDO FRQFOXVLRQV UHJDUGLQJ WKH FKRLFH RI D YDULDQFH FRPSRQHQW HVWLPDWLRQ PHWKRGRORJ\ FDQ EH GUDZQ IURP WKH UHVXOWV RI WKLV LQYHVWLJDWLRQ )RU DQ\ GHJUHH RI LPEDODQFH WKH XVH RI LQGLYLGXDO REVHUYDWLRQV LV VXSHULRU WR WKH XVH RI SORW PHDQV IRU HVWLPDWLRQ RI YDULDQFH FRPSRQHQW RU UDWLRV RI YDULDQFH FRPSRQHQWV ,I WKH GDWD DUH QHDUO\ EDODQFHG FORVH WR b VXUYLYDO ZLWK QR PLVVLQJ SORWV FURVVHV IXOOVLEf RU ODFN RI FRQQHFWHGQHVV KDOIVLEff WKH SURSHUWLHV RI WKH HVWLPDWLRQ WHFKQLTXHV EDVHG RQ LQGLYLGXDO DQG SORWPHDQ REVHUYDWLRQV EHFRPH VLPLODU VR LI GHSDUWXUH IURP EDODQFH LV QRPLQDO SORW PHDQV FDQ EH XVHG HIIHFWLYHO\ +RZHYHU XVLQJ LQGLYLGXDO REVHUYDWLRQV REYLDWHV WKH QHHG IRU D VXUYH\ RI LPEDODQFH LQ WKH GDWD VLQFH LQGLYLGXDO REVHUYDWLRQV SURGXFH EHWWHU UHVXOWV WKDQ SORW PHDQV IRU DQ\ RI WKH HVWLPDWLRQ WHFKQLTXHV H[DPLQHG 1HJDWLYH (VWLPDWHV 'UDZLQJ RQ WKH UHVXOWV RI WKLV LQYHVWLJDWLRQ WKH GLVFXVVLRQ RI SUDFWLFDO VROXWLRQV IRU WKH QHJDWLYH HVWLPDWHV SUREOHP ZLOO UHYROYH DURXQG WZR VROXWLRQV f DFFHSW DQG XVH WKH QHJDWLYH HVWLPDWHV DQG f UHVROYLQJ WKH V\VWHP ZLWK QHJDWLYH HVWLPDWHV VHW WR ]HUR *LYHQ WKDW WKH SURSHUW\ RI LQWHUHVW LV WKH WUXH YDOXH RI D YDULDQFH FRPSRQHQW RU JHQHWLF UDWLR RIWHQ HVWLPDWHG DV D PHDQ DFURVV GDWD VHWV WKHQ QHJDWLYLW\ FRQVWUDLQWV FRPH LQWR SOD\ LI WKH FRPSRQHQW RI LQWHUHVW LV VPDOO LQ FRPSDULVRQ WR RWKHU XQGHUO\LQJ YDULDQFH FRPSRQHQWV LQ WKH GDWD RU WKH YDULDQFH RI HVWLPDWHV LV KLJK GXH WR DQ LQDGHTXDWH H[SHULPHQWDO GHVLJQ IRU YDULDQFH FRPSRQHQW HVWLPDWLRQ 7KHVH IDFWRUV OHDG WR DQ LQFUHDVHG QXPEHU RI QHJDWLYH HVWLPDWHV ,I WKH GDWD VWUXFWXUH LV VXFK WKDW QHJDWLYH HVWLPDWHV ZRXOG RFFXU IUHTXHQWO\ WKHQ DFFHSWLQJ QHJDWLYH HVWLPDWHV LV D JRRG DOWHUQDWLYH PAGE 90 ,I QHJDWLYH HVWLPDWHV WHQG WR RFFXU LQIUHTXHQWO\ RU ELDV LV RI OHVV FRQFHUQ WKDQ YDULDQFH DPRQJ HVWLPDWHV WKHQ UHVROYLQJ WKH V\VWHP DIWHU FRQYHUJHQFH \LHOGV QHJDWLYH HVWLPDWHV LV WKH SUHIHUDEOH VROXWLRQ 7KLV WDFWLF UHGXFHV ERWK ELDV DQG YDULDQFH DPRQJ HVWLPDWHV EHORZ WKDW RI DUELWUDULO\ VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR (VWLPDWLRQ 7HFKQLTXH 7KH SULPDU\ FRPSHWLWRUV DPRQJ HVWLPDWLRQ WHFKQLTXHV WKDW DUH SUDFWLFDOO\ DFKLHYDEOH DUH 5(0/ DQG 7<3( +0f %RWK WHFKQLTXHV SURGXFH HVWLPDWHV ZLWK OLWWOH RU QR ELDV KRZHYHU 5(0/ HVWLPDWHV IRU WKH PRVW SDUW KDYH VOLJKWO\ OHVV VDPSOLQJ YDULDQFH WKDQ 7<3( HVWLPDWHV ,I RQO\ VXEVHWV RI WKH SDUHQWV DUH LQ FRPPRQ DFURVV WHVWV DV LQ WKH FDVH +6 5(0/ KDV D GLVWLQFW DGYDQWDJH LQ YDULDQFH DPRQJ HVWLPDWHV RYHU 7<3( 5(0/ GRHV KDYH WKUHH DGGLWLRQDO DGYDQWDJHV RYHU 7<3( ZKLFK DUH f 5(0/ RIIHUV JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWLRQ RI IL[HG HIIHFWV ZKLOH 7<3( RIIHUV RUGLQDU\ OHDVW VTXDUHV HVWLPDWLRQ f %HVW /LQHDU 8QELDVHG 3UHGLFWLRQV %/83f RI UDQGRP YDULDEOHV DUH LQKHUHQW LQ 5(0/ VROXWLRQV LH JFD SUHGLFWLRQV DUH DYDLODEOH DQG WKXV LQ VROYLQJ IRU WKH YDULDQFH FRPSRQHQWV ZLWK 5(0/ IL[HG HIIHFWV DUH HVWLPDWHG DQG UDQGRP YDULDEOHV DUH SUHGLFWHG VLPXOWDQHRXVO\ +DUYLOOH f DQG f 5(0/ RIIHUV JUHDWHU IOH[LELOLW\ LQ WKH PRGHO VSHFLILFDWLRQ ERWK LQ XQLYDULDWH DQG PXOWLYDULDWH IRUPV DV ZHOO DV KHWHURJHQHRXV RU FRUUHODWHG HUURU WHUPV )XUWKHU DOWKRXJK WKH OLNHOLKRRG HTXDWLRQV IRU FRPPRQ 5(0/ DSSOLFDWLRQV DUH EDVHG RQ QRUPDOLW\ WKH WHFKQLTXH KDV EHHQ VKRZQ WR EH UREXVW DJDLQVW WKH XQGHUO\LQJ GLVWULEXWLRQ :HVWIDOO %DQNV HW DO f PAGE 91 5HFRPPHQGDWLRQ ,I RQH ZHUH WR FKRRVH D VLQJOH YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXH IURP DPRQJ WKRVH WHVWHG ZKLFK FRXOG EH DSSOLHG WR DQ\ GDWD VHW ZLWK FRQILGHQFH WKDW WKH HVWLPDWHV KDG GHVLUDEOH SURSHUWLHV YDULDQFH 06( DQG ELDVf WKDW WHFKQLTXH ZRXOG EH 5(0/ DQG WKH EDVLF XQLW RI REVHUYDWLRQ ZRXOG EH WKH LQGLYLGXDO 7KLV FRPELQDWLRQ 5(0/ SOXV LQGLYLGXDO REVHUYDWLRQVf SHUIRUPHG ZHOO DFURVV PDWLQJ GHVLJQ DQG W\SHV DQG OHYHOV RI LPEDODQFH 7UHDWPHQW RI QHJDWLYH HVWLPDWHV ZRXOG EH GHWHUPLQHG E\ WKH SURSRVHG XVH RI WKH HVWLPDWHV WKDW LV ZKHWKHU XQELDVHGQHVV DFFHSWLQJ DQG XVLQJ WKH QHJDWLYH HVWLPDWHVf LV PRUH LPSRUWDQW WKDQ VDPSOLQJ YDULDQFH UHVROYH WKH V\VWHP VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HURf $ SULPDU\ GLVDGYDQWDJH RI 5(0/ DQG LQGLYLGXDO REVHUYDWLRQV LV WKDW WKH\ DUH ERWK FRPSXWDWLRQDOO\ H[SHQVLYH FRPSXWHU PHPRU\ DQG WLPHf +0 HVWLPDWLRQ FRXOG UHSODFH 5(0/ RQ PDQ\ GDWD VHWV DQG SORW PHDQV FRXOG UHSODFH LQGLYLGXDO REVHUYDWLRQV RQ VRPH GDWD VHWV EXW JHQHUDO DSSOLFDWLRQ RI WKHVH ZLWKRXW UHJDUG WR WKH GDWD DW KDQG GRHV UHVXOW LQ D ORVV LQ GHVLUDEOH SURSHUWLHV RI WKH HVWLPDWHV LQ PDQ\ LQVWDQFHV 7KH FRPSXWDWLRQDO H[SHQVH RI 5(0/ DQG LQGLYLGXDO REVHUYDWLRQV HQVXUHV WKDW HVWLPDWHV KDYH GHVLUDEOH SURSHUWLHV IRU D EURDG VFRSH RI DSSOLFDWLRQV :LWK WKH DGYHQW RI ELJJHU DQG IDVWHU FRPSXWHUV DQG WKH HYROXWLRQ RI EHWWHU 5(0/ DOJRULWKPV ZKDW ZDV QRW IHDVLEOH LQ WKH SDVW RQ PRVW PDLQIUDPH FRPSXWHUV FDQ QRZ EH DFFRPSOLVKHG RQ SHUVRQDO FRPSXWHUV PAGE 92 &+$37(5 *$5(0/ $ &20387(5 $/*25,7+0 )25 (67,0$7,1* 9$5,$1&( &20321(176 $1' 35(',&7,1* *(1(7,& 9$/8(6 ,QWURGXFWLRQ 7KH FRPSXWHU SURJUDP GHVFULEHG LQ WKLV FKDSWHU FDOOHG *$5(0/ IRU *LHVEUHFKWfV DOJRULWKP RI UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWLRQ 5(0/f LV XVHIXO IRU ERWK HVWLPDWLQJ YDULDQFH FRPSRQHQWV DQG SUHGLFWLQJ JHQHWLF YDOXHV *$5(0/ DSSOLHV WKH PHWKRGRORJ\ RI *LHVEUHFKW f WR WKH SUREOHPV RI 5(0/ HVWLPDWLRQ 3DWWHUVRQ DQG 7KRPSVRQ f DQG EHVW OLQHDU XQELDVHG SUHGLFWLRQ %/83 +HQGHUVRQ f IRU XQLYDULDWH VLQJOH WUDLWf JHQHWLFV PRGHOV *$5(0/ FDQ EH DSSOLHG WR KDOIVLE RSHQSROOLQDWHG RU SRO\PL[f DQG IXOOVLE SDUWLDO GLDOOHOV IDFWRULDOV KDOIGLDOOHOV >QR VHLIV@ RU GLVFRQQHFWHG VHWV RI KDOIGLDOOHOVf PDWLQJ GHVLJQV ZKHQ SODQWHG LQ VLQJOH RU PXOWLSOH ORFDWLRQV ZLWK VLQJOH RU PXOWLSOH UHSOLFDWLRQV SHU ORFDWLRQ :KHQ XVHG IRU YDULDQFH FRPSRQHQW HVWLPDWLRQ WKLV SURJUDP KDV EHHQ VKRZQ WR SURYLGH HVWLPDWHV ZLWK GHVLUDEOH SURSHUWLHV DFURVV W\SHV RI LPEDODQFH FRPPRQO\ HQFRXQWHUHG LQ IRUHVW JHQHWLFV ILHOG WHVWV +XEHU HW DO LQ SUHVVf DQG ZLWK YDU\LQJ XQGHUO\LQJ GLVWULEXWLRQV %DQNV HW DO :HVWIDOO f *$5(0/ LV DOVR XVHIXO IRU GHWHUPLQLQJ HIILFLHQFLHV RI DOWHUQDWLYH ILHOG DQG PDWLQJ GHVLJQV IRU WKH HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV 8WLOL]LQJ WKH SRZHU RI PL[HGPRGHO PHWKRGRORJ\ +HQGHUVRQ f *$5(0/ SURYLGHV %/83 RI SDUHQWDO JHQHUDO JFDf DQG VSHFLILF FRPELQLQJ DELOLWLHV VHDf DV ZHOO DV JHQHUDOL]HG OHDVW VTXDUHV */6f VROXWLRQV IRU IL[HG HIIHFWV 7KH DSSOLFDWLRQ RI %/83 WR IRUHVW JHQHWLFV SUREOHPV KDV EHHQ DGGUHVVHG E\ :KLWH DQG +RGJH f :LWK FHUWDLQ DVVXPSWLRQV WKH GHVLUDEOH PAGE 93 SURSHUWLHV RI %/83 SUHGLFWLRQV LQFOXGH PD[LPL]LQJ WKH SUREDELOLW\ RI REWDLQLQJ FRUUHFW SDUHQWDO UDQNLQJV IURP WKH GDWD DQG PLQLPL]LQJ WKH HUURU DVVRFLDWHG ZLWK XVLQJ WKH SDUHQWDO YDOXHV REWDLQHG LQ IXWXUH DSSOLFDWLRQV */6 IL[HG HIIHFW HVWLPDWLRQ ZHLJKWV WKH REVHUYDWLRQV FRPSULVLQJ WKH HVWLPDWHV E\ WKHLU DVVRFLDWHG YDULDQFHV DSSUR[LPDWLQJ EHVW OLQHDU XQELDVHG HVWLPDWLRQ %/8(f IRU IL[HG HIIHFWV 6HDUOH S f 7KH SXUSRVH RI WKLV FKDSWHU LV WR GHVFULEH WKH WKHRU\ DQG XVH RI *$5(0/ LQ HQRXJK GHWDLO WR IDFLOLWDWH XVH E\ RWKHU LQYHVWLJDWRUV 7KH SURJUDP LV ZULWWHQ LQ )2575$1 DQG LV QRW GHSHQGHQW RQ RWKHU DQDO\VLV SURJUDPV $Q LQWHUDFWLYH YHUVLRQ RI WKLV SURJUDP FDQ EH REWDLQHG DV D VWDQGDORQH H[HFXWDEOH ILOH IURP WKH VHQLRU DXWKRU WKLV ILOH ZLOO UXQ RQ DQ\ ,%0 FRPSDWLEOH 3& XQGHU '26 RU :,1'2:6 RSHUDWLQJ V\VWHPV 7KH VL]H RI WKH SUREOHP DQ LQYHVWLJDWRU FDQ VROYH ZLOO EH GHSHQGHQW RQ WKH DPRXQW RI H[WHQGHG PHPRU\ DQG KDUG GLVN VSDFH IRU VZDS ILOHVf DYDLODEOH IRU SURJUDP XVH ,Q DGGLWLRQ WKH )2575$1 VRXUFH FRGH FDQ EH REWDLQHG IRU DQDO\VWV ZLVKLQJ WR FRPSLOH WKH SURJUDP IRU XVH RQ DOWHUQDWH V\VWHPV HJ PDLQIUDPH FRPSXWHUVf $OJRULWKP *$5(0/ SURFHHGV E\ UHDGLQJ WKH GDWD DQG IRUPLQJ D GHVLJQ PDWUL[ EDVHG RQ WKH QXPEHU RI OHYHOV RI IDFWRUV LQ WKH PRGHO $Q\ SRUWLRQV RI WKH GHVLJQ PDWUL[ IRU QHVWHG IDFWRUV RU LQWHUDFWLRQV DUH IRUPHG E\ KRUL]RQWDO GLUHFW SURGXFW &ROXPQV RI ]HURHV LQ WKH GHVLJQ PDWUL[ WKH UHVXOW RI LPEDODQFHf DUH WKHQ GHOHWHG 7KH GHVLJQ PDWUL[ FROXPQV DUH LQ DQ RUGHU VSHFLILHG E\ *LHVEUHFKWfV DOJRULWKP FROXPQV IRU IL[HG HIIHFWV DUH ILUVW IROORZHG E\ WKH GDWD YHFWRU DQG WKH ODVW VHFWLRQ RI WKH PDWUL[ LV IRU UDQGRP HIIHFWV 7KH GHVLJQ PDWUL[ LV WKH RQO\ IXOO\ IRUPHG PDWUL[ LQ WKH SURJUDP $OO RWKHU PDWULFHV DUH V\PPHWULF WKHUHIRUH WR VDYH FRPSXWDWLRQDO VSDFH :LQGRZV LV WKH WUDGHPDUN RI WKH 0LFURVRIW &RUSRUDWLRQ 5HGPRQG :$ PAGE 94 DQG WLPH RQO\ WKH GLDJRQDO DQG WKH DERYH GLDJRQDO SRUWLRQV RI PDWULFHV DUH IRUPHG DQG XWLOL]HG LH KDOIVWRUHGf $ KDOIVWRUHG PDWUL[ RI WKH GRW SURGXFWV RI WKH GHVLJQ FROXPQV LV IRUPHG DQG HLWKHU NHSW LQ FRPPRQ PHPRU\ RU VWRUHG LQ WHPSRUDU\ GLVN VSDFH VR WKDW WKH PDWUL[ LV DYDLODEOH IRU UHFDOO LQ WKH LWHUDWLYH VROXWLRQ SURFHVV 7KH DOJRULWKP SURFHHGV E\ PRGLI\LQJ WKH PDWUL[ RI GRW SURGXFWV VXFK WKDW WKH LQYHUVH RI WKH FRYDULDQFH PDWUL[ IRU WKH REVHUYDWLRQV 9f LV HQFORVHG E\ WKH FROXPQ VSHFLILHUV LQ WKH GRW SURGXFWV DV ;f; EHFRPLQJ ;f9n; 7KLV WUDQVIHU LV FRPSOHWHG ZLWKRXW LQYHUVLRQ RI WKH WRWDO 9 PDWUL[ 7KH LGHQWLW\ XVHG WR DFFRPSOLVK WKLV WUDQVIHU LV LI 9K DK=K=KL 9Kf ZKHUH 9K LV QRQVLQJXODU WKHQ Q 9n9 DK9O,f=K,K DE=Kf9Kf=Kfn=Kf9Kf $ FRPSDFW IRUP RI HTXDWLRQ LV REWDLQHG E\ SUHPXOWLSO\LQJ E\ =cf DQG SRVWPXOWLSO\LQJ E\ =M ZKHUH K N N WKH WRWDO QXPEHU RI UDQGRP IDFWRUVf DK LV WKH SULRU DVVRFLDWHG ZLWK UDQGRP YDULDEOH K 9N DN, 9 9 DQG =c LV WKH SRUWLRQ RI WKH GHVLJQ PDWUL[ IRU UDQGRP YDULDEOH L *LHVEUHFKW f $ SDUWLWLRQHG PDWUL[ LV IRUPHG LQ RUGHU WR XSGDWH XQWLO 9n RU 9 LV REWDLQHG 7KLV PDWUL[ LV RI WKH IRUP ,f mK=IF:n= 9AK=Kf9Kf;_\_=_=Nf YnM[[, \ =M _ ]NfYKf f] 7Kf ZKHUH 7N ;_\ _=_ c=Nff9N\;c\_=,__=f 7KH VZHHS RSHUDWRU RI *RRGQLJKW f LV DSSOLHG WR WKH XSSHU OHIW SDUWLWLRQ RI WKH PDWUL[ HTXDWLRQ f DQG WKH UHVXOW RI HTXDWLRQ LV REWDLQHG 7KH PDWUL[ LV VHTXHQWLDOO\ XSGDWHG DQG VZHSW XQWLO 7 ;c\ c=_ c=Nff9n;c\ c = Mc=Nf LV REWDLQHG 7 LV WKHQ VZHSW RQ WKH FROXPQV IRU IL[HG HIIHFWV ;f9 n;f 7KLV VZHHS RSHUDWLRQ SURGXFHV JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV IRU IL[HG HIIHFWV UHVXOWV ZKLFK FDQ EH VFDOHG LQWR SUHGLFWLRQV RI UDQGRP YDULDEOHV WKH UHVLGXDO VXP RI VTXDUHV DQG DOO WKH QHFHVVDU\ LQJUHGLHQWV IRU DVVHPEOLQJ WKH PAGE 95 HTXDWLRQ WR VROYH IRU WKH YDULDQFH FRPSRQHQWV 7KH HTXDWLRQ WR EH VROYHG IRU WKH YDULDQFH FRPSRQHQWV LV ^WU49M49Mf`D ^\f49M4\` QU UMFO UMFO WKHQ Âr ^WU 4 9 c4 9Mf`n ^\ f4 9 c4\` ZKHUH ^WU4949Mf` LV D PDWUL[ ZKRVH HOHPHQWV DUH WU49_49Mf ZKHUH L WR U DQG M O WR U LH WKHUH LV D URZ DQG FROXPQ IRU HYHU\ UDQGRP YDULDEOH LQ WKH OLQHDU PRGHO WU LV WKH WUDFH RSHUDWRU WKDW LV WKH VXP RI WKH GLDJRQDO HOHPHQWV RI D PDWUL[ 4 9n 9n;L;f9n;\;f9 IRU 9 DV WKH FRYDULDQFH PDWUL[ RI \ DQG ; DV WKH GHVLJQ PDWUL[ IRU IL[HG HIIHFWV 9c =c=f ZKHUH WKH LfV DUH WKH UDQGRP YDULDEOHV D LV WKH YHFWRU RI YDULDQFH FRPSRQHQW HVWLPDWHV DQG U LV WKH QXPEHU RI UDQGRP YDULDEOHV LQ WKH PRGHO Nf 7KH HQWLUH SURFHGXUH IURP IRUPLQJ 7 WR VROYLQJ IRU WKH YDULDQFH FRPSRQHQWV FRQWLQXHV XQWLO WKH YDULDQFH FRPSRQHQW HVWLPDWHV IURP WKH ODVW LWHUDWLRQ DUH QR PRUH GLIIHUHQW IURP WKH HVWLPDWHV RI WKH SUHYLRXV LWHUDWLRQ WKDQ WKH FRQYHUJHQFH FULWHULRQ VSHFLILHV 7KH IL[HG HIIHFW HVWLPDWHV DQG SUHGLFWLRQV RI UDQGRP YDULDEOHV DUH WKHQ WKRVH RI WKH ILQDO LWHUDWLRQ 7KH DV\PSWRWLF FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQWV LV REWDLQHG DV 9DURf ^WU49L49Mf` E\ XWLOL]LQJ LQWHUPHGLDWH UHVXOWV IURP WKH VROXWLRQ IRU WKH YDULDQFH FRPSRQHQWV 7KH FRHIILFLHQW PDWUL[ RI +HQGHUVRQfV PL[HG PRGHO HTXDWLRQV LV IRUPHG LQ RUGHU WR FDOFXODWH WKH FRYDULDQFH PDWUL[ IRU IL[HG DQG UDQGRP HIIHFWV 7KH FRYDULDQFH PDWUL[ IRU PAGE 96 REVHUYDWLRQV LV FRQVWUXFWHG XVLQJ WKH YDULDQFH FRPSRQHQWV HVWLPDWHV IURP *LHVEUHFKWfV DOJRULWKP 7KH FRHIILFLHQW PDWUL[ LV ;f5f; ;f5n= =f5n; =f5n= ZKHUH 5 LV WKH HUURU FRYDULDQFH PDWUL[ ZKLFK LQ WKLV DSSOLFDWLRQ LV ,IZ ZKHUH [Z LV WKH YDULDQFH RI UDQGRP YDULDEOH Z HTXDWLRQ DQG f ; LV WKH IL[HG HIIHFWV GHVLJQ PDWUL[ = LV WKH UDQGRP HIIHFWV GHVLJQ PDWUL[ DQG LV WKH FRYDULDQFH PDWUL[ IRU WKH UDQGRP YDULDEOHV ZKLFK LQ WKLV DSSOLFDWLRQ KDV YDULDQFH FRPSRQHQWV RQ WKH GLDJRQDO DQG ]HURHV RQ WKH RIIGLDJRQDO QR FRYDULDQFH DPRQJ UDQGRP YDULDEOHVf 7KH JHQHUDOL]HG LQYHUVH RI WKH PDWUL[ HTXDWLRQ f LV WKH HUURU FRYDULDQFH PDWUL[ RI WKH IL[HG HIIHFW HVWLPDWHV DQG UDQGRP SUHGLFWLRQV DVVXPLQJ WKH FRYDULDQFH PDWUL[ IRU REVHUYDWLRQ LV NQRZQ ZLWKRXW HUURU 2SHUDWLQJ *$5(0/ :KLOH *$5(0/ ZLOO UXQ LQ HLWKHU EDWFK RU LQWHUDFWLYH PRGH ZH IRFXV RQ WKH LQWHUDFWLYH 3&YHUVLRQ ZKLFK EHJLQV E\ SURPSWLQJ WKH DQDO\VW WR DQVZHU TXHVWLRQV GHWHUPLQLQJ WKH IDFWRUV WR EH UHDG IURP WKH GDWD 6SHFLILFDOO\ WKH DQDO\VW DQVZHUV \HV RU QR WR WKHVH TXHVWLRQV f DUH WKHUH PXOWLSOH ORFDWLRQV" f DUH WKHUH PXOWLSOH EORFNV" f DUH WKHUH GLVFRQQHFWHG VHWV RI IXOOVLEV" LH XVXDOO\ UHIHUULQJ WR GLVFRQQHFWHG KDOIGLDOOHOV DQG f LV WKH PDWLQJ GHVLJQ KDOIVLE RU IXOOVLE" 7KH SURJUDP WKHQ GHWHUPLQHV WKH SURSHU YDULDEOHV WR UHDG IURP WKH GDWD DV ZHOO DV WKH PRVW FRPSOLFDWHG QXPEHU RI PDLQ IDFWRUV SOXV LQWHUDFWLRQVf VFDODU OLQHDU PRGHO DOORZHG 7KH PRVW FRPSOLFDWHG OLQHDU PRGHO DOORZHG IRU IXOOVLE REVHUYDWLRQV LV PAGE 97 \LMNLP 0 Wc EcM VHWf JN J 6X WJIF WJX WVcX SLMNZLMNOP ZKHUH LV WKH P REVHUYDWLRQ RI WKH NO FURVV LQ WKH M EORFN RI WKH L WHVW + LV WKH SRSXODWLRQ PHDQ Wc LV WKH UDQGRP RU IL[HG YDULDEOH WHVW HQYLURQPHQW E\ LV WKH UDQGRP RU IL[HG YDULDEOH EORFN VHWF LV WKH UDQGRP RU IL[HG YDULDEOH VHW LH D YDULDEOH LV FUHDWHG VR WKDW GLVFRQQHFWHG VHWV RI KDOIGLDOOHOV SODQWHG LQ WKH VDPH H[SHULPHQW FDQ EH DQDO\]HG LQ WKH VDPH UXQ RU WR DQDO\]H SURYHQDQFHV DQG IDPLOLHV ZLWKLQ SURYHQDQFH ZKHUH SURYHQDQFH HTXDOV VHW VHWV DUH DVVXPHG WR EH DFURVV WHVW HQYLURQPHQWV DQG EORFNV ZLWK IDPLOLHV QHVWHG ZLWKLQ VHWV DQG LQWHUDFWLRQV ZLWK VHW DUH DVVXPHG XQLPSRUWDQW JN LV WKH UDQGRP YDULDEOH IHPDOH JHQHUDO FRPELQLQJ DELOLW\ JFDf J LV WKH UDQGRP YDULDEOH PDOH JFD 6\ LV WKH UDQGRP YDULDEOH VSHFLILF FRPELQLQJ DELOLW\ VHDf WJIU LV WKH UDQGRP YDULDEOH WHVW E\ IHPDOH JFD LQWHUDFWLRQ WJX LV WKH UDQGRP YDULDEOH WHVW E\ PDOH JFD LQWHUDFWLRQ W6MX LV WKH UDQGRP YDULDEOH WHVW E\ VHD LQWHUDFWLRQ SLMNO LV WKH UDQGRP YDULDEOH SORW ZLMNOP LV WKH UDQGRP YDULDEOH ZLWKLQSORW DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KH DVVXPSWLRQV XWLOL]HG DUH WKH YDULDQFH IRU IHPDOH DQG PDOH UDQGRP YDULDEOHV DUH HTXDO DA DJ_ 7Jf DQG IHPDOH DQG PDOH HQYLURQPHQWDO LQWHUDFWLRQV DUH WKH VDPH DA DA R?f 7KH PRVW FRPSOLFDWHG VFDODU OLQHDU PRGHO DOORZHG IRU KDOIVLE REVHUYDWLRQV LV \LMNQ 0 Wc E\ VHW JN WJLN SKLMN ZKLMNP PAGE 98 ZKHUH \LMNP LV WKH P REVHUYDWLRQ RI WKH NfÂ§ KDOIVLE IDPLO\ LQ WKH MfÂ§ EORFN RI WKH LfÂ§ WHVW + Wc ELM VHWR JN DQG WJr UHWDLQ WKH GHILQLWLRQ LQ WKH IXOOVLE HTXDWLRQ SKLMN LV WKH UDQGRP YDULDEOH SORW FRQWDLQLQJ GLIIHUHQW JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ WKH IXOOVLE PRGHO ZKLMNQL LV WKH UDQGRP YDULDEOH ZLWKLQSORW FRQWDLQLQJ GLIIHUHQW OHYHOV RI JHQRW\SLF DQG JHQRW\SH E\ HQYLURQPHQW FRPSRQHQWV WKDQ WKH IXOOVLE PRGHO DQG WKHUH LV QR FRYDULDQFH EHWZHHQ UDQGRP YDULDEOHV LQ WKH PRGHO 7KH DQDO\VW EXLOGV WKH OLQHDU PRGHO E\ DQVZHULQJ IXUWKHU SURPSWV ,I WHVW EORFN DQGRU VHW DUH LQ WKH PRGHO WKH\ PXVW EH GHFODUHG DV IL[HG RU UDQGRP HIIHFWV :KHQ DQ\ RI WKH WKUHH HIIHFWV LV GHFODUHG UDQGRP WKH DQDO\VW PXVW IXUQLVK SULRU YDOXHV IRU WKH YDULDQFH ,I QR SULRU YDOXH LV NQRZQ fV PD\ EH XVHG DV SULRUV 8VLQJ fV DV SULRUV ZLOO QRW DIIHFW WKH YDOXHV IRU UHVXOWLQJ YDULDQFH FRPSRQHQW HVWLPDWHV ZLWKLQ WKH FRQVWUDLQWV RI WKH FRQYHUJHQFH FULWHULRQ EXW WKHUH PD\ EH D WLPH SHQDOW\ GXH WR LQFUHDVLQJ WKH QXPEHU RI LWHUDWLRQV UHTXLUHG IRU FRQYHUJHQFH $OO UHPDLQLQJ IDFWRUV LQ WKH PRGHO DUH WUHDWHG DV UDQGRP YDULDEOHV 7R FRPSOHWH WKH GHILQLWLRQ RI WKH PRGHO WKH DQDO\VW FKRRVHV WR LQFOXGH RU H[FOXGH HDFK SRVVLEOH IDFWRU E\ DQVZHULQJ \HV RU QR ZKHQ SURPSWHG $IWHU HDFK \HV DQVZHU WKH SURJUDP DVNV IRU D SULRU YDOXH IRU WKH YDULDQFH $JDLQ LI QR NQRZQ SULRUV H[LVW fV PD\ EH VXEVWLWXWHG $IWHU WKH PRGHO KDV EHHQ VSHFLILHG WKH SURJUDP FRXQWV WKH QXPEHU RI IL[HG HIIHFWV DQG WKH QXPEHU RI UDQGRP HIIHFWV DQG DVNV LI WKH QXPEHU ILWV WKH PRGHO H[SHFWHG $ \HV DQVZHU SURFHHGV WKURXJK WKH SURJUDP ZKLOH D QR UHWXUQV WKH SURJUDP WR WKH EHJLQQLQJ *$5(0/ LV QRZ UHDG\ WR UHDG WKH GDWD ILOH ZKLFK PXVW EH DQ $6&,, GDWD ILOHf LQ WKLV RUGHU WHVW EORFN VHW IHPDOH PDOH DQG WKH UHVSRQVH YDULDEOH 7KH DQDO\VW LV SURPSWHG WR IXUQLVK D SURSHU )2575$1 IRUPDW VWDWHPHQW IRU WKH GDWD 7HVW EORFN VHW IHPDOH DQG PDOH DUH UHDG DV FKDUDFWHU YDULDEOHV $ ILHOGVf ZLWK DV PDQ\ DV HLJKW FKDUDFWHUV SHU ILHOG ZKLOH WKH GDWD PAGE 99 YHFWRU UHVSRQVH YDULDEOHf LV UHDG DV D GRXEOH SUHFLVLRQ YDULDEOH ) ILHOGf $Q H[DPSOH RI D IRUPDW VWDWHPHQW IRU D IXOOVLE PDWLQJ GHVLJQ DFURVV ORFDWLRQV DQG EORFNV LV $)f ZKLFK UHDGV IRXU FKDUDFWHU YDULDEOHV VHTXHQWLDOO\ RFFXS\LQJ FROXPQV HDFK DQG WKH UHSRQVH YDULDEOH EHJLQQLQJ LQ FROXPQ DQG HQGLQJ LQ FROXPQ KDYLQJ ILYH GHFLPDO SODFHV $IWHU UHDGLQJ WKH GDWD *$5(0/ EHJLQV WR IXUQLVK LQIRUPDWLRQ WR WKH DQDO\VW 7KLV LQIRUPDWLRQ VKRXOG EH VFDQQHG WR PDNH VXUH WKH GDWD UHDG DUH FRUUHFW 7KLV LQIRUPDWLRQ LQFOXGHV WKH QXPEHU RI SDUHQWV WKH QXPEHU RI IXOOVLE FURVVHV WKH QXPEHU RI REVHUYDWLRQV WKH PD[LPXP QXPEHU RI IL[HG HIIHFW GHVLJQ PDWUL[ FROXPQV DQG WKH PD[LPXP QXPEHU RI UDQGRP HIIHFW GHVLJQ PDWUL[ FROXPQV ,I WKHUH LV DQ HUURU DW WKLV SRLQW XVH &75/%5. WR H[LW WKH SURJUDP 3UREDEOH FDXVHV RI HUURUV DUH WKH GDWD DUH QRW LQ WKH IRUPDW VSHFLILHG PLVVLQJ YDOXHV DUH LQFOXGHG EODQN OLQHV RU RWKHU VLPLODU HUURUV DUH LQ WKH GDWD ILOH RU WKH PRGHO ZDV QRW FRUUHFWO\ VSHFLILHG $W WKLV SRLQW WKHUH DUH WKUHH RWKHU SURPSWV FRQFHUQLQJ WKH GDWD DQDO\VLV QXPEHU RI LWHUDWLRQV FRQYHUJHQFH FULWHULRQ DQG WUHDWPHQW RI QHJDWLYH YDULDQFH FRPSRQHQWVf 7KH QXPEHU RI LWHUDWLRQV LV DUELWUDULO\ VHW WR DQG FDQ EH FKDQJHG DW WKH DQDO\VWfV GLVFUHWLRQ 1R ZDUQLQJ LV LVVXHG WKDW WKH PD[LPXP QXPEHU RI LWHUDWLRQV KDV EHHQ UHDFKHG KRZHYHU WKH FXUUHQW LWHUDWLRQ QXPEHU DQG YDULDQFH FRPSRQHQW HVWLPDWHV DUH RXWSXW WR WKH VFUHHQ DW WKH EHJLQQLQJ RI HDFK LWHUDWLRQ 7KH FRQYHUJHQFH FULWHULRQ XVHG LV WKH VXP RI WKH DEVROXWH YDOXHV RI WKH GLIIHUHQFH EHWZHHQ YDULDQFH FRPSRQHQW HVWLPDWHV IRU FRQVHFXWLYH LWHUDWLRQV 7KH FULWHULRQ KDV EHHQ VHW WR O[O2 PHDQLQJ WKDW FRQYHUJHQFH LV UHTXLUHG WR WKH IRXUWK GHFLPDO SODFH IRU DOO YDULDQFH FRPSRQHQWV 7KH FRQYHUJHQFH FULWHULRQ VKRXOG EH PRGLILHG WR VXLW WKH PDJQLWXGH RI WKH YDULDQFHV XQGHU FRQVLGHUDWLRQ DV ZHOO DV WKH SUDFWLFDO QHHG IRU HQKDQFHG UHVROXWLRQ (QKDQFHG UHVROXWLRQ LV REWDLQHG DW WKH FRVW RI LQFUHDVLQJ WKH QXPEHU RI LWHUDWLRQV WR FRQYHUJHQFH 7KH DQDO\VW PXVW GHFLGH ZKHWKHU WR DFFHSW DQG XVH QHJDWLYH HVWLPDWHV RU WR VHW QHJDWLYH HVWLPDWHV WR ]HUR DQG UHVROYH WKH V\VWHP 7KH ODWWHU VROXWLRQ UHVXOWV LQ YDULDQFH FRPSRQHQW PAGE 100 HVWLPDWHV ZLWK ORZHU VDPSOLQJ YDULDQFH DQG VOLJKW ELDV ,I RQH LV LQWHUHVWHG LQ XQELDVHG HVWLPDWHV RI YDULDQFH FRPSRQHQWV WKDW KDYH D KLJK SUREDELOLW\ RI QHJDWLYH HVWLPDWHV WKHQ DFFHSWLQJ DQG XVLQJ WKH QHJDWLYH HVWLPDWHV PD\ EH WKH SURSHU FRXUVH WR WDNH ,QWHUSUHWLQJ *$5(0/ 2XWSXW $QDO\VLV LV QRZ XQGHUZD\ 7KH SULRUV IRU HDFK LWHUDWLRQ DQG WKH LWHUDWLRQ QXPEHU DUH SULQWHG RXW WR WKH VFUHHQ *$5(0/ FRQWLQXHV WR LWHUDWH XQWLO WKH FRQYHUJHQFH FULWHULRQ LV PHW RU WKH PD[LPXP QXPEHU RI LWHUDWLRQV LV UHDFKHG 7KH QH[W WLPH WKDW DQDO\VW LQWHUYHQWLRQ LV UHTXLUHG LV WR SURYLGH D QDPH IRU WKH RXWSXW ILOH IRU YDULDQFH FRPSRQHQW HVWLPDWHV 7KH IGH QDPH IROORZV QRUPDO '26 ILOH QDPLQJ SURWRFRO KRZHYHU DOWHUQDWLYH GLUHFWRULHV PD\ QRW EH VSHFLILHG LH DOO RXWSXWV ZLOO EH IRXQG LQ WKH VDPH GLUHFWRU\ DV WKH GDWD ILOH 7KH SURJUDP ZLOO QRZ TXL] WKH DQDO\VW WR GHWHUPLQH LI DGGLWLRQDO RXWSXWV DUH GHVLUHG 7KHVH DGGLWLRQDO RXWSXWV DUH JFD SUHGLFWLRQV VHD SUHGLFWLRQV LI DSSOLFDEOHf WKH DV\PSWRWLF FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQWV JHQHUDOL]HG OHDVW VTXDUHV IL[HG HIIHFW HVWLPDWHV HUURU FRYDULDQFH PDWUL[ RI WKH JFD SUHGLFWLRQV DQG HUURU FRYDULDQFH PDWUL[ IRU IL[HG HIIHFWV $Q DQVZHU RI \HV WR WKH LQFOXVLRQ RI DQ RXWSXW ZLOO UHVXOW LQ D SURPSWLQJ IRU D ILOH QDPH ,Q DGGLWLRQ IRU JFD DQG VHD SUHGLFWLRQV WKH DQDO\VW PD\ LQSXW D GLIIHUHQW YDOXH IRU EJD RU FUP ZLWK ZKLFK WR VFDOH SUHGLFWLRQV 7KH GLVFXVVLRQ ZKLFK IROORZV IXUQLVKHV PRUH GHWDLOHG LQIRUPDWLRQ FRQFHUQLQJ *$5(0/ RXWSXWV 9DULDQFH &RPSRQHQW (VWLPDWHV ,JQRULQJ FRQFHUQV DERXW FRQYHUJHQFH WR D JOREDO PD[LPXP DQG QHJDWLYH YDOXHV YDULDQFH FRPSRQHQW HVWLPDWHV DUH UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWHV RI 3DWWHUVRQ DQG 7KRPSVRQ f 7KH HVWLPDWHV DUH UREXVW DJDLQVW VWDUWLQJ YDOXHV SULRUVf LH WKH VDPH HVWLPDWHV ZLWKLQ WKH OLPLWV RI WKH FRQYHUJHQFH FULWHULRQ FDQ EH REWDLQHG IURP GLYHUVH SULRUV +RZHYHU SULRUV PAGE 101 FORVH WR WKH WUXH YDOXHV ZLOO LQ JHQHUDO UHGXFH WKH QXPEHU RI LWHUDWLRQV UHTXLUHG WR UHDFK FRQYHUJHQFH 7KH YDOXH RI WKH FRQYHUJHQFH FULWHULRQ PXVW EH OHVV WKDQ RU HTXDO WR WKH GHVLUHG SUHFLVLRQ IRU WKH YDULDQFH FRPSRQHQWV 5(0/ YDULDQFH FRPSRQHQW HVWLPDWHV IURP WKLV SURJUDP KDYH EHHQ VKRZQ WR KDYH PRUH GHVLUDEOH SURSHUWLHV YDULDQFH DQG ELDVf WKDQ RWKHU FRPPRQO\ XVHG HVWLPDWLRQ WHFKQLTXHV PD[LPXP OLNHOLKRRG PLQLPXP QRUP TXDGUDWLF XQELDVHG HVWLPDWLRQ DQG +HQGHUVRQfV 0HWKRG f RYHU D ZLGH UDQJH RI GDWD LPEDODQFH 7KH SURSHUWLHV RI WKH HVWLPDWHV DUH IXUWKHU HQKDQFHG E\ XVLQJ LQGLYLGXDO REVHUYDWLRQV DV GDWD UDWKHU WKDQ SORW PHDQV 7KH RXWSXW LV ODEHOOHG E\ WKH YDULDQFH FRPSRQHQW HVWLPDWHG 3UHGLFWLRQV RI 5DQGRP 9DULDEOHV 7KH SUHGLFWLRQV RXWSXW DUH IRU JHQHUDO DQG VSHFLILF FRPELQLQJ DELOLWLHV DQG DSSUR[LPDWH EHVW OLQHDU XQELDVHG SUHGLFWLRQV %/83f RI WKH UDQGRP YDULDEOHV %/83 SUHGLFWLRQV KDYH VHYHUDO RSWLPDO SURSHUWLHV f WKH FRUUHODWLRQ EHWZHHQ WKH SUHGLFWHG DQG WUXH YDOXHV LV PD[LPL]HG f LI WKH GLVWULEXWLRQ LV PXOWLYDULDWH QRUPDO WKHQ %/83 PD[LPL]HV WKH SUREDELOLW\ RI REWDLQLQJ WKH FRUUHFW UDQNLQJV +HQGHUVRQ f DQG VR PD[LPL]HV WKH SUREDELOLW\ RI VHOHFWLQJ WKH EHVW FDQGLGDWH IURP DQ\ SDLU RI FDQGLGDWHV +HQGHUVRQ f 3UHGLFWLRQV DUH RI WKH IRUP X e!=f9 r\;f ZKHUH LV WKH YHFWRU RI SUHGLFWLRQV e! LV WKH HVWLPDWHG FRYDULDQFH PDWUL[ IRU UDQGRP YDULDEOHV IURP WKH 5(0/ YDULDQFH FRPSRQHQW HVWLPDWHV VHH HTXDWLRQ =f LV WKH WUDQVSRVH RI WKH GHVLJQ PDWUL[ IRU UDQGRP YDULDEOHV \ LV WKH GDWD YHFWRU ; LV WKH GHVLJQ PDWUL[ IRU IL[HG HIIHFWV PAGE 102 LV WKH YHFWRU RI IL[HG HIIHFW HVWLPDWHV DQG 9 LV WKH HVWLPDWHG FRYDULDQFH PDWUL[ IRU REVHUYDWLRQV IURP 5(0/ YDULDQFH FRPSRQHQW HVWLPDWHV 127( LI SUHGLFWLRQV DUH GHVLUHG EDVHG RQ SULRU YDOXHV IRU WKH YDULDQFH FRPSRQHQWV VHW WKH QXPEHU RI LWHUDWLRQV WR DIWHU KDYLQJ LQSXW WKH GHVLUHG YDOXHV DV SULRUV 3UHGLFWLRQV DUH RXWSXW DV D ODEHOOHG YHFWRU $V\PSWRWLF &RYDULDQFH 0DWUL[ RI 9DULDQFH &RPSRQHQWV 7KH RXWSXW IRU WKH DV\PSWRWLF FRYDULDQFH PDWUL[ $9&0f RI YDULDQFH FRPSRQHQWV LV IURP HTXDWLRQ 7KLV RXWSXW UHSUHVHQWV WKH YDULDQFH RI UHSHDWHG PLQLPXP YDULDQFH TXDGUDWLF XQELDVHG YDULDQFH FRPSRQHQW HVWLPDWHV XVLQJ WKH VDPH H[SHULPHQWDO GHVLJQ LI WKH HVWLPDWHV DUH HTXDO WR WKH WUXH YDOXHV 7KLV WHFKQLTXH KDV EHHQ XVHG IRU VLPXODWLRQ ZRUN WR GHILQH RSWLPDO PDWLQJ DQG ILHOG GHVLJQV 0F&XWFKDQ HW DO f 7KH $9&0 LV XVHG WR FUHDWH WKH DV\PSWRWLF YDULDQFH RI OLQHDU FRPELQDWLRQV RI HVWLPDWHV RI YDULDQFH FRPSRQHQWV DV 9DUI/fDf /f9DUAf/ ZKHUH / VSHFLILHV WKH OLQHDU FRPELQDWLRQVf RI YDULDQFH FRPSRQHQWV LV WKH YHFWRU RI YDULDQFH FRPSRQHQW HVWLPDWHV DQG 9DURf LV WKH $9&0 IURP HTXDWLRQ 7KH GLDJRQDO HOHPHQWV RI /f9DUf/ DUH WKH YDULDQFHV RI WKH OLQHDU FRPELQDWLRQV DQG WKH RII GLDJRQDO HOHPHQWV DUH WKH FRYDULDQFHV EHWZHHQ WKH OLQHDU FRPELQDWLRQV 7KHVH YDOXHV DUH WKHQ XVHIXO IRU 7D\ORU VHULHV DSSUR[LPDWLRQ RI WKH YDULDQFH RI D UDWLR RI OLQHDU FRPELQDWLRQV VXFK DV KHULWDELOLW\ $9&0 LV RXWSXW DV D YHFWRU KDOIVWRUHG PDWUL[f DQG HDFK URZ RI WKH RXWSXW LV ODEHOOHG PAGE 103 )L[HG (IIHFW (VWLPDWHV )L[HG HIIHFW HVWLPDWHV DUH WKRVH RI JHQHUDOL]HG OHDVW VTXDUHV DQG DUH LQ D VHW WR ]HUR IRUPDW 6HW WR ]HUR IRUPDW FRPPRQO\ VHHQ LQ 6$6 RXWSXWf LV FKDUDFWHUL]HG E\ WKH ODVW OHYHO RI D PDLQ HIIHFW RU QHVWHG HIIHFW EHLQJ VHW WR ]HUR 7KHVH HVWLPDWHV DUH DSSUR[LPDWHO\ EHVW OLQHDU XQELDVHG HVWLPDWHV %/8(f RI WKH IL[HG HIIHFWV EHFDXVH WKH FRYDULDQFH PDWUL[ IRU REVHUYDWLRQV ZDV HVWLPDWHG DQG QRW NQRZQ ZLWKRXW HUURU .DFNDU DQG +DUYLOOH f KDYH VKRZQ IRU D EURDG FODVV RI YDULDQFH HVWLPDWRUV WKDW WKH IL[HG HIIHFWV HVWLPDWHV DUH VWLOO XQELDVHG 7KH ZRUG %HVW LQ %/8( UHIHUV WR WKH SURSHUWLHV RI PLQLPXP YDULDQFH IRU WKH FODVV RI XQELDVHG HVWLPDWRUV *HQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV LQ VHW WR ]HUR IRUPDW IRU IL[HG HIIHFWV DUH RI WKH IRUP ;f9n;\;f9n\ ZKHUH ; 9 DQG \ DUH DV GHILQHG LQ HTXDWLRQ )L[HG HIIHFW HVWLPDWHV DUH RXWSXW DV D ODEHOOHG YHFWRU (UURU &RYDULDQFH 0DWULFHV 7KH HUURU FRYDULDQFH PDWULFHV IRU SUHGLFWLRQV DQG IL[HG HIIHFW HVWLPDWHV DUH REWDLQHG E\ SURGXFLQJ D JHQHUDOL]HG LQYHUVH RI HTXDWLRQ +HQGHUVRQ 0F/HDQ f 6LQFH DOO FRYDULDQFH PDWULFHV DUH V\PPHWULF WKH RXWSXW LV LQ WKH IRUP RI D YHFWRU ZKLFK LV HTXLYDOHQW WR D KDOIVWRUHG PDWUL[ 2XWSXW IRU HUURU RI JFD SUHGLFWLRQV LV ODEHOHG ZKLOH WKH HUURU RI IL[HG HIIHFWV LV QRW 7KH ODEHOLQJ RQ JFD HUURUV PDNHV WKH XQODEHOOHG RXWSXW IRU IL[HG HIIHFW YDULDQFH VHOI H[SODQDWRU\ 7KH HUURU FRYDULDQFH PDWUL[ IRU JFD SUHGLFWLRQV FDQ EH FRQYHUWHG WR WKH FRYDULDQFH PDWUL[ IRU JFD SUHGLFWLRQV E\ IRUPLQJ WKH FRYDULDQFH PDWUL[ IRU WKH JFD UDQGRP YDULDEOHV DQG 6$6 LV WKH UHJLVWHUHG WUDGHPDUN RI 6$6 ,QVWLWXWH ,QF &DU\ 1RUWK &DUROLQD PAGE 104 VXEWUDFWLQJ WKH HUURU FRYDULDQFH PDWUL[ 7KH FRYDULDQFH PDWUL[ IRU SUHGLFWLRQV KDV EHHQ GHQRWHG DV 9DUJf E\ :KLWH DQG +RGJH f ([DPSOH 7KH IROORZLQJ GLVFXVVLRQ LQYROYHV WKH DQDO\VLV RI D VLPXODWHG GDWD VHW LQ RUGHU WR IXUWKHU GHPRQVWUDWH WKH RXWSXWV RI *$5(0/ 'DWD 7KH GDWD 7DEOH f ZDV JHQHUDWHG XVLQJ D VL[SDUHQW KDOIGLDOOHO PDWLQJ GHVLJQ DQG D UDQGRPL]HG FRPSOHWH EORFN ILHOG GHVLJQ 7KH ILHOG GHVLJQ LV LQ WZR ORFDWLRQV ZLWK IRXU FRPSOHWH EORFNV SHU ORFDWLRQ DQG WZR WUHHV SHU IDPLO\ SHU EORFN 7KH XQGHUO\LQJ JHQHWLF SDUDPHWHUV IRU WKH GDWD DUH LQGLYLGXDO WUHH KHULWDELOLW\ HTXDOV 7\SH % FRUUHODWLRQ HTXDOV GRPLQDQFH WR DGGLWLYH YDULDQFH UDWLR HTXDOV DQG WKH SRSXODWLRQ PHDQ HTXDOV $IWHU D EDODQFHG GDWD VHW ZDV JHQHUDWHG WKH REVHUYDWLRQV ZHUH VXEMHFWHG WR b UDQGRP GHOHWLRQ VLPXODWLQJ b VXUYLYDOf 7KH GDWD VHW LV FRPSULVHG RI D VPDOO QXPEHU RI REVHUYDWLRQV DQG ZKLOH QRW DQ RSWLPDO DSSOLFDWLRQ RI *$5(0/ VHUYHV ZHOO DV DQ LOOXVWUDWLRQ $QDO\VLV 7KH DQDO\VLV ZDV FDUULHG RXW ZLWK WZR GLIIHUHQW OLQHDU PRGHOV XVLQJ LQGLYLGXDO REVHUYDWLRQV DV WKH GDWD 7KH PRGHO FRQWDLQHG HLJKW VRXUFHV RI YDULDWLRQ DQG ZDV IURP HTXDWLRQ ZLWKRXW WKH YDULDEOH VHW ,Q PRGHO WHVW HQYLURQPHQW DQG EORFNV ZLWKLQ WHVW DUH GHFODUHG IL[HG 7KH VXEVHTXHQW PRGHO PRGHO f KDV DOO UDQGRP HIIHFWV H[FHSW WKH PHDQ 9DULDQFH PAGE 105 7DEOH 'DWD IRU H[DPSOH RI *$5(0/ RSHUDWLRQ / %O ) 0 7 DQG 59 VWDQG IRU ORFDWLRQ EORFN IHPDOH WUHH DQG UHVSRQVH YDULDEOH UHVSHFWLYHO\ $ SURSHU )2575$1 UHDG IRUPDW ZRXOG EH $7$7$7$7)f / %O ) 0 7 59 PAGE 106 7DEOH /% ) FRQWLQXHG 0 7 59 PAGE 107 7DEOH /% ) FRQWLQXHG 0 7 59 PAGE 108 7DEOH FRQWLQXHG / % ) 0 7 59 FRPSRQHQWV DUH HVWLPDWHG ZLWK PRGHO UHFHLYLQJ WZR GLIIHUHQW WUHDWPHQWV RI QHJDWLYH HVWLPDWHV LH OLYH ZLWK WKH QHJDWLYH HVWLPDWHV PRGHO $f RU UHVROYH WKH V\VWHP VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR PRGHO ,%f 7KH GLIIHUHQW PRGHOV DQG PHWKRGV IRU GHDOLQJ ZLWK QHJDWLYH HVWLPDWHV DUH GHPRQVWUDWHG VR WKDW WKH UHDGHU FDQ VHH D UDQJH RI RXWSXWV IURP *$5(0/ 2XWSXW 9DULDQFH FRPSRQHQW HVWLPDWHV 7KH YDULDQFH FRPSRQHQW HVWLPDWHV DUH 0RGHO $ 6,*0$648$5(' *&$ 6,*0$648$5(' 6&$ 6,*0$648$5(' /2&[*&$ 6,*0$648$5(' /2&[6&$ 6,*0$648$5(' %/2&.[)$0 6,*0$648$5(' (5525 0RGHO ,% 6,*0$648$5(' *&$ 6,*0$648$5(' 6&$ 6,*0$648$5(' /2&[*&$ PAGE 109 6,*0$648$5(' /2&[6&$ 6,*0$648$5(' %/2&.[)$0 6,*0$648$5(' (5525 DQG 0RGHO 6,*0$648$5(' /2&$7,21 6,*0$648$5(' %/2&./2&f 6,*0$648$5(' *&$ 6,*0$648$5(' 6&$ 6,*0$648$5(' /2&[*&$ 6,*0$648$5(' /2&[6&$ 6,*0$648$5(' %/2&.[)$0 6,*0$648$5(' (5525 7KHVH YDULDQFH FRPSRQHQW HVWLPDWHV LOOXVWUDWH RXWSXWV IRU WKH UDQGRP PRGHO WKH PL[HG PRGHO DQG WKH DOWHUQDWLYHV IRU GHDOLQJ ZLWK QHJDWLYH HVWLPDWHV )L[HG HIIHFW HVWLPDWHV )L[HG HIIHFW HVWLPDWHV DUH 0RGHO ,% 08 /2&$7,21 /2&$7,21 %/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f DQG 0RGHO 08 7KH LQWHUSUHWDWLRQ RI IL[HG HIIHFW HVWLPDWHV IRU PRGHO ,% LV WKDW EORFNV WKURXJK EHORQJ ZLWK ORFDWLRQ DQG WKH IRXUWK EORFN LV VHW WR ]HUR %ORFNV WKURXJK DUH WKRVH RI ORFDWLRQ DQG WKH HLJKWK EORFN LV VHW WR ]HUR DV ZHOO DV ORFDWLRQ 6HWV RI EORFNV ZLWKLQ ORFDWLRQ FDQ DOZD\V EH GHWHUPLQHG E\ WKH ODVW EORFN ZLWKLQ D ORFDWLRQ EHLQJ VHW WR ]HUR 7KH LQWHUSUHWDWLRQ RI VHW WR ]HUR PAGE 110 LV 08 LV WKH PHDQ RI WKH IRXUWK EORFN ODEHOOHG EORFN f LQ ORFDWLRQ WZR DQG DQ\ HVWLPDEOH IXQFWLRQ RI WKH IL[HG HIIHFWV FDQ EH JHQHUDWHG IURP WKHVH HVWLPDWHV $Q H[DPSOH RI DQ HVWLPDEOH IXQFWLRQ ZRXOG EH WKH VLWH PHDQ RI ORFDWLRQ 7KLV PHDQ ZRXOG EH HVWLPDWHG DV 08 /2&$7,21 O%/2&./2&f %/2&./2&f %/2&./2&f %/2&./2&f f 08 RI PRGHO LV WKH HVWLPDWH RI WKH JHQHUDO PHDQ DFURVV VLWHV LI DOO RWKHU IDFWRUV DUH UDQGRP $OO RI WKHVH HVWLPDWHV DUH WKH UHVXOW RI JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWLRQ $V\PSWRWLF FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQWV 7KH DV\PSWRWLF FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQWV LQ PRGHO ,% ZRXOG DSSHDU DV $6<03727,& 9$5,$1&( &29$5,$1&( 0$75,; *&$ *&$ *&$ 6&$ *&$ /2&[*&$ *&$ /2&[6&$ *&$ %/2&.[)$0 *&$ (5525 6&$ 6&$ 6&$ /2&[*&$ 6&$ /2&[6&$ 6&$ %/2&.[)$0 6&$ (5525 /2&[*&$ /2&[*&$ /2&[*&$ /2&[6&$ /2&[*&$ %/2&.[)$0 /2&[*&$ (5525 /2&[6&$ /2&[6&$ /2&[6&$ %/2&.[)$0 /2&[6&$ (5525 %/2&.[)$0 %/2&.[)$0 %/2&.[)$0 (5525 (5525 (5525 7KLV PDWUL[ DV DUH DOO RWKHU PDWULFHV RXWSXW LV KDOIVWRUHG 7KH RXWSXW LV UHDG DV *&$ *&$ LV WKH DV\PSWRWLF YDULDQFH RI WKH JFD YDULDQFH FRPSRQHQW 7KH QH[W URZ ODEHOOHG *&$ 6&$ PAGE 111 LV WKH DV\PSWRWLF FRYDULDQFH EHWZHHQ WKH HVWLPDWHV RI WKH JFD YDULDQFH FRPSRQHQW DQG WKH VHD YDULDQFH FRPSRQHQW 7KXV WKH QH[W IRXU URZV DUH DV\PSWRWLF FRYDULDQFHV RI JFD YDULDQFH HVWLPDWHV ZLWK WKH RWKHU UDQGRP YDULDEOHV LQ WKH PRGHO 7KH RWKHU URZV DUH UHDG LQ D OLNH PDQQHU DQG LI WKH DQDO\VW ZLVKHG WR DUUD\ WKH RXWSXW DV D PDWUL[ DOO QHFHVVDU\ FRPSRQHQWV DUH DW KDQG 3UHGLFWLRQV RI UDQGRP YDULDEOHV $OO SUHGLFWLRQV RI UDQGRP YDULDEOHV DUH DSSURSULDWHO\ ODEHOOHG DFFRUGLQJ WR WKH FKDUDFWHU QDPH UHDG IURP WKH GDWD DQG IRU PRGHO ,% ZRXOG DSSHDU DV IURP WKH JFD RXWSXWf *&$ *&$ *&$ *&$ *&$ *&$ IURP WKH VHD RXWSXWf 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ 6&$ $OO WKHVH SUHGLFWLRQV DUH DSSUR[LPDWHO\ EHVW OLQHDU XQELDVHG SUHGLFWLRQV DQG DUH DSSUR[LPDWH EHFDXVH WKH YDULDQFH FRPSRQHQWV ZHUH HVWLPDWHG IURP WKH VDPH GDWD PAGE 112 (UURU FRYDULDQFH PDWUL[ RI WKH SUHGLFWLRQV 7KH HUURU FRYDULDQFH PDWUL[ RI WKH SUHGLFWLRQV LV RXWSXW DV D KDOIVWRUHG PDWUL[ ZLWK HDFK URZ DSSURSULDWHO\ ODEHOOHG 7KLV PDWUL[ IRU PRGHO ,% DSSHDUV DV 7+( (5525 9$5,$1&( &29$5,$1&( 0$75,; )25 *&$ $55$<(' $6 $ 9(&725 7KH ODEHOOLQJ RI WKH RXWSXW LV LQWHUSUHWHG LGHQWLFDOO\ WR WKDW IRU WKH DV\PSWRWLF YDULDQFH FRYDULDQFH PDWUL[ IRU WKH YDULDQFH FRPSRQHQWV 7KRVH URZV ZKLFK FRQWDLQ D SDUHQWDO QDPH WZLFH DUH WKH HUURU YDULDQFH IRU WKDW SDUHQWDO SUHGLFWLRQ DQG WKRVH URZV FRQWDLQLQJ WZR SDUHQWDO QDPHV DUH WKH HUURU FRYDULDQFH IRU WKH WZR SDUHQWDO SUHGLFWLRQV ,Q WKLV XQEDODQFHG FDVH WKH UHDGHU ZLOO VHH WKDW VRPH SDUHQWV KDYH PRUH HUURU DVVRFLDWHG ZLWK WKHLU SUHGLFWLRQV WKDQ RWKHUV LH FRPSDUH WKH HUURU IRU SDUHQW ZLWK SDUHQW 7KLV LV WUXH EHFDXVH RI WKH YDU\LQJ QXPEHU RI REVHUYDWLRQV DVVRFLDWHG ZLWK WKH SUHGLFWLRQ IRU HDFK SDUHQW DQG DOVR WKH YDU\LQJ GLVWULEXWLRQ RI WKRVH REVHUYDWLRQV DFURVV WHVWV DQG EORFNV ,I RQH DVVXPH WKDW WKH HVWLPDWH IRU JFD YDULDQFH IURP WKH PAGE 113 GDWD HTXDOV WKH WUXH YDULDQFH IRU JFD WKHQ WKH FRUUHODWLRQ RI WKH SUHGLFWLRQ ZLWK WKH WUXH YDOXH &RUUJJf :KLWH DQG +RGJH f IRU SDUHQW LV HTXDO WR 9O f RU (UURU FRYDULDQFH PDWUL[ IRU WKH IL[HG HIIHFWV 7KH HUURU FRYDULDQFH PDWUL[ IRU WKH IL[HG HIIHFWV LV RXWSXW DV D KDOIVWRUHG PDWUL[ 7KH RXWSXW LV QRW ODEHOOHG KRZHYHU RQH RQO\ KDV WR NQRZ WKH WRWDO QXPEHU RI OHYHOV IRU DOO IL[HG HIIHFWV WR DVVLJQ ODEHOV LI QHHGHG 7KH SULPDU\ XVH RI WKLV PDWUL[ LV WR HVWLPDWH WKH YDULDQFH RI HVWLPDEOH IXQFWLRQV RI WKH IL[HG HIIHFWV ,I GHQRWHV WKH YHFWRU FRQWDLQLQJ WKH VSHFLILFDWLRQ RI DQ HVWLPDEOH IXQFWLRQ DQG 9E GHQRWHV WKH HUURU FRYDULDQFH PDWUL[ IRU IL[HG HIIHFWV WKHQ WKH YDULDQFH RI DQ HVWLPDEOH IXQFWLRQ LV HTXDO WR Of9EO f IRU WKH PHDQ RI WHVW HTXDOV > @ &RQFOXVLRQV *$5(0/ LV DQ DQDO\WLFDO WRRO IRU XVH ZLWK PRGHOV FRPPRQ WR IRUHVW JHQHWLFV 7KH SURSHUWLHV RI WKH YDULDQFH FRPSRQHQW HVWLPDWLRQ DOJRULWKP KDYH EHHQ GRFXPHQWHG E\ VLPXODWLRQ VWXGLHV DQG WKH DOJRULWKP SUHVHQWV VROXWLRQV DV UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWHV 0DQ\ RWKHU RXWSXWV DUH DYDLODEOH IURP WKH SURJUDP LQFOXGLQJ EHVW OLQHDU XQELDVHG SUHGLFWLRQV JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV RI IL[HG HIIHFWV HUURU FRYDULDQFH PDWULFHV RI SUHGLFWLRQV DQG HVWLPDWHV DQG WKH DV\PSWRWLF FRYDULDQFH PDWUL[ IRU YDULDQFH FRPSRQHQW HVWLPDWHV *$5(0/ LV QRW LQWHQGHG WR EH XVHG DV D EODFN ER[ 7KH SURJUDP KDV PDQ\ SRWHQWLDO XVHV YDULDQFH FRPSRQHQW HVWLPDWLRQ SDUHQWDO HYDOXDWLRQ SURJHQ\ HYDOXDWLRQ DQG VLPXODWHG HYDOXDWLRQ RI PDWLQJ DQG ILHOG GHVLJQ +RZHYHU WKRXJKWIXO LQWHUSUHWDWLRQ RI WKH RXWSXWV LV QHHGHG LQ RUGHU WR UHDOL]H WKH SRZHU DQG XWLOLW\ RI WKH SURJUDP PAGE 114 &+$37(5 &21&/86,216 2SWLPDO PDWLQJ GHVLJQ IRU WKH GHWHUPLQDWLRQ RI JHQHWLF DUFKLWHFWXUH ZDV H[SORUHG *HQHUDO FRQFOXVLRQV ZHUH UHDFKHG WKURXJK FRPSDULVRQ RI WKH KDOIGLDOOHO KDOIVLE DQG FLUFXODU PDWLQJ GHVLJQV ,Q SDUWLFXODU WKH FRPSDULVRQ RI WKH KDOIGLDOOHO DQG FLUFXODU GHVLJQV LV SHUWLQHQW WR WKH HVWDEOLVKPHQW RI IXWXUH SURJHQ\ WHVWV LQ ZKLFK IXOOVLE IDPLOLHV DUH GHVLUHG $FURVV WKH H[SHULPHQWDO OHYHOV H[DPLQHG WKH FLUFXODU PDWLQJ GHVLJQ SURYLGHV PRUH HIILFLHQW HVWLPDWHV RI SDUDPHWHUV IRU JHQHWLF DUFKLWHFWXUH WKDQ WKH KDOIGLDOOHO GHVLJQ ,I DQ HVWLPDWH RI WKH YDULDQFH LQ JHQHUDO FRPELQLQJ DELOLWLHV LV UHTXLUHG WKH KDOIVLE GHVLJQ LV PRUH HIILFLHQW WKDQ WKH FLUFXODU PDWLQJ GHVLJQ RYHU PRVW RI WKH H[SHULPHQWDO OHYHOV H[DPLQHG 7KLV SDWWHUQ RI HIILFLHQF\ DUJXHV IRU FRPSOHPHQWDU\ PDWLQJ GHVLJQV LQYROYLQJ KDOIVLE GHVLJQV RSHQSROOLQDWHG RU SRO\FURVVf WR ZRUN HVWLPDWH JHQHUDO FRPELQLQJ DELOLW\ DQG D VHFRQG GHVLJQ IXOOVLE PDWLQJf WR JHQHUDWH FURVVHV IURP ZKLFK WR PDNH VHOHFWLRQV &RPSOLPHQWDU\ PDWLQJ GHVLJQV GR UHTXLUH D JUHDWHU PRQHWDU\ DQG WHPSRUDO FRPPLWPHQW ,I WKLV W\SH RI FRPPLWPHQW LV QRW MXVWLILHG RU SRVVLEOH WKHQ WKH FLUFXODU PDWLQJ GHVLJQ VKRXOG EH XVHG WR JHQHUDWH IXOOVLE IDPLOLHV DQG HVWLPDWH JHQHWLF SDUDPHWHUV VLPXOWDQHRXVO\ &RQVLGHULQJ ILHOG GHVLJQ LQ FRPELQDWLRQ ZLWK PDWLQJ GHVLJQ IXOOVLE GHVLJQV UHDFK PD[LPXP HIILFLHQF\ IRU JHQHWLF SDUDPHWHU HVWLPDWLRQ LQ IHZHU QXPEHUV RI UHSOLFDWHV DFURVV ORFDWLRQV WKDQ KDOIVLE GHVLJQV )RU DQ\ VSHFLILF FDVH RI ILHOG GHVLJQ DQG WKH KDOIVLE PDWLQJ GHVLJQ D SULRUL NQRZOHGJH RI WKH JHQHWLF DUFKLWHFWXUH LV UHTXLUHG WR FKRRVH WKH RSWLPDO ILHOG GHVLJQ IRU QXPEHU RI ORFDWLRQV PAGE 115 ,Q FDVHV ZKHUH PD[LPXP HIILFLHQF\ RI DQ H[SHULPHQWDO GHVLJQ LV REWDLQHG DQG WKH SUHFLVLRQ RI JHQHWLF SDUDPHWHU HVWLPDWHV LV VWLOO OHVV WKDQ GHVLUHG WKH RSWLPDO XVH RI H[SHULPHQWDO XQLWV ZRXOG EH GLVFRQQHFWHG VHWV RI H[SHULPHQWV DW PD[LPXP HIILFLHQF\ ZLWK WKH SDUDPHWHU HVWLPDWH WKHQ EHLQJ D PHDQ RI WKH HVWLPDWHV IURP WKH GLVFRQQHFWHG H[SHULPHQWV 2I WKH WKUHH PDWLQJ GHVLJQV RQO\ WKH KDOIGLDOOHO H[KLELWV HIILFLHQF\ RSWLPD IRU QXPEHU RI SDUHQWV 7KH RSWLPXP IRU QXPEHU RI SDUHQWV LQ KDOIGLDOOHOV LV DOZD\V FORVH WR DQG QHYHU ODUJHU WKDQ VL[ SDUHQWV ZLWK WKH IOXFWXDWLRQ UHVXOWLQJ IURP WKH JHQHWLF DUFKLWHFWXUH 7KXV IRU KDOIGLDOOHOV IRU PD[LPXP HIILFLHQF\ LQ JHQHWLF SDUDPHWHU HVWLPDWLRQ WKH QXPEHU RI SDUHQWV VKRXOG QRW H[FHHG VL[ DQG GHVLUHG SDUDPHWHU SUHFLVLRQ REWDLQHG E\ XVLQJ GLVFRQQHFWHG VHWV RI VL[ SDUHQWV 2SWLPD IRU QXPEHU RI ORFDWLRQV H[LVW IRU DOO PDWLQJ GHVLJQV DQG PD[LPXP HIILFLHQF\ ZRXOG DJDLQ EH REWDLQHG E\ UHSOLFDWLQJ DQ H[SHULPHQW RQO\ IRU WKH RSWLPDO QXPEHU RI ORFDWLRQV $ SDUDPHWHU HVWLPDWH RI WKH GHVLUHG SUHFLVLRQ ZRXOG EH FDOFXODWHG DV D PHDQ RI GLVFRQQHFWHG H[SHULPHQWV 2SWLPDO DQDO\VLV ZDV GHDOW ZLWK RQ WZR VWDJHV HVWLPDWLQJ SDUHQWDO ZRUWK DQG HVWLPDWLRQ RI YDULDQFH FRPSRQHQWV RU JHQHWLF DUFKLWHFWXUHf 7KH HVWLPDWLRQ RI SDUHQWDO ZRUWK ZDV H[DPLQHG IRU WKH KDOIGLDOOHO PDWLQJ GHVLJQ ,W LV DUJXHG RQ WKHRUHWLFDO JURXQGV DQG LQ JHQHUDOLW\ WKDW EHVW OLQHDU XQELDVHG SUHGLFWLRQ DQG EHVW OLQHDU SUHGLFWLRQ DUH PRUH VXLWHG WR WKH SUREOHP RI SDUHQWDO HYDOXDWLRQ WKDQ RUGLQDU\ OHDVW VTXDUHV 8VLQJ VLPXODWHG GDWD IRU WZR PDWLQJ GHVLJQV KDOIGLDOOHO DQG KDOIVLEf YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV ZHUH FRPSDUHG ZLWK YDU\ OHYHOV RI GDWD LPEDODQFH DQG WZR OHYHOV RI JHQHWLF FRQWURO ,Q HVWLPDWLQJ YDULDQFH FRPSRQHQWV RU JHQHWLF UDWLRV VXFK DV KHULWDELOLW\f IRXU FULWHULD ZHUH DGRSWHG IRU GLVFULPLQDWLRQ DPRQJ HVWLPDWLRQ WHFKQLTXHV SUREDELOLW\ RI QHDUQHVV ELDV PHDQ VTXDUH HUURU DQG YDULDQFH RI HVWLPDWLRQf 2I WKH IRXU RQO\ ELDV DQG YDULDQFH RI HVWLPDWLRQ SURYHG LQIRUPDWLYH %LDV SURYHG XVHIXO LQ GLVFULPLQDWLQJ DPRQJ WUHDWPHQWV RI QHJDWLYH HVWLPDWHV ZLWK DFFHSWLQJ DQG OLYLQJ ZLWK WKH QHJDWLYH HVWLPDWHV KDYLQJ WKH OHDVW ELDV UHVROYLQJ WKH V\VWHP PAGE 116 ZLWK QHJDWLYH HVWLPDWHV VHW WR ]HUR LQWHUPHGLDWH LQ ELDV DQG VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR SURGXFLQJ WKH PRVW ELDV 9DULDQFH RI HVWLPDWLRQ DOVR ZDV GLVFULPLQDWRU\ DPRQJ WUHDWPHQWV RI QHJDWLYH HVWLPDWHV ZLWK DFFHSWLQJ DQG OLYLQJ ZLWK QHJDWLYH HVWLPDWHV KDYLQJ WKH KLJKHVW YDULDQFH VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR LQWHUPHGLDWH LQ YDULDQFH DQG UHVROYLQJ WKH V\VWHP VHWWLQJ QHJDWLYH HVWLPDWHV WR ]HUR KDYLQJ WKH ORZHVW YDULDQFH 9DULDQFH RI HVWLPDWLRQ ZDV DOVR GLVFULPLQDWRU\ DPRQJ XQLWV RI REVHUYDWLRQ DQG YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV 2I WKH WZR XQLWV RI REVHUYDWLRQ XVHG LQGLYLGXDOV DQG SORW PHDQVf LQGLYLGXDO REVHUYDWLRQV SURGXFHG HVWLPDWHV ZLWK EHWWHU SURSHUWLHV DFURVV DOO OHYHOV RI LPEDODQFH PDWLQJ GHVLJQV DQG YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV 2I WKH YDULDQFH FRPSRQHQW HVWLPDWLRQ WHFKQLTXHV FRQWUDVWHG UHVWULFWHG PD[LPXP OLNHOLKRRG SURGXFHG HVWLPDWHV ZLWK WKH EHVW SURSHUWLHV ELDV DQG YDULDQFH RI HVWLPDWLRQf DFURVV DOO PDWLQJ GHVLJQV OHYHOV RI JHQHWLF FRQWURO DQG OHYHOV RI LPEDODQFH 7KHUHIRUH LW LV SURSRVHG WKDW UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWLRQ ZLWK LQGLYLGXDO REVHUYDWLRQV DV GDWD VKRXOG EH XWLOL]HG :LWK WKH UHFRPPHQGDWLRQ WR XVH UHVWULFWHG PD[LPXP OLNHOLKRRG WKH SURJUDP XVHG WR DQDO\]H WKH VLPXODWHG GDWD ZDV UHZULWWHQ LQWR D XVHU IULHQGO\ IRUPDW DEOH WR DQDO\]H ERWK IXOOVLE DQG KDOIVLE GDWD $GGLWLRQDO RXWSXWV RWKHU WKDQ YDULDQFH FRPSRQHQWVf ZHUH DOVR DGGHG DV RSWLRQV 7KHVH RXWSXWV LQFOXGH JHQHUDO DQG VSHFLILF FRPELQLQJ DELOLW\ SUHGLFWLRQV WKH DV\PSWRWLF FRYDULDQFH PDWUL[ IRU YDULDQFH FRPSRQHQWV JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV RI IL[HG HIIHFWV DQG WKH FRYDULDQFH PDWULFHV IRU SUHGLFWLRQV DQG HVWLPDWHV PAGE 117 R R $33(1',; )2575$1 6285&( &2'( )25 *$5(0/ &rrrrrr;+,6 SURJUDP 352'8&(6 5(0/ $1' 0,948( 9$5,$1&(rrrrrrrrrrrrr &rrrr&20321(17 (67,0$7(6 %< 67$57,1* ,7(5$7,21 )520 7+(rrrrrrrrrrr &rrrr758( 9$/8(6 2) 7+( 3$5$0(7(56 7+528*+ 7+( 86( 2)rrrrrrrrrrrrr &rrrrrrrrrrrrrrr4M(MJe(&+7f6 $/*25,*+70rrrrrrrrrrrrrrrrrrrrrrrrrrr & 3$5$0(7(56 '(7(50,1( 7+( 352*5$0 ',0(16,216 $1< &+$1*( ,1 & 3$5$0(7(5 6,=( '(&/$5$7,21 6+28/' %( */2%$/ 6,1&( 7+(< $5( & $/62 63(&,),(' ,1 7+( 68%5287,1(6 352*5$0 0$,1 3$5$0(7(5 & 12%6(5 ,6 7+( 0$;,080 180%(5 2) 2%6(59$7,216 1 12%6(5 & 12%/ ,6 7+( 0$;,080 180%(5 2) %/2&.6 3(5 /2&$7,21 1 12%/ & 12&5 ,6 7+( 0$;,080 180%(5 2) )8//6,% &5266(6 1 12&5 12%+ ,6 7+( 0$;,080 180%(5 2) ),;(' ())(&7 /(9(/6 ,1&/8',1* 7+( 0($1 1 12%+ & 19$5%+ ',0(16,216 7+( 9$5,$1&( &29$5,$1&( 0$75,; )25 ),;(' & ())(&76 1 19$5%+ 12%+r12%+ ff 12%+ & 12*&$ ,6 7+( 0$;,080 180%(5 2) 3$5(176 1 12*&$ & 129$5* ',0(16,216 7+( 9$5,$1&( &29$5,$1&( 0$75,; )25 *&$ 1 129$5* 12*&$r12*&$Off 12*&$ & 12; ,6 7+( 0$;,080 180%(5 2) &2/8016 )25 ),;(' ())(&76 3/86 & 5$1'20 ())(&76 & 3/86 21( )25 7+( '$7$ 1 12; & 12&%6 ,6 7+( 0$;,080 180%(5 2) /(9(/6 )25 7+( 5$1'20 ())(&7 & +$9,1* 7+( *5($7(67 180%(5 868$//< &5266 %< %/2&. 25 3/27 & &20%,1$7,216 1 12&%6 & 1727 ,6 7+( 727$/ 180%(5 2) &2/8016 2) 12; 3/86 12&%6 1 1727 12; 12&%6 & 27+(5 3$5$0(7(56 86( 7+( 35(9,286 '(&/$5$7,216 72 $//2&$7(' & 68)),&,(17 6,=( 72 6<00(75,& 0$75,&(6 6725(' $6 9(&7256 1 1,=(' 12;r12&%6 PAGE 118 1 1,;3; 12;r12; fff 12; 1 16,3 12; 12&%6 1 1,=(3 16,3r16,3 fff 16,3f &20021&01 1&2/71&2/7%1&2/*1&2/61&2/*71&2/6712%6 1 1&2/%1&2/;1&2/&%1&/f125$112),;1&/),; 1 1&/5$11&2/6(15$1f &20021&01 1 <494 PAGE 119 35,17 r f:$51,1* <28 +$9( -867 (17(5(' 7+( 7:,/,*+7 =21( 2) 19$5,$1&( &20321(176f 35,17 r f$16:(5 < )25 <(6 25 1 )25 12 72 7+( )2//2:,1* 48(67,216f :5,7(f )250$7 fIFrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr -Arrrrrrrrrrrrrr M $ :5,7(f )250$7f 3/($6( 75< $*$,1ff 35,17rf ),567 7+( )$&7256 72 %( 5($' )520 7+( '$7$ :,// %( '(7( 150,1('f 35,17 r f '2(6 7+( '$7$ +$9( 08/7,3/( /2&$7,216" 5($'f '7(50f ,)'7(50f1(f PAGE 120 ,) '7(50f(4f)ff 7+(1 *2 72 (1',) '7(50f f5f 35,17 r f :+$7 ,6 7+( 35,25 )25 /2&$7,21" 5($'f 35,,f )250$7)f ,) '7(50f(4f1ff *2 72 35,17 r f %/2&. ,6 ),;(' 25 5$1'20" f 5($'f '7(50f ,)'7(50f1(f)ff$1''7(50f1(f5fff 7+(1 :5,7(f *2 72 (1',) ,) '7(50f(4f)ff 7+(1 *2 72 (1',) '7(50f f5f 35,17 r f :+$7 ,6 7+( 35,25 )25 %/2&." 5($'f 35,,f ,) '7(50f(4f1ff *2 72 35,17 r f 6(76 $5( ),;(' 25 5$1'20" 5($'f '7(50f ,)'7(50f1(f)ff$1''7(50f1(f5fff 7+(1 :5,7(f *2 72 (1',) ,) '7(50f(4f)ff 7+(1 *2 72 (1',) '7(50f f5f 35,17 r f :+$7 ,6 7+( 35,25 )25 6(76" 5($'f 35,,f 35,17 r f $// 27+(5 )$&7256 $5( &216,'(5(' 5$1'20f 35,17 r f $16:(5 < )25 <(6 25 1 )25 12 )25 ,1&/86,21 2) 7+( )$&72 15 ,1 7+( 02'(/f :5,7(f 35,17 r f ,6 *&$ ,1 7+( 02'(/" f 5($'f '7(50f ,)'7(50f1(f PAGE 121 (1',) ,) '7(50f(4f1ff *2 72 & 35,17 r f *&$ ,6 ),;(' 25 5$1'20" & ,1387 r '7(50f & ,) '7(50f(4f)ff 7+(1 & & *2 72 & (1',) '7(50f f5f 35,17 r f :+$7 ,6 7+( 35,25 )25 *&$" 5($'f 35,,f ,)'80(4f+ff 7+(1 '7(50f f1f *2 72 (1',) 35,17 r f ,6 6& $ ,1 7+( 02'(/" 5($'f '7(50f ,)'7(50f1(f PAGE 122 35,17 r f :+$7 ,6 7+( 35,25 )25 /2&$7,21[*&$" 5($'f 35,,f ,)'80(4f+ff 7+(1 '7(50f f1f *2 72 (1',) 35,17 r f ,6 /2&$7,21[6&$ ,1 7+( 02'(/" 5($'f '7(50f ,)'7(50f1(f PAGE 123 )250$72 7+( 180%(5 2) ),;(' )$&7256 3/86 7+( 0($1 f 1f 7+( 180%(5 2) 5$1'20 )$&7256 3/86 (5525 ff 35,17 r f '2 7+(6( /(9(/6 0$7&+ <285 ,17(1'(' 02'(/" < 25 1 f 5($'f '80'80 ,) '80'80(4f1ff 7+(1 35,17 r f 5(7851,1* 72 ,1,7,$/,=$7,21 2) 02'(/f 35,17 r f 72 (;,7 352*5$0 86( &21752/%5($.f *2 72 (1',) 35,17 r f 7+( ,1387 '$7$ 6(7 1$0( ,6 5($'f )/1$0( )250$7$f :5,7(f )250$72 7+( )250$7 2) 7+( '$7$ ,6 5(0(0%(5,1* 3$5(17+(6(6ff 5($',f )0$7 )250$7$f 23(1 ),/( )/1$0(67$786 f2/'ff 12%6 ,)'80(4f+ff *2 72 ,)'7(50f(4f1ff $1' '7(50f (4f1ff $1' '7(50f 1(4f1fff *2 72 ,)'7(50f(4f1ff$1''7(50f(4f1fff *2 72 ,)'7(50f(4f1ff$1''7(50f(4f1fff *2 72 ,)'7(50f(4f1ff *2 72 ,)'7(50f(4f1ff$1''7(50f(4f1fff *2 72 ,)'7(50f(4f1ff *2 72 ,)'7(50f(4f1ff *2 72 5($' )07 )0$7(1' f 7(6712%6f%/2&.12%6f6(712%6f 1 )12%6f012%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f%/2&.12%6f)12%6f012%6f 1 0($112%6f *2 72 5($' )07 )0$7(1' f %/2&.12%6f6(712%6f)12%6f012%6f 1 0($112%6f *2 72 5($' )07 )0$7(1' f )12%6f012%6f0($112%6f *2 72 5($' )07 )0$7(1' f 6(712%6f)12%6f012%6f0($112%6f *2 72 5($' )07 )0$7(1' f %/2&.12%6f)12%6f012%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f)12%6f012%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f6(712%6f)12%6f012%6f 1 0($112%6f *2 72 ,)'7(50f (4f1ff $1' '7(50f (4f1ff $1' '7(50f PAGE 124 1(4f1fff *2 72 ,)'7(50f(4f1ff$1''7(50f(4f1fff *2 72 ,)'7(50f(4f1ff$1''7(50f(4f1fff *2 72 ,)'7(50f(4f1ff$1''7(50f(41fff *2 72 ,)'7(50f(4f1ff *2 72 ,)'7(50f(4f1ff *2 72 5($' )07 )0$7(1' f 7(6712%6f%/2&.12%6f6(712%6f 1 )12%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f%/2&.12%6f)12%6f 1 0($112%6f *2 72 5($' )07 )0$7(1' f )12%6f0($112%6f *2 72 5($' )07 )0$7(1' f 6(712%6f)12%6f0($112%6f *2 72 5($' )07 )0$7(1' f %/2&.12%6f)12%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f)12%6f0($112%6f *2 72 5($' )07 )0$7(1' f 7(6712%6f6(712%6f)12%6f 1 0($112%6f 12%6 12%6 *2 72 12%6 12%6 &/26(Of :5,7(f 12%6 )250$7f 7+( 180%(5 2) 2%6(59$7,216 ,6 ff ,)'80(4f+ff *2 72 '2 12%6 )0,f ),f0,f &217,18( '2 ,)'7(50,f(4f1ff *2 72 ,)'7(50,f(4f5ff 7+(1 . 5$11$0.f 1$0(,f (1',) &217,18( 5$11$0. f 1$0(f '2 O12&5 )09(&,f f &217,18( '2 ,)35,,f*7f 7+(1 - PAGE 125 6,*-f 35,,f (1',) &217,18( 1&2/7 1&2/% 1&2/6( 1&2/7% 1&2/* 1&2/6 1&2/*7 1&2/67 1&2/&% ,)'7(50f(4f1ff *2 72 &$// 12&2/7(6712%6/2&21&2/7f 1&/f 1&2/7 ,)'7(50f(4f1ff *2 72 &$// 12&2/%/2&.12%65(31&2/%f ,)'7(50f(4f1ff *2 72 &$// 12&2/6(712%6',66(71&2/6(f 1&/f 1&2/6( ,)'7(50f(41ff$1''7(50f(4f PAGE 126 1 '2 11&2/6 ,))09(&.f/7)09(&-ff *2 72 17 )09(&.f )0 9(&.f )0 9(&-f )09(&-f 17 &217,18( ,)'80(4f+ff 1&2/6 1&/f 1&2/6 1&2/67 1&2/6 r1&2/7 1&2/*7 1&2/* r1&2/7 1&2/&% 1&2/6 r1&2/7% ,)'80(4f+ff 1&2/&% 1&2/*r1&2/7% ,)'7(50f(4 f 1ff 1&2/*7 ,)'7(50f(4f1ff 1&2/67 ,)'7(50f(4f1ff 1&2/&% 1&/f 1&2/*7 1&/f 1&2/67 1&/f 1&2/&% :5,7(f 1&2/* )250$72 180%(5 2) 3$5(176 ,6 ff :5,7(f 1&2/6 )250$7f 180%(5 2) )8//6,% &5266(6 ,6 ff 1&/),; 1&/5$1 '2 ,)'7(50,f(4f)ff 7+(1 1&/),; 1&/),; 1&/,f *2 72 (1',) 1&/5$1 1&/5$1 1&/,f &217,18( :5,7(f 1&/),;1&/5$1 )250$72 ),;(' ())(&7 &2/8016 f 1f 5$1'20 ())(&7 &2/8016 ff &9(5* 35,17 rf 7+( &219(5*(1&( &5,7(5,21 )25 9$5,$1&( &20321(176 :+,&+ 1(48$/6f 35,17 rf 7+( 680 2) 7+( $%62/87( '(9,$7,216 ,6 6(7 72 f 35,17 rf ,) <28 :,6+ 72 &+$1*( 7<3( < ,) 127 7<3( 1 f 5($'f '80'80 )250$7$Of ,)'80'80(4f1ff *2 72 35,17rf 7+( &219(5*(1&( &5,7(5,21 ,6 f 5($'f &9(5* 1&2/; 1&/),; 1&/5$1 12,76 PAGE 127 35,17rf 7+( 180%(5 2) ,7(5$7,216 $//2:(' ,6 6(7 72 f 35,17rf '2 <28 :,6+ 72 &+$1*( 7+,6" < 25 1f f 5($'f '80'80 ,)'80'80(4f PAGE 128 &$// /6:36/15$115$1 125$1f '2 O125$1 5(0/,f 62/,125$1 f &217,18( =$* '2 O125$1 =$* =$* '$%65(0/f'80,ff &217,18( '2 O125$1 6,*,f 5(0/,f &217,18( ,)=$*/7&9(5*f *2 72 &217,18( ,)'80%(4f1ff *2 72 ,)'80%(4f PAGE 129 ,)'80'80(4f PAGE 130 '2 ,)'7(50,f(4f)ff 7+(1 '2 1&/,f ,),(4Of 7+(1 :5,7()07 f /2&2.f%+$7-f (1',) ,)(4f 7+(1 :5,7()07 f ',66(7.f%+$7-f (1',) :5,7()07 f 1$0(,f.%+$7-f &217,18( (1',) &217,18( )250$7$7)f )250$7$,)f &/26(f '2 O129$5* 9$5*,f &217,18( '2 19$5%+ 9$5%+,f &217,18( 35,17 '2 <28 '(6,5( 7+( $6<03727,& 9$5,$1&( &29$5,$1&(f 35,17 0$75,; )25 9$5,$1&( &20321(176" < 25 1f f 5($'f '80'80 ,)'80'80(4f1ff *2 72 35,17 rf :+$7 ,6 7+( ),/(1$0( )25 9$59&f" f 5($'f )/1$0( 23(1),/( )/1$0(67$786 f81.12:1ff :5,7(f )250$7f $6<03727,& 9$5,$1&( &29$5,$1&( 0$75,;ff '2 O125$1 '2 ,125$1 62/,-f 62/,-fr :5,7(f 5$11$0,f5$11$0-f62/,-f &217,18( &217,18( )250$7$ 7$ 7)f 35,17rf'2 <28 '(6,5( 7+( (5525 9$5,$1&( &29$5,$1&( 0$75,; )25 1*&$" < 25 1f f 5($'f '80'80 ,)'80'80(4f1ff *2 72 35,17 rf :+$7 ,6 7+( ),/(1$0( )25 (9$5*+$7f" f 5($'f )/1$0( 23(1),/( )/1$0(67$786 f81.12:1ff &$// 9$5;9$5*9$5%+f :5,7( f PAGE 131 . '2 O1&2/* '2 ,1&2/* . :5,7(f 9$5*.f3$5(17,f3$5(17-f &217,18( &217,18( )250$7f7+( (5525 9$5,$1&( &29$5,$1&( 0$75,; )25 *&$ $55$<(' 1$6 $ 9(&725ff )250$7)7$7$f &/26(,2f 35,17 r f '2 <28 '(6,5( 7+( 9$5,$1&( &29$5,$1&( 0$75,; )25 ),;(' 1 ())(&76" < 25 1f f 5($'f '80% ,)'80%(4f1ff *2 72 ,)'80'80(4f1ff &$// 9$5;9$5*9$5%+f 35,17 r f :+$7 ,6 7+( ),/(1$0( )25 9$5%(7$+$7f" f 5($'f )/1$0( 23(1OO),/( )/1$0(67$786 f81.12:1ff '2 1&/),; '2 ,1&/),; . :5,7(f 9$5%+.f &217,18( &217,18( )250$7)f &/26(OOf 6723 (1' errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & 68%5287,1( /6:3 6:36 7+( '(6,*1$7(' &2/8016 2) $ 0$75,; ; $1' & 5(78516 7+( 6:(37 0$75,; $6 ; 68%5287,1( /6:3;152:;1&2/;;167$1(1'f ,17(*(5 152:;1&2/;;167$1(1'1727 '28%/( 35(&,6,21 ;f '0,1 % %%f & 16:3 '(),1(6 7+( 3,927 &2/8016 )25 6:3 '0,1 ,( & ,) /(66 7+$1 )8// 5$1. 0$75,&(6 $5( (1&2817(5(' '0,1 0867 %( & (03/2<(' & 72 =(52 7+( 52: $1' &2/801 $662&,$7(' :,7+ 7+( '(3(1'(1&< 72 & 352'8&( $ *(1(5$/,=(' ,19(56( '2 167$1(1' ;..f ,) '/('0,1f 7+(1 '2 O152:; '2 O1&2/;; ;,.f PAGE 132 ;.-f &217,18( &217,18( *2 72 (1',) '2 O1&2/;; ;.-f ;.-f' &217,18( '2 O152:; & 6+28/' %( ,1&5(0(17(' 62 7+$7 ,6 127 (48$/ 72 % ;,.f '2 / O1&2/;; ;,/f ;,/f%r;./f &217,18( ;,.f %' &217,18( ;..f & %$&.:$5' (/,0,1$7,21 1727 167$ 1(1' ,)1727(4f *2 72 & 6$9,1* $%29( ',$*21$/ (175,(6 )25 08/7,3/,&$7,21 :(,*+76 .. '2 %%..f ;-.f .. .. &217,18( & =(52,1* $%29( ',$*21$/ (175,(6 )25 ,16(57,21 2) ,19(56( 9$/8(6 '2 .O ;,.f &217,18( & '2,1* 52: 23(5$7,216 72 &5($7( $%29( ',$*21$/ (175,(6 )25 ,19(56( 1 '2 0 % %%1f 1 1 '2 O1&2/;; ;0-f ;0-f%r;.-f &217,18( &217,18( &217,18( 5(7851 (1' & '(6,*1 &5($7(6 '(6,*1 0$75,&(6 )25 0$,1 ())(&76 $1' ,17(5$&7,216 & $1' )2506 7+( 1250$/ (48$7,216 68%5287,1( '(6,*1 3$5$0(7(5 PAGE 133 1 12%6(5 1 12%/ 1 12&5 1 12%+ 1 12*&$ 1 12; 1 12&%6 1 1727 12; 12&%6 1 1,=(' 12;r12&%6 1 1,;3; 12;r12; fff 12; 1 16,3 12; 12&%6 1 1,=(3 16,3r16,3 fff 16,3f &20021&01 1&2/71&2/7%1&2/*1&2/61&2/*71&2/6712%6 1 1&2/%1&2/;1&2/&%1&/f125$112),;1&/),; 1 1&/5$11&2/6(15$1f &20021&01 1 <494 PAGE 134 '2 1:180 3,f &217,18( errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & )250,1* 7+( 0$75,; 72 %( 6:3 72 352'8&( <494< $1' 9494 errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & 7. ;fr,199.fr; &203/(7(' Afr&rrrrrrrrrrWrrrrrrrrrrWrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 167$. 1&/),; 1&/5$1 '2 ,180 O125$1O 12'80 125$1,180 1&2/5' 1&2/; 15$112'80f 167$. 167$.15$112'80f 1(1'. 167$. 15$112'8 0f '2 167$.1(1'. 0 19(&, 1&2/;f ,, ,167$. 1 19(&,,1&2/5'f 11 1 '2 ,1(1'. 0 0 11 11 311f 7.0fr6,*12'80f &217,18( 31 f 31 f &217,18( & 5 6,*,fr=Lfr,199.fr=Lf +$6 %((1 )250(' '2 167$.1(1'. '2 1&2/; . ,)-/7,f 7+(1 '.f *2 72 (1',) 0 19(&, 1&2/;f 0 0 -, '.f 7.0fr64576,*12'80ff &217,18( &217,18( '2 167$.1(1'. 1 19(&,1&2/;f ,, ,167$. 11 1&2/;r,,Of '2 ,1&2/; PAGE 135 R R 0 1-, 11 '.f 7.0fr64576,*12'80ff &217,18( &217,18( eWrWrr_W_WWrrrrWWrrWrrrrrrWWrWWWWrrrrrrrrrrrrrrrrrrrrWrrrrrWrr & =Lfr,199.fr;r64576,*,ff +$6 %((1 )250(' e!rrrrrrrrrrrrr_rrWrrWLrrrrrrrrrrrrrrrrrrrrWWW_rOrrrrrrrrrrrrrr Arrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 7' 'f _Wr_rrrW_rrr_WrrrrWOFWWrrWWrrrrrrrrrr__rrrrrrrWWWrrrrrrrrrrrrIF 1(1' 15$112'80f '2 O15$112'80f 1 19(&,1&2/5'f '2 15$112'80f O1&2/5' . 0 1-, 30f '.f &217,18( &217,18( '2 15$112'80f O1&2/5' 19(&,1&2/5'f ,, ,15$112'80f 0 19(&,,1&2/;f '2 ,1&2/5' . 0 0 3.f 7.0f &217,18( &217,18( & 3 5_ c'f7'c @7.f &$// 9(&6:331&2/5'1&2/5' 15$112'80ff '2 O1&2/; ,, 15$112'80f 0 19(&,,1&2/5'f '2 ,1&2/; . 0 0 7..f 30f &217,18( &217,18( &217,18( '2 O1&2/; PAGE 136 ,, 15$1f 0 19(&,,1&2/5'f '2 ,1&2/; . 0 0 7..f 30f &217,18( &217,18( 1'),; 1&/),; &$// 9&6:37.1&2/;1&2/; 1&/),;1'),;f errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & 3257,216 2) 7. $5( 6(/(&7(' $1' 08/7,3/,(' $1' 7+( 75$&( &$/&8 & /$7(' 72 )250 9494 errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr &rrrrrrrrrrrrrrrrre4/MMMAM>VM M \4 125$1 2) 9494rrrrrrrrrrrrrrrrrrrrr 1(1' 1&/),; '2 O125$1O 167$ 1(1' 1(1' 167$ 15$1-f 75 167$. 1(1' '2 -125$1O ,)D(4-f 7+(1 '2 167$1(1' 1 19(&,,1&2/;f '2 ,,1(1' 0 1 .,, ,),,(4.f 7+(1 75 75 7.0fr7.0f *2 72 (1',) 75 75 r7.0fr7.0f &217,18( &217,18( 9494-,f 75 *2 72 (1',) 1(1'. 167$. 15$1,f 75 '2 / 167$1(1' 1 19(&/1&2/;f '2 167$.1(1'. 0 1 ./ 75 75 7.0fr7.0f &217,18( &217,18( 167$. 1(1'. 9494-,f 75 PAGE 137 &217,18( &217,18( ArrrrrrrrrrrrrrrAf4MAAMMA>L\M QRUDQ RI YTYTrrrrrrrrrrrrrrrrrrrrrrrrrrrr '2 125$1 75$&(5p '2 ,125$1O 9494-,f 9494,-f &217,18( &217,18( 167$ 1&/),; '2 O125$1 1(1' 167$ 15$1-f '2 167$1(1' 1 19(&,1&2/;f 1 1 75$&(5-f 75$&(5-f 7.1f &217,18( 167$ 1(1' &217,18( '2 O125$1O 9494,125$1f 75$&(5,f &217,18( 68% '2 O125$1O 68% 68% 75$&(5,fr6,*,f '2 O125$1O 9494IO125$1f 9494,125$1f6,*-fr9494,-ff &217,18( 9494,125$1f 9494,125$1f6,*125$1f &217,18( 167$. 12%61'),; 75 )/2$7167$.f 9494125$1125$1f 7568%f6,*125$1f '2 125$1 9494125$1125$1f 9494125$1125$1f6,*,fr9494,125$1ff &217,18( 9494125$1125$1f 9494125$1125$1f6,*125$1f '2 O125$1O 9494125$1,f 9494,125$1f &217,18( Frrrrrrrrrrrrr)50L1T 9(&725 2) ),;(' ())(&76 (67,0$7(6rrrrrrrrr '2 1&/),; 1 19(&,1&2/;f 1 1 1&/),;, %+$7,f 7.1f &217,18( errrrrrrrrrrrr)45@PMYM4 9(&7256 2) SUHGLFWLRQVrrrrrrrrrrrrrr '2 PAGE 138 ,)5$11$0,f(4f*&$ff 7+(1 167$ *2 72 (1',) &217,18( *2 72 1(1' '2 167$ 1(1' 1(1' 15$1,f &217,18( / 1(1' 1 19(&1&/),; 1&2/;f / / 1 '2 O1&2/* / / *&$,f 7./f &217,18( '2 ,)5$11$0,f(4f6&$ff 7+(1 167$ *2 72 (1',) &217,18( *2 72 1(1' '2 167$ 1(1' 1(1' 15$1,f &217,18( / 1(1' 1 19(&1&/),; 1&2/;f / / 1 '2 O1&2/6 / / 6&$,f 7./f &217,18( errrWrrrrrrrrrS4-A>\---A4 <494 PAGE 139 <494<125$1f 7.167$f '2 O125$1O <494<125$1f <494<125$1f6,*,fr<494<,ff &217,18( <494<125$1f <494<125$1f6,*125$1f '($//2&$7( 7.'3f 5(7851 (1' Frrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & 7+,6 )81&7,21 &28176 7+( 180%(5 2) (175,(6 )25 $1 ())(&7 68%5287,1( 12&2/9(&2%69(& 1&2/f 3$5$0(7(5 1 12%6(5 f ,17(*(5 2%61&2/ &+$5$&7(5r 9(&12%6(5f9(& rf=;17 '2 2%6 ,)D(4Of 7+(1 Y(FLDf Y(FDf 1&2/ *2 72 (1',) '2 O1&2/ ; 9(&Df = 9(& -f ,);(4=f *272 &217,18( 1&2/ 1&2/ 9(& 1&2/f 9(&,f &217,18( '2 O1&2/O 1 '2 11&2/ ,)9(&.f/79(&-ff *2 72 17 9(&.f 9(& .f 9(& -f 9(&-f 17 &217,18( 5(7851 (1' Frrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & 7+,6 )81&7,21 &28176 7+( 180%(5 2) (175,(6 )25 3$5(176 68%5287,1( 123$59(&O9(&2%69(&13$5f 3$5$0(7(5 1 12%6(5 1 12*&$ f ,17(*(5 2%613$5 &+$5$&7(5r 9(&O12%6(5f9(&12%6(5f9(&12*&$f<=;17 '2 2%6 PAGE 140 ,)IO(4Of 7+(1 9(&,f 9(&,f 9(&,f 9(&,f 13$5 *2 72 (1',) '2 13$5 ; 9(&,f = 9(&-f ,);(4=f *2 72 &217,18( 13$5 13$5 9(&13 $5f 9(& ,f '2 13$5 < 9(&f = 9(&.f ,)<(4=f *272 &217,18( 13$5 13$5 9(&13$5f 9(&,f &217,18( '2 13$5 1 '2 113$5 ,)9(&.f/79(&-ff *2 72 17 9(&.f 9(&.f 9(&-f 9(&-f 17 &217,18( 5(7851 (1' \rrrr &rr9(&6:3 352'8&(6 $ ,19(56( 2) $ 6<00(75,& 0$75,; 6725(' $6rr A 9(&725 r rrrrrr r r r r r rrrrr r r r r r r r r r r r r errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 68%5287,1( 9(&6:39(&152:;1&2/;;167$1(1'f 3$5$0(7(5 1 12%6(5 1 12%/ 1 12&5 1 12%+ 1 12*&$ 1 12; 1 12&%6 1 1727 12; 12&%6 1 1,=(' 12;r12&%6 1 1,;3; 12;r12; fff 12; 1 16,3 12; 12&%6 PAGE 141 1 1,=(3 16,3r16,3Offf 16,3f ',0(16,21 9(&rf9f $//2&$7$%/( 9 ,17(*(5 152:;1&2/;;167$1(1'180%19(&9180O180 '28%/( 35(&,6,21 9(&'0,1'%& $//2&$7( 91&2/;;ff '0,1 '2 O1&2/;; YDf L &217,18( '2 167$1(1' 180 .r.ff 1&2/;;r.Of 180% 180 9(&180f ,) '$%6'f/('0,1f 7+(1 '2 ,),(4.f 7+(1 180 ,r,ff 1&2/;;r,Of *2 72 (1',) 180 ,r, ff .1&2/;;r,f 9(&180f &217,18( 180 180% '2 O1&2/;; 180 180 9(&180f &217,18( *2 72 (1',) '2 O152:; ,),(4.f *2 72 180 19(&,1&2/;;f ,),/7.f 7+(1 180 180 ., % 9(&180f' *2 72 (1',) 180 180% ,. % )/2$79,ffr)/2$79.ffr9(&180ff' ,)'$%6%f/7'ff *2 72 '2 ,1&2/;; ,)-(4.f *2 72 ,)./7-f 7+(1 180 180% -. & 9(&180f *2 72 (1',) PAGE 142 180 -r-Off 1&2/;;r-Of & )/$79-ffr)/$79.ffr9(&180f ,)'$%6&f/7'ff *2 72 180 180 9(&180f 9(&180f%r&f &217,18( &217,18( '2 .1&2/;; 180 180% -. 9(&18 0f 9(&180f' &217,18( '2 ,),(4.f 7+(1 180 DrDff 1&2/;;r,f *2 72 (1',) 180 ,r,Off 1&2/;;r,Of 9(&180f 9(&180f' &217,18( 9(&180% f 9 .f 9 .f &217,18( '($//2&$7( 9f 5(7851 (1' &rr9&6:3 352'8&(6 $ ,19(56( 2) $ 6<00(75,& 0$75,; 6725(' $6rr A r rr `F IF rrr rrrArr A 9(&725 r r r r r r r r r r rrA rrrrrr r r r r r r rrrrrr r emrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 68%5287,1( 9&6:39(&152:;1&2/;;167$1(1'1')f 3$5$0(7(5 1 12%6(5 1 12%/ 1 12&5 1 12%+ 1 12*&$ 1 12; 1 12&%6 1 1727 12; 12&%6 1 1,=(' 12;r12&%6 1 1,;3; 12;r12; fff 12; 1 16,3 12; 12&%6 1 1,=(3 16,3r16,3Offf 16,3f ',0(16,21 9(&rf9f $//2&$7$%/( 9 ,17(*(5 152:;1&2/;;167$1(1'180%19(&9180O1801') '28%/( 35(&,6,21 9(&'0,1'%& '0,1 ' PAGE 143 $//2&$7( 91&2/;;ff '2 O1&2/;; 9,f &217,18( '2 167$1(1' 180 .r.ff 1&2/;;r.f 180% 180 9(& 18 0f ,) '$%6'f/('0,1f 7+(1 1') 1') '2 ,),(4.f 7+(1 180 ,r,ff 1&2/;;r,f *2 72 (1',) 180 ,r,'f 1&2/;;r,Of 9(&180f &217,18( 180 180% '2 .O1&2/;; 180 180 9(& 18 0f &217,18( *2 72 (1',) '2 O152:; ,),(4.f *2 72 180 19(&,1&2/;;f ,),/7.f 7+(1 180 180 ., % 9(&180f' *2 72 (1',) 180 180% ,. % )/2$79,ffr)/2$79.ffr9(&180ff' ,)'$%6%f/7'ff *2 72 '2 ,1&2/;; ,)-(4.f *2 72 ,)./7-f 7+(1 180 180%-. & 9(&180f *2 72 (1',) 180 rff 1&2/;;r-f & )/2$79-ffr)/2$79.ffr9(&180f ,)'$%6&f/7'ff *2 72 180 180 9(&180f 9(&180f%r&f PAGE 144 &217,18( &217,18( '2 .1&2/;; 180 180% -. 9(&180f 9(&180f' &217,18( '2 ,),(4.f 7+(1 180 DrDff 1&2/;;r,Of *2 72 (1',) 180 ,r, ff 1&2/;;r,Of 9(&180f 9(&180f' &217,18( 9(&180% f 9 .f 9 .f &217,18( '($//2&$7( 9f 5(7851 (1' 4rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr &rrrrrr19(& &28176 7+( 3523(5 326,7,21 2) $1 (/(0(17rrrrrrr &rrrrrrrrr,1 7+( +$/) 6725(' 0$75,; $6 $ 9(&725frrrrrrrrrr &rrrrrrr$&&5',1* 72 ,76 1250$/ 52: &2/801 326,7,21rrrrrrrr &rrrrrrrrrrrrrrrrrMQ WKH RULJLQDO PDWUL[rrrrrrrrrrrrrrrrrrr )81&7,21 19(&152:61&2/;;f ,17(*(5 152:61&2/;;19(& 0 '2 O152:6 ,)D(4Of *2 72 0 0 1&2/;; f &217,18( 19(& 0 5(7851 (1' errrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 68%5287,1( ;35,0;7(67%/2&.6(7)0)0f 3$5$0(7(5 1 12%6(5 1 12%/ 1 12&5 1 12%+ 1 12*&$ 1 12; 1 12&%6 1 1727 12; 12&%6 1 1,=(' 12;r12&%6 PAGE 145 1 1,;3; 1 2; r 1 2; fff 12; 1 16,3 12; 12&%6 1 1,=(3 16,3r16,3 fff 16,3f &20021&01 1&2/71&2/7%1&2/*1&2/61&2/*71&2/6712%6 1 1&2/%1&2/;1&2/&%1&/f125$112),;1&/),; 1 1&/5$11&2/6(15$1f &20021&01 1 <494 PAGE 146 ,)7(67,f(4/2&2-ff 7+(1 10/9 *2 72 (1',) &217,18( ;,1-f &217,18( /2&OOf 0/9 0/9 0/9 1&2/7 /2&Of 0/9 & )250,1* '(6,*1 0$75,; )25 %/2&. ,)'7(50Of(4f1ff25'7(50f(4f5fff *2 72 '2 12%6 '2 O1&2/% ,)%/2&.,fÂ‘ (45(3-ff 7+(1 1. *2 72 (1',) &217,18( '%/2&.,1.f &217,18( 167$ /2&OOf 1(1' /2&Of ,)'7(50f(4f1ff 7+(1 167$ 1(1' (1',) '2 12%6 / 0/9 '2 167$1(1' '2 O1&2/% ;,/f ;,-fr'%/2&.,.f / / &217,18( &217,18( &217,18( /2&Of 0/9 0/9 0/9 1&2/7% /2&f 0/9 ,)2'7(50D2(4f1A25&'7(50A(4f5fff *272 '2 12%6 '2 O1&2/6( ,)6(7,f(4',66(7-ff 7+(1 1. 0/9 *2 72 (1',) &217,18( ;,1.f PAGE 147 &217,18( /2&f 0/9 0/9 0/9 1&2/6( /2&f 0/9 0/9 0/9 ,)'7(50OOf(4f1ff25'7(50Of(4f)fff *2 72 '2 12%6 & )250,1* '(6,*1 0$75,; )25 7(67 '2 O1&2/7 ,)7(67,f(4/2&2-ff 7+(1 10/9 *2 72 (1',) &217,18( ;,1-f &217,18( /2&OOf 0/9 0/9 0/9 1&2/7 /2&Of 0/9 & )250,1* '(6,*1 0$75,; )25 %/2&. ,)'7(50Of(4f1ff25'7(50f(4f)fff *2 72 '2 12%6 '2 O1&2/% ,)%/2&.,f(45(3-ff 7+(1 1. *2 72 (1',) &217,18( '%/2&.,1.f &217,18( 167$ /2&OOf 1(1' /2&O f ,)'7(50f(4f1ff 7+(1 1(1' 167$ (1',) '2 12%6 / 0/9 '2 167$1(1' '2 O1&2/% ;,/f ;,-fr'%/2&.,.f / / &217,18( &217,18( &217,18( /2&Of 0/9 0/9 0/9 1&2/7% /2&f 0/9 PAGE 148 ,)2'7(5022(4f1225&'7(50A(4f)fff *2 72 '2 12%6 '2 O1&2/6( ,)6(7f(4',66(7-ff 7+(1 1. 0/9 *2 72 (1',) &217,18( ;,1.f &217,18( /2&Of 0/9 0/9 0/9 1&2/6( /2&f 0/9 & )250,1* '(6,*1 0$75,; )25 *&$ ,)'7(50f(4f1ff *2 72 '2 12%6 '2 O1&2/* ,)),f(43$5(17-ff 7+(1 1/ 0/9 *2 72 (1',) &217,18( ;,1/f ,)'80(4f+ff *2 72 '2 O1&2/* ,)0,f(43$5(17.ff 7+(1 11 .0/9 *2 72 (1',) &217,18( ;,11f &217,18( /2&Of 0/9 0/9 0/9 1&2/* /2&f 0/9 ,)'7(50f(4f1ff *2 72 167$ 0/9 '2 12%6 '2 O1&2/6 ,))0,f(4)09(&-ff 7+(1 ;,167$f *2 72 (1',) &217,18( &217,18( /2&Of 0/9 0/9 0/9 1&2/6 /2&f 0/9 PAGE 149 ,)'7(50Of(4f1ff25'7(50OOf(4f1fff *2 72 167$ /2&OOf 1(1' /2&Of 167$. /2&Of 1(1'. /2&f '2 12%6 / 0/9 '2 167$1(1' '2 167$.1(1'. ;,/f ;,-fr;,.f / / &217,18( &217,18( &217,18( 0/9 0/9 1&2/*7 ,)'7(50Of(4f1ff25'7(50OOf(4f1fff *2 72 167$. /2&Of 1(1'. /2&f '2 12%6 / 0/9 '2 167$1(1' '2 167$.1(1'. ;,/f ;,-fr;,.f / / &217,18( &217,18( &217,18( 0/9 0/9 1&2/67 ,)'7(50Of(41f25'7(50Of(4f1fff *2 72 167$ /2&Of 1(1' /2&f 167$. /2&Of 1(1'. /2&f ,)'80(4f+ff 7+(1 167$. /2&Of 1(1'. /2&f (1',) '2 12%6 / 0/9 '2 167$1(1' '2 167$.1(1'. [D/f [DMfr[D.f / / &217,18( &217,18( &217,18( ArrrrrrrrrrrrrrrrrrrrrrrrrrrArrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr & ; 08c _+7_ _7_ c7%c _*_ c6c c *7c c 67c _&% &203/(7(' PAGE 150 Arrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr '($//2&$7( '%/2&.f 35,17r frrrrrrr),1,6+(' )250,1* 7+( '(6,*1 0$75,;rrrrrrrrrrf 35,17r frrrrrrr12: &+(&.,1* )25 18// &2/8016rrrrrrrrrrrrrrrf 1(1' 1&/),; 10,66 '2 O125$1O 167$ 1(1' 1(1' 167$ 15$1.f '2 167$1(1' '2 12%6 ,);D-f1(f *2 72 &217,18( 15$1.f 15$1.f 10,66 10,66 18/9(&10,66f &217,18( &217,18( 35,17rfrrrrrrrrrrr),1,6+(' &+(&.,1* )25 18// &2/8016rrrrrrrrrf :5,7(f 10,66 )250$7I 7+(5( :(5( ff 18// &2/8016ff ,)10,66(42f *2 72 35,17 r!rrrrrrrrrrr12: '(/(7,1* 18// &2/8016rrrrrrrrrrrrrrrrf 18/9(&10,66 Of 1&2/; / 18/9(&f '2 10,66 ,)18/9(&, f18/9(&,ff(4 f *2 72 '2 18/9(&,f 18/9(&, f '2 O12%6 ;./f ;.-f &217,18( / / &217,18( &217,18( 1&/5$1 1&/5$110,66 1&2/; 1&2/;10,66 180 1&2/;r1&2/;Off 1&2/; $//2&$7( ;3;180Off '2 180 ;3;Df &217,18( 35,17rfrrrrrrrrrr)250,1* '27 352'8&76 2) '(6,*1 &2/8016rrrrrrrf '2 1&2/; 1 19(&, 1&2/;f '2 ,1&2/; 1 1 '2 O12%6 ;3;1f ;3;1f )/2$7;.,ffr)/2$7;.-fff PAGE 151 &217,18( &217,18( &217,18( 35,17rfrrrrrrrr)250,1* '27 352'8&76 2) '(6,*1 &2/8016 $1' 7+( 1$7$ 9(&725rrrrrrrrf / 1&/),; '2 O1&2/; ,)-/(1&/),;f 7+(1 1 19(&-1&2/;f 1 1 1&/),; (1',) ,) -*71&/),;f 7+(1 1 19(&/1&2/;f 1 1-1&/),; (1',) '2 O12%6 =$3 )/2$7;.-ff =,3 0($1.f ,)-(4/f =$3 0($1.f ;3;1f ;3;1f =,3r= $3f &217,18( &217,18( 35,17rfrrrrrrr$// '27 352'8&76 +$9( 12: %((1 )250('rrrrrrrrf 35,17rfrrr6$9,1* ; 35,0( ; 0$75,; )25 )8785( ,7(5$7,216rrrrf :5,7(f ;3; 35,17rfrrrrrrrrr; 35,0( ; ,6 6725('rrrrrrrrrf '($//2&$7( ;;3;f 5(7851 (1' errrrrrrrrrrrrrrrr$/*25,7+0rrrrrrrrrrrrrrrrrrrrrrrrrrr &rrrrrrrrrrr0',),(' 7 287387 9$5,$1&( &29$5,$1&(rrrrrrrrrrrrrrrr errrrrr rrrrrrrrrrPAnMnOAMe 2) 35(',&7,216 rrrrrrrrrrrrrrrrrrrrrrrrrrrr 68%5287,1( 9$5;9$5*9$5%+f 3$5$0(7(5 1 12%6(5 1 12%/ 1 12&5 1 12%+ 1 19$5%+ 12%+r12%+ ff 12%+ 1 12*&$ 1 129$5* 12*&$r12*&$Off 12*&$ 1 12; 1 1,;3; 12;r12;Off 12; 1 12&%6 1 1727 12; 12&%6 1 1,=(' 12;r12&%6 1 16,3 12; 12&%6 1 1,=(3 16,3r16,3 fff 16,3f PAGE 152 &20021&01 1&2/71&2/7%1&2/*1&2/61&2/*71&2/6712%6 1 1&2/%1&2/;1&2/&%1&/f125$112),;1&/),; 1 1&/5$11&2/6(15$1f &20021&01 1 <494 PAGE 153 &217,18( 1 19(&,1&2/;f '2 ,1&2/; ,)-(41&/),;ff *2 72 '2 / O12=(52 ,)-*(16,*/ff$1'-/(16,*/fff *2 72 &217,18( 11 1-, . 7..f ;3;11f6,*125$1f &217,18( &217,18( '2 1&/),; O1&2/7. 19(&,1&2/7.f 1 . 7.1f 7.1f O''.fff &217,18( '($//2&$7( ';3;f errrrrrrrrrrrrreTMMAnMf-216 KDYH QRZ EHHQ IRUPHGrrrrrrrrrrrrrrrrrrrrrrrrrr &$// 9(&6:37.1&2/7.1&2/7.O1&2/7.f '2 ,)5$11$0,f(4f*&$ff 7+(1 167$ *2 72 (1',) &217,18( 1(1' '2 167$ ,)6,*,f(4f *2 72 1(1' 1(1' 15$1,f &217,18( 167$. 1(1' 1&/),; 1(1'. 167$. 15$1167$f 1 '2 167$.1(1'. 19(&,1&2/7.f '2 ,1(1'. .. .-, 1 1 9$5*1f 7...f &217,18( &217,18( 1 '2 1&/),; 19(&,1&2/7.f '2 ,1&/),; PAGE 154 .. .-, 1 1 9$5%+1f 7...f &217,18( &217,18( '($//2&$7( 7.f 5(7851 (1' PAGE 155 5()(5(1&( /,67 %DQNV %' 0DR ,/ t :DOWHU -3 5REXVWQHVV RI WKH UHVWULFWHG PD[LPXP OLNHOLKRRG HVWLPDWRU GHULYHG XQGHU QRUPDOLW\ DV DSSOLHG WR GDWD ZLWK VNHZHG GLVWULEXWLRQV 'DLU\ 6FL %HFNHU :$ 0DQXDO RI 4XDQWLWDWLYH *HQHWLFV :DVKLQJWRQ 6WDWH 8QLY3UHVV 3XOOPDQ:$ SS %UDDWHQ 02 7KH XQLRQ RI SDUWLDO GLDOOHO PDWLQJ GHVLJQV DQG LQFRPSOHWH EORFN HQYLURQPHQWDO GHVLJQV 1RUWK &DUROLQD 6WDWH 8QLY ,QVW RI 6WDW 0LPHR 6HULHV 1R SS %ULGJZDWHU )( 7DOEHUW -7 t -DKURPL 6 ,QGH[ VHOHFWLRQ IRU LQFUHDVHG GU\ ZHLJKW LQ D \RXQJ OREOROO\ SLQH SRSXODWLRQ 6LOYDH *HQHW %XUGRQ 5' *HQHWLF FRUUHODWLRQ DV D FRQFHSW IRU VWXG\LQJ JHQRW\SHHQYLURQPHQW LQWHUDFWLRQ LQ IRUHVW WUHH EUHHGLQJ 6LOYDH *HQHW %XUGRQ 5' t 6KHOERXUQH &-$ %UHHGLQJ SRSXODWLRQV IRU UHFXUUHQW VHOHFWLRQ &RQIOLFWV DQG SRVVLEOH VROXWLRQV 1 = )RU 6FL %XUOH\ %XUURZV 30 $UPLWDJH )% t %DUQHV 5' 3URJHQ\ WHVW GHVLJQV IRU 3LQXV SDWXOD LQ 5KRGHVLD 6LOYDH *HQHW &DPSEHOO *HQHWLF YDULDELOLW\ LQ MXYHQLOH KHLJKWJURZWK RI 'RXJODVILU 6LOYDH *HQHW &RUEHLO 55 t 6HDUOH 65 $ FRPSDULVRQ RI YDULDQFH FRPSRQHQW HVWLPDWRUV %LRPHWULFV )DOFRQHU '6 ,QWURGXFWLRQ WR 4XDQWLWDWLYH *HQHWLFV /RQJPDQ t &R 1HZ PAGE 156 )UHXQG 5t /LWWHOO 5& 6$6 IRU /LQHDU 0RGHOV 6$6 ,QVWLWXWH,QF &DU\1& SS *LHVEUHFKW )* (IILFLHQW SURFHGXUH IRU FRPSXWLQJ PLQTXH RI YDULDQFH FRPSRQHQWV DQG JHQHUDOL]HG OHDVW VTXDUHV HVWLPDWHV RI IL[HG HIIHFWV &RPPXQ 6WDWLVW 7KHRU 0HWK *LOEHUW 1(* 'LDOOHO FURVV LQ SODQW EUHHGLQJ +HUHGLW\ *RRGQLJKW -+ $ WXWRULDO RQ WKH VZHHS RSHUDWRU $PHU 6WDW f *UD\ELOO )$ 7KHRU\ DQG $SSOLFDWLRQ RI WKH /LQHDU 0RGHO 'X[EXU\ 3UHVV 1RUWK 6FLWXDWH0$ SS *UHHQZRRG 06 /DPEHWK && t +XQW -/ $FFHOHUDWHG EUHHGLQJ DQG SRWHQWLDO LPSDFW XSRQ EUHHGLQJ SURJUDPV ,Q 6RXWKHUQ &RRSHUDWLYH 6HULHV %XOOHWLQ 1R /RXLVLDQD $J ([SHULPHQW 6WDWLRQ %DWRQ 5RXJH/$ SS *ULIILQJ % &RQFHSW RI JHQHUDO DQG VSHFLILF FRPELQLQJ DELOLW\ LQ UHODWLRQ WR GLDOOHO FURVVLQJ V\VWHPV $XVW %LRO 6FL +DOODXHU $5 t 0LUDQGD -% 4XDQWLWDWLYH *HQHWLFV LQ 0DL]H %UHHGLQJ ,RZD 6WDWH 8QLY3UHVV $PHV SS +DUWOH\ +2 ([SHFWDWLRQV YDULDQFHV DQG FRYDULDQFHV RI $129$ PHDQ VTXDUHV E\ V\QWKHVLV %LRPHWULFV +DUWOH\ +2 t 5DR -1. 0D[LPXP OLNHOLKRRG HVWLPDWLRQ IRU WKH PL[HG DQDO\VLV RI YDULDQFH PRGHO %LRPHWULND +DUYLOOH '$ 0D[LPXP OLNHOLKRRG DSSURDFKHV WR YDULDQFH FRPSRQHQW HVWLPDWLRQ DQG WR UHODWHG SUREOHPV $PHU 6WDW $VVRF +HQGHUVRQ &5 (VWLPDWLRQ RI YDULDQFH DQG FRYDULDQFH FRPSRQHQWV %LRPHWULFV +HQGHUVRQ &5 6LUH HYDOXDWLRQ DQG H[SHFWHG JHQHWLF DGYDQFH ,Q $QLPDO %UHHGLQJ DQG *HQHWLFV 6\PSRVLXP LQ +RQRU RI /XVK $QLPDO 6FL $VVRF $PHU &KDPSDLJQ ,OO SS +HQGHUVRQ &5 *HQHUDO IOH[LELOLW\ RI OLQHDU PRGHO WHFKQLTXHV IRU VLUH HYDOXDWLRQ 'DLU\ 6FL +HQGHUVRQ &5 %HVW OLQHDU XQELDVHG SUHGLFWLRQ RI EUHHGLQJ YDOXHV QR LQ WKH PRGHO IRU UHFRUGV 'DLU\ 6FL +HQGHUVRQ &5 $SSOLFDWLRQV RI /LQHDU 0RGHOV LQ $QLPDO %UHHGLQJ 8QLYHUVLW\ RI *XHOSK *XHOSK 2QWDULR &$1 S PAGE 157 +HQGHUVRQ &5 .HPSWKRPH 2 6HDUOH 65 t 9RQ .URVLJN &1 (VWLPDWLRQ RI HQYLURQPHQWDO DQG JHQHWLF WUHQGV IURP UHFRUGV VXEMHFW WR FXOOLQJ %LRPHWULFV +RGJH *5 t :KLWH 7/ LQ SUHVVf *HQHWLF SDUDPHWHU HVWLPDWHV IRU JURZWK WUDLWV DW GLIIHUHQW DJHV LQ VODVK SLQH 6LOYDH *HQHW +RJJ 59 t &UDLJ $7 ,QWURGXFWLRQ WR 0DWKHPDWLFDO 6WDWLVWLFV )RXUWK HGLWLRQ 0DFPLOODQ 3XEO &R 1HZ PAGE 158 0LOOHU -$V\PSWRWLF SURSHUWLHV DQG FRPSXWDWLRQ RI PD[LPXP OLNHOLKRRG HVWLPDWHV LQ WKH PL[HG PRGHO RI WKH DQDO\VLV RI YDULDQFH 7HFK 5HS 1R 'HSDUWPHQW RI 6WDWLVWLFV 6WDQIRUG 8QLY 6WDQIRUG &$ 0LOOLNHQ *$ t -RKQVRQ '( $QDO\VLV RI 0HVV\ 'DWD 'HVLJQHG ([SHULPHQWV /LIHWLPH /HDUQLQJ 3XE %HOPRQW&$ SS 1DPNRRQJ 6Q\GHU (% t 6WRQHF\SKHU 5: +HULWDELOLW\ DQG JDLQ FRQFHSWV IRU HYDOXDWLQJ EUHHGLQJ V\VWHPV VXFK DV VHHGOLQJ RUFKDUGV 6LOYDH *HQHW 1DPNRRQJ t 5REHUGV -+ &KRRVLQJ PDWLQJ GHVLJQV WR HIILFLHQWO\ HVWLPDWH JHQHWLF YDULDQFH FRPSRQHQWV IRU WUHHV 6LOYDH *HQHW 2OVHQ $ 6HHO\ t %LUNHV ,QYDULDQW TXDGUDWLF XQELDVHG HVWLPDWLRQ RI WZR YDULDQFH FRPSRQHQWV $QQ 6WDW 3DWWHUVRQ +' t 7KRPSVRQ 5 5HFRYHU\ RI LQWHUEORFN LQIRUPDWLRQ ZKHQ EORFN VL]HV DUH XQHTXDO %LRPHWULND 3HGHUVRQ '* $ FRPSDULVRQ RI IRXU H[SHULPHQWDO GHVLJQV IRU WKH HVWLPDWLRQ RI KHULWDELOLW\ 7KHRUHW $SSO *HQHW 3HSSHU :' &KRRVLQJ SODQWPDWLQJ GHVLJQ DOORFDWLRQV WR HVWLPDWH JHQHWLF YDULDQFH FRPSRQHQWV LQ WKH DEVHQFH RI SULRU NQRZOHGJH RI WKH UHODWLYH PDJQLWXGHV %LRPHWULFV 3HSSHU :' t 1DPNRRQJ &RPSDULQJ HIILFLHQF\ RI EDODQFHG PDWLQJ GHVLJQ IRU SURJHQ\ WHVWLQJ 6LOYDH *HQHW 3LWWPDQ (-* 7KH FORVHVW HVWLPDWHV RI VWDWLVWLFDO SDUDPHWHUV 3UR &DPEU 3KLORV 6RF 3UHVV :+ )ODQQHU\ %3 7HXNROVN\ 6$ t 9HWWHUOLQJ :7 1XPHULFDO 5HFLSHV 7KH $UW RI 6FLHQWLILF &RPSXWLQJ )RUWUDQ YHUVLRQf &DPEULGJH 8QLY 3UHVV 1HZ PAGE 159 6FKQHLGHU '0 /LQHDU $OJHEUD $ &RQFUHWH ,QWURGXFWLRQ 0D[PLOODQ 3XE &R 1HZ PAGE 160 :HLU 5t *RGGDUG 5( $GYDQFHG JHQHUDWLRQ RSHUDWLRQDO EUHHGLQJ SURJUDPV IRU OREOROO\ DQG VODVK SLQH ,Q 6RXWKHUQ &RRS 6HULHV %XOO /RXLVLDQD $JULH ([S 6WQ %DWRQ 5RXJH /$ SS :HLU 5t =REHO %0DQDJLQJ JHQHWLF UHVRXUFHV IRU WKH IXWXUH D SODQ IRU WKH 1& 6WDWH ,QGXVWU\ &RRSHUDWLYH 7UHH ,PSURYHPHQW 3URJUDP ,Q 3URF WK 6RXWK )RU 7UHH ,PSURYH &RQI -XQH 5DOHLJK 1& SS :HVWIDOO 3+ $ FRPSDULVRQ RI YDULDQFH FRPSRQHQW HVWLPDWHV IRU DUELWUDU\ XQGHUO\LQJ GLVWULEXWLRQV $PHU 6WDW $VVRF :KLWH 7/ $ FRQFHSWXDO IUDPHZRUN IRU WUHH LPSURYHPHQW SURJUDPV 1HZ )RUHVWV :KLWH 7/ t +RGJH *5 3UDFWLFDO XVHV RI EUHHGLQJ YDOXHV LQ WUHH LPSURYHPHQW SURJUDPV DQG WKHLU SUHGLFWLRQ IURP SURJHQ\ WHVW GDWD 3 LQ 3URF WK 6RXWK )RU 7UHH ,PSURYH &RQI 7H[DV $ t 0 8QLY &ROOHJH 6WDWLRQ 7; :KLWH 7/ t +RGJH *5 %HVW OLQHDU SUHGLFWLRQ RI EUHHGLQJ YDOXHV LQ D IRUHVW WUHH LPSURYHPHQW SURJUDP 7KHRU $SSO *HQHW :KLWH 7/ t +RGJH *5 3UHGLFWLQJ %UHHGLQJ 9DOXHV ZLWK $SSOLFDWLRQV LQ )RUHVW 7UHH ,PSURYHPHQW .OXZHU $FDGHPLF 3XE 'RUGUHFKW7KH 1HWKHUODQGV SS :LOFR[ 0' 6KHOERXUQH &-$ t )LUWK $ *HQHUDO DQG VSHFLILF FRPELQLQJ DELOLW\ LQ HLJKW VHOHFWHG FORQHV RI UDGLDWD SLQH 1 = )RU 6FL PAGE 161 %,2*5$3+,&$/ 6.(7&+ 'XGOH\ $UYOH +XEHU ZDV ERUQ 'HFHPEHU LQ )XOWRQ &RXQW\ *HRUJLD WR 'XGOH\ DQG 'RURWK\ +XEHU +LV EDVLF HGXFDWLRQ ZDV LQ WKH 6WHSKHQV &RXQW\ VFKRRO V\VWHP +H HQWHUHG *HRUJLD ,QVWLWXWH RI 7HFKQRORJ\ WR VWXG\ FKHPLFDO HQJLQHHULQJ DQG ODWHU WUDQVIHUUHG WR WKH 8QLYHUVLW\ RI *HRUJLD LQ WKH IRUHVWU\ SURJUDP ,Q KH UHFHLYHG D %DFKHORU RI 6FLHQFH GHJUHH )URP WR KH VHUYHG LQ WKH 8 6 1DY\ DQG DIWHU VHUYLFH UHHQWHUHG WKH 8QLYHUVLW\ RI *HRUJLD UHFHLYLQJ D 0DVWHU RI 6FLHQFH GHJUHH LQ $IWHU VHYHUDO \HDUV RI VHOI HPSOR\PHQW DQG HPSOR\PHQW DW WKH 8QLYHUVLW\ RI *HRUJLD KH EHJDQ D 'RFWRU RI 3KLORVRSK\ SURJUDP LQ +H LV FXUUHQWO\ HPSOR\HG DV RSHUDWLRQV JHQHWLFLVW IRU 6RXWKHUQ )RUHVW 7UHH ,PSURYHPHQW E\ :H\HUKDHXVHU &RPSDQ\ PAGE 162 , FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 7LPRWK\ / :KLWH &KDLUPDQ $VVRFLDWH 3URIHVVRU RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0LFKDHO $ 'H/RUHQ]R $VVRFLDWH 3URIHVVRU RI 'DLU\ 6FLHQFH FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ $VVLVWDQW 5HVHDUFK 6FLHQWLVW RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 5DPRQ & /LWWHOO 3URIHVVRU RI 6WDWLVWLFV WR DFFHSWDEOH D GLVVHUWDWLRQ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ ÂÂ/ a= AFW0A/UFUF[ 'RQDOG / 5RFNZRRG 3URIHVVRU RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ PAGE 163 7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXOW\ RI WKH 6FKRRO RI )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ LQ WKH &ROOHJH RI $JULFXOWXUH DQG WR WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHPHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0D\ $&8? L A 'LUHFWRU )RUHVW 5HVRXUFHV DQG &RQVHUYDWLRQ 'HDQ *UDGXDWH 6FKRRO PAGE 164 81,9(56,7< 2) )/25,'$ < u- OPTIMAL MATING DESIGNS AND OPTIMAL TECHNIQUES FOR ANALYSIS OF QUANTITATIVE TRAITS IN FOREST GENETICS By DUDLEY ARVLE HUBER DISSERTATION PRESENTED TO THE GRADUATE SCHOOL THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1993 ACKNOWLEDGEMENTS I express my gratitude to Drs. T. L. White, G. R. Hodge, R. C. Littell, M. A. DeLorenzo and D. L. Rockwood for their time and effort in the pursuit of this work. Their guidance and wisdom proved invaluable to the completion of this project. I further acknowledge Dr. Bruce Bongarten for his encouragement to continue my academic career. I am grateful to Dr. T. L. White and the School of Forest Resources and Conservation at the University of Florida for funding this work. I extend special thanks to George Bryan and Dr. M. A. DeLorenzo of the Dairy Science Department and Greg Powell of the School of Forest Resources and Conservation for the use of computing facilities, programming help and aid in running the simulations required. Most importantly, I thank my family, Nancy, John and Heather, for their understanding and encouragement in this endeavor. 11 TABLE OF CONTENTS ACKNOWLEDGEMENTS ii LIST OF TABLES vi LIST OF FIGURES vii ABSTRACT viii CHAPTER 1 INTRODUCTION 1 CHAPTER 2 THE EFFICIENCY OF HALF-SIB, HALF-DIALLEL AND CIRCULAR MATING DESIGNS IN THE ESTIMATION OF GENETIC PARAMETERS WITH VARIABLE NUMBERS OF PARENTS AND LOCATIONS 4 Introduction 4 Methods 6 Assumptions Concerning Block Size 6 The Use of Efficiency (i) 7 General Methodology 8 Levels of Genetic Determination 10 Covariance Matrix for Variance Components 12 Covariance Matrix for Linear Combinations of Variance Components and Variance of a Ratio 13 Comparison Among Estimates of Variances of Ratios 14 Results 17 Heritability 17 Type B Correlation 18 Dominance to Additive Variance Ratio 21 Discussion 22 Comparison of Mating Designs 22 A General Approach to the Estimation Problem 23 Use of the Variance of a Ratio Approximation 25 Conclusions 26 iii CHAPTER 3 ORDINARY LEAST SQUARES ESTIMATION OF GENERAL AND SPECIFIC COMBINING ABILITIES FROM HALF-DIALLEL MATING DESIGNS 28 Introduction 28 Methods 30 Linear Model 30 Ordinary Least Squares Solutions 31 Sum-to-Zero Restrictions 32 Components of the Matrix Equation 35 Estimation of Fixed Effects 39 Numerical Examples 41 Balanced Data (Plot-mean Basis) 41 Missing Plot 42 Missing Cross 45 Several Missing Crosses 46 Discussion 47 Uniqueness of Estimates 47 Weighting of Plot Means and Cross Means in Estimating Parameters ... 48 Diallel Mean 51 Variance and Covariance of Plot Means 52 Comparison of Prediction and Estimation Methodologies 54 Conclusions 55 CHAPTER 4 VARIANCE COMPONENT ESTIMATION TECHNIQUES COMPARED FOR TWO MATING DESIGNS WITH FOREST GENETIC ARCHITECTURE THROUGH COMPUTER SIMULATION 57 Introduction 57 Methods 59 Experimental Approach 59 Experimental Design for Simulated Data 61 Full-Sib Linear Model 62 Half-sib Linear Model 63 Data Generation and Deletion 64 Variance Component Estimation Techniques 66 Comparison Among Estimation Techniques 70 Results and Discussion 71 Variance Components 71 Ratios of Variance Components 76 General Discussion 79 Observational Unit 79 Negative Estimates 79 Estimation Technique 80 Recommendation 81 IV CHAPTER 5 GAREML: A COMPUTER ALGORITHM FOR ESTIMATING VARIANCE COMPONENTS AND PREDICTING GENETIC VALUES 82 Introduction 82 Algorithm 83 Operating GAREML 86 Interpreting GAREML Output 90 Variance Component Estimates 90 Predictions of Random Variables 91 Asymptotic Covariance Matrix of Variance Components 92 Fixed Effect Estimates 93 Error Covariance Matrices 93 Example 94 Data 94 Analysis 94 Output 98 Conclusions 103 CHAPTER 6 CONCLUSIONS 104 APPENDIX FORTRAN SOURCE CODE FOR GAREML 107 REFERENCE LIST 145 BIOGRAPHICAL SKETCH 151 v LIST OF TABLES Table 2-1. Parametric variance components 11 Table 3-1. Data set for numerical examples 43 Table 3-2. Numerical results for examples 44 Table 4-1. Abbreviation for and description of variance component estimation methods 60 Table 4-2. Sets of true variance components 61 Table 4-3. Sampling variance for the estimates 72 Table 4-4. Bias for the estimates 74 Table 4-5. Probability of nearness 75 Table 5-1. Data for example 95 VI LIST OF FIGURES Figure 2-1. Efficiency (i) for h2 16 Figure 2-2. Efficiency (i) for rB 19 Figure 2-3. Efficiency (/) for 7 20 Figure 3-1. The overparameterized linear model 33 Figure 3-2. The linear model for a four-parent half-diallel 33 Figure 3-3. Intermediate result in SC A submatrix generation 39 Figure 3-4. Weights on overall cross means 49 Figure 4-1. Distribution of 1000 MIVQUE estimates 77 vii Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OPTIMAL MATING DESIGNS AND OPTIMAL TECHNIQUES FOR ANALYSIS OF QUANTITATIVE TRAITS IN FOREST GENETICS By Dudley Arvle Huber May 1993 Chairperson: Timothy L. White Major Department: School of Forest Resources and Conservation First, the asymptotic covariance matrix of the variance component estimates is used to compare three common mating designs for efficiency (maximizing the variance reducing property of each observation) for genetic parameters across numbers of parents and locations and varying genetic architectures. It is determined that the circular mating design is always superior in efficiency to the half-diallel design. For single-tree heritability, the half-sib design is most efficient. For estimating type B correlation, maximum efficiency is achieved by either the half- sib or circular mating design and that change in rank for efficiency is determined by the underlying genetic architecture. Another intent of this work is comparing analysis methodologies for determining parental worth. The first of these investigations is ordinary least squares assumptions in the estimation of parental worth for the half-diallel mating design with balanced and unbalanced data. The conclusion from comparison of ordinary least squares to alternative analysis methodologies is that best linear unbiased prediction and best linear prediction are more appropriate to the problem of determining parental worth. viii The next analysis investigation contrasts variance component estimation techniques across levels of imbalance for the half-diallel and half-sib mating designs for the estimation of genetic parameters with plot means and individuals used as the unit of observation. The criteria for discrimination are variance of the estimates, mean square error, bias and probability of nearness. For all estimation techniques individuals as the unit of observation produced estimates with the most desirable properties. Of the estimation techniques examined, restricted maximum likelihood is the most robust to imbalance. The computer program used to produce restricted maximum likelihood estimates of variance components was modified to form a user friendly analysis package. Both the algorithm and the outputs of the program are documented. Outputs available from the program include variance component estimates, generalized least squares estimates of fixed effects, asymptotic covariance matrix for variance components, best linear unbiased predictions for general and specific combining abilities and the error covariance matrix for predictions and estimates. IX CHAPTER 1 INTRODUCTION Analysis of quantitative traits in forest genetic experiments has traditionally been approached as a two-part problem. Parental worth would be estimated as fixed effects and later considered as random effects for the determination of genetic architecture. While traditional, this approach is most probably sub-optimal given the proliferation of alternative analysis approaches with enhanced theoretical properties (White and Hodge 1989). In this dissertation emphasis is placed on the half-diallel mating design because of its omnipresence and the uniqueness of the analysis problem this mating design presents. The half- diallel mating design has been and continues to be used in plant sciences (Sprague and Tatum 1942, Gilbert 1958, Matzinger et al. 1959, Burley et al. 1966, Squillace 1973, Weir and Zobel 1975, Wilcox et al. 1975, Snyder and Namkoong 1978, Hallauer and Miranda 1981, Singh and Singh 1984, Greenwood et al. 1986, and Weir and Goddard 1986). The unique feature of the half-diallel mating system which hinders analysis with many statistical packages is that a single observation contains two levels of the same main effect. Optimality of mating design for the estimation of commonly needed genetic parameters (single-tree heritability, type B correlation and dominance to additive variance ratio) is examined utilizing the asymptotic covariance of the variance components (Kendall and Stuart 1963, Giesbrecht 1983 and McCutchan et al. 1989). Since genetic field experiments are composed of both a mating design and a field design, the central consideration in this investigation is which mating design with what field design (how many parents and across what number of locations 1 2 within a randomized complete block design) is most efficient. The criterion for discernment among designs is the efficiency of the individual observation in reducing the variance of the estimate (Pederson 1972). This question is considered under a range of genetic architectures which spans that reported for coniferous growth traits (Campbell 1972, Stonecypher et al. 1973, Snyder and Namkoong 1978, Foster 1986, Foster and Bridgwater 1986, Hodge and White [in press]). The investigation into optimal analysis proceeds by considering the ordinary least squares (OLS) treatment of estimating parental worth for the half-diallel mating design. OLS assumptions are examined in detail through the use of matrix algebra for both balanced and unbalanced data. The use of matrix algebra illustrates both the uniqueness of the problem and the interpretation of the OLS assumptions. Comparisons among OLS, generalized least squares (GLS), best linear unbiased prediction (BLUP) and best linear prediction (BLP) are made on a theoretical basis. Although consideration of field and mating design of future experiments is essential, the problem of optimal analysis of current data remains. In response to this need, simulated data with differing levels of imbalance, genetic architecture and mating design is utilized as a basis for discriminating among variance component estimation techniques in the determination of genetic architecture. The levels of imbalance simulated represent those commonly seen in forest genetic data as less than 100% survival, missing crosses for full-sib mating designs and only subsets of parents in common across location for half-sib mating designs. The two mating designs are half-sib and half-diallel with a subset of the previously used genetic architectures. The field design is a randomized complete block with fifteen families per block and six trees per family per block. The four critera used to discriminate among variance component estimation techniques are probability of nearness (Pittman 1937), bias, variance of the estimates and mean square error (Hogg and Craig 1978). 3 The techniques compared for variance component estimation are minimum variance quadratic unbiased estimation (Rao 1971b), minimum norm quadratic unbiased estimation (Rao 1971a), restricted maximum likelihood (Patterson and Thompson 1971), maximum likelihood (Hartley and Rao 1967) and Hendersonâ€™s method 3 (Henderson 1953). These techniques are compared using the individual and plot means as the unit of observation. Further, three alternatives are explored for dealing with negative variance component estimates which are accept and live with negative estimates, set negative estimates to zero, and re-solve the system setting negative components to zero. The algorithm used for the method which provided estimates with optimal properties across experimental levels was converted to a user friendly program. This program providing restricted maximum likelihood variance component estimates uses Giesbrechtâ€™s algorithm (1983). Documentation of the algorithm and explanation of the programâ€™s output are provided along with the Fortran source code (appendix). CHAPTER 2 THE EFFICIENCY OF HALF-SIB, HALF-DIALLEL AND CIRCULAR MATING DESIGNS IN THE ESTIMATION OF GENETIC PARAMETERS WITH VARIABLE NUMBERS OF PARENTS AND LOCATIONS Introduction In forest tree improvement, genetic tests are established for four primary purposes: 1) ranking parents, 2) selecting families or individuals, 3) estimating genetic parameters, and 4) demonstrating genetic gain (Zobel and Talbert 1984). While the four purposes are not mutually exclusive, a test design optimal for one purpose is most probably not optimal for all (Burdon and Shelbourne 1971, White 1987). A breeder then must prioritize the purposes for which a given test is established and choose a design based on these priorities. Within a genetic test design there are two primary components: mating design and field design. There have been several investigations of optimal designs for these two components either separately or simultaneously under various criteria. These criteria have included the efficient and/or precise estimation of heritability (Pederson 1972, Namkoong and Roberds 1974, Pepper and Namkoong 1978, McCutchan et al. 1985, McCutchan et al. 1989), precise estimation of variance components (Braaten 1965, Pepper 1983), and efficient selection of progeny (van Buijtenen 1972, White and Hodge 1987, van Buijtenen and Burdon 1990, Loo-Dinkins et al. 1990). Incorporated within this body of research has been a wide range of genetic and environmental variance parameters and field and mating designs. However, the models in previous investigations have been primarily constrained to consideration of testing in a single 4 5 environment with a corresponding limited number of factors in the model, i.e., genotype by environment interaction and/or dominance variance are usually not considered. This chapter focuses on optimal mating designs through consideration of three common mating designs (half- sib, half-diallel, and circular with four crosses per parent) for estimation of genetic parameters with a field design extending across multiple locations. In this chapter the approach to the optimal design problem is to maintain the basic field design within locations as randomized complete block with four blocks and a six-tree row-plot representing each genetic entry within a block (noted as one of the most common field designs by Loo-Dinkins et al. 1990). The number of families in a block, number of locations, mating design and number of parents within a mating design are allowed to change. Since optimality, besides being a function of the field and mating designs, is also a function of the underlying genetic parameters, all designs are examined across a range of levels of genetic determination (as varying levels of heritability, genotype by environment interaction and dominance) reflecting estimates for many economically important traits in conifers (Campbell 1972, Stonecypher et al. 1973, Snyder and Namkoong 1978, Foster 1986, Foster and Bridgwater 1986, Hodge and White (in press)). For each design and level of genetic determination, a Minimum Variance Quadratic Unbiased Estimation (MIVQUE) technique and an approximation of the variance of a ratio (Kendall and Stuart 1963, Giesbrecht 1983 and McCutchan et al. 1989) are applied to estimate the variance of estimates of heritability, additive to additive plus additive by environment variance ratio, and dominance to additive variance ratio. These techniques use the true covariance matrix of the variance component estimates (utilizing only the known parameters and the test design and precluding the need for simulated or real data) and a Taylor series approximation of the variance of a ratio. The relative efficiencies of different test designs are compared on the basis of i (the 6 efficiency of an individual observation in reducing the variance of an estimate, Pederson 1972). Thus this research explores which mating design, number of parents and number of locations is most efficient per unit of observation in estimating heritability, additive to additive plus additive by environment variance ratio, and dominance to additive variance ratio for several variance structures representative of coniferous growth traits. Methods Assumptions Concerning Block Size As opposed to McCutchan et al. (1985), where block sizes were held constant and including more families resulted in fewer observations per family per block, in this chapter the blocks are allowed to expand to accomodate increasing numbers of families. This expansion is allowed without increasing either the variance among block or the variance within blocks. For the three mating designs which are discussed, the addition of one parent to the half-sib design increases block size by 6 trees (plot for a half-sib family), the addition of a parent to the circular design increases block size by 12 trees (two plots for full-sib families), and the addition of a parent to the half-diallel design increases block size by 6p (where p is the number of parents before the addition or there are p new full-sib families per block). Therefore, block size is determined by the mating design and the number of parents. All comparisons among mating designs and numbers of locations are for equal block sizes, i.e., equal numbers of observations per location. This results in comparing mating designs with unequal numbers of parents in the designs and comparing two location experiments against five location experiments with equal numbers of observations per location but unequal total numbers of observations. 7 The Use of Efficiency (i) Efficiency is the tool by which comparisons are made and is the efficacy of the individual observations in an experiment in lowering the variance of parameter estimates. An increasing efficiency indicates that for increasing experimental size the additional observations have enhanced the variance reducing property of all observations. Efficiency is calculated as i = 1 / N(Var(x)) where N is the total number of observations and Var(x) is the variance of a generic parameter estimate. Increasing N always results in a reduction of the variance of estimation, all other things being equal. Yet the change in efficiency with increasing N is dependent on whether the reduction in variance is adequate to offset the increase in N which caused the reduction. Comparing a previous efficiency with that obtained by increasing N, i.e., increasing the number of parents in a mating design or increasing the number of locations in which an experiment is planted: since ia = 1 / N(Var(x)), 2-1 then N(Var(x)) = 1 / ia and (N + AN)(Var(x) + AVar(x)) = 1 / if ia (the old efficiency) = in (the new efficiency), then AVar(x) / Var(x) = - AN / (N + AN); if i0 < in, then AVar(x) / Var(x) < - AN / (N + AN); and if ia > then AVar(x) / Var(x) > - AN / (N + AN); where A denotes the change in magnitude. Viewing equation 2-1, if N is held constant and one design has a higher efficiency (i), the design must also produce parameter estimates which have a lower variance. 8 General Methodology Sets of true variance components are calculated in accordance with a stated level of genetic control and the design matrix is generated in correspondence with the field and mating design. Knowing the design matrix and a set of true variance components, a true covariance (covariance) matrix of variance component estimates is generated. Once the covariance matrix of the variance components is in hand, the variance of and covariances between any linear combinations of the variance component estimates are calculated. From the covariance matrix for linear combinations, the variance of genetic ratios as ratios of linear combinations of variance components are approximated by a Taylor series expansion. Since definition of a set of variance components and formation of the design matrix are dependent on the linear model employed, discussion of specific methodology begins with linear models. Linear Models Half-diallel and circular designs The scalar linear model employed for half-diallel and circular mating designs is yÂ¡jklm = H + ti + b|j + gk + g, + Su + tgik + tgu + tSw + pijkl + wijklm 2-2 where yijklm is the m- observation of the kl- cross in the j- block of the i- test; H is the population mean; tj is the random variable test environment ~ NID^o^); b;j is the random variable block ~ NID(0, su is the random variable specific combining ability (sea) ~ NID^ff2^); tgÂ¡j. is the random variable test by female gca interaction ~ NIDiO,^); 9 tgu is the random variable test by male gca interaction ~ 1^10(0,0%; tsM is the random variable test by sea interaction ~ NID(O,02J; pijkl is the random variable plot ~ NIDCO,^); Wjjkim is the random variable within plot ~ NID(O,02W); and there is no covariance between random variables in the model. This linear model in matrix notation is (dimensions below model component): y = + ZjjCj + ZBeB + ZGeG + Zses + Z^e^ + Z^^ts + ZPeP + ew 2-3 nxl axl axt txl nxb bxl nxg gxl rus sxl nxtg tgxl ruts tsjel rup pxl nxl where y is the observation vector; Zj is the portion of the design matrix for the iâ€” random variable; e, is the vector of unobservable random effects for the iâ€” random variable; 1 is a vector of lâ€™s; and n, t, b, g, s, tg, ts, and p are the number of observations, tests, blocks, gcaâ€™s, seaâ€™s, test by gca interactions, test by sea interactions and plots, respectively. Utilizing customary assumptions in half-diallel mating designs (Method 4, Griffing 1956), the variance of an individual observation is Var(yijldJ = Var(y) = Z^o2, + ZBZÂ¿o\ + ZGZÂ¿o28C. + ZsZyâ€ž + 7^G7^a\ + Z^Z^ + ZPZâ€™ nxn identity matrix. Half-sib design The scalar linear model for half-sib mating designs is yijkm - M + by + gk + tg^ + p*jk + W*jkm 2-6 10 where yijkm is the m- observation of the k- half-sib family in the jâ€” block of the iâ€” test; /r, tÂ¡, bij, gk, and tg^ retain the definition in Eq.2-2; p*ijk is the random variable plot containing different genotype by environment components than Eq.2-2 ~ NIDiO.o2,,.); w*jkm is the random variable within plot containing different levels of genotypic and genotype by environment components than Eq.2-2 ~ NIDfO^2*Â»); and there is no covariance between random variables in the model. The matrix notation model is y = nl -I- Z.-jt*'!' + Zg6g + ZqCq f ^xg^tg â€1" p -f* 2-7 rul rue 1 art txl rub hrl rug gxl iutg tgxl rup prl rul The variance of an individual observation in half-sib designs is Var(yijkJ = a2, + Eight levels of genetic determination are derived from a factorial combination of two levels of each of three genetic ratios: heritability (h2 = 4orgra / (2a2gca + a2sca + 2a2^ + a2^ + a2p + To generate sets of true variance components (Table 2-1) for half-diallel and circular mating designs from the factorial combinations of genetic parameters, the denominator of h2 is set to 10 (arbitrarily, but without loss of generality) which, given the level of h2, leads to the 11 solution for a2iaÂ¡. Solving for cr2gca and knowing y yields the value for o2^. Knowing the level of rB and allows the equation for rB to be solved for a2^. An assumption that the ratio of 7 Table 2-1. Parametric variance components for the factorial combination of heritability (.1 and .25), Type B Correlation (.5 and .8) and dominance to additive variance ratio (.25 and 1.0) for full and half-sib designs. a2x and a\ were maintained at 1.0 and .5, respectively for all levels and designs. Design Level h2 fB 7 <4, o2*. < a2Â» Full 1 .1 .8 1.0 .2500 .2500 .0625 .0625 .6344 8.4281 2 .1 .5 1.0 .2500 .2500 .2500 .2500 .5950 7.9050 3 .1 .8 .25 .2500 .0625 .0625 .0156 .6508 8.6461 4 .1 .5 .25 .2500 .0625 .2500 .0625 .6212 8.2538 5 .25 .8 1.0 .6250 .6250 .1562 .1562 .5359 7.1203 6 .25 .5 1.0 .6250 .6250 .6250 .6250 .4376 5.8125 7 .25 .8 .25 .6250 .1562 .1562 .0391 .5769 7.6649 8 .25 .5 .25 .6250 .1562 .6250 .1562 .5031 6.6844 Half 1 and 3 .1 .8 .2500 .0625 .4844 9.2031 2 and 4 .1 .5 .2500 .2500 .4750 9.0250 5 and 7 .25 .8 .6250 .1562 .4609 8.7579 6 and 8 .25 .5 .6250 .6250 .4375 8.3125 equals the ratio of a2* / a\ permits a solution for u2te. A further assumption that 0.5, respectively, for all treatment levels. In order to facilitate comparisons of half-sib mating designs with full-sib mating designs, u2gca and a\ retain the same values for given levels of h2 and rB and the denominator of heritability again is set to 10. To solve for 12 h2 and rB. Under the previous definitions all consideration of differences in y changing the magnitudes of a2p. and a2W, is disallowed. Thus, there are only four parameter sets for the half- sib mating design (Table 2-1). Covariance Matrix for Variance Components The base algorithm to produce the covariance matrix for variance component estimates is from Giesbrecht (1983) and was rewritten in Fortran for ease of handling the study data. In using this algorithm, we assume that all random variables are independent and normally distributed and that the true variances of the random variables are known. Under these assumptions, Minimum Norm Quadratic Unbiased Estimation (MINQUE, Rao 1972) using the true variance components as priors (the starting point for the algorithm) becomes MIVQUE (Rao 1971b), which requires normality and the true variance components as priors (Searle 1987), and for a given design the covariance matrix of the variance component estimates becomes fixed. A sketch of the steps from the MIVQUE equation (Eq.2-10, Giesbrecht 1983, Searle 1987) to the true covariance matrix for variance components estimates is {triQVjQV^Jff2 = {yâ€™QVjQy} 2-10 rxr rxl rxl then Â¿* = {tr(QViQVj)}â€˜1{yâ€™QViQy} and VarÃo2) = {trÃQV&V^-'VarQyâ€™Q^QyMtrÃQV&Vj)}-1 rxr rxr rxr rxr where {aj is a matrix whose elements are a:j where in the full-sib designs i= 1 to 8 and j=l to 8, i.e., there is a row and column for every random variable in the linear model; 13 tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V'1 - V'XtXâ€™V 'XyXâ€™V'1 for V = the covariance matrix of y and X as the design matrix for fixed effects; VÂ¡ = ZjZâ€™, where i = the random variables test, block, etc.; Ã³2 is the vector of variance component estimates; and r is the number of random variables in the model. The variance of a quadratic form (where A is any non-negative definite matrix of proper dimension) under normality is Var(yâ€™Ay) = 2tr(AVAV) + /tâ€™A/* (Searle 1987); however, MINQUE derivation (Rao 1971) requires that AX = 0 which in our case is A1 =0 and is equivalent to /xlâ€™Al/i = 0, thus Var({yâ€™QVjQy}) = 2{tr(QViQVj)}; 2-11 and using Eq.2-10 and Eq.2-11 VariÃ³2) = {triQVjQVpj ^itriQViQVpJitriQVjQVj)}1 and therefore VariÃ³2) = Vvc = 2{tr(QViQVj)}1. 2-12 From Eq.2-12 it is seen that the MIVQUE covariance matrix of the variance component estimates is dependent only on the design matrix (the result of the field design and mating design) and the true variance components; a data vector is not needed. Covariance Matrix for Linear Combinations of Variance Components and Variance of a Ratio Once the covariance matrix for the variance component estimates (Eq.2-12) is created, then the covariance matrix of linear combinations of these variance components is formed as Vk = Lâ€™VvcL 2-13 2x2 2xt rxr rx2 14 where L specifies the linear combinations of the variance components which are the combinations of variance components in the denominator and numerator of the genetic ratio being estimated. A Taylor series expansion (first approximation) for the variance of a ratio using the variances of and covariance between numerator and denominator is then applied using the elements of Vk to produce the approximate variance of the three ratio estimates as (Kendall and Stuart 1963): Var(ratio) = (l/D)2(Vk(l,l)) - 2(N/D3)(Vk(l,2)) + (tfVD4)^^)) 2-14 where the generic ratio is N/D and N and D are the parametric values; Vk(l,l) is the variance of N; Vk(l,2) is the covariance between N and D; and Vk(2,2) is the variance of D. Comparison Among Estimates of Variances of Ratios The approximate variances of the three ratio estimates (h2, rB, and y) are compared across mating designs with equal (or approximately equal) numbers of observations, across numbers of locations, and across numbers of parents within a mating design all within a level of genetic determination. The standard for comparison is i. Results are presented by the genetic ratio estimated so that direct comparisons may be made among the mating designs for equal numbers of observations within a number of locations for varying levels of genetic control. Number of genetic entries (number of crosses for full-sib designs and number of half-sib families for half-sib designs) is used as a proxy for number of observations since, for all designs, number of observations equals twenty-four times the number of locations times the number of genetic entries. Further, by plotting the two levels of numbers of locations on a single figure, a 15 comparison is made of the utility of replication of a design across increasing numbers of locations. Efficiency plots also permit contrasts of the absolute magnitude of variance of estimation among designs. For a given number of genetic entries and locations, the design with the highest efficiency is the most precise (lowest variance of estimation). Increasing the number of genetic entries or locations always results in greater precision (lower variance of estimation), but is not necessarily as efficient (the reduction in variance was not sufficient to offset the increase in numbers of observations). A primary justification for using the efficiency of a design as a criterion is that a more precise estimate of a genetic ratio is obtained by using the mean of two estimates from replication of the small design as two disconnected experiments as opposed to the estimate from single large design. This is true when 1) the number of observations in the large design (N) equals twice the number of observations in small design (nÂ¡), 2) the small design is more efficient, and 3) the variances are homogeneous. This is proven below: Since N = n, + n2 and n, = n2 then N = 2n,. By definition i=l/ (N*(Var (Ratio))); and Var(Ratio) = 1 /(z* N). The proposition is (Vars(Ratio) + Vars(Ratio))/4.0 < Var,(Ratio): substitution gives ((l/(n,*0) + (l/(n,*z.)))/4.0 < (l/(N*zj)). Simplification yields (17(2-0*n,*Ã¼) < (1/(N*/,)); and multiplication by N produces Hi, < 1/zj 2-15 which is strictly true so long as zs > zj where z8 is the efficiency of the smaller experiment and z, is the efficiency of the larger experiment. EFFICIENCY EFFICIENCY (1a) h =1; r, .8; V =1.0 (1b) h =.1; rB =.5; V = 1.0 (1e) h =.25; rB = 8: v = 1.0 h -.25; rB = .5; V = 1.0 Circular 2 locations Half-dialld 2 locations Half-sib 2 locations Circular 5 locations Half-dioUd 5 locations Half-sib 5 locations (11) (1g) hâ€™ = 25; r, = .5; V = .25 Figure 2-1. Efficiency (/) for h2 plotted against number of genetic entries for levels 1 through 8 for genetic control for circular, half- diallel, and half-sib mating designs across levels of location where i = l/(N(Var(h2))) and N = the total number of observations. ON 17 Results Heritabilitv Half-sib designs are almost globally superior to the two full-sib designs in precision of heritability estimates (results not shown for variance but may be seen from efficiencies in Figure 2-1). For designs of equal size, half-sib designs excel with the exception of genetic level three (Figure 2-lc, h2 = 0.1, rB = 0.8, and y = 0.25). In genetic level three, the circular design provides the most precise estimate of h2 for two location designs; however, when the design is extended across five locations, the half-sib mating design again provides the most precise estimates. The circular mating design is superior in precision to the half-diallel design across all levels of genetic control and location, even with a relatively large number of crosses per parent (four). Half-sib designs are, in general, (seven genetic control levels out of eight, Figure 2-1) more efficient with the exception of level three across two locations (Figure 2-lc). For the circular and half-sib mating designs considered, increasing the number of genetic entries always improves the efficiency of the design. However, definite optima exist for the half-diallel mating design for number of genetic entries, i.e., crosses which convert to a specific number of parents. These optima are not constant but tend to be six parents or less, lower with increasing h2 or number of locations. The six-parent half-diallel is never far from the half-diallel optima, and increasing the number of parents past the optimum results in decreased efficiency. For half-sib designs with h2 = 0.1, five locations are more efficient than two locations; however, at h2 = 0.25 two locations are most efficient. Further, the number of locations required to efficiently estimate h2 for half-sib designs is determined only by the level of h2 and does not depend on the levels of the other ratios. Although estimates over larger numbers of 18 observations are more precise (five-location estimates are more precise than two-location estimates), the efficiency (increase in precision per unit observation) declines. So that if h2 = 0.25 and estimates of a certain precision are required, disconnected sets of two-location experiments are preferred to five-location experiments. The relative efficiencies of five locations versus two locations is enhanced with decreasing rB (increasing genotype by environment interaction) within Ã¡ level of h2 (compare Figures 2-la to 2-lb and 2-lc to 2-ld for h2 = 0.1, and 2-le to 2-lf and 2-lg to 2-lh for h2 = 0.25). Yet, this enhancement is not sufficient to cause a change in efficiency ranking between the location levels. The full-sib designs differ markedly from this pattern (Figure 2-1) in that, for these parameter levels, it is never more efficient to increase the number of locations from two to five for heritability estimation. As observed with half-sib designs, for full-sib designs the relative efficiency status of five locations improves with decreasing rB. To further contrast mating designs note that the efficiency status of full-sib designs relative to the half-sib design improves with decreasing y and increasing rB (Figures 2-lb versus 2-lc and 2-lf versus 2-lg). Type B Correlation As opposed to h2 estimation, no mating design performs at or near the optima for precision of rB estimates across all levels of genetic control (Figure 2-2). However, the circular mating designs produce globally more precise estimates than those of the half-diallel mating design. In general, the utility of full-sib versus half-sib designs is dependent on the level of rB. The lower rB value favors half-sib designs while the higher rB tends to favor full-sib designs (compare Figures 2-2a to 2-2b, 2-2c to 2-2d, 2-2e to 2-2f and 2-2g to 2-2h). Decreasing y and lowering h2 always improves the relative efficiency of full-sib designs to half- sib designs (compare Figures 2-2c and 2-2d to 2-2e and 2-2f). EFFICIENCY EFFICIENCY (2a) h2 = .1; rB =8; y=1.0 (2c) h2 = .1: r B = -a: y-2s (26) b2 = .1; 18 = -5; Y = 10 (2d) rB = 5; y =-25 0 10 20 30 40 50 GENETIC ENTRIES (2.) h2Â»^: 'b=.8; y = 1.0 (2g) h2 =.25; rB Â».Â«; y=.25 (21) h2 =55; r B = .5; y-1.0 (2b) h2=.2S: f B = TÂ»; y=2S Circular 2 locations Half-dialld 2 locations Circular 5 locations Half-dialltl 5 locations Half-sib 2 locations Half-sib 5 locations â€¢ Figure 2-2. Efficiency (i) for rB plotted against number of genetic entries for levels 1 through 8 for genetic control for circular, half-diallel, and half-sib mating designs across levels of location where i = l/(N(Var(rB))) and N = the total number of observations. vo EFFICIENCY EFFICIENCY 20 (3a) h2 = .1; r0 = .5; -y = 1.0 (3c) h2 = .25; r0 = .8; V = 1.0 Circular 2 locations (3b) h2= .1; r0 = .5; -y = .25 (3d) h2 =.25; rB = .8; V = .25 GENETIC ENTRIES Half-diallel 2 locations Circular 5 locations Half-diallel 5 locations â– A Figure 2-3. Efficiency (z) for 7 plotted against number of genetic entries for four levels for genetic control for circular, half-diallel, and half-sib mating designs across levels of location where i = l/(N(Var(7))) and N = the total number of observations. 21 For estimation of rB, full-sib designs are more efficient than half-sib designs except in the three cases of low rB (0.5) and high 7 (1.0) for h2 = 0.1 (Figure 2-2b) and low rB for h2 = 0.25 (Figures 2-2f and 2-2h). Within full-sib designs the circular design is globally superior to the half-diallel. As with h2 estimation, half-diallel designs have optimal levels for numbers of parents. The six-parent half-diallel is again close to these optima for all genetic levels and numbers of locations. At low h2 for full-sib designs, planting in two locations is always more efficient than five locations. For half-sib designs at low h2, the relative efficiency of two versus five locations is dependent on the level of rB with lower rB favoring replication across more locations. At h2 = 0.25, half-sib designs are more efficient when replicated across five locations. At the higher h2 value, full-sib design efficiency across locations is dependent on the level of rB. With rB = 0.5 and h2 = 0.25, replication of full-sib designs is for the first time more efficient across five locations than across two locations; however, at the higher rB level two locations is again the preferred number. Dominance to Additive Variance Ratio In comparing the two full-sib designs for relative efficiency in estimating 7, the circular design is always approximately equal to or, for most cases, superior to the half-diallel design (Figure 2-3). The relative superiority of the circular design is enhanced by decreasing 7 and rB (not shown). The half-diallel design again demonstrates optima for number of parents with the six-parent design being near optimal. Within a mating design the use of two locations is always more efficient than the use of five locations. The magnitude of this superiority escalates with increasing rB and h2 (Figures 2-3a and 2-3b versus 2-3c and 2-3d). 22 Discussion Comparison of Mating Designs A priori knowledge of genetic control is required to choose the optimal mating and field design for estimation of h2, rB and 7. Given that such knowledge may not be available, the choices are then based on the most robust mating designs and field designs for the estimation of certain of the genetic ratios. If h2 is the only ratio desired, then the half-sib mating design is best. Estimation of both h2 and rB requires a choice between the half-sib and circular designs. If there is no prior knowledge then the selection of a mating design is dependent on which ratio has the highest priority. For experiments in which h2 received highest weighting, the half-sib design is preferred and in the alternative case the circular design is the better choice. In the last scenario information on all three ratios is desired from the same experiment and in this case the circular design is the better selection since the circular design is almost globally more efficient than the half-diallel design. After choosing a mating design, the next decision is how many locations per experiment are required to optimize efficiency. For the half-sib design the number of locations required to optimize efficiency is dependent on both the ratio being estimated and the level of genetic control. A broad inference is that for h2 estimation a two location experiment is more efficient and for rB a five location experiment has the better efficiency. Estimation of any of the three ratios with a full-sib design is almost globally more efficient in two location experiments. The disparity between the behavior of the half-sib and full-sib designs with respect to the efficiency of location levels can be explained in terms of the genetic connectedness offered by the different designs. Genetic connectedness can be viewed as commonality of parentage among genetic entries. The more entries having a common parent the more connectedness is present. 23 The half-sib design is only connected across locations by the one common parent in a half-sib family in each replication. Full-sib designs are connected across locations in each replication by the full-sib cross plus the number of parents minus two (half-diallel) or three (circular) for each of the two parents in a cross. The connectedness in a full-sib design means each observation is providing information about many other observations. The result of this connectedness is that, in general, fewer observations (number of locations) are required for maximum efficiency. A General Approach to the Estimation Problem The estimation problems may be viewed in a broader context than the specific solutions in this chapter. The technique for comparison of mating designs and numbers of locations across levels of genetic determination may be construed, for the case of h2 estimation, to be the effect of these factors on the variance of a2^ estimates. Viewing the variance approximation formula, the conclusion may be reached that the variance of o2^ estimates is the controlling factor in the variance of h2 estimates since the other factors at these heritability levels are multiplied by constants which reduce their impact dramatically. Given this conclusion, the variance of h2 estimates is essentially the (3,3) element in 2{tr(QVÂ¡QVj)}'â€˜ (Eq. 2-11). Further, since the covariances of the other variance component estimates with o2^ estimates are small, the variance of a2^ estimates is basically determined by the magnitude of the (3,3) element of {tr(QVÂ¡QVj)} which is tr(QVgQVg). Thus, the variance of h2 estimates is minimized by maximizing tr(QVgQVg) with h2 used as an illustration because this simplification is possible. Considering the impact of changing levels of genetic control, while holding the mating and field designs constant, Vg is fixed, the diagonal elements of V are fixed at 11.5 because of our assumptions, and only the off-diagonal elements of V change with genetic control levels. Since Q is a direct function of V1, what we observe in Figure 2-1 comparing a design across 24 levels of genetic control are changes in V'1 brought about by changes in the magnitude of the off- diagonal elements of V (covariances among observations). The effect of positive (the linear model specifies that all off-diagonal elements in V are zero or positive) off-diagonal elements on V'1 is to reduce the magnitude of the diagonal elements and often also result in negative off- diagonal elements. If one increases the magnitude of the off-diagonal elements in V, then the magnitude of the diagonal elements of V'1 is reduced and the magnitude of negative off-diagonal elements is increased. Since tr(QVgQVg) is the sum of the squared elements of the product of a direct function of V'1 and a matrix of non-negative constants (Vg), as the diagonal elements of V'1 are reduced and the off-diagonal elements become more negative, tr(QVgQVg) must become smaller and the variance of h2 estimates increases. Mating designs may be compared by the same type of reasoning. Within a constant field design changes in mating design produce alterations in V. Of the three designs the half-sib produces a V matrix with the most zero off-diagonal elements, the circular design next, and the half-diallel the fewest number of zero off-diagonal elements. Knowing the effect of off-diagonal elements on the variance of h2 estimates, one could surmise that the variance of estimates is reduced in the order of least to most non-zero off-diagonal elements. This tenant is in basic agreement with the results in Figures 2-1 through 2-3. The effects of rB and y on the variance of h2 estimates can also be interpreted utilizing the above approach. In the results section of this chapter it is noted that decreasing the magnitude of rB and/or y causes full-sib designs to rise in efficiency relative to the half-sib design. In accordance with our previous arguments this would be expected since decreasing the magnitude of those two ratios causes a decrease in the magnitude of off-diagonal elements. More precisely, decreasing y results in the reduction of off-diagonal elements in V of the full-sib designs while not affecting the half-sib design, and decreasing rB results in the reduction of off-diagonal 25 elements in V of full-sib and half-sib designs. Relative increases in efficiency of full-sib designs result from the elements due to location by additive interaction occurring much less frequently in the half-sib designs; thus, the relative impact of reduction in rB in half-sib designs is less than that for full-sibs. Use of the Variance of a Ratio Approximation Use of Kendall and Stuartâ€™s (1963) first approximation (first-term Taylor series approximation) of the variance of a ratio has two major caveats. The approximation depends on large sample properties to approach the true variance of the ratio, i.e., with a small number of levels for random variables the approximation does not necessarily closely approximate the true variance of the ratio. Work by Pederson (1972) suggests that for approximating the variance of h2 at least ten parents are required in diallels before the approximation will converge to the true variance even after including Taylor series terms past the first derivative. Pedersonâ€™s work also suggests that the approximation is progressively worse for increasing heritability with low numbers of parents. Using the field design in this chapter (two locations,four blocks and six-tree row-plots), simulation work (10,000 data sets) has demonstrated that with a heritability of 0.1 using four parents in a half-diallel across two locations that the variance of a ratio approximation yields a variance estimate for h2 of 0.1 while the convergent value for the simulation was 0.08 (Huber unpublished data). One should remember the dependence of the first approximation of the variance of a ratio on large sample properties when applying the technique to real data. The second caveat is that the range of estimates of the denominator of the ratio cannot pass through zero (Kendall and Stuart 1963). This constraint is of no concern for h2; however, the structure of rB and y denominators allows unbiased minimum variance estimates of those denominators to pass through zero which means at one point in the distribution of the estimates 26 of the ratios they are undefined (the distributions of these ratio estimates are not continuous). Simulation has shown that the variances of rB and y are much greater than the approximation would indicate (Huber unpublished data). The discrepancy in variance of the estimates could be partially alleviated through using a variance component estimation technique which restricts estimates to the parameter space 0 < a2 < oo. Nevertheless, because of the two caveats, approximations of the variance of h2, rB and y estimates should be viewed only on a relative basis for comparisons among designs and not on an absolute scale. Additionally, the expectation of a ratio does not equal the ratio of the expectations (Hogg and Craig 1978). If a value of genetic ratios is sought so that the value equals the ratio of the expectations, then the appropriate way to calculate the ratio would be to take the mean of variance components or linear combinations of variance components across many experiments and then take the ratio. If the value sought for h2 is the expectation of the ratio, then taking the mean of many h2 estimates is the appropriate approach. Returning to the results from simulated data (10,000 data sets) where the h2 value was set at 0.1, using the ratio of the means of variance components rendered a value of 0.1 for h2, the mean of the h2 estimates returned a value of 0.08, and a Taylor series approximation of the mean of the ratio yielded 0.07 (Pederson 1972). Conclusions Results from this study should be interpreted as relative comparisons of the levels of the factors investigated. However, viewing the optimal design problem as illustrated in the discussion section of this chapter can provide insight to the more general problem. There is no globally most efficient number of locations, parents or mating design for the three ratios estimated even within the restricted range of this study; yet, some general conclusions can be drawn. For estimating h2 the half-sib design is always optimal or close to optimal in 27 terms of variance of estimation and efficiency. In the estimation of rB and y, the circular mating design is always optimal or near optimal in variance reduction and efficiency. Across numbers of parents within a mating design only the half-diallel shows optima for efficiency. The other mating designs have non-decreasing efficiency plots over the level of number of parent; so that while there is an optimal number of locations for a level of genetic control, the number of genetic entries per location is limited more by operational than efficiency constraints. Two locations is a near global optimum over five locations for the full-sib mating designs. Within the half-sib mating design optimality depends on the levels of h2 and rB: 1) for h2 estimation the optimal number of locations is inversely related to the level of h2, i.e. at the higher level two tests were optimal and at the lower level five tests were optimal; and 2) for rB estimation for the half-sib design, the optimal number of locations was also inversely related to the level of rB. Means of estimates from disconnected sets provide lower variance of estimation where the smaller experiments have higher efficiencies. Thus, disconnected sets are preferred according to number of locations for all mating designs and according to number of parents for the half- diallel mating design. In practical consideration of the optimal mating design problem, the results of this study indicate that if h2 estimation is the primary use of a progeny test then the half-sib mating design is the proper choice. Further, the circular mating design is an appropriate choice if the estimation of rB is more important than h2,. Finally, if a full-sib design is required to furnish information about dominance variance, the circular design provides almost globally better efficiencies for h2, rB, and y than the half-diallel. CHAPTER 3 ORDINARY LEAST SQUARES ESTIMATION OF GENERAL AND SPECIFIC COMBINING ABILITIES FROM HALF-DIALLEL MATING DESIGNS Introduction The diallel mating system is an altered factorial design in which the same individuals (or lines) are used as both male and female parents. A full diallel contains all crosses, including reciprocal crosses and seifs, resulting in a total of p2 combinations, where p is the number of parents. Assumptions that reciprocal effects, maternal effects, and paternal effects are negligible lead to the use of the half-diallel mating system (Griffing 1956, method 4) which has p(p-l)/2 parental combinations and is the mating system addressed in this chapter. Half diallels have been widely used in crop and tree breeding (Sprague and Tatum 1942, Gilbert 1958, Matzinger et al. 1959, Burley et al. 1966, and Squillace 1973) and the widespread use of this mating system continues today (Weir and Zobel 1975, Wilcox et al. 1975, Snyder and Namkoong 1978, Hallauer and Miranda 1981, Singh and Singh 1984, Greenwood et al. 1986, and Weir and Goddard 1986). Most of the statistical packages available treat fixed effect estimation as the objective of the program with random variables representing nuisance variation. Within this context a common analysis of half-diallel experiments is conducted by first treating genetic parameters as fixed effects for estimation of general (GCA) and specific (SCA) combining abilities and subsequently as random variables for variance component estimation (used for estimating heritabilities, genetic correlations, and general to specific combining ability variance ratios for 28 29 determining breeding strategies). This chapter focuses on the estimation of GCAâ€™s and SCAâ€™s as fixed effects. The treatment of GCA and SCA as fixed effects in OLS (ordinary least squares) is an entirely appropriate analysis if the comparisons are among parents and crosses in a particular experiment. If, as forest geneticists often wish to do, GCA estimates from disconnected experiments are to be compared, then methods such as checklots must be used to place the estimates on a common basis. Formulae (Griffing 1956, Falconer 1981, Hallauer and Miranda 1981, and Becker 1975) for hand calculation of general and specific combining abilities are based on a solution to the OLS equations for half-diallels created by sum-to-zero restrictions, i.e., the sum of all effect estimates for an experimental factor equals zero. These formulae will yield correct OLS solutions for sum- to-zero genetic parameters provided the data have no missing cells. If cell (plot) means are used as the basis for the estimation of effects, there must be at least one observation per cell (plot) where a cell is a subclassification of the data defined by one level of every factor (Searle 1987). An example of a cell is the group of observations denoted by ABÂ¡j for a randomized complete block design with factor A across blocks (B). If the above formulae are applied without accounting for missing cells, incorrect and possibly misleading solutions can result. The matrix algebra approach is described in this chapter for these reasons: 1) in forest tree breeding applications data sets with missing cells are extremely common; 2) many statistical packages do not allow direct specification of the half-diallel model; 3) the use of a linear model and matrix algebra can yield relevant OLS solutions for any degree of data imbalance; and 4) viewing the mechanics of the OLS approach is an aid to understanding the properties of the estimates. The objectives of this chapter are to (1) detail the construction of ordinary least squares (OLS) analysis of half-diallel data sets to estimate genetic parameters (GCA and SCA) as fixed effects, (2) recount the assumptions and mathematical features of this type of analysis, (3) 30 facilitate the readerâ€™s implementation of OLS analyses for diallels of any degree of imbalance and suggest a method for combining estimates from disconnected experiments, and (4) aid the reader in ascertaining what method is an appropriate analysis for a given data set. Methods Linear Model Plot means are used as the unit of observation for this analysis with unequal numbers of observations per plot. Plot (cell) means are always estimable as long as there is one observation per plot, and linear combinations of these means (least squares means) provide the most efficient way of estimating OLS fixed effects (Yates 1934). Throughout this chapter, estimates are denoted by lower case letters while the parameters are designated by upper case letters and matrices are in bold print. Using plot means as observations, a common scalar linear model for an analysis of a half- diallel mating design with p(p-l)/2 crosses planted at a single location in a randomized complete block design with one plot per block is yÂ¡jk = n + B, + GCAj + GCAk + SCAjk + eÂ¡jk 3-1 where yijk is the mean of the iâ€” block for the jk- cross; is an overall mean; B; is the fixed effect of block i for i = 1 to b; GCAj is the fixed general combining ability effect of the j- female parent or k- male parent, j or k = 1,. . ,,p (j k); SCA^ >s the fixed specific combining ability effect of parents j and k; and 31 eijk is the random error associated with the observation of the jk- cross in the i- block where eijk _ (0, a2e). Cross by block interaction as genotype by environment interaction is treated as confounded with between plot variation as for contiguous plots. The model in matrix notation is y = X/J + e 3-2 where y is the vector of observation vectors (nxl = n rows and 1 column) where n equals the number of observations; X is the design matrix (nxm) whose function is to select the appropriate parameters for each observation where m equals the number of fixed effect parameters in the model; /3 is the vector (mxl) of fixed effect parameters ordered in a column; and e is the vector (nxl) of deviations (errors) from the expectation associated with each observation. Ordinary Least Squares Solutions The matrix representation of an OLS fixed effects solution is b = (Xâ€™XyXâ€™y 3-3 where b is the vector of estimated fixed effect parameters, i.e., an estimate of /5, and X is the design matrix either made full rank by reparameterization, or a generalized inverse of Xâ€™X may be used. Inherent in this solution is the ordinary least squares assumption that the variance- 32 covariance matrix (V) of the observations (y) is equal to I a2,, where I is an nxn identity matrix. The elements of an identity matrix are lâ€™s on the main diagonal and all other elements are 0. Multiplying I by cre places between observations appears in the off-diagonal elements. Thus, V = Ia2e states that the variance of the observations is equal to a2e for each observation and there are no covariances between the observations (which is one direct result of considering genetic parameters as fixed effects). Sum-to-Zero Restrictions The design matrix presented in this chapter is reparameterized by sum-to-zero restrictions to (1) reduce the dimension of the matrices to a minimal size, and (2) yield estimates of fixed effects with the same solution as common formulae in the balanced case. Other restrictions such as set-to-zero could also be applied so the discussion that follows treats sum-to-zero restrictions as a specific solution to the more general problem which is finding an inverse for Xâ€™X. The subscripts â€™oâ€™ and â€™sâ€™ refer to the overparameterized model and the reparameterized model with sum-to-zero restrictions, respectively. The matrix X0 of Figure 3-1 is the design matrix for an overparameterized linear model (Milliken and Johnson 1984, page 96). Overparameterization means that the equations are written in more unknowns (parameters, in this case 13) than there are equations (number of observations minus degrees of freedom for error, in this case 12 - 5 = 7) with which to estimate the parameters. Reparameterization as a sum-to-zero matrix overcomes this dilemma by reducing the number of parameters through making some of the parameters linear combinations of others. Sum-to-zero restrictions make the resulting parameters and estimates sum to zero even though 33 the unrestricted parameters (for example, the true GCA values as applied to a broader population) do not necessarily sum-to-zero within a diallel. This is the problem of comparability of GCA estimates from disconnected experiments. ym yus ym ym ym ym â€” y*i2 y2i3 y2M yi23 y224 ym . v B, GCA, GCAj gca3 gca4 sca12 scaI3 sca14 SC Ay SC Ay SCAâ€ž l 1 0 1 1 0 0 1 0 0 0 0 0 ' l 1 0 1 0 1 0 0 1 0 0 0 0 B, i 1 0 1 0 0 1 0 0 1 0 0 0 B, i 1 0 0 1 1 0 0 0 0 1 0 0 GCA, l 1 0 0 1 0 1 0 0 0 0 1 0 gca2 l 1 0 0 0 1 1 0 0 0 0 0 1 GCA, l 0 1 1 1 0 0 1 0 0 0 0 0 gca4 i 0 1 1 0 1 0 0 1 0 0 0 0 sca,2 i 0 1 1 0 0 1 0 0 1 0 0 0 sca,3 l 0 1 0 1 1 0 0 0 0 1 0 0 sca,4 l 0 1 0 1 0 1 0 0 0 0 1 0 SC Ay i 0 1 0 0 1 1 0 0 0 0 0 1 SC Ay SCAy . y = x0 p0 Figure 3-1. The overparameterized linear model for a four-parent half-diallel planted on a single site in two blocks displayed as matrices. The design matrix (XJ and parameter vector (0O) are shown in overparameterized form. 1 â€™s and 0â€™s denote the presence or absence of a parameter in the model for the observed means (data vector, y). The parameters displayed above the design matrix label the appropriate column for each parameter. Error vector not exhibited. H B, GCA, GCA2 GCA3 SCA12 SCA,3 0 1 -1 -1 1 0 0 1 -1 -1 1 0 ' el 12 B, el 13 ell4 GCA, el23 el24 GCA, + el34 e212 GCA, e213 e214 sca,2 e223 e224 sca,3 . e234 y = Xs)Ss + e. Figure 3-2. The linear model for a four-parent half-diallel planted on a single site in two blocks displayed as matrices. The design matrix (XJ and the parameter vector (fij are presented in sum-to-zero format. The parameters displayed above the design matrix label the appropriate column for each parameter. To illustrate the concept of sum-to-zero estimates versus population parameters, we use the expectation of a common formula. Becker (1975) gives equation 3-4 (which for balanced 34 cases is equivalent to g, = ((p-l)/(p-2))(Zj - Z )) as the estimate for general combining ability for the jâ€” line with p equalling the number of parents and Z^ equalling the site mean of the j x k cross. This equation yields the same solution as the matrix equations with no missing plots or crosses and with a design matrix which contains the sum-to-zero restrictions. An evaluation of this formula in a four-parent half-diallel planted in b blocks for the GCA of parent 1 is obtained by substituting the expectation of the linear model (equation 3-1) for each observation: gj = (l/foip^XpZj. -2Z.) T4 E{g,} = E{(l/(p(p-2)))(pZ, - 2Z )} E{g,} = 3/4(GCA,) - 1/4(GCA2 + GCA3 + GCA4) + 1/4(SCA12 + SCA13 + SCAU) - 1/4(SCA23 + SCA.4 + SCAj,). The result of equation 3-4 is obviously not GCA, from the unrestricted model (equation 3-1). Thus, g,, an estimable function and an estimate of parameter GCA,S (the estimate of the GCA of parent 1 given the sum-to-zero restrictions), does not have the same meaning as GCA, in the unrestricted model. An estimable function is a linear combination of the observations; but in order for an individual parameter in a model to be estimable, one must devise a linear combination of the observations such that the expectation has a weight of one on the parameter one wishes to estimate while having a weight of zero on all other parameters. A solution such as this does not exist for the individual parameters in the overparameterized model (equation 3-1). So, although the sum-to-zero restricted GCA parameters and estimates are forced to sum-to-zero for the sample of parents in a given dial lei, the unrestricted GCA parameters only sum-to-zero across the entire population (Falconer 1981) and an evaluation of GCA,S demonstrates that the estimate contains other model parameters. The result of sum-to-zero restrictions is that the degrees of freedom for a factor equals the number of columns (parameters) for that factor in X, (Figure 3-2). Thus, a generalized 35 inverse for X,â€™X, is not required since the number of columns in the sum-to-zero X, matrix for each factor equals the degrees of freedom for that factor in the model (X, is full column rank and provides a solution to equation 3-3). Components of the Matrix Equation The equational components of 3-2 are now considered in greater detail. Data vector v Observations (plot means) in the data vector are ordered in the manner demonstrated in Figure 3-1. For our example Figure 3-1 is the matrix equation of a four parent half-diallel mating design planted in two randomized complete blocks on a single site. There are six crosses present in the two blocks for a total of 12 observations in the data vector, y. The observations are first sorted by block. Second, within each block the observations should be in the same sequence (for simplicity of presentation only). This sequence is obtained by assigning numbers 1 through p to each of the p parents and then sorting all crosses containing parent 1 (whether as male or female) as the primary index in descending numerical order by the other parent of the cross as the secondary index. Next all crosses containing parent 2 (primary index, as male or female) in which the other parent in the cross (secondary index) has a number greater than 2 are then also sorted in descending order by the secondary index. This procedure is followed through using parent p-1 as the primary index. Design matrix and parameter vector. X and 6 The design matrix for a model is conceptually a listing of the parameters present in the model for each observation (Searle 1987, page 243). In Figure 3-1, y and ft are exhibited and the parameters in ft are displayed at the tops of the columns of X0 (a visually correct interpretation of the multiplication of a matrix by a vector). For each observation in y, the scalar 36 model (equation 3-1) may be employed to obtain the listing of parameters for that observation (the row of the design matrix corresponding to the particular observation). The convention for design matrices is that the columns for the factors occur in the same order as the factors in the linear model (equation 3-1 and Figure 3-1). Since design matrices can be devised by first creating the columns pertinent to each factor in the model (submatrices) and then horizontally and/or vertically stacking the submatrices, the discussion of the reparameterized design matrix formulation will proceed by factor. Mean The first column of X, is for n and is a vector of lâ€™s with the number of rows equalling the number of observations (Figure 3-2). The linear model (equation 3-1) indicates that all observations contain /r and the deviation of the observations from n is explained in terms of the factors and interactions in the model plus error. Block The number of columns for block is equal to the number of blocks minus one (column 2, XJ. Each row of a block submatrix consists of lâ€™s and 0â€™s or -lâ€™s according to the identity of the observation for which the row is being formed. The normal convention is that the first column represents block 1 and the second column block 2, etc. through block b-1. Since we have used a sum-to-zero solution (Â£^Â¡=0), the effect due to block b is a linear combination of the other b-1 effects, i.e., bb = -E- = Â¡bÂ¡ which in our example is 0 = b, + b2 and b2 = -b,. Thus, the row of the block submatrix for an observation in block b (the last block) has a -1 in each of the b-1 columns signifying that the block b effect is indeed a linear combination of the other b-1 block effects. Columns 2 and 3 of Xâ€ž (Figure 3-1) have become column 2 of X, (Figure 3-2). 37 General combining ability This submatrix of X, is slightly more complex than previous factors as a result of having two levels of a main effect present per observation, i.e., the deviation of an observation from n is modeled as the result of the GCAâ€™s of both the male and female parents (equation 3-1). Again we have imposed a restriction, Ejgca^O. Since GCA has p-1 degrees of freedom, the submatrix for GCA should have p-1 columns, i.e., gca,, = -Ejjgcaj. The GCA submatrix for X, (columns 3 through 5 in Figure 3-2) is formed from Xâ€ž (columns 4 through 7 in Figure 3-1) according in the same manner as the block matrix: (1) add minus one to the elements in the other columns along each row containing a one for gca,, (p = 4 in our example); and (3) delete the column from X0 corresponding to gca,,. The GCA submatrix has p(p-l)/2 rows (the number of crosses). This, with no missing cells (plots), equals the number of observations per block. To form the GCA factor submatrix for a site, the GCA submatrix is vertically concatenated (stacked on itself) b times. This completes the portion of the X, matrix for GCA. Specific combining ability In order to facilitate construction of the SCA submatrix, a horizontal direct product should be defined. A horizontal direct product, as applied to two column vectors, is the element by element product between the two vectors (SAS/IML1 Userâ€™s Guide 1985) such that the element in the iâ€” row of the resulting product vector is the product of the elements in the iâ€” rows of the two initial vectors. The resultant product vector has dimension n x 1. A horizontal direct product is useful for the formation of interaction or nested factor submatrices where the initial matrices represent the main factors and the resulting matrix represents an interaction or a nested factor (product rule, Searle 1987). 'SAS/IML is the registered trademark of the SAS Institute Inc. Cary, North Carolina. 38 The SCA submatrix can be formulated from the horizontal direct products of the columns of the GCA sub-matrix in X, (Figure 3-2). The results from the GCA columns require manipulation to become the SCA submatrix (since degrees of freedom for SCA do not equal those of an interaction for a half-diallel analysis), but the GCA column products provide a convenient starting point. The column of the SCA submatrix representing the cross between the jâ€” and the k- parents (SCAjJ is formed as the product between the GCAj and GCAk columns (Figure 3-3). The GCA columns in Figure 3-2 are multiplied in this order: column 1 times column 2 forming the first SCA column, column 1 times column 3 forming the second SCA column, and column 2 times column 3 forming the third SCA column (Figure 3-3). With four parents (six crosses) there are three degrees of freedom for GCA (p-1) and two degrees of freedom for SCA (6 crosses - 3 for GCA - 1 for the mean). Since SCA has only two degrees of freedom, a sum-to-zero design matrix can have only two columns for SCA. Imposing the restriction that the sum of the SCAâ€™s across all parents equals zero is equivalent to making the last column for the SCA submatrix (Figure 3-3) a linear combination of the others (Figure 3-2). The procedure for deleting the third column product is identical to that for the GCA submatrix: add minus one to every element in the rows of the remaining SCA columns in which a one appears in the column which is to be deleted (Figure 3-2, columns 6 and 7). The number of rows in the SCA submatrix equals the number observations in a block and must be vertically concatenated b times to create the SCA submatrix for a site. An algebraic evaluation of SCA sum-to-zero restrictions requires that EjScajk = 0 for each k and that E^sca^ = 0; thus, for observations in the iâ€” block with i serving to denote the row of the SCA submatrix in block i, scaÂ¡14 = -scail2 -scail3 and entries in the submatrix row for yil4 are -lâ€™s. The estimate for sca^ equals scaÂ¡14 because scai23 is the negative of the sum of the independently estimated SCAâ€™s (scaj12 and scail3) from the restriction that the sum of the SCAâ€™s 39 across all parents equals zero. Similarly, by sum-to-zero definition sca^ = -sca^ -sea,,,, and by substitution sca^ = -(-scaÂ¡12 -scaÂ¡13) -scai12 = scaÂ¡13. By the same protocol, it can be shown that sca^ = scaÂ¡12. The elements in the rows of the SCA submatrix are lâ€™s, -lâ€™s and 0â€™s in accordance with the algebraic evaluation. Thus, while it may seem that there should be 6 SCA values (one for each cross), only 2 can be independently estimated and the remaining 4 are linear combinations of the independently estimated SCAâ€™s. Again the SCA sum-to-zero estimates are not equal to the parametric population SCAâ€™s. An analogous illustration for SCA to that for GCA would show that the estimable function (linear combination of observations) for a given SCAe contains a variety of other parameters. OBS. GCA,xGCA2 GCA,xGCA3 GCA2xGCA3 sca12 SCA,J sca23 Yâ€ž2 1 0)(D=l (1)(0)=0 (1)(0)=0 1 0 0 YÂ¡u (D(0)=0 (1)(1)=1 (0)(1)=0 0 1 0 Ym4 (0)(-l)=0 (0)(-l)=0 (-1)(-1)=1 0 0 1 Yj23 (0)(1)=0 (0)(1)=0 (1)(1)=1 0 0 1 Y*4 (-1)(0)=0 (-1)(-1)=1 (0)(-l)=0 0 1 0 YÂ« J (-!)(-!) = 1 (-1)(0)=0 (-1)(0)=0 1 0 0 Figure 3-3. Intermediate result in SCA submatrix generation (SCA columns as horizontal direct products of GCA,, GCA2, and GCA3 columns within a block). The SCAjk column is the horizontal direct product of the columns for GCAj and GCAk. Estimation of Fixed Effects GCA parameters The GCA parameters can be estimated (without mean, block, and SCA in the design matrix) through the use of equation 3-3, if there are no missing cell means (plots) for any cross and no missing crosses. The design matrix consists only of the GCA submatrix. This design matrix has {p-1} (for GCAâ€™s) columns (the third through the fifth columns of XJ. The b vector is an estimate of the GCA portion of as in Figure 3-2 and the linear combinations for the estimation of gca,, is gca,, = -E?=Â¡gcaÂ¡. Parameters for any of the factors can be estimated 40 independently using the pertinent submatrix as long as there are no missing cell means (plots) and no missing crosses; this uses a property known as orthogonality. Orthogonality requires that the dot product between two vectors equals zero (Schneider 1987, page 168). The dot product (a scalar) is the sum of the values in a vector obtained from the horizontal direct product of two vectors. For two factors to be orthogonal, the dot products of all the column vectors making up the section of the design matrix for one factor with the column vectors making up the portion of the design matrix for the second must be zero. If all factors in the model are orthogonal, then the X,â€™X, matrix is block diagonal. A block-diagonal X,â€™X, matrix is composed of square factor submatrices (degrees of freedom x degrees of freedom) along the diagonal with all off-diagonal elements not in one of the square factor submatrices equalling zero. A property of block-diagonal matrices is that the inverse can be calculated by inverting each block separately and replacing the original block in the full Xâ€™X matrix by the inverted block. Because the blocks can be inverted separately and all other off-diagonal elements of the inverse are zero, the effects for factors which are orthogonal to all other factors may be estimated separately, i.e., there are no functions of other sum-to-zero factors in the sum-to-zero estimates. Mean, block. GCA and SCA parameters All parameters are estimated simultaneously by horizontally concatenating the mean, block, GCA, and SCA matrices to create X,. Equation 3-3 is again utilized to solve the system of equations. The b vector for the four parent example is an estimate of 0, of Figure 3-2. Again, one parameter is estimated for each column in the X, matrix and all parameter estimates not present are linear combinations of the parameter estimates in the b vector. So K is equal to - XÂ¡- = Â¡bÂ¡ and gca,, is equal to -Ejjgcaj. The linear combinations for SCA effects can be obtained by reading along the row of the SCA submatrix associated with the observation containing the 41 parameter, i.e., in Figure 3-2 the observation contains the effect sca^ which is estimated as the linear combination -scaÂ¡12 -scaÂ¡13. This completes the estimation of fixed effect parameters from a data set which is balanced on a plot-mean basis. Since field data sets with such completeness are a rarity in forestry applications, the next step is OLS analysis for various types of data imbalance. Calculations of solutions based on a complete data set and simulated data sets with common types of imbalance are demonstrated in numerical examples. Numerical Examples The data set analyzed in the numerical examples is from a five-year-old, six-parent half- diallel slash pine (Pirns elliottii var. elliottii Engelmn) progeny test planted on a single site in four complete blocks. Each cross is represented by a five-tree row plot within each block. Total height in meters and diameter at breast height (dbh in centimeters) are the traits selected for analysis. The data set is presented in Table 3-1 so that the reader may reconstruct the analysis and compare answers with the examples. The numbers 1 through 6 were arbitrarily assigned to the parents for analysis. Because of unequal survival within plots, plot means are used as the unit of observation. Balanced Data (Plot-mean Basis) The sum-to-zero design matrix for the balanced data set has (4 blocks)x(15 crosses) = 60 rows (which equals the number of observations in y) and has the following columns: one column for /i, three columns for blocks (b-1), five columns for GCA (p-1), and nine columns for SCA (15 crosses - 5 - 1) for a total of 18 columns. With sixty plot means (degrees of freedom) and 18 degrees of freedom in the model, subtracting 18 from 60 yields 42 degrees of freedom for 42 error which matches the degrees of freedom for cross by block interaction, thus verifying that degrees of freedom concur with the number of columns in the sum-to-zero design matrix. To illustrate the principle of orthogonality in the balanced case, the Xâ€™X and (Xâ€™X)'1 matrices may be printed to show that they are block diagonal. In further illustration, the effects within a factor may also be estimated without any other factors in the design matrix and compared to the estimates from the full design matrix. The vectors of parameter estimates for height and dbh (Table 3-2) were calculated from the same X, matrix because height and dbh measurements were taken on the same trees. In other words, if a height measurement was taken on a tree, a dbh measurement was also taken, so the design matrices are equivalent. Missing Plot To illustrate the problem of a missing plot, the cross, parent two by parent three, was arbitrarily deleted in block one (as if observation y123 were missing). This deletion prompts adjustments to the factor matrices in order to analyze the new data set. The new vector of observations (y) now has 59 rows. This necessitates deletion of the row of the design matrix (XJ in block 1 which would have been associated with cross 2x3. This is the only matrix alteration required for the analysis. Thus, the resultant X, matrix has 60 - 1 = 59 rows and 18 columns. With 59 means in y and 18 columns in X,, the degrees of freedom for error is 41. Comparisons between results of the analyses (Table 3-2) of the full data set and the data set missing observation y123 reveal that for this case the estimates of parameters have been relatively unaffected by the imbalance (magnitudes of GCAâ€™s changed only slightly and rankings by GCA were unaffected). 43 Table 3-1. Data set for numerical examples. Five-year-old slash pine progeny test with a 6- parent half-diallel mating design present on a single site with four randomized complete blocks and a five-tree row plot per cross per block. Block Female Male Mean Height Mean DBH Within Plot Variance Variance Height DBH Tree per Plot Meters Centimeters m cm2 1 1 2 2.6899 3.810 0.9800 3.484 4 1 1 3 1.9080 2.134 1.4277 3.893 5 1 1 5 3.1242 4.445 0.4487 1.656 4 1 1 6 2.4933 3.200 0.8488 5.664 5 1 2 5 1.4783 1.588 0.6556 2.167 4 1 2 6 2.7026 3.471 0.1136 0.344 3 1 3 2 3.0480 4.699 0.2341 0.968 4 1 3 5 3.4991 5.131 0.0945 0.271 5 1 3 6 2.4003 2.794 0.5149 1.548 4 1 4 1 3.3955 4.928 0.1489 0.761 5 1 4 2 3.4290 5.144 0.7943 3.285 4 1 4 3 2.5298 2.984 0.9557 4.188 4 1 4 5 2.4155 3.175 0.5936 2.946 4 1 4 6 3.2004 4.521 1.7034 7.594 5 1 5 6 2.2403 2.794 1.0433 6.280 4 2 1 2 3.5662 5.080 0.9560 2.903 5 2 1 3 2.6335 3.353 0.7695 3.497 5 2 1 5 3.6942 5.893 0.0573 0.432 5 2 1 6 3.4808 4.928 0.9222 2.890 5 2 2 5 3.4260 4.877 0.7017 2.432 5 2 2 6 2.4282 3.302 0.0616 0.452 3 2 3 2 3.0480 4.064 0.0192 0.301 4 2 3 5 2.8895 4.013 0.1957 0.690 5 2 3 6 1.9406 1.863 0.0560 0.408 3 2 4 1 3.0114 3.962 1.9753 6.342 5 2 4 2 3.6454 5.283 0.1731 0.787 5 2 4 3 2.9566 3.861 0.0506 0.174 5 2 4 5 2.8118 4.382 1.1336 5.435 4 2 4 6 3.2674 4.318 1.1211 4.354 5 2 5 6 3.7917 5.893 0.0848 0.497 5 3 1 2 2.2961 2.625 0.3914 1.699 3 3 1 3 2.8956 4.128 1.2926 4.532 4 3 1 5 2.5359 3.607 0.8284 4.303 5 3 1 6 2.9032 3.937 0.8252 4.064 4 3 2 5 2.7737 4.064 0.9829 3.226 2 3 2 6 1.2040 0.635 0.4464 0.806 2 3 3 2 2.9870 4.191 0.9049 2.989 4 3 3 5 2.8407 3.962 0.7309 3.632 5 3 3 6 1.3564 0.000 0.1677 0.000 2 3 4 1 2.6746 3.620 0.8463 2.984 4 3 4 2 2.7066 3.353 0.5590 1.787 5 3 4 3 3.4198 4.623 0.3509 0.690 5 3 4 5 3.3299 4.953 0.4102 1.226 4 3 4 6 3.4564 4.978 0.8369 3.503 5 3 5 6 3.2614 4.826 1 4 1 2 1.8974 2.476 1.0160 3.629 4 4 1 3 1.3005 0.508 0.2019 0.774 3 4 1 5 2.0726 2.540 1.2235 5.097 3 4 1 6 1.8821 1.778 0.4728 3.312 4 4 2 5 1. 64 1.334 0.5354 2.382 4 4 2 6 1.5392 0.635 0.0376 0.806 2 4 3 2 1.8898 2.032 0.7364 1.892 4 4 3 5 2.5146 3.620 0.0876 0.446 4 4 3 6 1.8389 2.201 0.0941 0.280 3 4 4 1 2.3348 2.591 0.3816 2.722 5 4 4 2 1.7272 1.693 2.1640 8.602 3 4 4 3 1.6581 1.524 0.0537 0.903 5 4 4 5 2.1184 2.286 0.3137 2.366 4 4 4 6 1.5545 1.422 0.4803 1.019 5 4 5 6 1.4122 1.693 0.0338 0.150 3 44 Table 3-2. Numerical results for examples of data imbalance using the OLS techniques presented in the text. Five Estimate Balanced* Missing Plotb Missing Cross' Missing Crosses0 oP Height DBH Height DBH Height DBH Height DBH M 2.5830 3.362 2.5787 3.346 2.5386 3.260 2.4980 3.149 B, 0.1203 0.292 0.1074 0.245 0.1074 0.245 0.1393 0.309 0.5230 0.976 0.5274 0.992 0.5386 1.023 0.6041 1.140 b3 0.1264 0.205 0.1308 0.220 0.1180 0.187 0.0689 0.087 GCA, 0.0706 0.144 0.0760 0.163 0.1260 0.270 0.1361 0.232 gca2 -.1077 -.180 -.1186 -.220 -.2186 -.434 -.2371 -.493 GCAj -.1316 -.347 -.1426 -.386 -.2426 -.601 -.3972 -.952 GCA, 0.2489 0.398 0.2544 0.417 0.3044 0.524 0.4241 0.804 GCAS 0.1265 0.489 0.1320 0.509 0.1820 0.616 0.1746 0.646 SCA^ 0.0665 0.172 0.0763 0.208 0.1663 0.400 SCA,j -.3374 -.628 -.3277 -.592 -.2377 -.400 SCAm -.0484 -.128 -.0550 -.152 -.1150 -.280 -.2041 -.410 sca,5 0.0766 0.126 0.0700 0.102 0.0100 -.026 0.0480 0.094 SC Ay 0.3995 0.912 0.3600 0.771 sca24 0.1528 0.289 0.1627 0.324 0.2527 0.517 0.1920 0.408 SCAjj -.3185 -.706 -.3084 -.670 -.2187 -.478 SCAâ€ž -.0592 0.164 -.0493 0.129 0.0406 0.064 0.1163 0.246 SCAjj 0.3580 0.677 0.3679 0.712 0.4793 0.905 â€œwhere (numerical examples are for height) b4= -Efa = -.7697; gca^ = -Efecaj = -.2067; sca^ = -Escajk for j or k = p and p = 1,2,3 then sca16 = .2428, sca^ = -.3002, and sca^ = -.3608; sca45 = -E^sca,, = -.2898, e = independently estimated seaâ€™s 1, ... ,9; sca^ = sca12 + sca13 + sca,5 + sca^ + sca^ + sca35 = .2446; and sea.* = sca12 + sca13 + sca14 + sca^, + sca^ + sea-* = .1737. bwhere the linear combinations for parameter estimates are identical to the balanced example. cwhere sca,,6 = -Escajk for j or k = p and p = 1 to 3; sca45 = -EÂ®scae e = independently estimated SCAâ€™s 1,. . .,8; sca^ = sca12 + sca13 + sca15 + scajj + sca35; and sca^ = sca12 + scaI3 + sca14+ SC324 + sca^. dwhere sca16 = -sca14 -scaI5, sca^ = -SC324, sca^ = sca^, sca^ = sca15, sca^ = sca14 + SC324 + sca^, and scajj = the negative of the sum of the four independently estimated seaâ€™s. â€œwhere for all cases linear combinations for block and gca are the same as in the balanced case. 45 Missing Cross Another common form of imbalance in diallel data sets, the missing cross, is examined through arbitrary deletion of the 2 x 3 cross from all blocks, i.e., y123, y^, y323, y423 are missing in the data vector. This type of imbalance is representative of a particular cross that could not be made and is therefore missing from all blocks. The matrix manipulations required for this analysis are again presented by factor. For appropriate SCA restrictions, the data vector and design matrix should be ordered so that the p1^ parent has no missing crosses. Since the labeling of a parent as parent p is entirely subjective, any parent with all crosses may be designated as parent p. The previous labelling directions are necessary since we generate the SCA submatrix as horizontal direct products of the columns of the GCA submatrix; and to account for missing crosses, the horizontal direct product for each particular missing parental combinations are not calculated which sets the missing SCAâ€™s to zero. If there is a cross missing from those of the p- parent, we cannot account for the missing cross with this technique (Searle 1987, page 479). For the mean, block, and GCA submatrices, the adjustment for the missing cross dictates deleting the rows in the submatrices which would have corresponded to the y^ observations. The SCA submatrix must be reformed since a degree of freedom for SCA and hence a column of the submatrix has been lost. The SCA submatrix is reinstituted from the GCA horizontal direct products (remembering that one cross, 2x3, no longer exists and therefore that product GCA2 x GCA3 is inappropriate). Dropping the column for SCA^, is equivalent to setting SCA^ to zero (Searle 1987) so that the remaining SCAâ€™s will sum-to-zero. After that, the reformation is according to the established pattern. With one missing cross there are now 56 observations and hence 56 degrees of freedom available. The columns of the X, matrix are now: one for the mean, three for block, five for GCA, and eight for SCA for a total of 17 columns. The 46 remaining degrees of freedom for error is 39, matching the correct degrees of freedom ((14- l)x(4-l) = 39). For the missing cross example /x is no longer equivalent to the mean of the plot means since /x = 2.5386 and Eijkyijk)/N = 2.5715 where N = 56 (number of plot means). This is the result of GCA effects which are no longer orthogonal to the mean. Check the X/X, matrix or try estimating factors separately and compare to the estimates when all factors are included in X,. If formulae for balanced data (Becker 1975, Falconer 1981, and Hallauer and Miranda 1981) are applied to unbalanced data (plot-mean basis) estimates of parameters are no longer appropriate because factors in the model are no longer independent (orthogonal). Applying Beckerâ€™s formula which uses totals of cross means for a site (y jk) to the missing cross example yields: gca, = .2992, gca2 = -.5649, gca3 = -.5888, gca4 = .4665, gca, = .3552, and gca*; = .0219. These answers are very different in magnitude from those in Table 3-2 for this example and gca,, also has a different sign. Employing these formulae in the analysis of unbalanced data is analogous to matrix estimation of GCAâ€™s without the other factors in the model which is inappropriate. Several Missing Crosses The concluding example (Table 3-2) is a drastically unbalanced data set resulting from the arbitrary deletion of five crosses (1x2, 1 x 3, 2 x 3, 3 x 5, and 4 x 5). The matrix manipulation for this example is an extension of the previous one cross deletion example. Rows corresponding to yil2, yil3, y^, y^, and yi45 are deleted from the mean, block and GCA submatrices for all blocks. The SCA matrix (now 4 columns = 10 crosses -5-1 =4 degrees of freedom) is again reformed with only the relevant products of the GCA columns. Counting degrees of freedom (columns of the sum-to-zero design matrix), the mean has one, block has 47 three, GCA has five, and SC A has four degrees of freedom for a total of 13. Error has (4-l)(10- 1) = 27 degrees of freedom. Totaling degrees of freedom for modeled effects and error yields 40 which equals the number of plot means. In increasingly unbalanced cases (Table 3-2), the spread among the GCA estimates tends to increase with increasing imbalance (loss of information). This is a general feature of OLS analyses and the basis for the feature is that the spread among the GCA estimates is due to both the innate spread due to additive genetics effects as well as the error in estimation of the GCAâ€™s. When there is less information, GCA estimates tend to be more widely spread due to the increase in the error variance associated with their estimation. This feature has been noted (White and Hodge 1989, page 54) as the tendency to pick as parental winners individuals in a breeding program which are the most poorly tested. Discussion After developing the OLS analysis and describing the inherent assumptions of the analysis, there are four important factors to consider in the interpretation of sum-to-zero OLS solutions: (1) the lack of uniqueness of the parameter estimates; (2) the weights given to plot means (yijk) and in turn site means (y jk) for crosses in data sets with missing crosses in parameter estimation; (3) the arbitrary nature of using a diallel mean (perforce a narrow genetic base) as the mean about which the GCAâ€™s sum-to-zero; and (4) the assumption that the covariance matrix for the observations (V) is Ia2e. Uniqueness of Estimates Sum-to-zero restrictions furnish what would appear to be unique estimates of the individual parameters, e.g. GCA,, when, in fact, these individual parameters are not estimable 48 (Graybill 1976, Freund and Littell 1981, and Milliken and Johnson 1984). The lack of estimability is again analogous to attempting to solve a set of equations in n unknowns with t equations where n is greater than t. Therefore, an infinite number of solutions exist for 0. There are quantities in this system of equations that are unique (estimable), i.e., the estimate is invariant regardless of the restriction (sum-to-zero or set-to-zero) or generalized inverse (no restrictions) used (Milliken and Johnson 1984) and the estimable functions include sum-to-zero GCA and SCA estimates since they are linear combinations of the observations; but, these estimable quantities do not estimate the individual parametric GCAâ€™s and SCAâ€™s of the overparameterized model (equation 3-4) since there is no unique solution for those parameters. Weighting of Plot Means and Cross Means in Estimating Parameters With at least one measurement tree in each plot and with plot means as the unit of observation, use of the matrix approach produces the same results as the basic formulae. The weight placed on each plot mean in the estimation of a parameter can be determined by calculating (X/XJ 'X,â€™ which can be viewed as a matrix of weights W so that equation 3-3 can be written as b = Wy. The matrix W has these dimensions: the number of rows equals the number of parameters in /5S and the number of columns equals the number of plot means in y. The iâ€” row of the W contains the weights applied to y to estimate the i- parameter in b (bÂ¡). In the discussion which follows gca, is utilized as b,. If there are no missing plots, the cross mean in every block (yijlc) has the same weighting and weights can be combined across blocks to yield the weight on the overall cross mean (y jk). It can be shown that for the balanced numerical example gca, is calculated by weighting the overall cross means containing parent 1 by 1/6 and weighting all overall cross means not 49 GCA1 GCA2 GCA3 GCA4 GCA5 GCA6 GCA1 GCA2 GCA3 GCA4 GCA5 GCA6 1/6 .16667 1/6 .16667 1/6 .16667 1/6 .16667 1/6 .16667 . 14583 missing -1/12 -.08333 - 1/ 12 -.08333 - 1/12 -.08333 -1/12 -.08333 . 14583 missing missing missing - 1/ 12 -.08333 -1/12 -.08333 - 1/12 -.08333 . 18056 .22549 -. 104 17 .0 196 1 -. 104 17 -.11765 - 1/12 -.08333 - 1/12 -.08333 . 18056 .3 1372 -. 104 17 -.27451 -. 104 17 missing -.06944 missing \ AAXXAAA/ -1/12 -.08333 . 18056 .294 12 -. 104 17 .08824 104 17 -.04902 -.06944 -.29412 -.06944 -.20588 jÂ§Â§ 5/6 -1/6 -1/6 -1/6 -1/6 -1/6 Figure 3-4. Weights on overall cross means (y jk) for the three numerical examples for estimation of GCA,. The weights for the balanced example (above the diagonal) are presented in both fractional and decimal form. The weights for the one-cross missing and the five-crosses missing are presented as the upper number and lower number, respectively, in cells below the diagonal. The marginal weights on GCA parameters (right margin) do not change although cells are missing. 50 containing parent 1 by -1/12. Figure 3-4 (above the diagonal) demonstrates the weightings on the overall cross means for the balanced numerical example as well as the marginal weighting on the GCA parameters. These marginal weightings are obtained by summing along a row and/or column as one would to obtain the marginal totals for a parent (Becker 1975). One feature of sum-to-zero solutions is that these marginal weightings will be maintained no matter the imbalance due to missing crosses, as will be seen by considering the numerical examples for a missing cross (Figure 3-4 below the diagonal, upper number) and five missing crosses (Figure 3-4 below the diagonal, lower number). The marginal weights have remained the same as in the balanced case while the weights on the cross means differ among the crosses containing parent 1 and also among the crosses not containing parent 1. In the five missing crosses example, crosses yM and y -26 even receive a positive weighting where in the prior examples they had negative weighting. The expected value in all three examples is GCAls (for sum-to-zero) despite the apparently nonsensical weightings to cross means with missing crosses; however, the evaluation of the estimates in terms of the original model changes with each new combination of missing cells, i.e., y ^ and y M have a positive weight in the five missing crosses example in GCAt estimation. Whether this type of estimation is desirable with missing cell (cross) means has been the subject of some discussion (Speed, Hocking and Hackney 1978, Freund 1980, and Milliken and Johnson 1984). The data analyst should be aware of the manner in which sum-to-zero treats the data with missing cell means and decide whether that particular linear combination of cross means estimating the parameter is one of interest, realizing that the meaning of the estimates in terms of the original model is changing. 51 Diallel Mean The use of the mean for a half-diallel as the mean around which GCAâ€™s sum-to-zero is not satisfactory in that the diallel mean is the mean of a rather narrow genetically based population, and in particular that the comparisons of interest are not usually confined to the specific parents in a specific diallel on a particular site. A checklot can be employed to represent a base population against which comparison of half- or full-sib families can be made to provide for comparison of GCA estimates from other tests (van Buijtenen and Bridgwater 1986). Mathematically, when effects are forced to sum-to-zero around their own mean, the absolute value of the GCAâ€™s is reflective of their value relative to the mean of the group. Even if the parents involved in the particular diallel were all far superior to the population mean for GCA, GCAâ€™s calculated on an OLS basis would show that some of these GCAâ€™s were negative. If the GCAâ€™s of the diallel parents were in fact all below the population mean, the opposite and equally undesirable result ensues. For disconnected diallels together on a single site, an OLS analysis would yield GCA estimates that sum-to-zero within each diallel since parents are nested within diallels. Unless the comparisons of interest are only in the combination of the parents in a specific diallel on a specific site, the checklot alternative is desirable. A method for obtaining the desired goal of comparable GCAâ€™s from disconnected experiments, disregarding the problem of heteroscedasticity, is to form a function from the data which yields GCA estimates properly located on the number scale. Such a function can be formed (using GCA! as an example) from gcals, the diallel mean, and the checklot mean. From expectations of the scalar linear model (equation 3-1), GCAls = ((p-l)/p)GCA, - (l/p)Â£f=2GCAj + (l/p)EE=2SCAlk - 3-5 (2/(p(p-2)))E?:â€™EE=3SCAjk; E{diallel mean} = n + (E^BJ/b + (2/p)EP=1GCAj + (2/(p(p-l)))EP:jE^2SCAjk; and 52 E{checklot mean} = n + + t; where j for GCA is j or k and t represents the fixed genetic parameter of the checklot. The function used to properly locate GCAlrd (the subscript rel denotes the relocated GCA,,) is gca,re, = gca,, + (l/2)(diallel mean - checklot mean). The expectation of gca,re, with negligible SCA is GCAln., = GCA, - t/2; and since breeding value equals twice GCA, BV,re, = BV, - t. If SCA is non-negligible then the expectation is GCA,re, = GCA, + (l/(p-l))E|USCA,k - (l/Op-lto^^I^SCA* - t/2. 3-6 In either case the function provides a reasonable manner by which GCA estimates from disconnected diallels are centered at the same location on a number scale and are then comparable. Variance and Covariance of Plot Means The variances of plot means with unequal numbers of trees per plot are by definition unequal, i.e., Var(yijk) = crp + (PJnijk where a2p is plot variance, and nijk is the number of observations per plot. Also, if blocks were considered random, there would be an additional source of variance for plot means due to blocks (as well as a covariance between plot means in the same block) and this could be incorporated into the V matrix with Var(yijk) = a\ + best linear unbiased estimates (BLUE) would be secured by weighting each mean by itâ€™s true associated variance (Searle 1987, page 316). This is the generalized least squares (GLS) approach as b = (X,â€™V1XJ'1X8â€™V-1y 3-7 53 The GLS approach relaxes the OLS assumptions of equal variance of and no covariance between the observations (plot means) while still treating genetic parameters as fixed effects. The entries along the diagonal of the V matrix are the variances of the plot means (Var(yijk)) in the same order as means in the data vector. The off-diagonal elements of V would be either 0 or a\ (the variance due to the random variable block) for elements corresponding to observations in the same block. BLUE requires exact knowledge of V; if estimates of a2p, aand o2â€ž are utilized in the V matrix, estimable functions of 0 approximate BLUE. The OLS assumption that SCA and GCA are fixed effects can also be relaxed to allow for covariances due to genetic relatedness. In particular, the information that means are from the same half- or full-sib family could be included in the V matrix. Relaxation of the zero covariance assumption implies that GCA and SCA are random variables. If GCA and SCA are treated as random variables, then the application of best linear prediction (BLP) or best linear unbiased prediction (BLUP) to the problem would be more appropriate (White and Hodge 1989, page 64). The treatment of the genetic parameters as random variables is consistent with that used in estimating genetic correlations and heritabilities. The V matrix of such an application would include, in addition to the features of the GLS V matrix, the covariance between full-sib or half- sib families added to the off-diagonal elements in V, i.e., if the first and second plot means in the data vector had a covariance due to relationship, then that covariance is inserted twice in the V matrix. The covariance would appear as the second element in the first row and the first element in the second row of V (V is a symmetric matrix). Also the diagonal elements of V would increase by 2 54 Comparison of Prediction and Estimation Methodologies Which methodology (OLS, GLS, BLP, or BLUP) to apply to individual data bases is somewhat a subjective decision. The decision can be based both on the computational or conceptual complexity of the method and the magnitude of the data base with which the analyst is working. To aid in this decision, this discussion highlights the differences in the inherent properties and assumptions of the techniques. For all practical purposes the answers from the four techniques will never be equal; however, there are two caveats. First, OLS estimates equal GLS estimates if all the cell means are known with the same precision (variance), (Searle 1987, page 490). Otherwise, GLS discounts the means that are known with less precision in the calculations and different estimates result. The second caveat is if the amount of data is infinite, i.e., all cross means are known without error, then all four techniques are equivalent (White and Hodge 1989, pages 104-106). In all other cases BLP and BLUP shrink predictions toward the location parameter(s) and produce predictions which are different from OLS or GLS estimates even with balanced data. During calculations GLS, BLP, and BLUP place less weight on observations known with less precision, which is intuitively pleasing. With OLS and GLS forest geneticists treat GCAâ€™s and SCAâ€™s as fixed effects for estimation and then as random variables for genetic correlations and heritabilities. BLP and BLUP provide a consistent treatment of GCAâ€™s and SCAâ€™s as random variables while differing in their assumptions about location parameters (fixed effects). In BLP fixed effects are assumed known without error (although they are usually estimated from the data) while with BLUP fixed effects are estimated using GLS. BLP and BLUP techniques also contain the assumption that the covariance matrix of the observations is known without error (most often variances must be estimated). In many BLUP applications (Henderson 1974), mixed model equations are utilized 55 iteratively to estimate fixed effects and to predict random variables from a data set. A BLUP treatment of fixed effects allows any connectedness between experiments to be utilized in the estimation of the fixed effects. This provides an intuitive advantage of BLUP over BLP in experimentation where connectedness among genetic experiments is available or where the data are so unbalanced that treating the fixed effects as known is less desirable than a GLS estimate of the fixed effects. An ordering of computational complexity and conceptual complexity from least to most complex of the four methods is OLS, GLS, BLP and BLUP. The latter three methods require the estimation of the covariance matrix of the observations either separately (a priori) or iteratively with the fixed effects. Precise estimation of the covariance matrix for observations requires a great number of observations and the precision of GLS, BLP and BLUP estimations or predictions is affected by the error of estimation of the components of V. Selection of a method can then be based on weighing the computational complexity and size of the available data base against the advantages offered by each method. Thus, if complexity of the computational problem is of paramount concern, the analyst necessarily would choose OLS. With a small data base (one that does not allow reasonable estimates of variances), the analyst would again choose OLS. With a large data base and no qualms with computational complexity, the analyst can choose between BLP and BLUP based on whether there is sufficient connectedness or imbalance among the experiments to make BLUP advantageous. Conclusions Methods of solving for GCA and SCA estimates for balanced (plot-mean basis) and unbalanced data have been presented along with the inherent assumptions of the analysis. The use of plot means and the matrix equations will produce sum-to-zero OLS estimates for GCA and 56 SCA for all types of imbalance. Formulae in the literature which yield OLS solutions for balanced data can yield misleading solutions for unbalanced data because of the loss of orthogonality and also weightings on site means for crosses (or totals) are constants. GCAâ€™s and SCAâ€™s obtained through sum-to-zero restriction are not truly estimates of parametric population GCAâ€™s and SCAâ€™s. There are an infinite number of solutions for GCAâ€™s and SCAâ€™s from the system of equations as a result of the overparameterized linear model. Yet, if the only comparisons of interest are among the specific parents on a particular site, then the estimates calculated by sum-to-zero restrictions are appropriate. Checklots may be used to provide comparability among estimates derived from disconnected sets. Having discussed the innate mathematical features of OLS analysis, knowledge of these features should help the data analyst decide if OLS is the most desirable technique for the data at hand. It may be desirable to relax OLS assumptions, which are in all likelihood invalid for the covariance matrix of the observations. This could lead to GLS, BLP or BLUP as better alternatives. CHAPTER 4 VARIANCE COMPONENT ESTIMATION TECHNIQUES COMPARED FOR TWO MATING DESIGNS WITH FOREST GENETIC ARCHITECTURE THROUGH COMPUTER SIMULATION Introduction In many applications of quantitative genetics, geneticists are commonly faced with the analysis of data containing a multitude of flaws (e.g. non-normality, imbalance, and heteroscedasticity). Imbalance, as one of these flaws, is intrinsic to quantitative forest genetics research because of the difficulty in making crosses for full-sib tests and the biological realities of long term field experiments. Few definitive studies have been conducted to establish optimal methods for estimation of variance components from unbalanced data. Simulation studies using simple models (one-way or two-way random models) have been conducted for certain data structures, i.e., imbalance, experimental design, and variance parameters (Corbeil and Searle 1976, Swallow 1981, Swallow and Monahan 1984, interpretations by Littell and McCutchan 1986). The results from these studies indicate that technique optimality is a function of the data structure. In practice (both historically and still common place), estimation of variance components in forest genetics applications has been achieved by using sequentially adjusted sums of squares as an application of Hendersonâ€™s Method 3 (HM3, Henderson 1953). Under normality and with balanced data, this technique has the desirable properties of being the minimum variance unbiased estimator. If the data are unbalanced, then the only property retained by HM3 estimation is 57 58 unbiasedness (Searle 1971, Searle 1987 pp. 492,493,498). Other estimators have been shown to be locally superior to HM3 in variance or mean square error properties in certain cases (Klotz et al. 1969, Olsen et al. 1976, Swallow 1981, Swallow and Monahan 1984). Over the last 25 years, there has been a proliferation of variance component estimation techniques including minimum norm quadratic unbiased estimation (MINQUE, Rao 1971a), minimum variance quadratic unbiased estimation (MIVQUE, Rao 1971b), maximum likelihood (ML, Hartley and Rao 1967), and restricted maximum likelihood (REML, Patterson and Thompson 1971). The practical application of these techniques has been impeded by their computational complexity. However, with continuing advances in computer technology and the appearance of better computational algorithms, the application of these procedures continues to become more tractable (Harville 1977, Geisbrecht 1983, Meyer 1989). Whether these methods of analysis are superior to HM3 for many genetics applications remains to be shown. With balanced data and disregarding negative estimates, all previously mentioned techniques except ML produce the same estimates (Harville 1977). With unbalanced data, each technique produces a different set of variance component estimates. Criteria must then be adopted to discriminate among techniques. Candidate criteria for discrimination include unbiasedness (large number convergence on the parametric value), minimum variance (estimator with the smallest sampling variance), minimum mean square error (minimum of sampling variance plus squared bias, Hogg and Craig 1978), and probability of nearness (probability that sample estimates occur in a certain interval around the parametric value, Pitman 1937). Negative estimates are also problematic in the estimation of variance components. Five alternatives for dealing with the dilemma of estimates less than zero (outside the natural parameter space of zero to infinity) are (Searle 1971): 1) accept and use the negative estimate, 2) set the negative estimate to zero (producing biased estimates), 3) re-solve the system with the offending 59 component set to zero, 4) use an algorithm which does not allow negative estimates, and 5) use the negative estimate to infer that the wrong model was utilized. The purpose of this research was to determine if the criteria of unbiasedness, minimum variance, minimum mean square error, and probability of nearness discriminated among several variance component estimation techniques while exploring various alternatives for dealing with negative variance component estimates. In order to make such comparisons, a large number of data sets were required for each experimental level. Using simulated data, this chapter compares variance component estimation techniques for plot-mean and individual observations, two mating systems (modified half-diallel and half-sib) and two sets of parametric variance components. Types of imbalance and levels of factors were chosen to reflect common situations in forest genetics. Methods Experimental Approach For each experimental level 1000 data sets were generated and analyzed by various techniques (Table 4-1) producing numerous sets of variance component estimates for each data set. This workload resulted in enormous computational time being associated with each experimental level. The overall experimental design for the simulation was originally conceived as a factorial with two types of mating design (half-diallel and half-sib), two sets of true variance components (Table 4-2), two kinds of observations (individual and plot mean) and three types of imbalance: 1) survival levels (80% and 60%, with 80% representing moderate survival and 60% representing poor survival; 2) for full-sib designs three levels of missing crosses (0, 2, and 5 out of 15 crosses); and 3) for half-sib designs two levels of connectedness among tests (15 and 10 common families between tests out of 15 families per test). Because of the computational time 60 Table 4-1. Abbreviation for and description of variance component estimation methods utilized for analyses based on individual observations (if utilized for plot-mean analysis the abbreviation is modified by pre-fixing a â€™Pâ€™). Abbreviation Description Citation ML PML Maximum Likelihood: estimates not restricted to the parameter space (individual and plot-mean analysis). Hartley and Rao 1967; Shaw 1987 MODML Maximum Likelihood: negative estimates set to zero after convergence (individual analysis). Hartley and Rao 1967 NNML Maximum Likelihood: if negative estimates appeared at convergence, they were set to zero and the system re-solved (individual analysis). Hartley and Rao 1967; Miller 1973 REML PREML Restricted Maximum Likelihood: estimates not restricted to the parameter space (individual and plot-mean analysis). Patterson and Thompson 1971; Shaw 1987; Harville 1977 MODREML Restricted Maximum Likelihood: negative estimates set to zero after convergence (individual analysis). Patterson and Thompson 1971 NNREML PNNREML Restricted Maximum Likelihood: if negative estimates appeared at convergence, they were set to zero and the system re-solved (individual and plot-mean analysis). Patterson and Thompson 1971; Miller 1983 MIVQUE PMIVQUE Minimum Variance Quadratic Unbiased: non-iterative with true (parametric) values of the variance components as priors (individual and plot-mean analysis). Rao 1971b MINQUE1 PMINQUE1 Minimum Norm Quadratic Unbiased: non-iterative with ones as priors for all variance components (individual and plot-mean analysis). Rao 1971a TYPE3 PTYPE3 Sequentially Adjusted Sums of Squares; Hendersonâ€™s Method 3 (individual and plot-mean analysis). Henderson 1953 MIVPEN MIVQUE with a penalty algorithm to prevent negative estimates (individual analysis). Harville 1977 constraint, the experiment could not be run as a complete factorial and the investigation continued as a partial factorial. In general, the approach was to run levels which were at opposite ends of the imbalance spectrum, i.e., 80% survival and no missing crosses versus 60% survival and 5 missing crosses, within a variance component level. If results were consistent across these treatment combinations, intermediate levels were not run. 61 Designation of a treatment combination is by five character alpha-numeric field. The first character is either "H" (half-sib) or "D" (half-diallel). The second character denotes the set of parametric variance components where " 1" designated the set of variance components associated with heritability of 0.1 and "2" designated the set of variance components associated with heritability of 0.25 (Table 4-1). The third character is an "S" indicating that the last two characters determine the imbalance level. The fourth character designates the survival level either "6" for 60% or "8" for 80%. The final character specifies the number of missing crosses (half- diallel) or lack of connectedness (half-sib). The treatment combination â€™H1S80â€™ is a half-sib mating design (H), the set of variance components associated with heritability equalling 0.1 (1), 80% survival (8), and 15 common parents across tests (0). Table 4-2. Sets of true variance components for the half-diallel and half-sib mating designs generated from specification of two levels of single-tree heritability (h2), type B correlation (rB), and non-additive to additive variance ratio (d/a). Genetic Ratios* Mating Design True Variance Components1â€™ h2 d/a 0? o] < 0?. 0Â» 0.1 0.5 1.0 full-sib 1.0 0.5 0.25 0.25 0.25 0.25 .595 7.905 half-sib 1.0 0.5 0.25 NA 0.25 NA .475 7.9964 0.25 0.8 .25 full-sib 1.0 0.5 0.625 .1562 .1562 .0391 .5769 7.6649 a h2 = 4o2g / ff2phenotypic; rB = 4cfg / (4a2g + 4 Experimental Design for Simulated Data The mating design for the simulation was either a six-parent half-diallel (no seifs) or a fifteen-parent half-sib. The randomized complete block field design was in three locations (i.e separate field tests) with four complete blocks per location and six trees per family in a block; where family is a full-sib family for half-diallel or a half-sib family for the half-sib design. This 62 field design and the mating designs reflect typical designs in forestry applications (Squillace 1973, Wilcox et al. 1975, Bridgwater et al. 1983, Weir and Goddard 1986, Loo-Dinkins et al. 1991) and are also commonly used in other disciplines (Matzinger et al. 1959, Hallauer and Miranda 1981, Singh and Singh 1984). The six trees per family could be considered as contiguous or non-contiguous plots without affecting the results or inferences. Full-Sib Linear Model The scalar linear model employed for half-diallel individual observations is yÂ¡jkto = M + tÂ¡ + b;j + gk + g| + Su + tgik + tgu + tSjjj + pijkl + wijkto 4-1 where yijklm is the m- observation of the kl- cross in the jâ€” block of the iâ€” test; H is the population mean; tÂ¡ is the random variable test location ~ NID(0,a2,); bÂ¡j is the random variable block ~ NID(0, g, is the random variable male gca ~ NIDlO.a2^; su is the random variable specific combining ability (sea) ~ NID^o2,,); tg^ is the random variable test by female gca interaction ~ NID(0,(r^); tgu is the random variable test by male gca interaction ~ NID^cr2,^; ts^ is the random variable test by sea interaction ~ NID(0,u2J; pijkl is the random variable plot ~ NID(0, there is no covariance between random variables in the model. This linear model in matrix notation is (dimensions below model component) y â€” Ml + Z-r&r + ZBeB 4- ZGeG + Zses + ZTGe-pG + ZTse-iS + ZPeP -I- e^, 4-2 63 rue 1 rul rut txl rub bjel rug gxl rus sjc 1 rutg tgxl ruts tsjel rup pjc 1 rul where y is the observation vector; ZÂ¡ is the portion of the design matrix for the iâ€” random variable; eÂ¡ is the vector of unobservable random effects for the iâ€” random variable; 1 is a vector of lâ€™s; and n, t, b, g, s, tg, ts, and p are the number of observations, tests, blocks, gcaâ€™s, seaâ€™s, test by gca interactions, test by sea interactions and plots, respectively. Utilizing customary assumptions in half-diallel mating designs (Method 4, Griffing 1956), the variance of an individual observation is Var(yijklJ = a2, + <4 + 2 a2g + Var(y) = Z,Zâ€™o2, + ZBZyâ€ž + ZcZÂ¿o2g + ZsZâ€™a2, + + Z^L^a\ + ZrZyv + I.o2. 4-4 where " â€™ " indicates the transpose operator, all matrices of the form ZÂ¡ZÂ¡â€™ are run, and Iâ€ž is an run identity matrix. Half-sib Linear Model The scalar linear model for half-sib individual observations is yijkÂ» = M + ti + by + gk + tgi + Phyk + Whijkm 4-5 where yijkm is the mâ€” observation of the k- half-sib family in the jâ€” block of the i- test; H, tj, by, gk, and tg^ retain the definition in Eq.4-1; phijk is the random variable plot containing different genotype by environment components than the corresponding term in Eq.4-1 ~ NID(0,a2ph); whijkm is the random variable within-plot containing different levels of genotypic and genotype by environment components than the corresponding term in Eq.4-1 64 ~ NIDÃO.o2^); and there is no covariance between random variables in the model. The matrix notation model is (dimensions below model component) y = 4 4 Zg6g 4" ZqC(j 4- ^tg^tg 4* ZpGp 4" e^v 4-6 rul axl rut txl rub bxl axg gxl rutg tgjcl axp pjcl rul The variance of an individual observation in half-sib designs is Var(yijkJ = a2, 4- a2b 4- For an observational vector based on plot means, the plot and within-plot random variables were combined by taking the arithmetic mean across the observations within a plot. The resulting plot means model has a new Three estimates of ratios among variance components were determined: 1) single tree heritability adjusted for test location and block as fi2 = 4 components for test location and block deleted; 2) type B correlation as (rB = 4b2g / (4 Data generation was accomplished by using a Cholesky upper-lower decomposition of the covariance matrix for the observations (Goodnight 1979) and a vector of pseudo-random standard normal deviates generated using the Box-Muller transformation with pseudo-random uniform deviates (Knuth 1981, Press et al. 1989). The upper-lower decomposition creates a matrix (U) with the property that Var(y) = Uâ€™U. The vector of pseudo-random standard normal deviates 65 (z) has a covariance matrix equal to an identity matrix (IJ where n is the number of observations. The vector of observations is created as y = Uâ€™z. Then Var(y) = Uâ€™(Var(z))U and since Var(z) = I,â€ž Var(y) = UTU = Uâ€™U. Analyses of survival patterns using data from the Cooperative Forest Genetic Research Program (CFGRP) at the University of Florida were used to develop survival distributions for the simulation. The data sets chosen for survival analysis were from full-sib slash pine (Pirns elliottii var elliottii Engelm) tests planted in randomized complete block designs with the families in row plots and were selected because the survival levels were either approximately 60% or 80%. Survival levels for most crosses (full-sib families) clustered around the expected value, i.e., approximately 60% for an average survival level of 60%; however, there were always a few crosses that had much poorer survival than average and also a small number of crosses that had much better survival than average. This survival pattern was consistent across the 50 experiments analyzed. Thus, a lower than average survival level was arbitrarily assigned to certain crosses, a higher than average survival level was assigned to certain crosses, and the average survival level assigned to most crosses. This modeling of survival pattern was also extended to the half- sib mating design. At 80% survival no missing plots were allowed and at 60% survival missing plots occurred at random. Full-sib family deletion simulated crosses which could not be made and were therefore missing from the experiment. When deleting five crosses, the deletion was restricted to a maximum of four crosses per parent to prevent loss of all the crosses in which a single parent appeared since this would have resulted in changing a six-parent to a five-parent half-diallel. Tests having only subsets of the half-sib families in common are a frequent occurrence in data analysis at CFGRP. This partial connectedness was simulated by generating data in which 66 only 10 of the 15 families present in a test were common to either one of the other two tests comprising a data set. Variance Component Estimation Techniques Two algorithms were utilized for all estimation techniques: sequentially adjusted sums of squares (Milliken and Johnson 1984, p 138) for HM3; and Giesbrechtâ€™s algorithm (Giesbrecht 1983) for REML, ML, MINQUE and MIVQUE. Giesbrechtâ€™s algorithm is primarily a gradient algorithm (the method of scoring), and as such allows negative estimates (Harville 1977, Giesbrecht 1983). Negative estimates are not a theoretical difficulty with MINQUE or MIVQUE; however, for REML and ML, estimates should be confined to the parameter space. For this reason estimators referred to as REML and ML in this chapter are not truly REML and ML when negative estimates occur; further, there is the possibility that the iterative solution stopped at a local maxima not the global maximum. These concerns are commonplace in REML and ML estimation (Corbeil and Searle 1976, Harville 1977, Swallow and Monahan 1984); however, ignoring these two points, these estimators are still referred to as REML and ML. The basic equation for variance component estimation under normality (Giesbrecht 1983) for MIVQUE, MINQUE and REML is MQV.QVj)}^ = {yâ€™Q^Qy} 4-9 rxr rjcl rjcl then Â¿* = {tr(QViQVj)}'1{yâ€™QViQy}; and for ML (trCV â€™V.V'Vj)}^ = {yâ€™QViQy} 4-10 rxr rxl rxl where {tr(QVÂ¡QVj)} is a matrix whose elements are tr(QViQVj) where in the full-sib designs i= 1 to 8 and j=l to 8, i.e., there is a row and column for every random variable in the linear model; 67 tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V'1 - V'XCXâ€™V-'XyXâ€™V1 for V as the covariance matrix of y and X as the design matrix for fixed effects; V, = ZtZ\ where i = the random variables test, block, etc.; b2 is the vector of variance component estimates; and r is the number of random variables in the model. The MINQUE estimator used was MINQUE1 , i.e., ones as priors for all variance components; calculated by applying Giesbrechtâ€™s algorithm non-iteratively. MINQUE 1 was chosen because of results demonstrating MINQUEO (prior of 1 for the error term and of 0 for all others) to be an inferior estimation technique for many cases (Swallow and Monahan 1984, R.C. Littell unpublished data). With normally-distributed uncorrelated random variables, the use of the true values of the variance components as priors in a non-iterative application of Giesbrechtâ€™s algorithm produced the MIVQUE solutions (equation 4-5). Obtaining true MIVQUE estimation is a luxury of computer simulation and would not be possible in practice since the true variance components are required (Swallow and Searle 1978). This estimator was included to provide a standard of comparison for other estimators. An additional MIVQUE-type estimator, referred to as MIVPEN, was also included. MIVPEN was also a non-iterative application of the algorithm with the true variance components as priors; however, this estimator was conditioned on the variance component parameter space and did not allow negative estimates. The non-negative conditioning of MIVPEN was accomplished by adding a penalty algorithm to MIVQUE such that no variance component was allowed to be less than lxl(f7. Estimates from MIVPEN were equal to MIVQUE for data sets for which there were no negative MIVQUE variance component estimates. When negative MIVQUE estimates occur the two techniques were no longer equivalent. The penalty 68 algorithm operated by using A = a2 - o2 and by choosing a scalar weight w such that no element of a2new is less than lxlO'7. Then a2^ = a2 + wA, where A is the vector of departure from the true values (o2), lxlO'7 is an arbitrary constant and a2^ is the vector of estimated variance components conditioned on non-negativity. REML estimates were from repeated application of Giesbrechtâ€™s algorithm (equation 4-9) in which the estimates from the k* iteration become the priors for the k+1* iteration. The iterations were stopped when the difference between the estimates from the k* and k+1* iterations met the convergence criterion; then the estimates of the k+l* iteration became the REML estimates. The convergence criterion utilized was E-=11 ct2m - for this experimental workload it was desired that the simulation run with little analyst intervention and in as few iterations as possible, the robustness of REML solutions obtained from Giesbrechtâ€™s algorithm to priors (or starting points) was explored. The difference in solutions starting from two distinct points (a vector of ones and the true values) was compared over 2000 data sets of different structures (imbalance, true variance components, and field design). The results (agreeing with those of Swallow and Monahan 1984) indicated that the difference between the two solutions was entirely dependent on the stringency of the convergence criterion and not on the starting point (priors). Also the number of iterations required for convergence was greatly decreased by using the true values as priors. Thus, all REML estimates were calculated starting with the true values as priors. Three alternatives for coping with negative estimates after convergence were used for REML solutions: accept and use the negative estimates (Shaw 1987), arbitrarily set negative estimates to zero, and re-solve the system setting negative estimates to zero (Miller 1973). The first two alternatives are self-explanatory and the latter is accomplished by re-analyzing those data 69 sets in which the initial unrestricted REML estimates included one or more negative estimates. During re-analysis if a variance component became negative, it was set to zero (could never be any value other than zero) and the iterations continued. This procedure persisted until the convergence criterion was met with a solution in which all variance components were either positive or zero. Harville (1977) suggested several adaptations of Hendersonâ€™s mixed model equations (Henderson et al. 1959) which do not allow variance component estimates to become negative; however, the estimates can become arbitrarily close to zero. After trial of these techniques versus the set the negative estimates to zero after convergence and re-solve the system approach, comparison of results using the same data sets indicates that there is little practical advantage (although more desirable theoretically) in using the approach suggested by Harville. The differences between sets of estimates obtained by the two methods are extremely minor (solving the system with a variance component set to zero versus arbitrarily close to zero). ML solutions, as iterative applications of equation 4-6, were calculated from the same starting points and with the same convergence criterion as REML solutions. The three negative variance component alternatives explored for ML were to accept and use the negative estimates, to arbitrarily set negative estimates to zero after converging to a solution for the former, and (for half-sib data only) to re-solve the system setting negative variance components to zero. The algorithm to calculate solutions for HM3 (sequentially adjusted sums of squares) was based on the upper triangular G2 sweep (Goodnight 1979) and Hartleyâ€™s method of synthesis (Hartley 1967). The equation solved was E{MS}o2 = MS where MS is the vector of mean squares and E{MS} is their expectation. The alternative used for negative estimates was to accept and use the negative estimates. 70 Comparison Among Estimation Techniques For the simulation MIVQUE estimates were the basis for all comparisons because MIVQUE is by definition the minimum variance quadratic unbiased estimator. The results of comparing the mean of 1000 MIVQUE estimates for an experimental level to the means for other techniques were termed "apparent bias". "Apparent bias" denotes that 1000 data sets were not sufficient to achieve complete convergence to the true values of the variance components. Sampling variances of estimation were calculated from the 1000 observations within an experimental level and estimation technique for variance components and genetic ratios (single tree heritability, Type B correlation and dominance to additive variance ratio). Mean square error then equalled variance plus squared "apparent bias". While mean square error was investigated, there was never sufficient bias for mean square error to lead to a different decision concerning techniques than sampling variance of the estimates; so mean square error was deleted from the remainder of this discussion. Probability of nearness is the probability that an estimate will lie within a certain interval around the true parameter. The three total interval widths utilized were one-half, equal to, and twice the parameter size. The percentage of 1000 estimates falling within these intervals were calculated for the different estimation techniques within an experimental level for variance components and ratios and utilized as an estimate of probability of nearness. Results are presented by variance component or genetic ratio estimated as a percentage of MIVQUE (except in the case of probability of nearness). MIVQUE estimates represent 100% with estimates with greater variance having values larger than 100% and "apparently biased" estimates having values different from 100%. The percentages were calculated as equal to 100 times the estimate divided by the MIVQUE value. For the criterion of variance, the lower the 71 percentage the better the estimator performed; for bias, values equalling 100% (0 bias) are preferred; and for probability of nearness, larger percentages (probabilities) are favored since they are indicative of greater density of estimates near the parametric value. Results and Discussion Variance Components Sampling variance of the estimators For all variance components estimated, REML and ML estimation techniques were consistently equal to or less than MIVQUE for sampling variance of the estimator (Table 4-3). The variance among estimates from these techniques was further reduced by setting the negative components to zero (MODML and MODREML) or setting negative estimates to zero plus reÂ¬ solving the system (NNREML, NNML, and PNNREML). Variance among MINQUE1 estimates is always equal to or greater than for MIVQUE, as one might expect, since they are, in this application, the same technique with MIVQUE having perfect priors (the true values). Variances for HM3 estimators (TYPE3 and PTYPE3) are either equal to or greater than MIVQUE (HM3 estimates have progressively larger relative variance with higher levels of imbalance. MIVPEN, although impractical because of the need for the true priors, had much more precise estimates of variance components than other techniques illustrating what could be accomplished given the true values as priors plus maintaining estimates within the parameter space. In general, the spread among the percentages for variance of estimation for the estimation techniques is highly dependent on the degree of imbalance and the type of mating system. With increasing imbalance the likelihood-based estimators realized greater advantage for sampling variance of the estimates over HM3 for both mating systems. The most advantageous application 72 Table 4-3. Sampling variance for the estimates of and treatment combination; NA is not applied. Values greater than 100 indicate larger variance among 1000 estimates. Estimator D1S80 D1S65 D2S65 H1S80 H1S65 REML 99.9 102.6 101.5 99.6 106.3 100.2 100.0 104.1 99.7 98.0 100.0 101.0 101.4 99.6 105.8 ML 77.3 78.2 76.4 95.9 103.9 106.9 104.8 110.7 100.8 99.1 82.5 82.9 86.4 96.2 103.8 MINQUE1 100.0 104.2 104.0 104.0 146.7 101.2 118.8 123.6 112.5 139.7 100.3 105.8 103.9 104.0 145.8 NNREML 80.8 71.6 95.2 88.0 68.6 67.9 48.3 54.9 78.7 48.6 76.8 64.2 92.2 87.3 67.7 NNML NA NA NA 83.3 65.3 79.4 48.9 83.1 64.7 MODML 58.2 50.0 69.5 84.7 74.6 12.8 81.4 81.6 86.6 68.5 58.1 46.1 72.0 83.8 71.4 MODREML 81.5 74.5 96.1 88.9 78.1 89.1 74.0 73.7 85.4 66.9 76.4 63.5 88.9 87.7 74.3 TYPE3 101.0 101.0 105.5 100.6 121.0 101.1 101.0 115.5 100.9 125.6 100.5 108.4 102.9 100.4 121.6 PREML 100.3 106.3 101.7 107.5 146.9 102.7 113.5 119.8 122.0 150.7 PML 77.6 81.9 77.1 103.6 143.4 109.7 117.3 127.2 123.3 151.9 PMINQUE1 100.3 107.6 105.4 107.5 179.3 102.7 129.0 137.3 122.0 180.6 PNNREML 80.9 71.1 93.9 92.7 86.6 69.8 53.2 60.5 94.0 68.1 PTYPE3 100.3 106.6 105.4 107.5 168.1 102.7 124.7 133.3 122.0 184.9 100.6 110.8 104.1 106.9 168.0 MIVPEN NA 36.2 29.1 80.0 45.6 26.6 20.0 74.3 39.6 34.7 30.2 79.8 45.4 PMIVQUE 100.3 104.2 102.4 107.5 146.9 102.7 114.4 117.8 122.0 150.7 73 of likelihood-based estimators is in the H1S65 case where the imbalance is not only random deletions of individuals but also incomplete connectedness across locations, i.e. the same families are not present in each test (akin to incomplete blocks within a test). An analysis of variance was conducted to determine the importance of the treatment of negative variance component estimates in the variance of estimation for REML and ML estimates. The model of sampling variance of the estimates as a result of mating design, imbalance level, treatment of negative estimates and size of the variance component demonstrated consistently (for all variance components except error) that treatment of negative estimates is an important component of the variance of the estimates (p < .05). The model accounted for up to 99% of the variation in the variance of the variance component estimates with 1) accepting and using negative estimates producing the highest variance; 2) setting the negative components to zero being intermediate; and 3) re-solving the system with negative estimates set to zero providing the lowest variance. For all estimation techniques, lower variance among estimates was obtained by using individual observations as compared to plot means. The advantage of individual over plot-mean observations increased with increasing imbalance. Bias The most consistent performance for bias (Table 4-4) across all variance components was TYPE3 known from inherent properties to be unbiased. The consistent convergence of the TYPE3 value to the MIVQUE value indicated that the number of data sets used (1000 per technique and experimental level) was suitable for the purpose of examining bias. The other two consistent performers were REML and MINQUE1. PTYPE3 (HM3 based on plot means) was unbiased when no plot means were missing, but produced "apparently biased" estimates when plot means were missing. 74 Table 4-4. Bias for the estimates of a2g (upper number), a2^ (second number), and h2 (third number where calculated) as a percentage of the MIVQUE estimate by type of estimator and experimental combination; NA is not applied. Values different from 100 denote "apparent" bias. Estimator D1S80 D1S65 D2S65 H1S80 H1S65 REML 99.9 101.5 98.7 99.9 102.8 99.9 102.2 99.8 99.9 98.9 99.9 101.3 98.6 99.9 102.6 ML 74.6 61.6 76.0 96.2 98.2 106.5 114.6 109.7 101.3 101.8 75.5 61.8 77.9 96.3 98.2 MINQUE 99.7 96.4 99.0 99.4 102.0 100.1 100.8 101.3 100.8 98.3 99.7 96.6 98.9 99.4 101.3 NNREML 107.9 116.5 98.1 101.9 107.8 93.1 92.9 92.9 100.5 102.3 108.7 118.4 98.2 102.2 107.7 NNML NA NA NA 101.9 107.8 100.5 102.3 98.2 103.8 MODML 86.6 90.4 79.0 98.1 114.1 109.9 129.9 127.4 101.3 122.9 87.8 91.5 79.4 99.6 112.6 MODREML 109.5 124.2 100.6 103.1 117.8 103.7 119.8 119.2 104.6 120.6 109.5 123.2 98.4 102.9 116.2 TYPE3 100.1 99.4 99.6 100.2 99.6 100.2 101.0 102.4 100.2 100.9 100.0 99.5 99.3 100.2 99.7 PREML 99.7 98.7 97.7 99.5 110.6 100.1 103.6 100.2 102.4 98.3 PML 74.2 58.5 73.6 95.9 105.2 106.9 116.2 111.5 103.2 102.0 PMINQUE 99.7 95.2 98.8 99.5 106.5 100.1 102.1 102.9 102.4 114.8 PNNREML 107.9 114.5 96.7 101.8 115.6 92.9 94.0 95.0 104.5 110.2 PTYPE3 99.7 96.8 99.0 99.5 104.5 100.1 97.2 96.0 102.4 108.7 99.8 98.0 98.8 99.6 104.1 MIVPEN NA 107.5 98.6 102.0 103.2 99.0 91.7 101.4 105.1 112.6 103.9 102.1 103.4 PMIVQUE 99.7 97.4 99.2 99.5 106.8 100.1 101.7 100.5 102.4 98.8 75 Table 4-5. Probability of nearness for crg (upper number), a\ (second number), and h2 (third number where calculated). The probability interval is equal to the magnitude of the parameter. Estimator D1S80 D1S65 D2S65 H1S80 H1S65 REML 32.8 24.3 41.8 45.3 28.6 43.0 26.2 25.7 36.6 27.1 34.2 25.3 45.4 45.0 28.3 ML 33.6 22.3 40.7 45.4 29.2 42.9 26.4 24.8 36.2 26.7 34.6 22.3 45.0 45.7 28.2 MINQUE 32.6 24.6 41.0 45.1 26.1 43.1 24.3 25.4 34.2 23.2 33.7 25.0 44.6 44.7 25.6 NNREML 33.4 23.4 41.7 45.1 29.3 44.9 28.1 25.6 38.0 28.9 34.3 24.3 46.1 45.2 29.5 NNML NA NA NA 45.9 29.7 37.9 29.1 46.0 29.0 TYPE3 34.0 23.2 42.5 45.3 27.1 42.6 27.1 24.8 37.3 25.0 35.3 23.8 45.8 45.9 27.3 PREML 32.1 20.0 41.6 43.7 24.6 42.7 26.8 24.6 32.3 20.4 PML 33.5 19.8 39.7 44.0 24.4 41.0 26.3 23.6 31.6 21.1 PMINQUE 32.1 21.4 40.4 43.7 24.5 42.7 24.8 23.1 32.3 21.9 PNNREML 31.9 19.2 41.0 43.4 26.0 43.3 28.0 23.3 33.1 21.3 PTYPE3 32.1 23.3 41.7 43.7 25.2 42.7 25.4 24.1 32.3 22.4 32.6 24.1 46.0 44.6 24.6 MIVQUE 33.6 25.7 43.7 45.1 29.2 42.9 28.6 26.4 36.9 26.3 34.8 26.8 47.7 45.4 29.4 MIVPEN NA 41.1 78.5 48.4 35.6 47.0 60.3 39.2 31.2 42.4 80.5 48.7 35.3 PMIVQUE 32.1 20.0 41.8 43.7 25.9 42.7 28.5 26.8 32.3 20.8 76 Among estimators which displayed bias, maximum likelihood estimators (ML and PML) were known to be inherently biased (Harville 1977, Searle 1987) with the amount of bias proportional to the number of degrees of freedom for a factor versus the number of levels for the factor. Other biases resulted from the method of dealing with negative estimates. Living with negative estimates produced the estimators with the least bias. Setting negative variance components to zero resulted in the greatest bias. Intermediate in bias were the estimates resulting from re-solving the system with negative components set to zero. Probability of nearness Results for probability of nearness proved to be largely non-discriminatory among techniques (Table 4-5). The low levels of probability density near the parametric values are indicative of the nature of the variance component estimation problem. Figure 4-1 illustrates the distribution of MIVQUE variance component estimates for h2 (4-la) and of a chi-square distribution, positively skewed with the expected value (mean) occurring to the right of the peak probability density and a proportion of the estimates occurring below zero (except error). With increasing imbalance, the variance among estimates increases and the probability of nearness decreases for all interval widths. Ratios of Variance Components Single tree heritabilitv Results for estimates of single tree heritability adjusted for locations and blocks are shown in Tables 4-3 and 4-4 (third number from the top in each cell, if calculated). For these relatively low heritabilities (0.1 and 0.25), the bias and variance properties of the estimated ratio are similar to those for a2g estimates (Figure 4-1). This implies that knowing the properties of the numerator 77 20 15 P E R io C E N , T 0 -.25 -.10 0.0 .10 .25 0.4 0.6 -0.625 -.25 0.0 .25 .625 1.0 1.5 2.0 MIVQUE ESTIMATES 1000 DATA SETS 4-la. h2 4-lb. a2g 201 J L 5 â€¢ - - J L Figure 4-1. Distribution of 1000 MIVQUE estimates of h2 (4-la) and (rg (4-lb) for experimental level D1S80 illustrating the positive skew and similarity of the distributions. The true values are .1 for h2 and .25 for a2g. The interval width of the bars is one-half the parametric value. 78 of heritability reveals the properties of the ratio (especially true of ratios with expected values of 0.1 and 0.25, Kendall and Stuart 1963, Ch. 10). Variance component estimation techniques which performed well for bias and/or variance among estimates for trg also performed well for h2. Type B correlation and dominance to additive variance ratio Type B correlation (Table 4-3 and 4-4 as a2^ and dominance to additive variance ratio (not shown) estimates both proved to be too unstable (extremely large variance among estimates) in their original formulations to be useful in discrimination among variance component estimation techniques. This high variance is due to the estimates of the denominators of these ratios approaching zero and to the high variance of the denominator of ratios (Table 4-2). These ratios were reformulated with numerators of interest (4^^ for additive genetic by test interaction and 4ff2s for dominance variance, respectively) and a denominator equal to the estimate of the phenotypic variance. With this reformulation the variance and bias properties of estimates of the altered ratios is approximated by the properties of estimates of the numerators. For increasing imbalance maximum-likelihood-based estimation offers an increasing advantage over HM3, and for all techniques individual observations offer increasing advantage over plot-mean observations for variance of the estimates of these ratios. Bias, other than inherently biased methods (ML), is associated with the probability of negative estimates which is increased by increasing imbalance. This assertion is supported by comparing the biases of REML, NNREML, and MODREML estimates across imbalance levels. 79 General Discussion Observational Unit Some general conclusions regarding the choice of a variance component estimation methodology can be drawn from the results of this investigation. For any degree of imbalance the use of individual observations is superior to the use of plot means for estimation of variance component or ratios of variance components. If the data are nearly balanced (close to 100% survival with no missing plots, crosses (full-sib) or lack of connectedness (half-sib)), the properties of the estimation techniques based on individual and plot-mean observations become similar; so if departure from balance is nominal, plot means can be used effectively. However, using individual observations obviates the need for a survey of imbalance in the data since individual observations produce better results than plot means for any of the estimation techniques examined. Negative Estimates Drawing on the results of this investigation, the discussion of practical solutions for the negative estimates problem will revolve around two solutions: 1) accept and use the negative estimates; and 2) re-solving the system with negative estimates set to zero. Given that the property of interest is the true value of a variance component or genetic ratio, often estimated as a mean across data sets, then negativity constraints come into play if the component of interest is small in comparison to other underlying variance components in the data, or the variance of estimates is high due to an inadequate experimental design for variance component estimation. These factors lead to an increased number of negative estimates. If the data structure is such that negative estimates would occur frequently, then accepting negative estimates is a good alternative. 80 If negative estimates tend to occur infrequently or bias is of less concern than variance among estimates, then re-solving the system after convergence yields negative estimates is the preferable solution. This tactic reduces both bias and variance among estimates below that of arbitrarily setting negative estimates to zero. Estimation Technique The primary competitors among estimation techniques that are practically achievable are REML and TYPE3 (HM3). Both techniques produce estimates with little or no bias; however, REML estimates for the most part have slightly less sampling variance than TYPE3 estimates. If only subsets of the parents are in common across tests as in the case H1S65, REML has a distinct advantage in variance among estimates over TYPE3. REML does have three additional advantages over TYPE3 which are 1) REML offers generalized least squares estimation of fixed effects while TYPE3 offers ordinary least squares estimation; 2) Best Linear Unbiased Predictions (BLUP) of random variables are inherent in REML solutions, i.e., gca predictions are available; and thus in solving for the variance components with REML, fixed effects are estimated and random variables are predicted simultaneously (Harville 1977); and 3) REML offers greater flexibility in the model specification both in univariate and multivariate forms as well as heterogeneous or correlated error terms. Further, although the likelihood equations for common REML applications are based on normality, the technique has been shown to be robust against the underlying distribution (Westfall 1987, Banks et al. 1985). 81 Recommendation If one were to choose a single variance component estimation technique from among those tested which could be applied to any data set with confidence that the estimates had desirable properties (variance, MSE, and bias), that technique would be REML and the basic unit of observation would be the individual. This combination (REML plus individual observations) performed well across mating design and types and levels of imbalance. Treatment of negative estimates would be determined by the proposed use of the estimates that is whether unbiasedness (accepting and using the negative estimates) is more important than sampling variance (re-solve the system setting negative estimates to zero). A primary disadvantage of REML and individual observations is that they are both computationally expensive (computer memory and time). HM3 estimation could replace REML on many data sets and plot means could replace individual observations on some data sets; but general application of these without regard to the data at hand does result in a loss in desirable properties of the estimates in many instances. The computational expense of REML and individual observations ensures that estimates have desirable properties for a broad scope of applications. With the advent of bigger and faster computers and the evolution of better REML algorithms, what was not feasible in the past on most mainframe computers can now be accomplished on personal computers. CHAPTER 5 GAREML: A COMPUTER ALGORITHM FOR ESTIMATING VARIANCE COMPONENTS AND PREDICTING GENETIC VALUES Introduction The computer program described in this chapter, called GAREML for Giesbrechtâ€™s algorithm of restricted maximum likelihood estimation (REML), is useful for both estimating variance components and predicting genetic values. GAREML applies the methodology of Giesbrecht (1983) to the problems of REML estimation (Patterson and Thompson 1971) and best linear unbiased prediction (BLUP, Henderson 1973) for univariate (single trait) genetics models. GAREML can be applied to half-sib (open-pollinated or polymix) and full-sib (partial diallels, factorials, half-diallels [no seifs] or disconnected sets of half-diallels) mating designs when planted in single or multiple locations with single or multiple replications per location. When used for variance component estimation, this program has been shown to provide estimates with desirable properties across types of imbalance commonly encountered in forest genetics field tests (Huber et al. in press) and with varying underlying distributions (Banks et al. 1985, Westfall 1987). GAREML is also useful for determining efficiencies of alternative field and mating designs for the estimation of variance components. Utilizing the power of mixed-model methodology (Henderson 1984), GAREML provides BLUP of parental general (gca) and specific combining abilities (sea) as well as generalized least squares (GLS) solutions for fixed effects. The application of BLUP to forest genetics problems has been addressed by White and Hodge (1988, 1989). With certain assumptions, the desirable 82 83 properties of BLUP predictions include maximizing the probability of obtaining correct parental rankings from the data and minimizing the error associated with using the parental values obtained in future applications. GLS fixed effect estimation weights the observations comprising the estimates by their associated variances approximating best linear unbiased estimation (BLUE) for fixed effects (Searle 1987, p 489-490). The purpose of this chapter is to describe the theory and use of GAREML in enough detail to facilitate use by other investigators. The program is written in FORTRAN and is not dependent on other analysis programs. An interactive version of this program can be obtained as a stand-alone executable file from the senior author; this file will run on any IBM compatible PC under DOS or WINDOWS2 operating systems. The size of the problem an investigator can solve will be dependent on the amount of extended memory and hard disk space (for swap files) available for program use. In addition, the FORTRAN source code can be obtained for analysts wishing to compile the program for use on alternate systems (e.g. mainframe computers). Algorithm GAREML proceeds by reading the data and forming a design matrix based on the number of levels of factors in the model. Any portions of the design matrix for nested factors or interactions are formed by horizontal direct product. Columns of zeroes in the design matrix (the result of imbalance) are then deleted. The design matrix columns are in an order specified by Giesbrechtâ€™s algorithm: columns for fixed effects are first, followed by the data vector, and the last section of the matrix is for random effects. The design matrix is the only fully formed matrix in the program. All other matrices are symmetric; therefore, to save computational space 2Windows is the trademark of the Microsoft Corporation, Redmond, WA. 84 and time, only the diagonal and the above diagonal portions of matrices are formed and utilized (i.e., half-stored). A half-stored matrix of the dot products of the design columns is formed and either kept in common memory or stored in temporary disk space so that the matrix is available for recall in the iterative solution process. The algorithm proceeds by modifying the matrix of dot products such that the inverse of the covariance matrix for the observations (V) is enclosed by the column specifiers in the dot products as Xâ€™X becoming Xâ€™V'X. This transfer is completed without inversion of the total V matrix. The identity used to accomplish this transfer is if Vh = ahZhZhf + V^+j) where Vh is nonsingular; then V-'h = VV,, - Â«hV-â€˜(h+I)Zh(Ih + abZhâ€™V->(h+1)Zh)'Zhâ€™V-'(h+1). 5-1 A compact form of equation 5-1 is obtained by pre-multiplying by ZÂ¡â€™ and post-multiplying by Zj where h = 1, k-1 (k = the total number of random factors), ah is the prior associated with random variable h, Vk = oÂ¡kI, V, = V and ZÂ¡ is the portion of the design matrix for random variable i (Giesbrecht 1983). A partitioned matrix is formed in order to update until V,'1 or V is obtained. This matrix is of the form: Ih + V^hZhâ€™V(h+1)-1(X|y|Z1|...!Zk.1) 1 5-2 . ^(X | y | z, I... I Zk.,)V(h+1) â€˜Zh T(h+1) where Tk., = (X|y |Z,|... Â¡ Zk.,)â€™Vk.râ€˜(X Â¡ y |Z,| ...|Zk,,). The sweep operator of Goodnight (1979) is applied to the upper left partition of the matrix (equation 5-2) and the result of equation 5-1 is obtained. The matrix is sequentially updated and swept until T, = (XÂ¡ y Â¡ Z, |... Â¡ Zk.,)â€™V'1(XÂ¡y Â¡ Z, j... Â¡ Zk.,) is obtained. T, is then swept on the columns for fixed effects (Xâ€™V 'X). This sweep operation produces generalized least squares estimates for fixed effects, results which can be scaled into predictions of random variables, the residual sum of squares and all the necessary ingredients for assembling the 85 equation to solve for the variance components. The equation to be solved for the variance components is {tr(QVjQVj)}ff2 = {yâ€™QVjQy} nr rjcl rjcl then Â¿* = {tr (Q V Â¡Q Vj)}'1 {y â€™Q V Â¡Qy}; 5-3 where {tr(QV1QVj)} is a matrix whose elements are tr(QV|QVj) where i= 1 to r and j = l to r, i.e., there is a row and column for every random variable in the linear model; tr is the trace operator that is the sum of the diagonal elements of a matrix; Q = V'1 - V-'XiXâ€™V-'XyXâ€™V1 for V as the covariance matrix of y and X as the design matrix for fixed effects; V, = ZÂ¡Zâ€™( where the iâ€™s are the random variables; a2 is the vector of variance component estimates; and r is the number of random variables in the model (k-1). The entire procedure from forming T, to solving for the variance components continues until the variance component estimates from the last iteration are no more different from the estimates of the previous iteration than the convergence criterion specifies. The fixed effect estimates and predictions of random variables are then those of the final iteration. The asymptotic covariance matrix for the variance components is obtained as VarÃo2) = 2{tr(QViQVj)}1 54 by utilizing intermediate results from the solution for the variance components. The coefficient matrix of Hendersonâ€™s mixed model equations is formed in order to calculate the covariance matrix for fixed and random effects. The covariance matrix for 86 observations is constructed using the variance components estimates from Giesbrechtâ€™s algorithm. The coefficient matrix is Xâ€™Râ€™X Xâ€™R'Z Zâ€™R'X Zâ€™R'Z + D 1 5-5 where R is the error covariance matrix which in this application is I(fw where Z is the random effects design matrix; and D is the covariance matrix for the random variables which, in this application, has variance components on the diagonal and zeroes on the off-diagonal (no covariance among random variables). The generalized inverse of the matrix (equation 5-5) is the error covariance matrix of the fixed effect estimates and random predictions assuming the covariance matrix for observation is known without error. Operating GAREML While GAREML will run in either batch or interactive mode, we focus on the interactive PC-version which begins by prompting the analyst to answer questions determining the factors to be read from the data. Specifically, the analyst answers yes or no to these questions: 1) are there multiple locations? 2) are there multiple blocks? 3) are there disconnected sets of full-sibs? i.e., usually referring to disconnected half-diallels and 4) is the mating design half-sib or full-sib? The program then determines the proper variables to read from the data as well as the most complicated (number of main factors plus interactions) scalar linear model allowed. The most complicated linear model allowed for full-sib observations is 87 yijkim - M + tÂ¡ + bÂ¡j + setâ€ž + gk + g, + Su + tgfc + tgu + tSuj + pÂ¡jkJ + wijklm 5-6 where yijklm is the m- observation of the kl- cross in the j- block of the i- test; H is the population mean; tÂ¡ is the random or fixed variable test environment; by is the random or fixed variable block; setc is the random or fixed variable set, i.e., a variable is created so that disconnected sets of half-diallels planted in the same experiment can be analyzed in the same run or to analyze provenances and families within provenance where provenance equals set; sets are assumed to be across test environments and blocks with families nested within sets and interactions with set are assumed unimportant. gk is the random variable female general combining ability (gca); g, is the random variable male gca; Sy is the random variable specific combining ability (sea); tgfr is the random variable test by female gca interaction; tgu is the random variable test by male gca interaction; tSuj is the random variable test by sea interaction; pijkl is the random variable plot; wijklm is the random variable within-plot; and there is no covariance between random variables in the model. The assumptions utilized are the variance for female and male random variables are equal (a2^ = a2g, = (Tg); and female and male environmental interactions are the same (a2^ = a2^ = a\). The most complicated scalar linear model allowed for half-sib observations is yijkn, = M + tÂ¡ + by + set0 + gk + tgik + phijk + whijkm 5-7 88 where yijkm is the m- observation of the kâ€” half-sib family in the jâ€” block of the iâ€” test; H, tÂ¡, bij; seto; gk, and tg* retain the definition in the full-sib equation; phijk is the random variable plot containing different genotype by environment components than the full-sib model; whijkni is the random variable within-plot containing different levels of genotypic and genotype by environment components than the full-sib model; and there is no covariance between random variables in the model. The analyst builds the linear model by answering further prompts. If test, block and/or set are in the model, they must be declared as fixed or random effects. When any of the three effects is declared random, the analyst must furnish prior values for the variance. If no prior value is known, 1.0â€™s may be used as priors. Using 1.0â€™s as priors will not affect the values for resulting variance component estimates within the constraints of the convergence criterion; but there may be a time penalty due to increasing the number of iterations required for convergence. All remaining factors in the model are treated as random variables. To complete the definition of the model, the analyst chooses to include or exclude each possible factor by answering yes or no when prompted. After each yes answer, the program asks for a prior value for the variance. Again, if no known priors exist, 1.0â€™s may be substituted. After the model has been specified, the program counts the number of fixed effects and the number of random effects and asks if the number fits the model expected. A "yes" answer proceeds through the program while a "no" returns the program to the beginning. GAREML is now ready to read the data file (which must be an ASCII data file) in this order: test, block, set, female, male, and the response variable. The analyst is prompted to furnish a proper FORTRAN format statement for the data. Test, block, set, female and male are read as character variables (A fields) with as many as eight characters per field, while the data 89 vector (response variable) is read as a double precision variable (F field). An example of a format statement for a full-sib mating design across locations and blocks is "(4A8,F10.5)" which reads four character variables sequentially occupying 8 columns each and the reponse variable beginning in column 33 and ending in column 42 having five decimal places. After reading the data, GAREML begins to furnish information to the analyst. This information should be scanned to make sure the data read are correct. This information includes the number of parents, the number of full-sib crosses, the number of observations, the maximum number of fixed effect design matrix columns, and the maximum number of random effect design matrix columns. If there is an error at this point, use CTRL-BRK to exit the program. Probable causes of errors are the data are not in the format specified, missing values are included, blank lines or other similar errors are in the data file, or the model was not correctly specified. At this point, there are three other prompts concerning the data analysis (number of iterations, convergence criterion and treatment of negative variance components). The number of iterations is arbitrarily set to 30 and can be changed at the analystâ€™s discretion. No warning is issued that the maximum number of iterations has been reached; however, the current iteration number and variance component estimates are output to the screen at the beginning of each iteration. The convergence criterion used is the sum of the absolute values of the difference between variance component estimates for consecutive iterations. The criterion has been set to lxlO'4 meaning that convergence is required to the fourth decimal place for all variance components. The convergence criterion should be modified to suit the magnitude of the variances under consideration as well as the practical need for enhanced resolution. Enhanced resolution is obtained at the cost of increasing the number of iterations to convergence. The analyst must decide whether to accept and use negative estimates or to set negative estimates to zero and re-solve the system. The latter solution results in variance component 90 estimates with lower sampling variance and slight bias. If one is interested in unbiased estimates of variance components that have a high probability of negative estimates, then accepting and using the negative estimates may be the proper course to take. Interpreting GAREML Output Analysis is now underway. The priors for each iteration and the iteration number are printed out to the screen. GAREML continues to iterate until the convergence criterion is met or the maximum number of iterations is reached. The next time that analyst intervention is required is to provide a name for the output fde for variance component estimates. The fde name follows normal DOS file naming protocol; however, alternative directories may not be specified, i.e., all outputs will be found in the same directory as the data file. The program will now quiz the analyst to determine if additional outputs are desired. These additional outputs are gca predictions, sea predictions (if applicable), the asymptotic covariance matrix for the variance components, generalized least squares fixed effect estimates, error covariance matrix of the gca predictions and error covariance matrix for fixed effects. An answer of yes to the inclusion of an output will result in a prompting for a file name. In addition, for gca and sea predictions the analyst may input a different value for o2ga or crwith which to scale predictions. The discussion which follows furnishes more detailed information concerning GAREML outputs. Variance Component Estimates Ignoring concerns about convergence to a global maximum and negative values, variance component estimates are restricted maximum likelihood estimates of Patterson and Thompson (1971). The estimates are robust against starting values (priors), i.e., the same estimates, within the limits of the convergence criterion, can be obtained from diverse priors. However, priors 91 close to the true values will, in general, reduce the number of iterations required to reach convergence. The value of the convergence criterion must be less than or equal to the desired precision for the variance components. REML variance component estimates from this program have been shown to have more desirable properties (variance and bias) than other commonly used estimation techniques (maximum likelihood, minimum norm quadratic unbiased estimation and Hendersonâ€™s Method 3) over a wide range of data imbalance. The properties of the estimates are further enhanced by using individual observations as data rather than plot means. The output is labelled by the variance component estimated. Predictions of Random Variables The predictions output are for general and specific combining abilities and approximate best linear unbiased predictions (BLUP) of the random variables. BLUP predictions have several optimal properties: 1) the correlation between the predicted and true values is maximized; 2) if the distribution is multivariate normal then BLUP maximizes the probability of obtaining the correct rankings (Henderson 1973) and so maximizes the probability of selecting the best candidate from any pair of candidates (Henderson 1977). Predictions are of the form: u = Â£>Zâ€™V *(y-X6) 5-8 where Ã¼ is the vector of predictions; Â£> is the estimated covariance matrix for random variables from the REML variance component estimates, see equation 5-5; Zâ€™ is the transpose of the design matrix for random variables; y is the data vector; X is the design matrix for fixed effects; 92 6 is the vector of fixed effect estimates; and V is the estimated covariance matrix for observations from REML variance component estimates. NOTE: if predictions are desired based on prior values for the variance components, set the number of iterations to 1 after having input the desired values as priors. Predictions are output as a labelled vector. Asymptotic Covariance Matrix of Variance Components The output for the asymptotic covariance matrix (AVCM) of variance components is from equation 5-4. This output represents the variance of repeated minimum variance quadratic unbiased variance component estimates using the same experimental design if the estimates are equal to the true values. This technique has been used for simulation work to define optimal mating and field designs (McCutchan et al. 1989). The AVCM is used to create the asymptotic variance of linear combinations of estimates of variance components as VarfLâ€™a2) = Lâ€™Var(^)L 5-9 where L specifies the linear combination(s) of variance components; Ã³2 is the vector of variance component estimates; and VarÃo2) is the AVCM from equation 5-4. The diagonal elements of Lâ€™VarÃÃ³^L are the variances of the linear combinations and the off- diagonal elements are the covariances between the linear combinations. These values are then useful for Taylor series approximation of the variance of a ratio of linear combinations such as heritability. AVCM is output as a vector (half-stored matrix) and each row of the output is labelled. 93 Fixed Effect Estimates Fixed effect estimates are those of generalized least squares and are in a set to zero format. Set to zero format (commonly seen in SAS3 output) is characterized by the last level of a main effect or nested effect being set to zero. These estimates are approximately best linear unbiased estimates (BLUE) of the fixed effects because the covariance matrix for observations was estimated and not known without error. Kackar and Harville (1981) have shown, for a broad class of variance estimators, that the fixed effects estimates are still unbiased. The word "Best" in BLUE refers to the properties of minimum variance for the class of unbiased estimators. Generalized least squares estimates, in set to zero format, for fixed effects are of the form: 6 = (Xâ€™V'XyXâ€™V-'y 5-10 where 6, X, V and y are as defined in equation 5-8. Fixed effect estimates are output as a labelled vector. Error Covariance Matrices The error covariance matrices for predictions and fixed effect estimates are obtained by producing a generalized inverse of equation 5-5 (Henderson 1984, McLean 1989). Since all covariance matrices are symmetric, the output is in the form of a vector which is equivalent to a half-stored matrix. Output for error of gca predictions is labeled while the error of fixed effects is not. The labeling on gca errors makes the unlabelled output for fixed effect variance self- explanatory. The error covariance matrix for gca predictions can be converted to the covariance matrix for gca predictions by forming the covariance matrix for the gca random variables and 3SAS is the registered trademark of SAS Institute Inc., Cary, North Carolina. 94 subtracting the error covariance matrix. The covariance matrix for predictions has been denoted as Var(g) by White and Hodge (1989). Example The following discussion involves the analysis of a simulated data set in order to further demonstrate the outputs of GAREML. Data The data (Table 5-1) was generated using a six-parent half-diallel mating design and a randomized complete block field design. The field design is in two locations with four complete blocks per location and two trees per family per block. The underlying genetic parameters for the data are individual tree heritability equals 0.25, Type B correlation equals 0.8, dominance to additive variance ratio equals 0.25 and the population mean equals 15.0. After a balanced data set was generated, the observations were subjected to 40% random deletion (simulating 60% survival). The data set is comprised of a small number of observations and while not an optimal application of GAREML serves well as an illustration. Analysis The analysis was carried out with two different linear models using individual observations as the data. The model contained eight sources of variation and was from equation 5-6 without the variable set. In model 1, test environment and blocks within test are declared fixed. The subsequent model (model 2) has all random effects except the mean. Variance 95 Table 5-1. Data for example of GAREML operation. L, Bl, F, M, T and RV stand for location, block, female, tree and response variable, respectively. A proper FORTRAN read format would be (A2,T5,A2,T9,A2,T13,A2,T22,F10.5). L Bl F M T RV 1 1 1 2 1 19.07165 1 1 1 3 1 13.17908 1 1 1 6 1 14.33610 1 1 1 6 2 12.48194 1 1 2 3 1 7.57821 1 1 2 3 2 12.73262 1 1 2 5 1 18.38451 1 1 2 5 2 9.84538 1 1 2 6 1 15.60306 1 1 2 6 2 17.44872 1 1 3 4 1 14.59613 1 1 3 5 1 16.95861 1 1 3 5 2 15.02863 1 1 3 6 1 15.95634 1 1 4 5 1 19.13362 1 1 4 5 2 12.08240 1 1 4 6 1 5.37647 1 1 5 6 1 18.87956 1 2 1 3 2 16.79470 1 2 1 5 1 15.81553 1 2 1 5 2 19.77063 1 2 1 6 1 17.49746 1 2 1 6 2 18.81207 1 2 2 3 1 15.03569 1 2 2 5 1 11.68149 1 2 2 6 2 12.78227 1 2 3 4 1 13.39599 1 2 3 5 1 13.54873 1 2 3 5 2 12.00935 1 2 3 6 1 16.89523 1 2 3 6 2 20.48223 1 2 4 5 1 15.21563 1 2 4 6 1 14.21138 1 2 4 6 2 15.65649 1 2 5 6 1 21.36959 1 2 5 6 2 16.39244 1 3 1 3 1 18.83196 1 3 1 3 2 20.45754 1 3 1 4 1 14.10900 1 3 1 4 2 16.49369 1 3 1 6 2 14.25154 1 3 2 3 1 19.57695 1 3 2 5 2 12.38303 96 Table 5-1 LB1 F 1 3 2 1 3 3 1 3 3 1 3 3 1 3 4 1 3 5 1 3 5 1 4 1 1 4 1 1 4 1 1 4 1 1 4 1 1 4 2 1 4 2 1 4 2 1 4 2 1 4 3 1 4 3 1 4 3 1 4 3 1 4 4 1 4 4 1 4 4 1 4 5 1 4 5 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 2 2 1 2 2 1 2 2 1 2 2 1 2 2 1 3 2 1 3 2 1 4 2 1 4 2 1 4 2 1 4 2 1 5 2 1 5 2 2 1 --continued M T RV 6 2 17.12110 4 1 13.03351 4 2 13.20463 5 2 12.44908 5 1 14.28528 6 1 17.57996 6 2 16.57026 3 1 16.91731 3 2 18.36209 4 2 16.70828 5 2 21.29535 6 1 15.23314 3 1 12.14596 3 2 12.20679 4 1 11.83520 6 1 14.27080 4 1 14.34923 4 2 16.39791 5 1 12.17513 5 2 14.95300 5 2 11.63311 6 1 13.29654 6 2 15.90303 6 1 17.22657 6 2 10.04577 2 2 9.80034 3 1 12.12891 3 2 18.00497 4 1 12.68041 4 2 13.14452 6 1 19.19915 3 1 5.36263 3 2 13.39351 5 2 11.13499 6 1 13.46429 6 2 16.87729 4 2 9.24115 6 2 13.49004 5 1 11.88620 5 2 9.83032 6 1 11.46474 6 2 12.68435 6 1 16.66260 6 2 14.14226 2 1 15.77378 97 Table 5-1 LB1 F 2 2 1 2 2 1 2 2 1 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 3 2 2 4 2 2 4 2 2 4 2 2 5 2 2 5 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 2 2 3 3 2 3 3 2 3 4 2 3 4 2 3 4 2 3 4 2 3 5 2 3 5 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 2 2 4 2 --continued M T RV 3 1 13.28328 4 1 11.22915 4 2 9.94041 5 2 14.03251 6 2 20.41990 3 1 10.74312 4 1 6.72215 5 1 12.77779 5 2 11.10388 4 1 12.52286 5 1 8.02745 5 1 14.14567 5 2 11.85937 6 2 14.61252 6 1 10.56892 6 2 14.13368 2 1 21.17819 3 1 13.56761 4 1 9.35457 5 1 13.78936 6 1 11.12412 3 1 9.41810 3 2 12.77555 4 1 15.38449 4 2 9.64170 5 2 11.64608 6 1 11.79241 6 2 9.14105 4 1 8.92909 6 1 8.08095 5 1 10.13996 5 2 10.30808 6 1 9.88286 6 2 8.80803 6 1 11.65281 6 2 7.90006 3 1 12.72744 3 2 14.44072 4 1 14.67983 5 1 9.27305 5 2 16.99880 6 1 14.17835 3 1 14.14628 3 2 10.64403 98 Table 5-1â€”continued L B1 F M T RV 2 4 2 4 1 16.55552 2 4 2 5 1 10.30221 2 4 2 5 2 13.24760 2 4 3 4 2 8.44671 2 4 3 5 1 14.12292 2 4 3 5 2 14.17583 2 4 3 6 1 13.92882 2 4 3 6 2 16.18924 2 4 4 5 1 8.89750 2 4 4 5 2 9.79576 2 4 4 6 1 12.29319 2 4 4 6 2 9.16987 2 4 5 6 1 14.85018 2 4 5 6 2 16.69414 components are estimated with model 1 receiving two different treatments of negative estimates, i.e., live with the negative estimates (model 1 A) or re-solve the system setting negative estimates to zero (model IB). The different models and methods for dealing with negative estimates are demonstrated so that the reader can see a range of outputs from GAREML. Output Variance component estimates The variance component estimates are Model 1A SIGMA-SQUARED GCA 1.221435 SIGMA-SQUARED SCA 0.233278 SIGMA-SQUARED LOCxGCA -0.096850 SIGMA-SQUARED LOCxSCA -0.548142 SIGMA-SQUARED BLOCKxFAM 1.242110 SIGMA-SQUARED ERROR 7.285051; Model IB SIGMA-SQUARED GCA 1.160636 SIGMA-SQUARED SCA 0.003190 SIGMA-SQUARED LOCxGCA 0.000000 99 SIGMA-SQUARED LOCxSCA 0.000000 SIGMA-SQUARED BLOCKxFAM 0.753049 SIGMA-SQUARED ERROR 7.375388; and Model 2 SIGMA-SQUARED LOCATION 3.430921 SIGMA-SQUARED BLOCK(LOC) 0.000000 SIGMA-SQUARED GCA 1.233609 SIGMA-SQUARED SCA 0.000000 SIGMA-SQUARED LOCxGCA 0.000000 SIGMA-SQUARED LOCxSCA 0.000000 SIGMA-SQUARED BLOCKxFAM 0.960168 SIGMA-SQUARED ERROR 7.197284. These variance component estimates illustrate outputs for the random model, the mixed model and the alternatives for dealing with negative estimates. Fixed effect estimates Fixed effect estimates are Model IB MU 13.085052 LOCATION 1 1.805455 LOCATION 2 0.000000 BLOCK(LOC) 1 -0.475396 BLOCK(LOC) 2 0.856959 BLOCK(LOC) 3 0.844716 BLOCK(LOC) 4 0.000000 BLOCK(LOC) 5 -0.219529 BLOCK(LOC) 6 -0.526635 BLOCK(LOC) 7 -1.682449 BLOCK(LOC) 8 0.000000; and Model 2 MU 13.809567. The interpretation of fixed effect estimates for model IB is that blocks 1 through 4 belong with location 1 and the fourth block is set to zero. Blocks 5 through 8 are those of location 2 and the eighth block is set to zero as well as location 2. Sets of blocks within location can always be determined by the last block within a location being set to zero. The interpretation of set to zero 100 is MU is the mean of the fourth block (labelled block 8) in location two; and any estimable function of the fixed effects can be generated from these estimates. An example of an estimable function would be the site mean of location 1. This mean would be estimated as MU + LOCATION 1 + l/4(BLOCK(LOC) 1 + BLOCK(LOC)2 + BLOCK(LOC)3 + BLOCK(LOC) 4). MU of model 2 is the estimate of the general mean across sites if all other factors are random. All of these estimates are the result of generalized least squares estimation. Asymptotic covariance matrix for the variance components The asymptotic covariance matrix for the variance components in model IB would appear as ASYMPTOTIC VARIANCE COVARIANCE MATRIX GCA GCA 0.7902569240 GCA SCA -0.0490465017 GCA LOCxGCA 0.0000000000 GCA LOCxSCA 0.0000000000 GCA BLOCKxFAM 0.0003970615 GCA ERROR -0.0001155675 SCA SCA 0.2047376344 SCA LOCxGCA 0.0000000000 SCA LOCxSCA 0.0000000000 SCA BLOCKxFAM -0.1319741909 SCA ERROR -0.0020057997 LOCxGCA LOCxGCA 0.0000000000 LOCxGCA LOCxSCA 0.0000000000 LOCxGCA BLOCKxFAM 0.0000000000 LOCxGCA ERROR 0.0000000000 LOCxSCA LOCxSCA 0.0000000000 LOCxSCA BLOCKxFAM 0.0000000000 LOCxSCA ERROR 0.0000000000 BLOCKxFAM BLOCKxFAM 1.6336304265 BLOCKxFAM ERROR -1.2680804956 ERROR ERROR 2.0069152440 This matrix, as are all other matrices output, is half-stored. The output is read as "GCA GCA" is the asymptotic variance of the gca variance component. The next row labelled "GCA SCA" 101 is the asymptotic covariance between the estimates of the gca variance component and the sea variance component. Thus the next four rows are asymptotic covariances of gca variance estimates with the other random variables in the model. The other rows are read in a like manner and if the analyst wished to array the output as a matrix, all necessary components are at hand. Predictions of random variables All predictions of random variables are appropriately labelled according to the character name read from the data and for model IB would appear as (from the gca output) GCA 1 GCA 2 GCA 3 GCA 4 GCA 5 GCA 6 1.573253 -0.356262 -0.423469 -1.310747 -0.054977 0.572202; (from the sea output) SCA 1 2 SCA 1 3 SCA 1 4 SCA 1 5 SCA 1 6 SCA 2 3 SCA 2 4 SCA 2 5 SCA 2 6 SCA 3 4 SCA 3 5 SCA 3 6 SCA 4 5 SCA 4 6 SCA 5 6 0.003806 0.002662 -0.002028 0.001562 -0.001678 -0.003976 0.001827 -0.003550 0.000914 -0.000036 -0.002495 0.002681 0.000656 -0.004021 0.003676. All these predictions are approximately best linear unbiased predictions and are approximate because the variance components were estimated from the same data. 102 Error covariance matrix of the predictions The error covariance matrix of the predictions is output as a half-stored matrix with each row appropriately labelled. This matrix for model IB appears as THE ERROR VARIANCE COVARIANCE MATRIX FOR GCA ARRAYED AS A VECTOR 0.3618685934 1 1 0.1692300980 1 2 0.1465129987 1 3 0.1583039830 1 4 0.1713608386 1 5 0.1533590404 1 6 0.3687218966 2 2 0.1382132356 2 3 0.1730487382 2 4 0.1543784409 2 5 0.1570431430 2 6 0.3545855963 3 3 0.1622943256 3 4 0.1744667783 3 5 0.1845626177 3 6 0.3518724881 4 4 0.1567087948 4 5 0.1584072224 4 6 0.3466599143 5 5 0.1570607852 5 6 0.3502027434 6 6. The labelling of the output is interpreted identically to that for the asymptotic variance covariance matrix for the variance components. Those rows which contain a parental name twice are the error variance for that parental prediction and those rows containing two parental names are the error covariance for the two parental predictions. In this unbalanced case the reader will see that some parents have more error associated with their predictions than others, i.e., compare the error for parent 2 with parent 5. This is true because of the varying number of observations associated with the prediction for each parent and also the varying distribution of those observations across tests and blocks. If one assume that the estimate for gca variance from the 103 data equals the true variance for gca, then the correlation of the prediction with the true value (Corr(g,g), White and Hodge 1989) for parent 5 is equal to Vl - ( . 347/1. 161) or 0.84. Error covariance matrix for the fixed effects The error covariance matrix for the fixed effects is output as a half-stored matrix. The output is not labelled; however, one only has to know the total number of levels for all fixed effects to assign labels if needed. The primary use of this matrix is to estimate the variance of estimable functions of the fixed effects. If 1 denotes the vector containing the specification of an estimable function and Vb denotes the error covariance matrix for fixed effects, then the variance of an estimable function is equal to lâ€™Vbl. 1â€™ for the mean of test 1 equals [1 1 0 1/4 1/4 1/4 1/4 00 00], Conclusions GAREML is an analytical tool for use with models common to forest genetics. The properties of the variance component estimation algorithm have been documented by simulation studies and the algorithm presents solutions as restricted maximum likelihood estimates. Many other outputs are available from the program including best linear unbiased predictions, generalized least squares estimates of fixed effects, error covariance matrices of predictions and estimates, and the asymptotic covariance matrix for variance component estimates. GAREML is not intended to be used as a black box. The program has many potential uses: variance component estimation, parental evaluation, progeny evaluation and simulated evaluation of mating and field design. However, thoughtful interpretation of the outputs is needed in order to realize the power and utility of the program. CHAPTER 6 CONCLUSIONS Optimal mating design for the determination of genetic architecture was explored. General conclusions were reached through comparison of the half-diallel, half-sib and circular mating designs. In particular, the comparison of the half-diallel and circular designs is pertinent to the establishment of future progeny tests in which full-sib families are desired. Across the experimental levels examined, the circular mating design provides more efficient estimates of parameters for genetic architecture than the half-diallel design. If an estimate of the variance in general combining abilities is required, the half-sib design is more efficient than the circular mating design over most of the experimental levels examined. This pattern of efficiency argues for complementary mating designs involving half-sib designs (open-pollinated or polycross) to work estimate general combining ability and a second design (full-sib mating) to generate crosses from which to make selections. Complimentary mating designs do require a greater monetary and temporal commitment. If this type of commitment is not justified or possible, then the circular mating design should be used to generate full-sib families and estimate genetic parameters simultaneously. Considering field design in combination with mating design, full-sib designs reach maximum efficiency for genetic parameter estimation in fewer numbers of replicates across locations than half-sib designs. For any specific case of field design and the half-sib mating design, a priori knowledge of the genetic architecture is required to choose the optimal field design for number of locations. 104 105 In cases where maximum efficiency of an experimental design is obtained and the precision of genetic parameter estimates is still less than desired, the optimal use of experimental units would be disconnected sets of experiments at maximum efficiency with the parameter estimate then being a mean of the estimates from the disconnected experiments. Of the three mating designs only the half-diallel exhibits efficiency optima for number of parents. The optimum for number of parents in half-diallels is always close to and never larger than six parents with the fluctuation resulting from the genetic architecture. Thus for half-diallels for maximum efficiency in genetic parameter estimation, the number of parents should not exceed six and desired parameter precision obtained by using disconnected sets of six parents. Optima for number of locations exist for all mating designs and maximum efficiency would again be obtained by replicating an experiment only for the optimal number of locations. A parameter estimate of the desired precision would be calculated as a mean of disconnected experiments. Optimal analysis was dealt with on two stages (estimating parental worth and estimation of variance components or genetic architecture). The estimation of parental worth was examined for the half-diallel mating design. It is argued, on theoretical grounds and in generality, that best linear unbiased prediction and best linear prediction are more suited to the problem of parental evaluation than ordinary least squares. Using simulated data for two mating designs (half-diallel and half-sib) variance component estimation techniques were compared with vary levels of data imbalance and two levels of genetic control. In estimating variance components (or genetic ratios such as heritability) four criteria were adopted for discrimination among estimation techniques (probability of nearness, bias, mean square error and variance of estimation). Of the four, only bias and variance of estimation proved informative. Bias proved useful in discriminating among treatments of negative estimates with accepting and living with the negative estimates having the least bias, re-solving the system 106 with negative estimates set to zero intermediate in bias and setting negative estimates to zero producing the most bias. Variance of estimation also was discriminatory among treatments of negative estimates with accepting and living with negative estimates having the highest variance, setting negative estimates to zero intermediate in variance and re-solving the system setting negative estimates to zero having the lowest variance. Variance of estimation was also discriminatory among units of observation and variance component estimation techniques. Of the two units of observation used (individuals and plot means), individual observations produced estimates with better properties across all levels of imbalance, mating designs and variance component estimation techniques. Of the variance component estimation techniques contrasted, restricted maximum likelihood produced estimates with the best properties (bias and variance of estimation) across all mating designs, levels of genetic control and levels of imbalance. Therefore it is proposed that restricted maximum likelihood estimation with individual observations as data should be utilized. With the recommendation to use restricted maximum likelihood, the program used to analyze the simulated data was rewritten into a user friendly format able to analyze both full-sib and half-sib data. Additional outputs (other than variance components) were also added as options. These outputs include general and specific combining ability predictions, the asymptotic covariance matrix for variance components, generalized least squares estimates of fixed effects and the covariance matrices for predictions and estimates. o o APPENDIX FORTRAN SOURCE CODE FOR GAREML C******XHIS program PRODUCES REML AND MIVQUE VARIANCE************* C****COMPONENT ESTIMATES BY STARTING ITERATION FROM THE*********** C****TRUE VALUES OF THE PARAMETERS THROUGH THE USE OF************* C***************QjE C PARAMETER SIZE DECLARATION SHOULD BE GLOBAL SINCE THEY ARE C ALSO SPECIFIED IN THE SUBROUTINES PROGRAM MAIN PARAMETER ( C NOBSER IS THE MAXIMUM NUMBER OF OBSERVATIONS N NOBSER = 5000, C NOBL IS THE MAXIMUM NUMBER OF BLOCKS PER LOCATION N NOBL=36, C NOCR IS THE MAXIMUM NUMBER OF FULL-SIB CROSSES N NOCR = 75, NOBH IS THE MAXIMUM NUMBER OF FIXED EFFECT LEVELS INCLUDING THE MEAN N NOBH = 200, C NVARBH DIMENSIONS THE VARIANCE COVARIANCE MATRIX FOR FIXED C EFFECTS N NVARBH = (NOBH*(NOBH-1 ))/2 + NOBH, C NOGCA IS THE MAXIMUM NUMBER OF PARENTS N NOGCA=50, C NOVARG DIMENSIONS THE VARIANCE COVARIANCE MATRIX FOR GCA N NOVARG = (NOGCA*(NOGCA-l))/2 + NOGCA, C NOX IS THE MAXIMUM NUMBER OF COLUMNS FOR FIXED EFFECTS PLUS C RANDOM EFFECTS C PLUS ONE FOR THE DATA N NOX =1400, C NOCBS IS THE MAXIMUM NUMBER OF LEVELS FOR THE RANDOM EFFECT C HAVING THE GREATEST NUMBER, USUALLY CROSS BY BLOCK OR PLOT C COMBINATIONS N NOCBS =1000, C NTOT IS THE TOTAL NUMBER OF COLUMNS OF NOX PLUS NOCBS N NTOT = NOX + NOCBS, C OTHER PARAMETERS USE THE PREVIOUS DECLARATIONS TO ALLOCATED C SUFFICIENT SIZE TO SYMMETRIC MATRICES STORED AS VECTORS N NIZED = NOX*NOCBS, 107 108 N NIXPX = ((NOX*(NOX-1 ))/2) + NOX, N NSIP = NOX + NOCBS, N NIZEP = ((NSIP*(NSIP-1 ))/2) + NSIP) COMMON/CMN1/ NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, N NCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, N NCLRAN,NCOLSE,NRAN(9) COMMON/CMN2/ N YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), N BHAT(NOBH),SCA(NOCR) COMMON/CMN3/ DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), N PARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) DIMENSION TEST(NOBSER),BLOCK(NOBSER),F(NOBSER),M(NOBSER), N FM(NOBSER),REML(9),VARHAT(9),SOL(9,10),DUM(9),PRI(9), N SET(NOBSER),NAME(9),NUMMY(9),VARG(NOVARG),VARBH(NVARBH) INTEGER NCOLT,NCOLB,NCOLG,NCOLS,NCOLGT,NCOLST,NCOLCB,NOBS, N NCOLTB,NCOLX,NCOLSE,NCL,NCLFIX,NCLRAN,NORAN,NOFIX, N NUMMY,NRAN,NOITS,LEP DOUBLE PRECISION YQVQY,SIG,REML,ZAG,VARHAT,SOL,MEAN,DUM,VQVQ, N GCA,BHAT,PRI,SCA,SCALES,SCALEG,VARG,VARBH REAL CVERG CHARACTER* 1 DTERM,DUMDUM,DUM2,DUMB CHARACTER*80 FMAT CHARACTER* 16 FLNAME,FM,FMVEC,NT,KICK,LICK CHARACTER* 11 NAME,RANNAM CHARACTER* 13 SIGMA CHARACTER*8 TEST,LOCO,F,M,PARENT,BLOCK,SET,DISSET,REP SIGMA =â€™SIGMA-SQUAREDâ€™ NAME(1) = â€™LOCATIONâ€™ NAME(2) = â€™BLOCK(LOC)â€™ NAME(3) = â€™SETâ€™ NAME(4) = â€™GCAâ€™ NAME(5) = â€™SCAâ€™ NAME(6) = â€™LOCxGCAâ€™ NAME(7) = â€™LOCxSCAâ€™ NAME(8) = â€™BLOCKxFAMâ€™ NAME(9) =â€™ERRORâ€™ OPEN(UNIT = 13,STATUS = â€™SCRATCHâ€™,FORM = â€™UNFORMATTEDâ€™) DO 2031 1=1,8 DO 2032 J = 1,2 DTERM(I,J) = â€™ â€™ 2032 CONTINUE 2031 CONTINUE PRINT *, â€™ REML VARIANCE COMPONENTS ESTIMATED BY THE METHOD OF NSCORINGâ€™ PRINT *, â€™ THROUGH THE USE OF GIESBRECHTS ALGORITHMâ€™ PRINT *, â€™ WRITTEN BY DUDLEY HUBER UNIVERSITY OF FLORIDAâ€™ PRINT * * 109 PRINT *, â€™WARNING YOU HAVE JUST ENTERED THE TWILIGHT ZONE OF NVARIANCE COMPONENTSâ€™ PRINT *, â€™ANSWER Y FOR YES OR N FOR NO TO THE FOLLOWING QUESTIONSâ€™ WRITE(6,2012) 2012 FORMAT(/ â€™fc******************************************************* J^************** j A WRITE(6,2012) 10101=0 J = 0 2500 FORMAT(â€™ PLEASE TRY AGAINâ€™) PRINT*,â€™ FIRST THE FACTORS TO BE READ FROM THE DATA WILL BE DETE NRMINEDâ€™ 2501 PRINT *, â€™ DOES THE DATA HAVE MULTIPLE LOCATIONS? READ(6,1501) DTERM(1,1) IF((DTERM(1,1).NE.â€™Yâ€™).AND.(DTERM(1,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 2501 ENDIF 2502 PRINT *, â€™ ARE THERE MULTIPLE BLOCKS(LOCATION) IN THE DATA? â€™ READ(6,1501) DTERM(2,1) IF((DTERM(2,1).NE.â€™Yâ€™).AND.(DTERM(2,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 2502 ENDIF 2503 PRINT *, â€™ ARE THERE DISCONNECTED SETS OF GENETIC ENTRIES IN THE NDATA? â€™ READ(6,1501) DTERM(3,1) IF((DTERM(3,1).NE.â€™Yâ€™).AND.(DTERM(3,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 2503 ENDIF WRITE(6,2012) 7001 PRINT *,â€™ IS THE ANALYSIS BASED ON HALF-SIB (H) OR FULL-SIB FAMILI NES (F)? (H OR F) â€™ READ(6,1501) DUM2 IF((DUM2.NE.â€™Hâ€™).AND.(DUM2.NE.â€™Fâ€™)) THEN WRITE(6,2500) GO TO 7001 ENDIF PRINT *, â€™ NOW TO DETERMINE FIXED OR RANDOM FACTORS AND PRIORSâ€™ PRINT *, â€™ ANSWER F FOR FIXED OR R FOR RANDOM TO DETERMINE STATUSâ€™ IF (DTERM(1,1).EQ.â€™Nâ€™) GO TO 1001 2504 PRINT *, â€™ LOCATION IS FIXED OR RANDOM? â€™ READ(6,1501) DTERM(1,2) IF((DTERM(1,2).NE.â€™Fâ€™).AND.(DTERM(1,2).NE.â€™Râ€™)) THEN WRJTE(6,2500) GO TO 2504 ENDIF 110 IF (DTERM(1,2).EQ.â€™Fâ€™) THEN J=J+1 GO TO 1001 ENDIF DTERM(1,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR LOCATION? 1 = 1+1 READ(6,1502) PRI(I) 1502 FORMAT(F20.6) 1001 IF (DTERM(2,1).EQ.â€™Nâ€™) GO TO 1002 2505 PRINT *, â€™ BLOCK IS FIXED OR RANDOM? â€™ READ(6,1501) DTERM(2,2) IF((DTERM(2,2).NE.â€™Fâ€™).AND.(DTERM(2,2).NE.â€™Râ€™)) THEN WRITE(6,2500) GO TO 2505 ENDIF IF (DTERM(2,2).EQ.â€™Fâ€™) THEN J=J+1 GO TO 1002 ENDIF DTERM(2,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR BLOCK? 1 = 1+1 READ(6,1502) PRI(I) 1002 IF (DTERM(3,1).EQ.â€™Nâ€™) GO TO 1003 2506 PRINT *, â€™ SETS ARE FIXED OR RANDOM? READ(6,1501) DTERM(3,2) IF((DTERM(3,2).NE.â€™Fâ€™).AND.(DTERM(3,2).NE.â€™Râ€™)) THEN WRITE(6,2500) GO TO 2506 ENDIF IF (DTERM(3,2).EQ.â€™Fâ€™) THEN J=J+1 GO TO 1003 ENDIF DTERM(3,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR SETS? 1 = 1+1 READ(6,1502) PRI(I) 1003 PRINT *, â€™ ALL OTHER FACTORS ARE CONSIDERED RANDOMâ€™ PRINT *, â€™ ANSWER Y FOR YES OR N FOR NO FOR INCLUSION OF THE FACTO NR IN THE MODELâ€™ WRITE(6,2012) 2507 PRINT *, â€™ IS GCA IN THE MODEL? â€™ READ(6,1501) DTERM(4,1) IF((DTERM(4,1).NE.â€™Yâ€™).AND.(DTERM(4,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 2507 ENDIF IF (DTERM(4,1).EQ.â€™Nâ€™) GO TO 1004 C PRINT *, â€™ GCA IS FIXED OR RANDOM? C INPUT *, DTERM(4,2) C IF (DTERM(4,2).EQ.â€™Fâ€™) THEN C J=J+1 C GO TO 1004 C ENDIF DTERM(4,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR GCA? 1 = 1+1 READ(6,1502) PRI(I) IF(DUM2.EQ.â€™Hâ€™) THEN DTERM(5,1) = â€™Nâ€™ GO TO 1005 ENDIF 1004 PRINT *, â€™ IS SC A IN THE MODEL? READ(6,1501) DTERM(5,1) IF((DTERM(5,1).NE.â€™Yâ€™).AND.(DTERM(5,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 1004 ENDIF IF ((DTERM(5,l).EQ.â€™Nâ€™).OR.(DUM2.EQ.â€™Hâ€™)) GO TO 1005 C PRINT *, â€™ SCA IS FIXED OR RANDOM? â€™ C INPUT *, DTERM(5,2) C IF (DTERM(5,2).EQ.â€™Fâ€™) THEN C J=J+1 C GO TO 1005 C ENDIF DTERM(5,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR SCA? 1 = 1+1 READ(6,1502) PRI(I) 1005 IF(DTERM(1,1).EQ.â€™Nâ€™) GO TO 1007 PRINT *, â€™ IS LOCATIONxGCA INTERACTION IN THE MODEL? READ(6,1501) DTERM(6,1) IF((DTERM(6,1).NE.â€™Y,).AND.(DTERM(6,1).NE.,Nâ€™)) THEN WRITE(6,2500) GO TO 1005 ENDIF IF (DTERM(6,1).EQ.â€™Nâ€™) GO TO 1006 C PRINT *, â€™ LOCATIONxGCA IS FIXED OR RANDOM? â€™ C INPUT *, DTERM(6,2) C IF (DTERM(6,2).EQ.â€™Fâ€™) THEN C J=J + 1 C GO TO 1006 C ENDIF DTERM(6,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR LOCATIONxGCA? 1 = 1+1 READ(6,1502) PRI(I) 1006 IF(DUM2.EQ.â€™Hâ€™) THEN DTERM(7,1) = â€™Nâ€™ GO TO 1007 ENDIF PRINT *, â€™ IS LOCATIONxSCA IN THE MODEL? READ(6,1501) DTERM(7,1) IF((DTERM(7,1).NE.â€™Yâ€™).AND.(DTERM(7,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 1006 ENDIF IF ((DTERM(7,l).EQ.â€™Nâ€™).OR.(DUM2.EQ.â€™Hâ€™)) GO TO 1007 C PRINT *, â€™ LOCATIONxSCA IS FIXED OR RANDOM? â€™ C INPUT *, DTERM(7,2) C IF (DTERM(7,2).EQ.â€™Fâ€™) THEN C J=J+1 C GO TO 1007 C ENDIF DTERM(7,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR LOCATIONxSCA? 1 = 1+1 READ(6,1502) PRI(I) 1007 PRINT *, â€™ IS PLOT OR FAMILYxBLOCK IN THE MODEL? READ(6,1501) DTERM(8,1) IF((DTERM(8,1).NE.â€™Yâ€™).AND.(DTERM(8,1).NE.â€™Nâ€™)) THEN WRITE(6,2500) GO TO 1007 ENDIF IF (DTERM(8,1).EQ.â€™Nâ€™) GO TO 1008 C PRINT *, â€™ PLOT OR FAMILYxBLOCK IS FIXED OR RANDOM? C INPUT *, DTERM(8,2) C IF (DTERM(8,2).EQ. â€™Fâ€™) THEN C J=J+1 C GO TO 1008 C ENDIF DTERM(8,2) = â€™Râ€™ PRINT *, â€™ WHAT IS THE PRIOR FOR PLOT OR FAMILYxBLOCK? 1 = 1+1 READ(6,1502) PRI(I) 1008 PRINT *, â€™ WHAT IS THE PRIOR FOR ERROR? 1 = 1+1 READ(6,1502) PRI(I) J=J+1 NOFIX=J NORAN=I WRITE(6,1009) NOFIX,NORAN 113 1009 FORMATO THE NUMBER OF FIXED FACTORS PLUS THE MEAN = â€™,12,/, Nâ€™ THE NUMBER OF RANDOM FACTORS PLUS ERROR = â€™,12) PRINT *, â€™ DO THESE LEVELS MATCH YOUR INTENDED MODEL? Y OR N â€™ READ(6,1501) DUMDUM IF (DUMDUM.EQ.â€™Nâ€™) THEN PRINT *, â€™ RETURNING TO INITIALIZATION OF MODELâ€™ PRINT *, â€™ TO EXIT PROGRAM USE CONTROL-BREAKâ€™ GO TO 1010 ENDIF PRINT *, â€™ THE INPUT DATA SET NAME IS: READ(6,1503) FLNAME 1503 FORMAT(A16) WRITE(6,1011) 1011 FORMATO THE FORMAT OF THE DATA IS: REMEMBERING PARENTHESESâ€™,/) READ(6,10I2) FMAT 1012 FORMAT(A80) OPEN (1 ,FILE = FLNAME,STATUS = â€™OLDâ€™) NOBS=1 1 IF(DUM2.EQ.â€™Hâ€™) GO TO 2 IF((DTERM(1,1).EQ.â€™Nâ€™). AND. (DTERM(2,1). EQ.â€™Nâ€™). AND. (DTERM(3,1). NEQ.â€™Nâ€™)) GO TO 1013 IF((DTERM(1,1).EQ.â€™Nâ€™).AND.(DTERM(2,1).EQ.â€™Nâ€™)) GO TO 1014 IF((DTERM(1,1).EQ.â€™Nâ€™).AND.(DTERM(3,1).EQ.â€™Nâ€™)) GO TO 1015 IF(DTERM(1,1).EQ.â€™Nâ€™) GO TO 1000 IF((DTERM(2,1).EQ.â€™Nâ€™).AND.(DTERM(3,1).EQ.â€™Nâ€™)) GO TO 1016 IF(DTERM(2,1).EQ.â€™Nâ€™) GO TO 1017 IF(DTERM(3,1).EQ.â€™Nâ€™) GO TO 1018 READ( 1 ,FMT = FMAT,END = 3) TEST(NOBS),BLOCK(NOBS),SET(NOBS), N F(NOBS),M(NOBS),MEAN(NOBS) GO TO 1019 1018 READ (1,FMT = FMAT,END = 3) TEST(NOBS),BLOCK(NOBS),F(NOBS),M(NOBS), N MEAN(NOBS) GO TO 1019 1000 READ (1,FMT = FMAT,END = 3) BLOCK(NOBS),SET(NOBS),F(NOBS),M(NOBS), N MEAN(NOBS) GO TO 1019 1013 READ (1 ,FMT = FMAT,END = 3) F(NOBS),M(NOBS),MEAN(NOBS) GO TO 1019 1014 READ (1 ,FMT=FMAT,END = 3) SET(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) GO TO 1019 1015 READ (1,FMT=FMAT,END = 3) BLOCK(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) GO TO 1019 1016 READ (1,FMT = FMAT,END = 3) TEST(NOBS),F(NOBS),M(NOBS),MEAN(NOBS) GO TO 1019 1017 READ (1,FMT = FMAT,END = 3) TEST(NOBS),SET(NOBS),F(NOBS),M(NOBS), N MEAN(NOBS) GO TO 1019 2 IF((DTERM(1,1). EQ.â€™Nâ€™). AND. (DTERM(2,1). EQ.â€™Nâ€™). AND. (DTERM(3,1). 114 NEQ.â€™Nâ€™)) GO TO 7013 IF((DTERM(1,1).EQ.â€™Nâ€™).AND.(DTERM(2,1).EQ.â€™Nâ€™)) GO TO 7014 IF((DTERM(1,1).EQ.â€™Nâ€™).AND.(DTERM(3,1).EQ.â€™Nâ€™)) GO TO 7015 IF((DTERM(2,1).EQ.â€™Nâ€™).AND.(DTERM(3,1).EQ.,Nâ€™)) GO TO 7016 IF(DTERM(2,1).EQ.â€™Nâ€™) GO TO 7017 IF(DTERM(3,1).EQ.â€™Nâ€™) GO TO 7018 READ( 1 ,FMT=FMAT,END = 3) TEST(NOBS),BLOCK(NOBS),SET(NOBS), N F(NOBS),MEAN(NOBS) GO TO 1019 7018 READ (1 ,FMT=FMAT,END = 3) TEST(NOBS),BLOCK(NOBS),F(NOBS), N MEAN(NOBS) GO TO 1019 7013 READ (1,FMT = FMAT,END = 3) F(NOBS),MEAN(NOBS) GO TO 1019 7014 READ (1,FMT = FMAT,END = 3) SET(NOBS),F(NOBS),MEAN(NOBS) GO TO 1019 7015 READ (1,FMT=FMAT,END = 3) BLOCK(NOBS),F(NOBS),MEAN(NOBS) GO TO 1019 7016 READ (1,FMT=FMAT,END = 3) TEST(NOBS),F(NOBS),MEAN(NOBS) GO TO 1019 7017 READ (1,FMT=FMAT,END = 3) TEST(NOBS),SET(NOBS),F(NOBS), N MEAN(NOBS) 1019 NOBS=NOBS+1 GO TO 1 3 NOBS = NOBS-1 CLOSE(l) WRITE(6,2015) NOBS 2015 FORMAT(â€™ THE NUMBER OF OBSERVATIONS IS â€™,14) IF(DUM2.EQ.â€™Hâ€™) GO TO 7019 DO 4 1=1,NOBS FM(I) = F(I)//M(I) 4 CONTINUE 7019 K=0 DO 5010 1=1,8 IF(DTERM(I,1).EQ.â€™Nâ€™) GO TO 5010 IF(DTERM(I,2).EQ.â€™Râ€™) THEN K = K+ 1 RANNAM(K) = NAME(I) ENDIF 5010 CONTINUE RANNAM(K+ 1) = NAME(9) DO 72 1= l,NOCR FMVEC(I) = â€™ 72 CONTINUE J=0 DO 162 1=1,9 IF(PRI(I).GT.0.0) THEN J=J+1 115 SIG(J) = PRI(I) ENDIF 162 CONTINUE NCOLT=0 NCOLB=0 NCOLSE = 0 NCOLTB=0 NCOLG=0 NCOLS=0 NCOLGT=0 NCOLST=0 NCOLCB=0 IF(DTERM(1,1).EQ.â€™Nâ€™) GO TO 1020 CALL NOCOL(TEST,NOBS,LOCO,NCOLT) 1020 NCL(1) = NCOLT IF(DTERM(2,1).EQ.â€™Nâ€™) GO TO 1021 CALL NOCOL(BLOCK,NOBS,REP,NCOLB) 1021 IF(DTERM(3,1).EQ.â€™Nâ€™) GO TO 1022 CALL NOCOL(SET,NOBS,DISSET,NCOLSE) 1022 NCL(3) = NCOLSE IF((DTERM(1,1).EQ.,Nâ€™).AND.(DTERM(2,1).EQ.â€™Yâ€™)) THEN NCOLTB = NCOLB GO TO 1023 ENDIF NCOLTB = NCOLT*NCOLB 1023 IF(DUM2.EQ.â€™Hâ€™) THEN CALL NOCOL(F,NOBS,PARENT,NCOLG) GO TO 7022 ENDIF CALL NOPAR(F,M,NOBS,PARENT,NCOLG) 7022 NCL(2) = NCOLTB NCL(4) = NCOLG IF((DUM2.EQ.â€™Hâ€™).OR.(DTERM(5,l).EQ.â€™Nâ€™)) GO TO 7021 DO 32 1=1,NOBS IF(LEQ.l) THEN FMVEC(I) = FM(I) NCOLS=1 GO TO 32 ENDIF DO 33 J = l,NCOLS KICK=FM(I) LICK = FMVEC(J) IF(KICK.EQ.LICK) GO TO 32 33 CONTINUE NCOLS = NCOLS + 1 FMVEC(NCOLS) = FM(I) 32 CONTINUE DO 159 K= l,NCOLS-l 116 N = K +1 DO 159 J = N,NCOLS IF(FMVEC(K).LT.FMVEC(J)) GO TO 159 NT=FMVEC(K) FM VEC(K) = FM VEC(J) FMVEC(J) = NT 159 CONTINUE 7021 IF(DUM2.EQ.â€™Hâ€™) NCOLS = 0 NCL(5) = NCOLS NCOLST=NCOLS *NCOLT NCOLGT=NCOLG *NCOLT NCOLCB = NCOLS *NCOLTB IF(DUM2.EQ.â€™Hâ€™) NCOLCB = NCOLG*NCOLTB IF(DTERM(6,1).EQ. â€™ Nâ€™) NCOLGT=0 IF(DTERM(7,1).EQ.â€™Nâ€™) NCOLST = 0 IF(DTERM(8,1).EQ.â€™Nâ€™) NCOLCB = 0 NCL(6) = NCOLGT NCL(7) = NCOLST NCL(8) = NCOLCB WRITE(6,5005) NCOLG 5005 FORMATO NUMBER OF PARENTS IS â€™,14) WRITE(6,5006) NCOLS 5006 FORMAT(â€™ NUMBER OF FULL-SIB CROSSES IS â€™,14) NCLFIX = 1 NCLRAN=0 DO 1024 1=1,8 IF(DTERM(I,2).EQ.â€™Fâ€™) THEN NCLFIX = NCLFIX + NCL(I) GO TO 1024 ENDIF NCLRAN = NCLRAN + NCL(I) 1024 CONTINUE WRITE(6,6001) NCLFIX,NCLRAN 6001 FORMATO FIXED EFFECT COLUMNS = â€™,18, Nâ€™ RANDOM EFFECT COLUMNS = â€™,18) CVERG = .0001 PRINT *,â€™ THE CONVERGENCE CRITERION FOR VARIANCE COMPONENTS WHICH NEQUALSâ€™ PRINT *,â€™ THE SUM OF THE ABSOLUTE DEVIATIONS IS SET TO .0001.â€™ PRINT *,â€™ IF YOU WISH TO CHANGE TYPE Y IF NOT TYPE N. â€™ READ(6,1501) DUMDUM 1501 FORMAT(Al) IF(DUMDUM.EQ.â€™Nâ€™) GO TO 9021 PRINT*,â€™ THE CONVERGENCE CRITERION IS: â€™ READ(6,1502) CVERG 9021 NCOLX = NCLFIX + NCLRAN + 1 NOITS = 30 117 PRINT*,â€™ THE NUMBER OF ITERATIONS ALLOWED IS SET TO 30â€™ PRINT*,â€™ DO YOU WISH TO CHANGE THIS? (Y OR N) â€™ READ(6,1501) DUMDUM IF(DUMDUM.EQ.â€™Yâ€™) THEN PRINT*,â€™ THE NUMBER OF ITERATIONS DESIRED IS: â€™ READ*, NOITS ENDIF PRINT *, â€™ IF THE SOLUTION AFTER ITERATING TO CONVERGENCE CONTAINS N ONE OR MOREâ€™ PRINT *, â€™ NEGATIVE VARIANCE COMPONENT ESTIMATES!!!!â€™ PRINT *, â€™ DO YOU WISH TO RE-SOLVE THE SYSTEM SETTING NEGATIVE EST NIMATES TO ZERO?â€™ PRINT *, â€™ TYPE Y OR N READ(6,1501) DUMB CALL XPRIMX(TEST,BLOCK,SET,F,M,FM) REWIND(13) DO 801 1 = l,NORAN NUMMY(I) = 0 801 CONTINUE 803 DO 50 L=l,NOITS DO 71 1= l,NORAN DUM(I) = SIG(I) 71 CONTINUE WRITE(6,5001) L 5001 FORMATf THIS IS ITERATION NUMBER â€™,13) DO 8001 1= l,NORAN WRITE(6,154) SIGMA,RANNAM(I),SIG(I) 8001 CONTINUE DO 21 1= l,NORAN DO 22 J = l,NORAN VQVQ(I,J)=0.0 22 CONTINUE YQVQY(I) = 0.0 21 CONTINUE DO 51 1= l,NORAN IF(SIG(I).LT.0.0) SIG(I) = 0.0 51 CONTINUE CALL DESIGN REWIND(13) DO 5 1= l,NORAN SOL(I,NORAN+ 1) = YQVQY(I) REML(I) = 0.0 IF(NUMMY(I).EQ. 1) YQVQY(I) = 0.0 DO 6 J = l,NORAN SOL(I,J) = VQVQ(I,J) IF(NUMMY(I).EQ. 1) SOL(I,J) = 0.0 6 CONTINUE 5 CONTINUE 118 CALL L2SWP(S0L,N0RAN,N0RAN +1,1 ,NORAN) DO 7 1= l,NORAN REML(I) = SOL(I,NORAN+ 1) 7 CONTINUE ZAG = 0.0 DO 8 1= l,NORAN ZAG = ZAG + DABS(REML(1)-DUM(I)) 8 CONTINUE DO 9 I=l,NORAN SIG(I) = REML(I) 9 CONTINUE IF(ZAG.LT.CVERG) GO TO 11 50 CONTINUE 11 IF(DUMB.EQ.â€™Nâ€™) GO TO 8025 IF(DUMB.EQ.â€™Yâ€™) THEN LEP=0 DO 851 1= I,NORAN IF(SIG(1).LT.0.0) LEP= 1 IF(SIG(I).LE.0.0) THEN SIG(I) = 0.0 NUMMY(I)= 1 ENDIF 851 CONTINUE ENDIF IF(LEP.EQ.l) GO TO 803 8025 DO 10 1= l,NORAN VARHAT(I) = SIG(1) 10 CONTINUE PRINT *, â€™ WHAT IS THE FILENAME FOR THE VARIANCE COMPONENT OUTPUT N? â€™ READ(6,1503) FLNAME OPEN (2,FILE = FLNAME,STATUS =â€™UNKNOWNâ€™) DO 155 J = l,NORAN WRITE(2,FMT = 154) SIGMA,RANNAM(J),VARHAT(J) 155 CONTINUE 154 FORMAT(1X,A13,A12,F20.6) CLOSE(2) DO 156 1=1,9 IF(RANNAMa).EQ. â€™GCAâ€™) SCALEG = VARHAT(I) IF(RANNAM(I).EQ.â€™SCAâ€™) SCALES = VARH ATÂ® 156 CONTINUE PRINT *, â€™ DO YOU DESIRE GCA PREDICTIONS? (Y OR N) â€™ READ(6,1501) DUMDUM IF(DUMDUM.EQ.â€™Nâ€™) GO TO 704 PRINT *, â€™ DO YOU HAVE A PRIOR ESTIMATE OF GCA VARIANCE TO USE INS NTEADâ€™ PRINT *, â€™ OF THE DATA ESTIMATE? (Y OR N) â€™ READ(6,1501) DUMDUM 119 IF(DUMDUM.EQ.â€™Yâ€™) THEN PRINT *, â€™ WHAT IS THE GCA VARIANCE ESTIMATE YOU WISH TO USE? â€™ READ(6,1502) SCALEG ENDIF DO 157 1= l,NCOLG GCAÂ© = SC ALEG*GC AÂ© 157 CONTINUE PRINT *, â€™ WHAT IS THE FILENAME FOR THE GCA PREDICTION OUTPUT? â€™ READ(6,1503) FLNAME OPEN(4,FILE=FLNAME,STATUS = â€™UNKNOWNâ€™) DO 178 1= l,NCOLG WRITE(4,FMT=703) PARENTÂ©,GCAÂ© 178 CONTINUE 703 FORMAÂ©â€™ GCAâ€™,1X,A8,F20.6) CLOSE(4) 704 IFÂ©UM2.EQ.â€™Hâ€™) GO TO 705 IF(DTERM(5,1).EQ.â€™Nâ€™) GO TO 705 PRINT *, â€™ DO YOU DESIRE SCA PREDICTIONS? (Y OR N) â€™ READ(6,1501) DUMDUM IFÂ©UMDUM.EQ.â€™Nâ€™) GO TO 705 PRINT *, â€™ DO YOU HAVE A PRIOR ESTIMATE OF SCA VARIANCE TO USE INS NTEADâ€™ PRINT *, â€™ OF THE DATA ESTIMATE? (Y OR N) â€™ READ(6,1501) DUMDUM IFÂ©UMDUM.EQ.â€™Yâ€™) THEN PRINT *, â€™ WHAT IS THE SCA VARIANCE ESTIMATE YOU WISH TO USE? â€™ READ(6,1502) SCALES ENDIF DO 169 1= l,NCOLS SCAÂ©=SCALES *SC AÂ© 169 CONTINUE PRINT *, â€™ WHAT IS THE FILENAME FOR THE SCA PREDICTION OUTPUT? â€™ READ(6,1503) FLNAME OPEN(8,FILE = FLN AME,STATUS = â€™UNKNOWNâ€™) DO 171 1= l,NCOLS WRITE(8,FMT=707) FMVEC(I),SCA(I) 171 CONTINUE 707 FORMATf SCAâ€™,IX,A16,F20.6) CLOSE(8) 705 PRINT *, â€™ DO YOU DESIRE FIXED EFFECT ESTIMATES? (Y OR N) â€™ READ(6,1501) DUMDUM IFÂ©UMDUM.EQ.â€™Nâ€™) GO TO 706 PRINT *, â€™ WHAT IS THE FILENAME FOR FIXED EFFECTS ESTIMATES? â€™ READ(6,1503) FLNAME OPEN(9,FILE = FLNAME,STATUS = â€™UNKNOWNâ€™) WRITE(9,FMT=708) BHAT(l) 708 FORMAT(â€™ MUâ€™,T15,F20.6) J=1 120 DO 172 1=1,3 IF(DTERM(I,2).EQ.â€™Fâ€™) THEN DO 173 K=1,NCL(I) J=J+1 IF(I.EQ.l) THEN WRITE(9,FMT=711) LOCO(K),BHAT(J) ENDIF IF(1.EQ.3) THEN WRITE(9,FMT=711) DISSET(K),BHAT(J) ENDIF WRITE(9,FMT=709) NAME(I),K,BHAT(J) 173 CONTINUE ENDIF 172 CONTINUE 711 FORMAT(A8,T15,F20.6) 709 FORMAT(A11,I3,F20.6) CLOSE(9) DO 726 1= l,NOVARG VARG(I) = 0.D0 726 CONTINUE DO 727 1= 1,NVARBH VARBH(I) = 0.D0 727 CONTINUE 706 PRINT DO YOU DESIRE THE ASYMPTOTIC VARIANCE COVARIANCEâ€™ PRINT MATRIX FOR VARIANCE COMPONENTS? (Y OR N) â€™ READ(6,1501) DUMDUM IF(DUMDUM.EQ.â€™Nâ€™) GO TO 751 PRINT WHAT IS THE FILENAME FOR VAR(VC)? â€™ READ(6,1503) FLNAME OPEN(12,FILE = FLNAME,STATUS =â€™UNKNOWNâ€™) WRITE(12,755) 755 FORMAT(â€™ ASYMPTOTIC VARIANCE COVARIANCE MATRIXâ€™,/) DO 752 1= l,NORAN DO 753 J = I,NORAN SOL(I,J) = SOL(I,J)*2.0 WRITE(12,754) RANNAM(I),RANNAM(J),SOL(I,J) 753 CONTINUE 752 CONTINUE 754 FORMAT(A 11 ,T15,A11 ,T30,F20.10) 751 PRINT*,â€™DO YOU DESIRE THE ERROR VARIANCE COVARIANCE MATRIX FOR NGCA? (Y OR N) â€™ READ(6,1501) DUMDUM IF(DUMDUM.EQ.â€™Nâ€™) GO TO 715 PRINT *,â€™ WHAT IS THE FILENAME FOR EVAR(GHAT)? â€™ READ(6,1503) FLNAME OPEN(10,FILE = FLNAME,STATUS =â€™UNKNOWNâ€™) CALL VARX(VARG,VARBH) WRITE( 10,721) 121 K=0 DO 716 I=l,NCOLG DO 717 J=l,NCOLG K=K+1 WRITE(10,718) VARG(K),PARENT(I),PARENT(J) 717 CONTINUE 716 CONTINUE 721 FORMAT(â€™THE ERROR VARIANCE COVARIANCE MATRIX FOR GCA ARRAYED NAS A VECTORâ€™,/) 718 FORMAT(F20.10,T25,A8,T35,A8) CLOSE(IO) 715 PRINT *, â€™ DO YOU DESIRE THE VARIANCE COVARIANCE MATRIX FOR FIXED N EFFECTS? (Y OR N) â€™ READ(6,1501) DUMB IF(DUMB.EQ.â€™Nâ€™) GO TO 719 IF(DUMDUM.EQ.â€™Nâ€™) CALL VARX(VARG,VARBH) PRINT *, â€™ WHAT IS THE FILENAME FOR VAR(BETAHAT)? â€™ READ(6,1503) FLNAME OPEN(ll,FILE = FLNAME,STATUS =â€™UNKNOWNâ€™) K = 0 DO 723 1= 1,NCLFIX DO 724 J = I,NCLFIX K = K+ 1 WRITE(11,722) VARBH(K) 724 CONTINUE 723 CONTINUE 722 FORMAT(F20.10) CLOSE(ll) 719 STOP END c******************************************************* C SUBROUTINE L2SWP SWPS THE DESIGNATED COLUMNS OF A MATRIX X AND C RETURNS THE SWEPT MATRIX AS X SUBROUTINE L2SWP(X,NROWX,NCOLXX,NSTA,NEND) INTEGER NROWX,NCOLXX,NSTA,NEND,NTOT DOUBLE PRECISION X(9,10), DMIN, D, B, BB(10) C NSWP DEFINES THE PIVOT COLUMNS FOR SWP DMIN= IE-8 C IF LESS THAN FULL RANK MATRICES ARE ENCOUNTERED, DMIN MUST BE C EMPLOYED C TO ZERO THE ROW AND COLUMN ASSOCIATED WITH THE DEPENDENCY TO C PRODUCE A GENERALIZED INVERSE DO 10 K = NSTA,NEND D = X(K,K) IF (D.LE.DMIN) THEN DO 21 1= l,NROWX DO 22 J=l,NCOLXX X(I,K) = 0.0 122 X(K,J) = 0.0 22 CONTINUE 21 CONTINUE GO TO 10 ENDIF DO 20 J = l,NCOLXX X(K,J) = X(K,J)/D 20 CONTINUE DO 30 I = K+ l,NROWX C 1 SHOULD BE INCREMENTED SO THAT I IS NOT EQUAL TO K B = X(I,K) DO 40 L= l,NCOLXX X(I,L) = X(I,L)-B*X(K,L) 40 CONTINUE X(I,K) = -B/D 30 CONTINUE X(K,K)= 1/D C BACKWARD ELIMINATION NTOT = NSTA + NEND IF(NTOT.EQ.2) GO TO 61 C SAVING ABOVE DIAGONAL ENTRIES FOR MULTIPLICATION WEIGHTS KK= 1 DO 12 J = 1,K-1 BB(KK) = X(J,K) KK=KK+1 12 CONTINUE C ZEROING ABOVE DIAGONAL ENTRIES FOR INSERTION OF INVERSE VALUES DO 13 1=1, K-l X(I,K) = 0.0 13 CONTINUE C DOING ROW OPERATIONS TO CREATE ABOVE DIAGONAL ENTRIES FOR INVERSE N= 1 DO 70 M = 1,K-1 B = BB(N) N = N+1 DO 80 J = l,NCOLXX X(M,J) = X(M,J)-B*X(K,J) 80 CONTINUE 70 CONTINUE 10 CONTINUE 61 RETURN END C DESIGN CREATES DESIGN MATRICES FOR MAIN EFFECTS AND INTERACTIONS C AND FORMS THE NORMAL EQUATIONS SUBROUTINE DESIGN PARAMETER ( 123 N NOBSER = 5000, N NOBL = 36, N NOCR=75, N NOBH = 200, N NOGCA = 50, N NOX= 1400, N NOCBS= 1000, N NTOT = NOX + NOCBS, N NIZED = NOX*NOCBS, N NIXPX = ((NOX*(NOX-1 ))/2) + NOX, N NSIP = NOX + NOCBS, N NIZEP=((NSIP*(NSIP-1 ))/2) + NSIP) COMMON/CMN1/ NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, N NCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, N NCLRAN,NCOLSE,NRAN(9) COMMON/CMN2/ N YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), N BHAT(NOBH),SCA(NOCR) COMMON/CMN3/ DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), N PARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) DIMENSION TK(:),D(:),P(:),TRACER(9) ALLOCATABLE :: TK,D,P INTEGER NCOLT,NCOLTB,NCOLG,NCOLS,NOBS,NCOLGT,NCOLST,NVEC, NNCOLCB,NCOLX,NSTA,NEND,NCOLRD,NSTAK,NENDK,NWNUMl, NWNUM2, NINUM, NNWNUM3,NCL,NBIG,NORAN,NOFIX,NRAN,NCLFIX,NCLRAN,NDFIX,NCOLSE, NNODUM DOUBLE PRECISION TK,MEAN,D,TR,TRACER,YQVQY,VQVQ,SIG,GCA, N BHAT,P,SUB,SCA CHARACTER* 1 DTERM,DUM2 CHARACTER*8 PARENT,LOCO,REP,DISSET CHARACTER*16 FMVEC CHARACTER* 11 RANNAM NBIG = NRAN(NORAN) DO 1012 1= l,NORAN-l IF(NRAN(I).GT.NBIG) NBIG = NRAN(I) 1012 CONTINUE NWNUM1 = (NCOLX*(NCOLX-l))/2 + NCOLX NWNUM2 = NCOLX*NBIG NWNUM3 = ((NCOLX + NBIG)*(NCOLX + NBIG-1 ))/2 + NCOLX + NBIG ALLOCATE (TK(NWNUM1),D(NWNUM2),P(NWNUM3)) READ(13) TK DO 10 1= 1,NWNUM 1 TK(I)=TK(I)/SIG(NORAN) 10 CONTINUE DO 11 I=1,NWNUM2 D(I) = 0.0 11 CONTINUE 124 DO 12 1= 1,NWNUM3 P(I)=0.0 12 CONTINUE Â£**************************************************************** C FORMING THE MATRIX TO BE SWP TO PRODUCE YQVQY AND VQVQ Â£**************************************************************** Â£****************************************************************** C TK = Xâ€˜*INV(VK)*X COMPLETED ^-â€™*:C**********;t:**********:t:****************************************** NSTAK = 2 + NCLFIX + NCLRAN DO 1300 INUM = l,NORAN-l NODUM = NORAN-INUM NCOLRD = NCOLX + NRAN(NODUM) NSTAK = NSTAK-NRAN(NODUM) NENDK = NSTAK + NRAN(NODU M)-1 DO 251 I = NSTAK,NENDK M = NVEC(I, NCOLX) II = I-NSTAK+ 1 N = NVEC(II,NCOLRD) NN = N DO 252 J = I,NENDK M = M+ 1 NN=NN+1 P(NN)=TK(M)*SIG(NODUM) 252 CONTINUE P(N+ 1) = P(N+ 1) + 1.0 251 CONTINUE C R = I + SIG(I)*(Ziâ€˜*INV(VK)*Zi) HAS BEEN FORMED K = 0 DO 254 J = NSTAK,NENDK DO 255 1=1, NCOLX K = K+ 1 IF(J.LT.I) THEN D(K)=0.0 GO TO 255 ENDIF M = NVEC(I, NCOLX) M = M + J-I+ 1 D(K)=TK(M)*SQRT(SIG(NODUM)) 255 CONTINUE 254 CONTINUE DO 222 I = NSTAK,NENDK N = NVEC(I,NCOLX) II = I-NSTAK+ 1 NN = NCOLX*(II-l) DO 223 J = I,NCOLX o o 125 M = N+J-I+ 1 K = NN +J D(K)=TK(M)*SQRT(SIG(NODUM)) 223 CONTINUE 222 CONTINUE Â£******:t::|::t::t::*:***:t::t:**:t:*:|:***:*::t::t:*:t::t::t::t:********************:t:******** C D=Ziâ€˜*INV(VK)*X*SQRT(SIG(I)) HAS BEEN FORMED ^s************************************************************ ^************************************************************* TD = Dâ€˜ :|::t:*:|:***:t::|:***:|::t:****:t::lc:t::t:**:t:***********:|::|:*******:t::t::t:****:t:*******:fc K=0 NEND = NRAN(NODUM) DO 22 1= l,NRAN(NODUM) N = NVEC(I,NCOLRD) DO 23 J = NRAN(NODUM)+ l,NCOLRD K = K+ 1 M = N+J-I+ 1 P(M) = D(K) 23 CONTINUE 22 CONTINUE DO 25 I = NRAN(NODUM)+ l,NCOLRD K = NVEC(I,NCOLRD) II = I-NRAN(NODUM) M = NVEC(II,NCOLX) DO 26 J = I,NCOLRD K = K+ 1 M = M+ 1 P(K)=TK(M) 26 CONTINUE 25 CONTINUE C P = (R| Â¡D)//(TDÂ¡ ]TK) CALL VECSWP(P,NCOLRD,NCOLRD, 1 ,NRAN(NODUM)) K = 0 DO 226 1= l,NCOLX II = I + NRAN(NODUM) M = NVEC(II,NCOLRD) DO 227 J = I,NCOLX K = K+ 1 M = M+ 1 TK(K) = P(M) 227 CONTINUE 226 CONTINUE 1300 CONTINUE K=0 DO 826 I=l,NCOLX 126 II = I + NRAN(1) M = NVEC(II,NCOLRD) DO 827 J = I,NCOLX K = K+ 1 M = M+ 1 TK(K) = P(M) 827 CONTINUE 826 CONTINUE NDFIX = NCLFIX CALL VC2SWP(TK,NCOLX,NCOLX, 1,NCLFIX,NDFIX) Â£****************************************************************** C PORTIONS OF TK ARE SELECTED AND MULTIPLIED AND THE TRACE CALCU- C LATED TO FORM VQVQ Â£**************************************************************** Â£*****************cQLjj[^j[sj j yQ NORAN-1 OF VQVQ********************* NEND=1 +NCLFIX DO 841 J = l.NORAN-l NSTA = NEND+ 1 NEND = NSTA + NRAN(J)-1 TR = 0.0 NSTAK=NEND+1 DO 838 1= J,NORAN-l IF(LEQ.J) THEN DO 828 11 = NSTA,NEND N = NVEC(II,NCOLX) DO 830 K = II,NEND M = N + K-II +1 IF(II.EQ.K) THEN TR=TR + TK(M)*TK(M) GO TO 830 ENDIF TR = TR + 2*TK(M)*TK(M) 830 CONTINUE 828 CONTINUE VQVQ(J,I)=TR GO TO 838 ENDIF NENDK = NSTAK + NRAN(I)-1 TR = 0.0 DO 833 L = NSTA,NEND N = NVEC(L,NCOLX) DO 835 K = NSTAK,NENDK M = N + K-L+ 1 TR=TR + TK(M)*TK(M) 835 CONTINUE 833 CONTINUE NSTAK=NENDK+1 VQVQ(J,I)=TR 127 838 CONTINUE 841 CONTINUE ^***************^QI^^j[^[^ noran of vqvq**************************** DO 932 1= 1,NORAN-1 TRACER(I) = 0.0 DO 933 J = I,NORAN-l VQVQ(J,1) = VQVQ(I,J) 933 CONTINUE 932 CONTINUE NSTA = 2 + NCLFIX DO 935 J = l,NORAN-1 NEND = NSTA + NRAN(J)-1 DO 934 1 = NSTA,NEND N = NVEC(I,NCOLX) N = N+1 TRACER(J)=TRACER(J) + TK(N) 934 CONTINUE NSTA = NEND +1 935 CONTINUE DO 938 1= l,NORAN-l VQVQ(I,NORAN)=TRACER(I) 938 CONTINUE SUB = 0.0 DO 936 1= l,NORAN-l SUB = SUB + TRACER(I)*SIG(I) DO 937 J = l,NORAN-l VQVQfl,NORAN) = VQVQ(I,NORAN)-(SIG(J)*VQVQ(I,J)) 937 CONTINUE VQVQ(I,NORAN) = VQVQ(I,NORAN)/SIG(NORAN) 936 CONTINUE NSTAK=NOBS-NDFIX TR = FLOAT(NSTAK) VQVQ(NORAN,NORAN) = (TR-SUB)/SIG(NORAN) DO 940 1= 1,NORAN-1 VQVQ(NORAN,NORAN) = VQVQ(NORAN,NORAN)-(SIG(I)*VQVQ(I,NORAN)) 940 CONTINUE VQVQ(NORAN,NORAN) = VQVQ(NORAN,NORAN)/SIG(NORAN) DO 941 1= l,NORAN-l VQVQ(NORAN,I) = VQVQ(I,NORAN) 941 CONTINUE c*************F0RMiNg VECTOR OF FIXED EFFECTS ESTIMATES********* DO 951 I=1,NCLFIX N = NVEC(I,NCOLX) N = N + NCLFIX-I+2 BHAT(I)=TK(N) 951 CONTINUE Â£*************FQRÂ¡mjvjQ VECTORS OF predictions************** DO 952 1 = 1,9 128 IF(RANNAM(I) EQ.â€™GCAâ€™) THEN NSTA = I GO TO 953 ENDIF 952 CONTINUE GO TO 955 953 NEND = 0 DO 954 1= 1,NSTA-1 NEND = NEND + NRAN(I) 954 CONTINUE L=NEND+1 N = NVEC(NCLFIX + 1 ,NCOLX) L = L + N DO 955 1= l,NCOLG L = L+ 1 GCA(I) = TK(L) 955 CONTINUE DO 962 1=1,9 IF(RANNAM(I).EQ.â€™SCAâ€™) THEN NSTA = I GO TO 963 ENDIF 962 CONTINUE GO TO 965 963 NEND=0 DO 964 1= 1,NSTA-1 NEND = NEND + NRAN(I) 964 CONTINUE L=NEND+1 N = NVEC(NCLFIX + 1 ,NCOLX) L=L + N DO 965 1 = l.NCOLS L = L+ 1 SCA(I) = TK(L) 965 CONTINUE Â£***:t:*********pQJ^[yjJJ^Q YQVQY*********************** *************** NSTA = NCLFIX + 2 NEND = NSTA + NRAN(l) -1 N = NVEC(NCLFIX + 1 ,NCOLX) DO 926 J= l,NORAN-l DO 925 I = NSTA,NEND M = N + I-NCLFIX YQVQY(J) = YQVQY(J) + TK(M)*TK(M) 925 CONTINUE NSTA = NEND+ 1 NEND = NSTA + NRAN(J+ 1)-1 926 CONTINUE NSTA = NVEC(NCLFIX + l,NCOLX)+ 1 129 YQVQY(NORAN)=TK(NSTA) DO 927 1= l,NORAN-l YQVQY(NORAN) = YQVQY(NORAN)-(SIG(I)*YQVQY(I)) 927 CONTINUE YQVQY(NORAN) = YQVQY(NORAN)/SIG(NORAN) DEALLOCATE (TK,D,P) RETURN END c**************************************************************** C THIS FUNCTION COUNTS THE NUMBER OF ENTRIES FOR AN EFFECT SUBROUTINE NOCOL(VEC,OBS,VEC1 ,NCOL) PARAMETER ( N NOBSER = 5000) INTEGER OBS,NCOL CHARACTER*8 VEC(NOBSER),VEC 1 (*),Z,X,NT DO 11 1=1, OBS IFa.EQ.l) THEN vEcia)=vEca) NCOL=1 GO TO 11 ENDIF DO 12 J = l,NCOL X = VECa) Z = VEC 1 (J) IF(X.EQ.Z) GOTO 11 12 CONTINUE NCOL=NCOL+ 1 VEC 1 (NCOL) = VEC(I) 11 CONTINUE DO 159 K= l,NCOL-l N = K +1 DO 159 J = N,NCOL IF(VEC1(K).LT.VEC1(J)) GO TO 159 NT=VEC1(K) VEC 1 (K) = VEC 1 (J) VEC1(J) = NT 159 CONTINUE RETURN END c**************************************************************** C THIS FUNCTION COUNTS THE NUMBER OF ENTRIES FOR PARENTS SUBROUTINE NOPAR(VECl,VEC2,OBS,VEC3,NPAR) PARAMETER ( N NOBSER = 5000, N NOGCA = 50) INTEGER OBS,NPAR CHARACTER*8 VECl(NOBSER),VEC2(NOBSER),VEC3(NOGCA),Y,Z,X,NT DO 11 1=1,OBS 130 IFfl.EQ.l) THEN VEC3(I) = VEC1(I) VEC3(I+1) = VEC2(I) NPAR=2 GO TO 11 ENDIF DO 12 J = 1,NPAR X = VEC1(I) Z = VEC3(J) IF(X.EQ.Z) GO TO 15 12 CONTINUE NPAR = NPAR + 1 VEC3(NP AR) = VEC1 (I) 15 DO 13 K= 1,NPAR Y = VEC2(1) Z = VEC3(K) IF(Y.EQ.Z) GOTO 11 13 CONTINUE NPAR = NPAR + 1 VEC3(NPAR) = VEC2(I) 11 CONTINUE DO 159 K=1,NPAR-1 N = K+ 1 DO 159 J = N,NPAR IF(VEC3(K).LT.VEC3(J)) GO TO 159 NT=VEC3(K) VEC3(K) = VEC3(J) VEC3(J) = NT 159 CONTINUE RETURN END ****%:$: %%***%: ****%: *$:**$: it*** y:*** y:*** C**VECSWP PRODUCES A G2 INVERSE OF A SYMMETRIC MATRIX STORED AS** ^*^*^4:***********4:****4:* ^ VECTOR * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * c****************************************************************** SUBROUTINE VECSWP(VEC,NROWX,NCOLXX,NSTA,NEND) PARAMETER ( N NOBSER = 5000, N NOBL = 36, N NOCR = 75, N NOBH = 200, N NOGCA = 50, N NOX= 1400, N NOCBS = 1000, N NTOT=NOX + NOCBS, N NIZED = NOX*NOCBS, N NIXPX = ((NOX*(NOX-1 ))/2) + NOX, N NSIP = NOX + NOCBS, N NIZEP=((NSIP*(NSIP-l))/2) + NSIP) DIMENSION VEC(*),V(:) ALLOCATABLE V INTEGER NROWX,NCOLXX,NSTA,NEND,NUMB,NVEC,V,NUMl,NUM2 DOUBLE PRECISION VEC,DMIN,D,B,C ALLOCATE (V(NCOLXX)) DMIN= 1.0D-8 DO 9 1= l,NCOLXX va)=i 9 CONTINUE DO 10 K = NSTA,NEND NUM2 = -(K*(K-3))/2 + NCOLXX*(K-l) NUMB = NUM2-1 D = VEC(NUM2) IF (DABS(D).LE.DMIN) THEN DO 22 1=1,K IF(I.EQ.K) THEN NUM2 = -(I*(I-3))/2 + NCOLXX*(I-l) GO TO 53 ENDIF NUM2 = -(I*(I-1 ))/2 + K+NCOLXX*(I-1) 53 VEC(NUM2) = 0.0 22 CONTINUE NUM2 = NUMB +1 DO 21 J = K+ l,NCOLXX NUM2 = NUM2 + 1 VEC(NUM2) = 0.0 21 CONTINUE GO TO 10 ENDIF DO 23 1= l,NROWX IF(I.EQ.K) GO TO 23 NUM1 = NVEC(I,NCOLXX) IF(I.LT.K) THEN NUM2 = NUM1 +K-I+1 B = VEC(NUM2)/D GO TO 27 ENDIF NUM2 = NUMB + I-K+ 1 B = (FLOAT(V(I))*FLOAT(V(K))*VEC(NUM2))/D 27 IF(DABS(B).LT.(1.0D-20)) GO TO 23 DO 24 J = I,NCOLXX IF(J.EQ.K) GO TO 24 IF(K.LT.J) THEN NUM2 = NUMB + J-K+1 C = VEC(NUM2) GO TO 28 ENDIF 132 NUM2 = -(J*(J-l))/2 + K + NCOLXX*(J-l) C = FL0AT(V(J))*FL0AT(V(K))*VEC(NUM2) 28 IF(DABS(C).LT.(1.0D-20)) GO TO 24 NUM2 = NUM1 + J -I +1 VEC(NUM2) = VEC(NUM2)-(B*C) 24 CONTINUE 23 CONTINUE DO 26 J = K,NCOLXX NUM2=NUMB +J-K+1 VEC(NU M2) = VEC(NUM2)/D 26 CONTINUE DO 25 1=1, K IF(I.EQ.K) THEN NUM2 = -(I*(I-3))/2 + NCOLXX*(I-1) GO TO 54 ENDIF NUM2 = -(I*(I-l))/2 + K + NCOLXX*(I-l) 54 VEC(NUM2) = -VEC(NUM2)/D 25 CONTINUE VEC(NUMB+ 1)= I/D V (K) = -V (K) 10 CONTINUE DEALLOCATE (V) RETURN END C**VC2SWP PRODUCES A G2 INVERSE OF A SYMMETRIC MATRIX STORED AS** it:*** ;}c *** Â¡Je** ^ VECTOR. * * * * * * * * * * * * * * * * * * ********* ***** * Â£Â«****************************************************************** SUBROUTINE VC2SWP(VEC,NROWX,NCOLXX,NSTA,NEND,NDF) PARAMETER ( N NOBSER = 5000, N NOBL = 36, N NOCR = 75, N NOBH = 200, N NOGCA = 50, N NOX= 1400, N NOCBS= 1000, N NTOT=NOX + NOCBS, N NIZED = NOX*NOCBS, N NIXPX = ((NOX*(NOX-l))/2) + NOX, N NSIP = NOX + NOCBS, N NIZEP = ((NSIP*(NSIP-l))/2) + NSIP) DIMENSION VEC(*),V(:) ALLOCATABLE V INTEGER NROWX,NCOLXX,NSTA,NEND,NUMB,NVEC,V,NUMl,NUM2,NDF DOUBLE PRECISION VEC,DMIN,D,B,C DMIN= 1.0D-8 133 ALLOCATE (V(NCOLXX)) DO 9 1= l,NCOLXX V(I)=1 9 CONTINUE DO 10 K = NSTA,NEND NUM2 = -(K*(K-3))/2 + NCOLXX*(K-1) NUMB = NUM2-1 D = VEC (NU M2) IF (DABS(D).LE.DMIN) THEN NDF = NDF-1 DO 22 1=1,K IF(I.EQ.K) THEN NUM2 = -(I*(I-3))/2 + NCOLXX*(I-1) GO TO 53 ENDIF NUM2 = -(I*(I-l))/2 + K + NCOLXX*(I-l) 53 VEC(NUM2) = 0.0 22 CONTINUE NUM2 = NUMB +1 DO 21 J = K+l,NCOLXX NUM2 = NUM2 + 1 VEC (NU M2) = 0.0 21 CONTINUE GO TO 10 ENDIF DO 23 1= l,NROWX IF(I.EQ.K) GO TO 23 NUM1 = NVEC(I,NCOLXX) IF(I.LT.K) THEN NUM2 = NUM1 +K-I+1 B = VEC(NUM2)/D GO TO 27 ENDIF NUM2 = NUMB + I-K+1 B = (FLOAT(V(I))*FLOAT(V(K))*VEC(NUM2))/D 27 IF(DABS(B).LT.(1.0D-20)) GO TO 23 DO 24 J = I,NCOLXX IF(J.EQ.K) GO TO 24 IF(K.LT.J) THEN NUM2 = NUMB+J-K+ 1 C = VEC(NUM2) GO TO 28 ENDIF NUM2 = -(J *(J-1 ))/2 + K + NCOLXX*(J-1) C = FLOAT(V(J))*FLOAT(V(K))*VEC(NUM2) 28 IF(DABS(C).LT.(1.0D-20)) GO TO 24 NUM2 = NUM1 + J -I +1 VEC(NUM2) = VEC(NUM2)-(B*C) 134 24 CONTINUE 23 CONTINUE DO 26 J = K,NCOLXX NUM2 = NUMB +J-K+1 VEC(NUM2) = VEC(NUM2)/D 26 CONTINUE DO 25 1 = 1,K IF(I.EQ.K) THEN NUM2 = -a*a-3))/2 + NCOLXX*(I-l) GO TO 54 ENDIF NUM2 = -(I*(I-1 ))/2 + K + NCOLXX*(I-l) 54 VEC(NUM2) = -VEC(NUM2)/D 25 CONTINUE VEC(NUMB+ 1) = 1/D V (K) = -V (K) 10 CONTINUE DEALLOCATE (V) RETURN END Q********************************************************** C******NVEC COUNTS THE PROPER POSITION OF AN ELEMENT******* C*********IN THE HALF STORED MATRIX (AS A VECTOR)********** C*******ACC0RDING TO ITS NORMAL ROW COLUMN POSITION******** Â£*****************jjsj Tj-jj? ORIGINAL \jATRIX******************* FUNCTION NVEC(NROWS,NCOLXX) INTEGER NROWS,NCOLXX,NVEC M = 0 DO 3 1= l,NROWS IFa.EQ.l) GO TO 3 M = M + NCOLXX - (1-2) 3 CONTINUE NVEC = M RETURN END SUBROUTINE XPRIMX(TEST,BLOCK,SET,F,M,FM) PARAMETER ( N NOBSER = 5000, N NOBL = 36, N NOCR=75, N NOBH = 200, N NOGCA = 50, N NOX= 1400, N NOCBS= 1000, N NTOT=NOX + NOCBS, N NIZED = NOX*NOCBS, 135 N NIXPX = ((N0X*(N0X-l))/2) + NOX, N NSIP = NOX + NOCBS, N NIZEP = ((NSIP*(NSIP-1 ))/2) + NSIP) COMMON/CMN1/ NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, N NCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, N NCLRAN,NCOLSE,NRAN(9) COMMON/CMN2/ N YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), N BHAT(NOBH),SCA(NOCR) COMMON/CMN3/ DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), N PARENT(NOGCA),LOCO(10),REP(NOBL),DISSET(10) DIMENSION X(:,:),DBLOCK(:,:),LOC(5,2), N NULVEC(NOBSER),XPX(:) ALLOCATABLE :: DBLOCK,XPX,X INTEGER X,DBLOCK,NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST, N NOBS,NCOLB,NCOLX,NCOLCB,NUMl,NCL,NORAN,NOFIX, N NCLFIX,NCLRAN,NCOLSE,MLV,LOC,NRAN,NMISS,NULVEC DOUBLE PRECISION XPX,YQVQY,VQVQ,MEAN,SIG,ZIP,ZAP,GCA,BHAT,SCA CHARACTER* 1 DTERM,DUM2 CHARACTER*8 PARENT,LOCO,REP,DISSET,TEST(NOBSER),BLOCK(NOBSER), N SET(NOBSER),F(NOBSER),M(NOBSER) CHARACTER*16 FMVEC,FM(NOBSER) CHARACTER*11 RANNAM ALLOCATE (X(NOBS,NCOLX),DBLOCK(NOBS,NCOLB)) PRINT*, â€™ ********FORMING THE DESIGN MATRIX**********â€™ J = 0 DO 12001=1,8 IF((NCL(I).GT.0).AND.(DTERM(I,2).EQ.â€™Râ€™)) THEN J=J+1 NRAN(J) = NCL(I) ENDIF 1200 CONTINUE DO 47 1=1,NOBS DO 127 K=l,NCOLB DBLOCK(I,K) = 0 127 CONTINUE DO 48 J = l,NCOLX xa,J)=o 48 CONTINUE 47 CONTINUE DO 31 1=1,NOBS X(I,D=1 31 CONTINUE MLV = 1 IF((DTERM(l,l).EQ.â€™Nâ€™).OR.(DTERM(l,2).EQ.â€™Râ€™)) GO TO 1101 DO 1001 1=1,NOBS C FORMING DESIGN MATRIX FOR TEST DO 5504 J = l,NCOLT 136 IF(TEST(I).EQ.LOCO(J)) THEN NJ=J + MLV GO TO 5505 ENDIF 5504 CONTINUE 5505 X(I,NJ) = 1 1001 CONTINUE LOC(l,l) = MLV+1 MLV = MLV + NCOLT LOC(l,2) = MLV C FORMING DESIGN MATRIX FOR BLOCK 1101 IF((DTERM(2,l).EQ.â€™Nâ€™).OR.(DTERM(2,2).EQ.â€™Râ€™)) GO TO 1102 DO 1002 1=1,NOBS DO 5501 J = l,NCOLB IF(BLOCK(I)â– EQ.REP(J)) THEN NK=J GO TO 5502 ENDIF 5501 CONTINUE 5502 DBLOCK(I,NK) = 1 1002 CONTINUE NSTA = LOC(l,l) NEND=LOC( 1,2) IF(DTERM(1,1).EQ.â€™Nâ€™) THEN NSTA= 1 NEND=1 ENDIF DO 136 1=1,NOBS L = MLV+ 1 DO 137 J = NSTA,NEND DO 138 K=l,NCOLB X(I,L) = X(I,J)*DBLOCK(I,K) L = L+ 1 138 CONTINUE 137 CONTINUE 136 CONTINUE LOC(2,l) = MLV+1 MLV = MLV + NCOLTB LOC(2,2) = MLV 1102 IFODTERMaO.EQ.â€™N^.OR.CDTERM^.EQ.â€™Râ€™)) GOTO 1103 DO 1003 1 = 1,NOBS DO 5506 J = l,NCOLSE IF(SET(I).EQ.DISSET(J)) THEN NK=J + MLV GO TO 5507 ENDIF 5506 CONTINUE 5507 X(I,NK)=1 1003 CONTINUE LOC(3,1) = MLV +1 MLV = MLV + NCOLSE LOC(3,2) = MLV 1103 MLV = MLV+1 IF((DTERM(l,l).EQ.â€™Nâ€™).OR.(DTERM(l,2).EQ.â€™Fâ€™)) GO TO 2101 DO 2001 1=1,NOBS C FORMING DESIGN MATRIX FOR TEST DO 5508 J = l,NCOLT IF(TEST(I).EQ.LOCO(J)) THEN NJ=J + MLV GO TO 5509 ENDIF 5508 CONTINUE 5509 X(I,NJ) = 1 2001 CONTINUE LOC(l,l) = MLV+1 MLV = MLV + NCOLT LOC(l,2) = MLV C FORMING DESIGN MATRIX FOR BLOCK 2101 IF((DTERM(2,l).EQ.â€™Nâ€™).OR.(DTERM(2,2).EQ.â€™Fâ€™)) GO TO 2102 DO 2002 1=1,NOBS DO 5510 J=l,NCOLB IF(BLOCK(I).EQ.REP(J)) THEN NK=J GO TO 5511 ENDIF 5510 CONTINUE 5511 DBLOCK(I,NK) = 1 2002 CONTINUE NSTA = LOC(l,l) NEND = LOC(l ,2) IF(DTERM(1,1).EQ.â€™Nâ€™) THEN NEND=1 NSTA= 1 ENDIF DO 36 1=1,NOBS L = MLV + 1 DO 37 J = NSTA,NEND DO 38 K=l,NCOLB X(I,L) = X(I,J)*DBLOCK(I,K) L=L+1 38 CONTINUE 37 CONTINUE 36 CONTINUE LOC(2,l) = MLV+ 1 MLV = MLV + NCOLTB LOC(2,2) = MLV 2102 IFODTERMO.O.EQ.â€™NO.OR.CDTERM^.EQ.â€™Fâ€™)) GO TO 2103 DO 2003 1=1, NOBS DO 5512 J=l,NCOLSE IF(SET(1).EQ.DISSET(J)) THEN NK=J + MLV GO TO 5513 ENDIF 5512 CONTINUE 5513 X(I,NK)=1 2003 CONTINUE LOC(3,l) = MLV + 1 MLV = MLV + NCOLSE LOC(3,2) = MLV C FORMING DESIGN MATRIX FOR GCA 2103 IF(DTERM(4,1).EQ.â€™Nâ€™) GO TO 2104 DO 2004 1=1,NOBS DO 5514 J=l,NCOLG IF(F(I).EQ.PARENT(J)) THEN NL=J + MLV GO TO 5515 ENDIF 5514 CONTINUE 5515 X(I,NL) = 1 IF(DUM2.EQ.â€™Hâ€™) GO TO 2004 DO 5516 K=l,NCOLG IF(M(I).EQ.PARENT(K)) THEN NN=K+MLV GO TO 5517 ENDIF 5516 CONTINUE 5517 X(I,NN) = 1 2004 CONTINUE LOC(4,l) = MLV +1 MLV = MLV + NCOLG LOC(4,2) = MLV 2104 IF(DTERM(5,1).EQ.â€™Nâ€™) GO TO 2105 NSTA = MLV DO 34 1=1,NOBS DO 35 J = l,NCOLS IF(FM(I).EQ.FMVEC(J)) THEN X(I,J + NSTA)= 1 GO TO 34 ENDIF 35 CONTINUE 34 CONTINUE LOC(5,l) = MLV+ 1 MLV = MLV + NCOLS LOC(5,2) = MLV 139 2105 IF((DTERM(6,l).EQ.â€™Nâ€™).OR.(DTERM(l,l).EQ.â€™Nâ€™)) GO TO 2106 NSTA= LOC(l,l) NEND=LOC(l,2) NSTAK= LOC(4,l) NENDK = LOC(4,2) DO 49 1=1,NOBS L = MLV+ 1 DO 39 J = NSTA,NEND DO 40 K = NSTAK,NENDK X(I,L) = X(I,J)*X(I,K) L = L+ 1 40 CONTINUE 39 CONTINUE 49 CONTINUE MLV = MLV + NCOLGT 2106 IF((DTERM(7,l).EQ.â€™Nâ€™).OR.(DTERM(l,l).EQ.â€™Nâ€™)) GO TO 2107 NSTAK = LOC(5,l) NENDK = LOC(5,2) DO 41 1=1,NOBS L = MLV + 1 DO 42 J = NSTA,NEND DO 43 K = NSTAK,NENDK X(I,L) = X(I,J)*X(I,K) L=L+ 1 43 CONTINUE 42 CONTINUE 41 CONTINUE MLV = MLV + NCOLST 2107 IF((DTERM(8,l).EQ.,N,).OR.(DTERM(2,l).EQ.â€™Nâ€™)) GO TO 2108 NSTA = LOC(2,l) NEND = LOC(2,2) NSTAK = LOC(5,l) NENDK = LOC(5,2) IF(DUM2.EQ.â€™Hâ€™) THEN NSTAK = LOC(4,l) NENDK = LOC(4,2) ENDIF DO 44 1=1,NOBS L = MLV+ 1 DO 45 J = NSTA,NEND DO 46 K = NSTAK,NENDK xa,L)=xa,j)*xa,K) L = L+1 46 CONTINUE 45 CONTINUE 44 CONTINUE ^***************************^************************************* C X = MUÂ¡ |HT| |TJ Â¡TBÂ¡ |G| Â¡SÂ¡ Â¡ GTÂ¡ Â¡ STÂ¡ |CB COMPLETED 140 ^***************************************************************** DEALLOCATE (DBLOCK) PRINT*, â€™*******FINISHED FORMING THE DESIGN MATRIX**********â€™ PRINT*, â€™*******NOW CHECKING FOR NULL COLUMNS***************â€™ 2108 NEND = NCLFIX + 1 NMISS=0 DO 3001 K= l,NORAN-l NSTA = NEND+ 1 NEND = NSTA + NRAN(K)-1 DO 3002 J = NSTA,NEND DO 3003 1=1,NOBS IF(X(I,J).NE.0) GO TO 3002 3003 CONTINUE NRAN(K) = NRAN(K)-1 NMISS = NMISS+ 1 NULVEC(NMISS)=J 3002 CONTINUE 3001 CONTINUE PRINT*,â€™***********FINISHED CHECKING FOR NULL COLUMNS*********â€™ WRITE(6,3006) NMISS 3006 FORMATf THERE WERE â€™,14,â€™ NULL COLUMNSâ€™) IF(NMISS.EQ.O) GO TO 3011 PRINT *5>***********NOW DELETING NULL COLUMNS****************â€™ NULVEC(NMISS+ l) = NCOLX+ 1 L = NULVEC(1) DO 3021 1=1,NMISS IF((NULVEC(I+ 1)-NULVEC(I)).EQ. 1) GO TO 3021 DO 3022 J = NULVEC(I)+ 1,NULVEC(I+ 1)-1 DO 3023 K=l,NOBS X(K,L)=X(K,J) 3023 CONTINUE L = L+ 1 3022 CONTINUE 3021 CONTINUE 3011 NCLRAN = NCLRAN-NMISS NCOLX = NCOLX-NMISS NUM1 =(NCOLX*(NCOLX-l))/2 + NCOLX ALLOCATE (XPX(NUMl)) DO 10 1= 1,NUM1 XPXa) = 0.0 10 CONTINUE PRINT*,â€™**********FORMING DOT PRODUCTS OF DESIGN COLUMNS*******â€™ DO 15 1=1,NCOLX N = NVEC(I, NCOLX) DO 16 J = I,NCOLX N = N+ 1 DO 17 K=l,NOBS XPX(N) = XPX(N) + (FLOAT(X(K,I))*FLOAT(X(K,J))) 141 17 CONTINUE 16 CONTINUE 15 CONTINUE PRINT*,â€™********FORMING DOT PRODUCTS OF DESIGN COLUMNS AND THE D NATA VECTOR********â€™ L=NCLFIX+ 1 DO 6 J= l,NCOLX IF(J.LE.NCLFIX) THEN N = NVEC(J,NCOLX) N = N + NCLFIX + 2-J ENDIF IF (J.GT.NCLFIX) THEN N = NVEC(L,NCOLX) N = N+J-NCLFIX ENDIF DO 7 K= l,NOBS ZAP=FLOAT(X(K,J)) ZIP = MEAN(K) IF(J.EQ.L) ZAP = MEAN(K) XPX(N) = XPX(N) + (ZIP*Z AP) 7 CONTINUE 6 CONTINUE PRINT*,â€™*******ALL DOT PRODUCTS HAVE NOW BEEN FORMED********â€™ PRINT*,â€™***SAVING X PRIME X MATRIX FOR FUTURE ITERATIONS****â€™ WRITE(13) XPX PRINT*,â€™*********X PRIME X IS STORED*********â€™ DEALLOCATE (X,XPX) RETURN END Â£*****************|_JÂ£JSJJ}Â£J^ C***********M0DIFIED T0 OUTPUT VARIANCE COVARIANCE**************** Â£****** **********m^'j'l^j2Â£ OF PREDICTIONS **************************** SUBROUTINE VARX(VARG,VARBH) PARAMETER ( N NOBSER = 5000, N NOBL=36, N NOCR = 75, N NOBH = 200, N NVARBH = (NOBH*(NOBH-1 ))/2 + NOBH, N NOGCA = 50, N NOVARG = (NOGCA*(NOGCA-l))/2 + NOGCA, N NOX= 1400, N NIXPX = (NOX*(NOX-l))/2 +NOX, N NOCBS = 1000, N NTOT=NOX + NOCBS, N NIZED = NOX*NOCBS, N NSIP = NOX + NOCBS, N NIZEP = ((NSIP*(NSIP-l))/2) + NSIP) 142 COMMON/CMN1/ NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, N NCOLB,NCOLX,NCOLCB,NCL(9),NORAN,NOFIX,NCLFIX, N NCLRAN,NCOLSE,NRAN(9) COMMON/CMN2/ N YQVQY(9),VQVQ(9,9),MEAN(NOBSER),SIG(9),GCA(NOGCA), N BHAT(NOBH),SCA(NOCR) COMMON/CMN3/ DTERM(8,2),RANNAM(9),DUM2,FMVEC(NOCR), N PARENT(NOGCA),LOCO( 10),REP(NOBL),DISSET( 10) DIMENSION TK(:),D(:),VARG(NOVARG),VARBH(NVARBH), N NSIG(9,2),XPX(:) ALLOCATABLE :: TK,D,XPX INTEGER NCOLT,NCOLTB,NCOLG,NCOLS,NCOLGT,NCOLST,NOBS, N NCOLB,NCOLX,NCOLCB,NCL,NORAN,NOFIX,NCLFIX,NSIG,NCOLTK, N NCLRAN,NCOLSE,NRAN,NSTA,NEND,NSTAK,NENDK,NCOLD,NOZERO, N NUM1 DOUBLE PRECISION YQVQY,VQVQ,MEAN,SIG,GCA,BHAT,SCA,TK,D, N VARG,VARBH,XPX CHARACTER* 1 DTERM,DUM2 CHARACTER* 16 FMVEC CHARACTER* 11 RANNAM CHARACTER*8 LOCO,PARENT,DISSET,REP NUM1 =(NCOLX*(NCOLX-l))/2 + NCOLX ALLOCATE (XPX(NUMl),D(NCOLX)) READ(13) XPX K = 0 NOZERO=0 NCOLTK = NCLFIX NCOLD = NCLFIX+1 DO 22 1= l,NORAN-l NCOLD = NCOLD + NRAN(I) IF(SIG(I) EQ.0.0) THEN NOZERO = NOZERO+1 NSIG(NOZERO, 1) = NCOLD + l-NRAN(I) NSIG(NOZERO,2) = NCOLD GO TO 22 ENDIF NCOLTK = NCOLTK + NRAN(I) DO 21 J= 1,NRAN(I) K = K+ 1 D(K) = SIG(I) 21 CONTINUE 22 CONTINUE ALLOCATE (TK(NUMl)) K = 0 DO 302 1=1,NCOLX IF(I.EQ.(NCLFIX4-1)) GO TO 302 DO 23 L=l,NOZERO IF((I.GE.NSIG(L,1)).AND.(I.LE.NSIG(L,2))) GO TO 302 143 23 CONTINUE N = NVEC(I,NCOLX) DO 301 J = I,NCOLX IF(J.EQ.(NCLFIX+1)) GO TO 301 DO 24 L=l,NOZERO IF((J.GE.NSIG(L,1)).AND.(J.LE.NSIG(L,2))) GO TO 301 24 CONTINUE NN = N+J-I+1 K = K+1 TK(K) = XPX(NN)/SIG(NORAN) 301 CONTINUE 302 CONTINUE K=0 DO 28 I = NCLFIX + l,NCOLTK J = NVEC(I,NCOLTK) N=J + 1 K = K+1 TK(N)=TK(N)+ (l.D0/(D(K))) 28 CONTINUE DEALLOCATE (D,XPX) Â£**************Â£qjj^'jâ€™]ONS have now been formed************************** CALL VECSWP(TK,NCOLTK,NCOLTK,l,NCOLTK) DO 952 1=1,9 IF(RANNAM(I).EQ.â€™GCAâ€™) THEN NSTA=I GO TO 953 ENDIF 952 CONTINUE 953 NEND = 0 DO 954 1= 1,NSTA-1 IF(SIG(I).EQ.0.0) GO TO 954 NEND = NEND + NRAN(I) 954 CONTINUE NSTAK = NEND + NCLFIX +1 NENDK = NSTAK + NRAN(NSTA)-1 N=0 DO 955 I = NSTAK,NENDK K = NVEC(I,NCOLTK) DO 956 J = I,NENDK KK = K+J-I+ 1 N = N+1 VARG(N)=TK(KK) 956 CONTINUE 955 CONTINUE N = 0 DO 957 1=1,NCLFIX K = NVEC(I,NCOLTK) DO 958 J = I,NCLFIX 144 KK = K+J-I+ 1 N = N+ 1 VARBH(N)=TK(KK) 958 CONTINUE 957 CONTINUE DEALLOCATE (TK) RETURN END REFERENCE LIST Banks, B.D., Mao, I.L. & Walter, J.P. 1985. Robustness of the restricted maximum likelihood estimator derived under normality as applied to data with skewed distributions. J. Dairy Sci. 68:1785-1792. Becker, W.A. 1975. Manual of Quantitative Genetics. Washington State Univ.Press, Pullman,WA. 170 pp. Braaten, M.O. 1965. The union of partial diallel mating designs and incomplete block environmental designs. North Carolina State Univ. Inst, of Stat. Mimeo. Series No. 432, 77pp. Bridgwater, F.E., Talbert, J.T. & Jahromi, S. 1983. Index selection for increased dry weight in a young loblolly pine population. Silvae Genet. 32:157-161. Burdon, R.D. 1977. Genetic correlation as a concept for studying genotype-environment interaction in forest tree breeding. Silvae Genet. 26:168-175. Burdon, R.D. & Shelbourne, C.J.A. 1971. Breeding populations for recurrent selection: Conflicts and possible solutions. N. Z. J. For. Sci. 1:174-193. Burley, J., Burrows, P.M., Armitage, F.B. & Barnes, R.D. 1966. Progeny test designs for Pinus patula in Rhodesia. Silvae Genet. 15:166-173. Campbell, K. 1972. Genetic variability in juvenile height-growth of Douglas-fir. Silvae Genet. 21:126-129. Corbeil, R.R. & Searle, S.R. 1976. A comparison of variance component estimators. Biometrics 32:779-791 Falconer, D.S. 1981. Introduction to Quantitative Genetics. Longman & Co., New York,NY. 340 pp. Foster, G.S. 1986. Trends in genetic parameters with stand development and their influence on early selection for volume growth in loblolly pine. For. Sci. 32:944-959. Foster, G.S. & Bridgwater, F.E. 1986. Genetic analysis of fifth-year data from a seventeen parent partial diallel of loblolly pine. Silvae Genet. 35:118-122. Freund, R.J. 1980. The case of the missing cell. Amer. Stat. 34:94-98. 145 146 Freund, R.J. & Littell, R.C. 1981. SAS for Linear Models. SAS Institute,Inc., Cary,NC. 231 pp. Giesbrecht, F.G. 1983. Efficient procedure for computing minque of variance components and generalized least squares estimates of fixed effects. Commun. Statist. -Theor. Meth. 12:2169-2177. Gilbert, N.E.G. 1958. Diallel cross in plant breeding. Heredity 12:477-498. Goodnight, J.H. 1979. A tutorial on the sweep operator. Amer. Stat. 33(3): 149-158. Graybill, F.A. 1976. Theory and Application of the Linear Model. Duxbury Press, North Scituate,MA. 704 pp. Greenwood, M.S., Lambeth, C.C. & Hunt, J.L. 1986. Accelerated breeding and potential impact upon breeding programs. In: Southern Cooperative Series Bulletin No. 309. Louisiana Ag. Experiment Station, Baton Rouge,LA. pp. 39-41. Griffing, B. 1956. Concept of general and specific combining ability in relation to diallel crossing systems. Aust. J. Biol. Sci. 9:463-493. Hallauer, A.R. & Miranda, J.B. 1981. Quantitative Genetics in Maize Breeding. Iowa State Univ.Press, Ames,10. 468 pp. Hartley, H.O. 1967. Expectations, variances and covariances of ANOVA mean squares by "synthesis". Biometrics 21:467-480. Hartley, H.O. & Rao, J.N.K. 1967. Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54:93-108. Harville, D.A. 1977. Maximum likelihood approaches to variance component estimation and to related problems. J. Amer. Stat. Assoc. 72:320-338. Henderson, C.R. 1953. Estimation of variance and covariance components. Biometrics 9:226-252. Henderson, C.R. 1973. Sire evaluation and expected genetic advance. In: Animal Breeding and Genetics Symposium in Honor of J. Lush, Animal Sci. Assoc. Amer., Champaign, Ill. pp 10-41. Henderson, C.R. 1974. General flexibility of linear model techniques for sire evaluation. J. Dairy Sci. 57:963-972. Henderson, C.R. 1977. Best linear unbiased prediction of breeding values no in the model for records. J. Dairy Sci. 60:783-787. Henderson, C.R. 1984. Applications of Linear Models in Animal Breeding. University of Guelph, Guelph, Ontario, CAN. 462 p. 147 Henderson, C.R., Kempthome, O., Searle, S.R. & Von Krosigk, C.N. 1959. Estimation of environmental and genetic trends from records subject to culling. Biometrics 30:583- 588. Hodge, G.R. & White, T.L. (in press). Genetic parameter estimates for growth traits at different ages in slash pine. Silvae Genet. Hogg, R.V. & Craig, A.T. 1978. Introduction to Mathematical Statistics. Fourth edition. Macmillan Publ. Co. New York, NY. 438 pp. Kackar, R.N. & Harville, D.A. 1981. Unbiasedness of two-stage estimation and prediction procedures for mixed linear models. Comm. Stat. A. Theory and Methods 10:1249-1261. Kendall, M.G. & Stuart, A. 1963. The Advanced Theory of Statistics. Vol. 1. Hafner Publ. Co., New York. 433 pp. Klotz, J.H., Milton, R.C. & Zacks, S. 1969. Mean square efficiency of estimators of variance components. J. Amer. Stat. Assoc. 64:1383-1402. Knuth, D.E. 1981. Seminumerical Algorithms, 2nd ed., vol. 2 of The art of computer programming. Addison-Wesley Reading, MA. Littell, R.C. & McCutchan, B.G. 1986. Use of SAS for variance component estimation. In: Statistical considerations in genetic testing of forest trees. South. Coop. Series Bull. No. 324. pp 75-86. Loo-Dinkins, J.A., Tauer, C.G. & Lambeth, C.C. 1990. Selection system efficiencies for computer simulated progeny test field designs in loblolly pine. Theor. Appl. Genet. 79:89-96. Matzinger, D.F., Sprague, G.F. & Cockerham, C.C. 1959. Diallel crosses of maize in experiments repeated over locations and years. Crop Sci. 51:346-350. McCutchan, B.G., Ou, J.X. & Namkoong, G. 1985. A comparison of planned unbalanced design for estimating heritability in perennial tree crops. Theor. Appl. Genet. 71:536-544. McCutchan, B.G., Namkoong, G. & Giesbrecht, F.G. 1989. Design efficiencies with planned and unplanned unbalance for estimating heritability in forestry. For. Sci. 35:801-815. McLean, R.A. 1989. An introduction to general linear models. In: Applications of Mixed Models in Agriculture and Related Disciplines, South. Coop. Ser. Bull. No. 343. pp 23-30. Louisiana Agricultural Experiment Station. Baton Rouge. Meyer, K. 1989. Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. Genet. Sel. Evol. 21:317-340. 148 Miller, J.J. 1973. Asymptotic properties and computation of maximum likelihood estimates in the mixed model of the analysis of variance. Tech. Rep. No. 12, Department of Statistics, Stanford Univ., Stanford, CA. Milliken, G.A. & Johnson, D.E. 1984. Analysis of Messy Data I, Designed Experiments. Lifetime Learning Pub., Belmont,CA. 473 pp. Namkoong, G., Snyder, E.B. & Stonecypher, R.W. 1966. Heritability and gain concepts for evaluating breeding systems such as seedling orchards. Silvae Genet. 15:76-84. Namkoong, G. & Roberds, J.H. 1974. Choosing mating designs to efficiently estimate genetic variance components for trees. Silvae Genet. 23:43-53. Olsen, A., Seely, J. & Birkes, D. 1976. Invariant quadratic unbiased estimation of two variance components. Ann. Stat. 4:878-890. Patterson, H.D. & Thompson, R. 1971. Recovery of interblock information when block sizes are unequal. Biometrika 58:545-554. Pederson, D.G 1972. A comparison of four experimental designs for the estimation of heritability. Theoret. Appl. Genet. 42:371-377. Pepper, W.D. 1983. Choosing plant-mating design allocations to estimate genetic variance components in the absence of prior knowledge of the relative magnitudes. Biometrics 39:511-521. Pepper, W.D. & Namkoong, G. 1978. Comparing efficiency of balanced mating design for progeny testing. Silvae Genet. 27:161-169. Pittman, E.J.G. 1937. The "closest" estimates of statistical parameters. Pro. Cambr. Philos. Soc. 33:212-222. Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T. 1989. Numerical Recipes. The Art of Scientific Computing (Fortran version). Cambridge Univ. Press, New York NY. 702 pp. Rao, C.R. 1971a. Estimation of variance and covariance components-minque theory. J. Multivar. Anl. 1:257-275. Rao, C.R. 1971b. Minimum variance quadratic unbiased estimation of variance components. J. Multivar. Anl. 1:445-456. Rao, C.R. 1972. Estimation of variance and covariance components in linear models. J. Amer. Stat. Assoc. 67:112-115. SAS Institute, Inc. 1985. SAS Interactive Matrix Language Guide for Personal Computers. SAS Insitute,Inc., Cary,NC. 429 pp. 149 Schneider, D.M. 1987. Linear Algebra, A Concrete Introduction. Maxmillan Pub. Co., New York, NY. 506 pp. Searle, S.R. 1971. Topics in variance component estimation. Biometrics 27:1-76. Searle, S.R. 1987. Linear Models for Unbalanced Data. John Wiley and Sons, New York, NY. 536 pp. Shaw, R.G. 1987. Maximum-likelihood approaches applied to quantitative genetics of natural populations. Evolution 41(4):812-826. Singh, M. & Singh, R.K. 1984. A comparison of different methods of half-diallel analysis. Theor. Appl. Genet. 67:323-326. Snyder, E.B. & Namkoong, G. 1978. Inheritance in a diallel crossing experiment with longleaf pine. In: USDA For. Serv. Res. Pap. SO-140. South. For. Exp. Stn., New Orleans, LA. 31pp. Speed, F.M., Hocking, R.R. & Hackney, O.P. 1978. Methods of analysis of linear models with unbalanced data. J. Amer. Stat. Assoc. 73:105-112. Sprague, G.F. & Tatum, L.A. 1942. General vs. specific combining ability in single crosses of corn. J. Amer. Soc. Agron. 34:923-932. Squillace, A.E. 1973. Comparison of some alternative second-generation breeding plans for slash pine. In: South. For. Tree Improve. Conf. June 12-13, 1973 Baton Rouge, LA, pp. 2-13. Stonecypher, R.W., Zobel, B.J. & Blair, R. 1973. Inheritance patterns of loblolly pines from a nonselected natural population. Technical Bulletin No. 224, North Carolina Ag. Exp. Stn. Swallow, W.H. 1981. Variances of locally minimum variance quadratic unbiased estimators (â€™MIVQUEâ€™s)â€™ of variance components. Technometrics 23:271-283. Swallow, W.H. & Monahan, J.F. 1984. Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics 26(l):47-57. van Buijtenen, J.P. 1972. Efficiency of mating designs for second-generation selection. IN Proceedings, IUFRO Working Party Meeting on Progeny Testing, 25-27 Oct. 1972, Macon, GA. Edited by John F. Kraus, Ga. For. Res. Council, Macon, pp. 103-126. van Buijtenen, J.P. & Bridgwater, F. 1986. Mating and genetic test designs. In: Advanced Generation Breeding of Forest Trees. Southern Coop. Series Bull. 309. Louisiana Ag. Exp. Stn., Baton Rouge,LA. pp. 5-10. van Buijtenen, J.P. & Burdon, R.D. 1990. Expected efficiencies of mating designs for advanced generation selection. Can. J. For. Res. 20:1648-1663. 150 Weir, R.J. & Goddard, R.E. 1986. Advanced generation operational breeding programs for loblolly and slash pine. In: Southern Coop. Series Bull. 309. Louisiana Agrie. Exp. Stn., Baton Rouge, LA. pp. 21-26. Weir, R.J. & Zobel, B.J. 1975. Managing genetic resources for the future a plan for the N.C. State Industry Cooperative Tree Improvement Program. In: Proc. 13th South. For. Tree Improve. Conf. June 10-11, Raleigh, NC. pp. 73-82. Westfall, P.H. 1987. A comparison of variance component estimates for arbitrary underlying distributions. J. Amer. Stat. Assoc. 82:866-874. White, T.L. 1987. A conceptual framework for tree improvement programs. New Forests 4:325-342. White, T.L. & Hodge, G.R. 1987. Practical uses of breeding values in tree improvement programs and their prediction from progeny test data. P. 276-283 in Proc. 19th South. For. Tree Improve. Conf. Texas A & M Univ., College Station, TX. White, T.L. & Hodge, G.R. 1988. Best linear prediction of breeding values in a forest tree improvement program. Theor. Appl. Genet. 76:719-727. White, T.L. & Hodge, G.R. 1989. Predicting Breeding Values with Applications in Forest Tree Improvement. Kluwer Academic Pub., Dordrecht,The Netherlands. 367 pp. Wilcox, M.D., Shelbourne, C.J.A. & Firth, A. 1975. General and specific combining ability in eight selected clones of radiata pine. N. Z. J. For. Sci. 5:219-225. Yates, F. 1934. The analysis of multiple classifications with unequal numbers in the different classes. J. Amer. Stat. Assoc. 29:51-66. Zobel, B.J. & Talbert, J. 1984. Applied Forest Tree Improvement. John Wiley and Sons, New York, NY. 505 pp. BIOGRAPHICAL SKETCH Dudley Arvle Huber was born December 13, 1948, in Fulton County, Georgia, to Dudley and Dorothy Huber. His basic education was in the Stephens County school system. He entered Georgia Institute of Technology to study chemical engineering and later transferred to the University of Georgia in the forestry program. In 1970, he received a Bachelor of Science degree. From 1971 to 1977, he served in the U. S. Navy and after service re-entered the University of Georgia, receiving a Master of Science degree in 1981. After several years of self- employment and employment at the University of Georgia, he began a Doctor of Philosophy program in 1988. He is currently employed as operations geneticist for Southern Forest Tree Improvement by Weyerhaeuser Company. 151 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^7770 Timothy L. White, Chairman Associate Professor of Forest Resources and Conservation I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Michael A. DeLorenzo Associate Professor of Dairy Science I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Assistant Research Scientist of Forest Resources and Conservation I certify that I have read this study and that in my opinion it conforms standards of scholarly presentation and is fully adequate, in scope and quality, as for the degree of Doctor of Philosophy. Ramon C. Littell Professor of Statistics to acceptable a dissertation I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Donald L. Rockwood Professor of Forest Resources and Conservation This dissertation was submitted to the Graduate Faculty of the School of Forest Resources and Conservation in the College of Agriculture and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1993 Director, Forest Resources and Conservation Dean, Graduate School UNIVERSITY OF FLORIDA 3 1262 08553 9400 xml version 1.0 encoding UTF-8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EP1E48KFG_B6TTCX INGEST_TIME 2017-07-13T21:50:34Z PACKAGE AA00003661_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES |