|UFDC Home||myUFDC Home | Help|
This item has the following downloads:
1 COMPARISON OF LATENT GROWTH MODELS WITH DIFFERENT TIME CODING STRATEGIES IN THE PRESENCE OF INTER INDIVIDUALLY VARYING TIME POINTS OF MEASUREMENT By BURAK AYDIN A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULLFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN EDUCATION UNIVERSITY OF FLORIDA 2010
2 2010 BURAK AYDIN
3 To M ehmetali, K ymet a nd Bur ak Ayd n
4 ACKNOWLEDGMENTS First of all I would like to thank Dr. Walter Leite and Dr. James Algina for guiding me in my thesis. I thank to faculty and students of the research and evaluation methodology program in the Educational Psychology Department. I would also like to thank Turkish Government for the financial support.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ ...... 4 LIST OF TABLES ................................ ................................ ................................ ................ 7 LIST OF FIGURES ................................ ................................ ................................ .............. 9 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ ........ 12 2 EMPIRICAL ILLUSTRATION ................................ ................................ ..................... 22 Sample ................................ ................................ ................................ ........................ 26 Measures ................................ ................................ ................................ .................... 27 Mathem atics Scores ................................ ................................ ............................ 27 Reading Scores ................................ ................................ ................................ .... 28 L earning Related Behaviors ................................ ................................ ................ 28 Socio Economic Status (SES) ................................ ................................ ............. 29 Age, Gender, Race, Kindergarten Retention, Disability Status: ......................... 29 The Final Data ................................ ................................ ................................ ............ 29 Statistical Models ................................ ................................ ................................ ........ 31 Hierarchical Linear Model ing ................................ ................................ ............... 31 Latent Growth Curve Model ................................ ................................ ................. 32 Time points ................................ ................................ ................................ .... 33 Four differen t time coding strategies ................................ ............................ 34 Results ................................ ................................ ................................ ........................ 37 Discussion ................................ ................................ ................................ ................... 42 3 SIMULATIO N STUDY ................................ ................................ ................................ 44 Design ................................ ................................ ................................ ......................... 44 Data Generation ................................ ................................ ................................ ......... 48 Data Analysis ................................ ................................ ................................ .............. 49 Results ................................ ................................ ................................ ........................ 51 Convergence Rates and Improper Solutions ................................ ...................... 51 Relative Bias of Parameter Estimates and Standard Errors .............................. 51 Model Fit ................................ ................................ ................................ ............... 53 Discussion ................................ ................................ ................................ ................... 57 4 CONCLUSION ................................ ................................ ................................ ............ 67 APPENDIX CALCULATING THE NUMBER OF DAYS WITH SPSS ................................ ................. 71
6 LIST OF REFERENCES ................................ ................................ ................................ ... 72 BIOGRAPHICAL SKETCH ................................ ................................ ................................ 77
7 LIST OF TABLES Table page 2 1 Demographic Characteristic for the ECLS K Full and Anal ytical Samples .......... 30 2 2 SES and Math IRT scores for the ECLS K Full and Analytical Samples ............. 31 2 3 Descriptive Statis tics for Number of Day s between the Fall Kindergarten Assessment and the Other Assessments ................................ ............................. 34 2 4 Number of days between assessments divided by 100 ................................ ....... 35 2 5 Number of days between assessments divided by 358 ................................ ....... 36 2 6 Descriptive Statistics for Mathematics IRT Scores First through Fifth Grade ..... 36 2 7 Parameter estimates of RRC model and the unconditional LGM model with different time coding combinations. ................................ ................................ ....... 37 2 8 Varianc es of Parameter Estimates ................................ ................................ ........ 40 2 9 Estimated Mean of IRT Based Math Scores ( ) ........ 40 2 10 Mode l Fit Information for Unconditional LGM ................................ ....................... 40 2 11 Comparison of parameter estimates for the intercept between Morgan's ISAO model and the conditional LGM model with different time coding combinations. ................................ ................................ ................................ ......... 41 2 12 Comparison of parameter estimates for the linear slope between Morgan's ISAO model and the conditional LGM model with different time coding combinations. ................................ ................................ ................................ ......... 42 2 13 Model Fit Indices for Conditional LGM ................................ ................................ ... 42 3 1 Characteristics of assessment time points distributions ................................ ........ 45 3 2 Population Parameters ................................ ................................ ........................... 48 3 3 Example of Mplus LGM Program ................................ ................................ ........... 50 3 4 Percentages of convergence and improper solutions for each condition ............. 62 3 5 Comparison of relative parameter bias estimates across conditions. ................... 63 3 6 Relative bias estimates for standard errors. ................................ .......................... 64
8 3 7 Chi square statistic results across conditions ................................ ........................ 65 3 8 Comparison of model fit information across conditions. ................................ ........ 66
9 LIST OF FIGURES Figure page 2 1 Path m odel for an unconditional quadratic latent growth model. .......................... 25 2 2 Mathematics Growth Trajectory ................................ ................................ .............. 36 3 1 Distribut ions for the third assessment wave time points in narrow range. ............. 47
10 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Arts in Education COMPARISON OF LATENT GROWTH MODELS WITH DIFFERENT TIME CODING STRATEGIES IN THE PRESENCE OF INTER-INDIVIDUALLY VARYING TIME POINTS OF MEASUREMENT By Burak Aydin August 2010 Chair: Walter L. Leite Major: R Especially in the last two decades, there is a much greater availability of large and complex longitudinal data sets. As a structural equation modeling (SEM) approach, latent growth models (LGM) is one of the widely used methods to analyze longitudinal data sets. In a LGM individual growth is defined as a function of time. Therefore, it is important to correctly define time indicators and adequately implement them into a LGM. There have been some studies guiding researchers to code time adequately in a as the time indicators. However, in the education discipline it is also common to use measurement waves to define time in a growth model. Furthermore, not many studies investigated the heterogeneity in assessment dates, and researchers tended to assume that each child was measured at the same time in a measurement wave. Thus, it is not clear how procedures perform in the presence of inter-individually varying time points of measurement. Consequently, the aim of this thesis was to determine if the results of a LGM is valid when heterogeneity in a measurement wave was omitted.
11 An empirical study to compare different time coding strategies in a LGM is presented in this thesis. A conditional quadratic growth model was constructed based on a publish ed growth model in which mathematical development was investigated for a subsample taken from Early Childhood Longitudinal Study Kindergarten Cohort (ECLS K). Following the empirical study, a simulation study was built on an unconditional linear growth mod el with three measurement occasions. The simulation study examined the heterogeneity effect through the manipulation of three factors: the sample size factor had three levels, 200, 2000 and 8000: the range of the data collection period for each measurement had three levels, narrow, moderate and wide: the distribution of measurement occasions had three levels, uniform, moderately skewed and extremely skewed. The results of the empirical study showed that omitting the heterogeneity in assessment dates for the ECLS K did not cause considerable differences. However, the simulation study revealed that residual variances were affected by the range factor. Residual variances increased as the range of the measurement occasions increased. An interaction between the range and distribution factor was associated with substantial negative bias for the slope variance estimates and the largest absolute value of the bias was found for the condition with a wide range and an extremely skewed distribution of measur ement occasions. A sample size of 200 was associated with an increased number of improper solutions when heterogeneity was omitted. The chi square test statistic indicated lack of exact fit for many conditions. The fit indices indicated models fit adequate ly to the generated data sets, even though some parameters were estimated with substantial bias
12 CHAPTER 1 INTRODUCTION Most of the researchers in the social science disciplines are interested in understanding the source of stability and change in variable s. A psychologist might investigate a student population by collecting data in several time periods in order to see if they develop a particular skill. A sociologist, studying archives, might want to understand crime trend in a community. An economist, for example, might want to develop a model to examine changes in amounts of import/export products for a country over years The common point in these examples is that in each case change occurs as a function of time. The t ime unit can be days, weeks, months, years and even decades. However, in the education discipline, time units are generally specified as months or years, and the unit of analysis is generally students. For example, an educational researcher might try to map the advancement of math aptitude o f individuals for the first six years of school. In this case researcher needs to col lect data at several grades. In this fashion, the collected data can be called a longitudinal data set. Since, in the past decades, there is much greater availability of l arge and complex longitudinal data sets, researchers are attracted to develop methods to analyze how change comes about, and how much change occurs. Analysis of variance(ANOVA) multivariate analysis of variance (MANOVA), analysis of covariance (ANCOVA), multivariate analysis of covariance(MANCOVA), auto regressive and cross lagged multiple regression are some of the traditional methods to analyze longitudinal data sets. Eac h of these methods has pros and cons (see e.g., Collins & Sayer, 2001; Gottman, 199 5). Because researchers also tried to define how the change process differ among individuals and because above mentioned methods have assumptions which are rarely met in practice
13 in the social sciences, suc h as sphericity, new methods have been developed t o assess individual change. Actually, recent methodological research in the measurement of individual change has been reoriented from 'change' to 'growth' and has begun to focus directly on the individual trajectories. Hierarch ic al linear modeling (HLM) (s ee Raudenbush &Bryk, 2002) and a structural equation modeling (SEM) approach called latent growth modeling (LGM) have become the most frequently used methods to assess growth over time. Excellent introductions to LGM can be found in Meredith & Tisak, 1990 ; B. O. Muthn & Khoo, 1998; T. E. Duncan, Duncan, Strycker, Li, & Alpert, 1999; Willett & Keiley, 2000, Bollen & Curr an 2006. Both LGM and HLM use the individual data approach that is the expectations regarding means and covariances are modeled at the lev el of individuals. Because both approaches are similar in terms of data structure and model structure, they are equivalent models in most conditions. Nevertheless, Mehta and West (2000) poi nted out four reasons to use SEM over HLM approach. They stated tha t formulating a structural model involving growth factors as predictor or criterion variable is easier with SEM. The second reason is that it is possible to specify a growth model with a measurement structure in which latent variables are measured by sever al indicators at each time point. It is also easy to create models with different forms of growth and to investigate cohort effects when SEM is used. LGM methods simultaneously focus on changes in covariance, variances, and mean values over time (Dunn Ev eritt.&Pickles, 1993) and these methods use initial status and developmental trajectories (e.g., linear, quad ratic) to describe change in an individual LGM can estimate the variability across individuals in both initial status and trajectories, as well as specify coefficients for testing the effect of other
14 variables or constructs to explain variations in the initial status and trajectories (Hancock & Lawrence, 2006). In LGM approach, the intercept and slope terms are treated as latent variables and they are allowed to vary between individuals As an oversimplified example, assume a theory dictate s that reading scores for a first grade student population should follow a generally linear growth trajectory over 3 years. The i nitial point of measure ment (time 1) can serve as a reference point for the development. Therefore, each individual's score can be written as; (1 1) where is the reading score for i th student at time t defines the intercept for case i and the linear slope The is a constant where and which scales the intercept to be interpreted as the initial status, is the error term. Note that interpretation of depends on the coding of time, Equation 1 1 implies that individual deviations in gro wth trajectories affected by the coding of time. In a LGM it is important to carefully specify how time is measured and defined. The d atabases of Ac ademic Search Premier and PsycINFO were searched in order to find applied LGM studie s. Published journal a rticles using LGM showed that researchers tend to assume that each individual was assessed at the same time and so they use fixed time points for mea surement occasions (e.g., Morgan, Farkas and Wu,2009 ; Perez 2008) Generally speaking, assessing every indiv idual at the same time may be considered ideal. However in longitudinal studies with a large sample of students (ECLS K, LSAY) it is frequently impractical to conduct assessments at the exact same occasions for all students. These studies yield data which are partially homogeneous in the times of measurement, in that the reports cover the same period of
15 time (e.g., fall kindergarten, spri ng first grade). Considering the s e 'partially s are depend ent on the particular times of measurement, taking into account inter individual ly varying time of measurement c ould be essential to have va lid interpretations of the data ( Blozis and Cho, 2008 ). Omitting the heterogeneity of time of assessment occasions m ay cause biased estimations of change. My literature review showed that relatively few studies have been conducted on time coding issues in a LGM. One of the time coding issues is to decide whether age of the participants or the measurement dates will be u sed to scale time variable. Mehta and West (2000) published a methodological guideline in which they provided different scaling of time strategies in linear growth curve models. They used participants' ages to create time indicators and investigated three different time structured measurement designs. In the first design, they built a LGM for participants who were identically aged at the beginning of the study. Occasions of measurement were assumed to be equally spaced, with participants measured at ages 1 2, 13, 14 and 15. The authors investigated two different time scaling schemes under the first design. The difference between these schemes was the origin of time which is often chosen by the researcher to meet a particular theoretical interest. Time points are assigned as 0, 1, 2 and 3 when the origin is chosen to be age 12. Then the average intercept represents the mean score in the population at age 12. Time points are assigned as 3, 2, 1, 0 when the origin set to be age of 15. Then the intercept represe nts the mean score in the population at age 15. As expected, estimations differed in two different schemes due to change in time scale.
16 The second design was a cohort sequential design and is not related to focus of current research. The third design was more realistic case in which age of the subjects varied between 12 and 14 at the beginning of the study. The authors suggested different scaling strategies in the presence of age heterogeneity in order to obtain more accurate estimates because the scaling for the first design fails to capture time dependence. They investigated two time scaling schemes in the presence of age heterogeneity. For the first model, they scaled time with respect to a common origin across all individuals, meaning ages of participan ts were centered to a common age. For example, assigning 12 as the common age and with 4 repeated measures, time points were 0, 1, 2 and 3 if the participant was 12 years old at the beginning of the study, and 2, 3, 4 and 5 if the participant was 14 years old at the beginning of the study. With this scheme, the average intercept interpreted as the mean values at age 12. Secondly, they scaled time with respect to an individual origin, meaning centering was unique to the individual, each participants age cent ered to his or her age at the beginning of the study. Doing so, each participant had the time points of 0, 1, 2 and 3 even though they varied at the starting age. In order to capture the heterogeneity effect, the authors used participant's age at the begin ning of the study as an intercept predictor. The authors showed that these two schemes produced essentially same results. However, for ease of interpretation, they recommended using centering at a common age. Even though their work provided different desig ns and different schemes for time coding, all analyses conducted with the assumption that occasions of measurement were equally spaced.
17 Biesanz, Deeb Sossa, Papadakis, Bollen & Curran (2004) conducted research to understand the role of time coding in estim ating and interpreting growth curve models. For expository purposes, they investigated an unconditional linear growth model with three repeated measures where participants measured at ages of 5, 7 and 9. They changed the origin of time and reported results for 3 different time coding schemes. Then, they examined an unconditional quadratic growth model with five repeated measures and they modeled changes in children's weight. Data was collected for each child at ages of 5, 7,9,11 and 13. They used four diffe rent time coding strategies with an assumption of individuals assessed at the same time points. The first time coding scheme was to set the origin of time at age of 5, and then they assigned 0,2,4,6 and 8 for the ages 5, 7,9,11 and 13 respectively. For the second time coding scheme, they coded time by setting the last period of assessment time to zero. Therefore, they assigned the codes 8, 6, 4, 2 and 0 to the assessment periods. For the third time coding scheme, they centered time at the midpoint of as sessments and assigned 4, 2, 0, 2 and 4to the measurement periods. These three different placement of time's origin produce information about individual differences in weight and individual differences in the rate of change in weight growth at specific ag es. Changing the origin of time caused differences in estimates of some parameters (e.g. intercept, slope). The authors provided an analytical explanation to show how these differences in parameter estimates occurred and how they can be calculated based on the first scheme they used. This analytical explanation was based on the relationship between three different schemes. The authors basically created a transformation matrix and the size of this matrix was determined by the shape of the estimated growth fu nction. For example, by
18 creating the appropriate transformation matrix, the authors were able to change the time points from 0, 2, 4 to 4, 2, 0 within the same unconditional linear growth model. They used this transformation matrix to calculate difference s in parameter estimates. Furthermore, the authors used orthogonal polynomial codes as the fourth time coding scheme for an unconditional quadratic growth model. On one hand the coefficients obtained with this scheme caused interpretation difficulties. On the other hand, it was stated that with the orthogonal polynomial codes, estimation of the overall growth curve model might be more readily achieved. The authors also examined a conditional quadratic growth model with different time coding schemes. They in cluded a predictor for intercept, slope and quadratic term. Their result showed that models with different schemes fit the data equivalently, but parameter estimates were different. It was shown that differences in these estimates can also be calculated wi th using adequate transformation matrices. In Mehta & West (2000) and Biesanz, Deeb Sossa, Papadakis, Bollen & Curran. (2004), the coding of time points was based on the age of the participants. This approach is reasonable when the response is most sensit ive to changes in age. Another time coding strategy is to set time points based on measurement waves. In the education discipline, researchers generally tend to use measurement waves as the basis for coding time. For example they set time points based on semesters (e.g. fall and spring). This preference is a more logical approach if we assume growth in a behavior for school kids is related to time they spend in classes rather than their ages. In the work of Blozis and Cho (2008) both measurement waves and were used to create time points. Their work built on two representative longitudinal
19 studies. For the first study, data were gathered from National Longitudinal Survey of Youth (NLSY). Children aged between 6 and 14 were the subjects and their antisocial behavior scores were the dependent variables which were collected in four different assessment waves. They created seven different time coding schemes to estimate a he study and participants were assessed on different calendar dates within a measurement occasion. Thus, complete inter individual time heterogeneity was one of the characteristics of the data. Even though they tried different time coding strategies (e.g., person mean centered, group mean centered) they reached same parameter estimates and overall model fit. The second longitudinal study came from a slightly different population. The incapacity of participants was assessed nearly annually over a five year period. Years after first diagnosis and assessment waves were used to create time points. Using the relative standard deviation approach, they stated that the mean of the measures of time with regard to time since diagnosis in the second data was 12 times more heterogonous than the variability of mean ages in the NLSY data. The relative standard deviation measure was calculated based on the absolute value of the ratio of the standard deviation to the mean, which was then multiplied by 100. For example, the mean age of participants in the NLSY data was 10.12 with a standard deviation of .563; these values yielded a relative standard deviation of 5.56. Large heterogeneity caused remarkable differences in some of the parameter estimates with person mean centere d and grand mean centered time coding strategies when using years after the first diagnosis as time points. The authors also used measurement waves to create time points. First, they assigned 0,1,2,3 and 4 for the 5 year annually assessments. Second they a ssigned 2,
20 1, 0,1and 2 to set the origin as the midpoint of assessments. With these two coding strategies each participant had the same time points in each scheme. These two strategies produced exact same model fit. In order to capture the effect of heter ogeneity, authors created a third coding scheme and they implemented the mean years since diagnosis as a predictor of the intercept and time effect into the second scheme. Doing so, they achieved a more adequate model fit. In this study I attempted to in vestigate if taking into account heterogeneity of time of assessment would im prove the fit of latent growth models. I am also interested in whether the parameter estimates would change due to time coding dif ferences. I first decided to perform an analysis of real data using the Early Childhood Longitudinal Stu dy Kindergarten Cohort (ECLS K) because this data set provides exact day of assessment for each individual. To provide a realistic context for my study, I chose to build my researc h on a published mode l. After a careful search of the applied literature, work. These authors examined whether and to what extent the timing and persistence of mathematics difficulties (MD) in kindergarten predicted children's fi rst through fifth grade math growth trajectories. They employed HLM with fixed time points to analyze the math growth in a subsample taken from ECLS K In order to demonstrate the effects of ignoring the variability in measurement occasions, I tried to r eplicate Mplus software (Muthen & Muthen, 2008) was used because of its flexibility. It has been shown that, in most cases, HLM produces estimates that are equivalent to the estimates produced from a LGM (Raudenbush 2001; Rovine & Molenaar, 2000; Mehta & West 2000;
21 Hertzog and Nesselroade, 2003). In order to conduct the replication study; I created a latent curve model and investigated four different time cod ing strategies In two of these strategies the coding of t ime was based on measurement waves and resulted in fixed slope. In two other strategies, individual specific time points were used. Unlike the in previous research, I investigated the differences of time coding strategies using number of days between assessments; in other words, heterogeneity in assessment dates within the waves while including the age as a predictor for the intercept and the slope. It was hypothesized that models with individual specific time points would provide a more adequate fit. After the empirical illustration, a simulation study was conducted to explore differences between fixed time points approach and individual specific time points approach fo r an unconditional linear latent growth model. It was hypothesized that, using fixed time points produces biased estimates if assessment dates vary within a measurement occasion.
22 CHAPTER 2 EMPIRICAL ILLUSTRATI ON Methodology The aim of this study is t o examine the effect of different time coding strategies for both unconditional and conditional nonlinear latent growth model including a quadratic ( time squared) term .The level 1 model equation is: (2 1) where is the value of the trajectory variable y for the i th case at time t, defines the intercept for case i the linear slope, and is the curvature. The is a constant where a common coding convention is to have and ,which scales the intercept to be interpreted as the initial status. The variable is simply the squa red value of time at assessment t and is the error term. These components combine additively to reproduce the value of y for individual i at time t The LGM shown above, t reat s components of as random variables, and level 2 eq uations can be written as: (2 2) (2 3 ) (2 4) where is the mean intercept across all cases, is the mean slope across all cases and is the mean curvatu re across all cases. Equation 2 2 represents the individual intercept as a function of the mean of the intercepts for all cases ( ) and an error term Equation 2 3 represents the individual linear growth component as
23 a function of the mean of slopes for all cases ( ) and an error term. Equation 2 4 represents the individual quadratic growth component as a function of the mean of curvatures for all cases ( ) and an error term. The and are error terms with means of zero and variances of These error terms are assumed to be uncorrelated with In the unconditional model the variance of is equal to the variance of the variance of is equal to and the variance of is equal to Within the structural equation framework, if we consider a T x 1 vector y that includes the set of T repeated measures of y for individual i the level 1 E quation can be written in matrix terms; (2 5) where y is a T x 1 vector of repeated measures, is a T x m matrix of factor loadings, is an m x 1 vector of m latent factors, and is a T x 1 vector of residuals. st 3 rd and 5 th grades the elements of Equation 2 5 are; (2 6) Each observation of y for individual i at time t is a weighted c ombination of a random intercept, a random linear trajectory component, a random quadratic trajectory component, and an individual and time specific error. The level 2 equations can be written as
24 (2 7) where is m x 1 vector of factor means and is a m x 1 vector of errors. For the quadratic model, the matrix form of Equation (2 7) is (2 8) If we substitute E quation (2 7) into E quation (2 5), we could get reduced form expression of y (2 9) The model implied variance of the reduced form is (2 10) where represents the covariance structure of the residuals for the T repeated measures of y i s the covariance matrix of the e quation errors, among the latent trajectory factors. Elements of for the quadratic model are; (2 11) For an unconditional model the variance of is equal to the variance of In the case of three waves of data, the quadratic model has 12 parameters to estimate from 9 means, variances and covariances of the observed variables. Without constraints three waves of data are not sufficient to identify the model. In orde r to identify the model, I
25 assumed that the error variances of the observed variables are equal. With this assumption there were still 10 unknown parameters to estimate. Then I fixed the variance of the quadratic term at zero. The zero variance of the quad ratic term reduced unknown parameter numbers to 7, since covariance of the slope quadratic and intercept quadratic was automatically set to zero. Figure 2 1: Path model for an unconditional quadratic latent growth model. The unconditional m odel differs from the conditional model in the level 2 equations include predictors for the quadratic term, level 2 equations are: (2 12 ) (2 1 3) (2 14 ) here and are the intercepts for the equations that predict the intercepts and linear slopes across all cases. Note that and are the mean intercepts and mean
26 linear slopes when and are zero. The and are two predictors of the intercepts and linear slopes, and are the coefficients for and in the random intercept equation. The and are coefficients in the linear slope equation. These coefficients have the same interpretation as they have in a regression model. They provide the expected difference in the outcome for a 1 unit difference in the explanatory variable net o eleven time invariant predictors. Within the structural equation framework a quadratic conditional model with three time points can be written in matrix forms; ( 2 15) (2 16) S ample The ECLS K is a nationwide longitudinal study aimed to gather extensive data about children's cognitive and behavioral skill development and thei r learning experiences starting from kindergarten and ending with fifth grade. Students in this data attend both public and private schools. The ECLS K study was conducted by the National Center for Educational Statistics (NCES). The ECLS K study used a n ationally representative sample. Consistent with the purpose of this study, mathematic
27 assessment scores and exact assessment dates were used. These mathematics scores were obtained at five different time periods: fall of Kinder garten, spring of Kindergart en, spring of first grade, spring of third grade and spring of fifth grade. The o riginal data include d 17,565 cases coming from approximately 3500 classrooms in 1280 schools. In or study, children of (a) White non Hispanic or (b) Black/African American, n on Hispanic families were chosen. This initially defined a sample consisting of 12,385 children. Measures Mathematics S cores The mathematics assessment for the ECLS K study was designed to measure conceptual and procedural knowledge and also problem solving within particular content strands. For the five different math assessments, the content of the questions can be categorized as number sense, properties, operations, measurement, geometry, spatial sense, dat a analysis, statistics, probability, algebra and functions. The first three categories contained the largest number of items for all grades. Mathematics s cores were scaled using IRT procedures. The advantage of the IRT based scores is that it provides comp arable scores in many cases. IRT scoring makes possible longitudinal measurement of gain in achievement over time IRT methods estimate the difficulty, discriminating ability and guessing values for the items. Using these values, individual item responses a abilities The N ational C enter of E ducation S tatistics (NCES) chooses IRT based score as the most appropriate metric for growth modeling. The r eliabilities of IRT scores ranged from .89 to .94 (NCES) The c orrelat ion s between IRT scaled scores and scores from Woodcock McGrew Werder Mini Battery of Achievement (Woodcock, McGrew, &
28 Werder, 1994) were compared to assess concurrent validity. High correlations were found (.i.e., fifth grade = .80 and third grade = .84) (NCES). Reading S cores Reading difficulties of children were one of the predictors of the mathematical growth. ECLS K assessed children's reading skills at several time points. The content of the questions in this test can be categorized as print familia rity, letter recognition, decoding, sight word recognition, receptive vocabulary, and comprehension. The field test of the instrument showed no differential item functioning and adequate item level statistics. The reliability coefficient of fall kindergart en reading IRT scores was .91 (NCES, 2006). Reading test scores of the fall kindergarten were used to identify children with reading difficulties. Consistent with the literature 10% cut off was used. Scores in the lowest 10% were coded as 1 indicating read ing difficulty, the rest of the scores (90%) coded as 0 indicating no reading difficulty. Learning Related B ehaviors An instrument was created for ECLS K based on the Social Skills Rating System (Gresham & Elliott, 1990) to measure a student's attentivene ss, task persistence, eagerness to learn, learning independence, adaptability to changes in routine, and organization. Teachers rated children's learning related behaviors during the fall kindergarten. The scores for the fall of kindergarten yielded a spl it half reliability coefficient of .89 (NCES). Scores in the lowest 10% were coded as 1 indicating behavior problems, the rest of the scores (90%) were coded as 0 indicating no behavior problems. This dummy coded variable was used as one of the predictors o f children's initial knowledge of mathematics and mathematical growth.
29 Socio Economic S tatus (SES) Individual SES values were used as one of the predictors. NCES assessed a level occupation and household income The continuous scale of SES (WKSESL) was chosen as the variable. The scores were gathered during the spring of kindergarten. Age, G ender, R ace, Kindergarten Retention, Disability S tatus The c hildren's age in months at the beginning of the fall kindergar ten (September 1998) were used to create one of our independent variables. The d ichotomous variables of race, gender and retention status were also independent variables where 1 indicated, White, female and retention respectively and 0 indicated Black/Af rican American, male and no kindergarten retention respectively. Individualized Education Plan (IEP) was used for the students with disabilities. Records for spring of kindergarten were used to create a dichotomous variable where 1 indicates that child had an IEP. The Final Data In the work of they excluded those children who had missing data on any child level predictor (e.g. Race, gender, retention) or the Mathematics test at the kindergarten time points. Their final analyti cal sample consis ted of 7,892 children, and Morgan, Farkas and Wu claim that this sample was representative of the full sample. I followed th e same exclusion strategy and ca me up with a sample of 7 935. The small difference (43 cases) might be due to round ing in cut off procedures. However, when the Mplus program estimated the LGM with random time points, the robust maximum likelihood ( MLR ) estimation procedure deleted cases with missing values on time scores but non missing values on the corresponding depe ndent variables. Overall my final analytical samp le included 7,809 cases. Tables 2
30 1 and 2 2 show descriptive statisti (2009) sample and my replication sample. In the work of Morgan, Farkas and Wu (2009) one o f the main purpose s was to estimate the growth curves of children with learning difficulties in mathematics (MD) during kindergarten. They created four differen t dummy codes to categorize children with learning difficulties. The s ame strategy was followed in this study, and consistent with Morga n's work, a 10% cut off was used. Fall and spring kindergarten math test scores were used and students who scored in the lowest 10% were labeled as 'Difficulties' or 'D'. Non MD group represented students who were ab ove the 10% cut off in both fall and spring kindergarten tests. D10 was set to 1 for a student who had MD in the fall semester, but not in the spring semester. D01 was set to 1 if a child had MD in the spring semester but not in the fall. D11 was set to 1 if a child had MD in both semesters. Table 2 1 Demographic Characteristic for the ECLS K Full and Analytical Samples ______________________________________________________________________________ Characteristic Full Sample Replication Stud y Sample Morgan's Sample ( N=12,385) (N=7809) (N=7892) ______________________________________________________________________________ Gender Male 51.2% 50.9% 50.9% Female 48.8% 49.1% 49.1% Race White 79.9% 82.9% 82. 7% Black or African American 20.1% 17.1% 17.3% IEP Yes 7.0% 6.3% 6.4% No 93.0% 93.7% 93.6% IEP= Individualized Education Plan
31 Table 2 2. SES and Math IRT scores for the ECLS K Full and Analytical Samples _____________________________________________________________________________ Characteristic Full Sample Replication Study Sample Morgan's Sample ( N=12,385) (N=7809) (N=7892) _________________________ _____________________________________________________ SES* 0 .12(.78) 0.14(.77) 0 .14(.77) Fall Kindergarten Mathematics 23.88(8.98) 24.33(9.02) 24.31(9.00) IRT scores ________________________________________________________ ______________________ Note: Standard deviations are in parenthesis; IRT = item response theory. *Using the WKSESL variable Statistical Models Hierarchical Linear Modeling In the work of Morgan, Farkas and Wu (2009) HLM 6 (Raudenbush, Bryk, Cheong, & Con gdon, 2004) was used to analyze the data. They used a slopes and inte rcepts as outcomes model. In my study I tried to replicate thei r most complex model. Following are the level 1 and level 2 equatio ns for their most complex model: Level 1 : (2 17) w here i =1,2,3,...... n is the index for subjects t =0,2,4 are the time point s is the initia l status of the student at time zero is the li near slope, is the quadratic term. The growth rate of person i at any specific time point is the first derivate of the growth model at that point ( Raudenbush & Bryk, 2002). If is positive, it can be sai d that the student is growing at an accelerating rate, if it is negative, growing at a
32 decelerating rate. The is the measurement error and it is assumed to have a mean of 0, a constant variance and be distributed normally. Level 2 : (2 18) (2 19) (2 20) w here is the coefficient for a particular predictor on init ial math status and math growth and are random error. Due to lack of enough degrees of freedom the variance of random error for the quadratic term was not estimated. Latent Growth Curve Model he Equation 2 15 for the unconditional model as; (2 21) (2 22) The level 2 matrices for a conditional LGM model based on Morgan`s HLM can be written as;
33 (2 23) Time p o ints T he ECLS K provides exact measurement dates for each subject. Given this information, it is possible to calculate the number of days between assessments for each individual. The first mathematics assessment was conducted at the end of the fall kindergarten semester. The date of this assessment was assigned as the starting point and given the value of zero. For example student with the id 0001002C took the first assessment on 30 November 1998 (fall kindergarten) and second assessment on 26 May 1999(spring kin dergarten). Using the compute elapse = (date2 date1)/86400 option on SPSS, it has been found that there are 177 days between these two assessment points. The same process was employed for all students and all other time points. Table 2 3 shows descriptive statistics for number of days after the first assessment. Morgan, Farkas and Wu (2009) used fall and spring of kindergarten math IRT grades to identify children with MDs. In order to analyze mathematics skills growth, they
34 used spring of first grade, spri ng of third grade and spring of fifth grade math IRT scores. Using fixed time points, they assigned time points of 0, 2 and 4 for the grades. Table 2 3. Descriptive Statistics for Number of Days between the Fall Kindergarten Assessment and the Other Asses sments _______________________________________________________________________ Semester Number of Minimum Maximum Mean SD Students _______________________________________________________________________ Spring KG 7809 120 261 185.6 20.4 Spring1 st 7809 477 644 550.6 23.0 Spring3 rd 6548 1199 1366 1275.3 25.8 Spring5 th 5174 1904 2091 1983.3 29.0 _______________________________________________________________________ Note: SD= Standard Deviation Four different time coding s trategies In order to generate parallel results with Morgan's study, the first model was es timated using the fixed time points of 0, 2 and 4 to represent spring of first grade, spring of third grade and spring of fifth grade, respectively. This procedure assigns 1 point for each education year. The second model was also estimated with fixed tim e points, but the average of number of days between assessments was used. Because growth analysis started from spring of first grade, I treated the spring first grade assessment dates as the beginning point. Thus, I was able to find how many days passed be fore the third grade and fifth grade assessments. Calculations showed that o n average there were 725 days between first grade and third grade assessments; and, there were 1344 days between first grade and fifth grade assessments. For ease of interpretation each average divided by constant number of 100. Time points were 0, 7.25, and 13.44 represents spring of first grade, spring of third grade and spring of fifth grade, respectively.
35 T he t hird model was estimated without fixed time points. Adding the comma nd of 'type=random' to analysis line in the Mplus program, inter individually varying time points of assessments were taken into account. The averages of time point s were 0, 7.25 and 13.44. Table 2 4 shows descriptive s tatistics for time points used. Table 2 4. Number of days between assessments divided by 100 ___________________________________________________________________________ Term N Min Max Mean SD Spring 1 st grade 7809 0 0 0 0 Spring 3 rd grade 6548 6.42 8.08 7.25 0.2 3 Spring 5 th grade 5174 13.48 15.13 14.33 0.29 ______________________________________________________________________________________________________ Note: N= Sample size, SD= Standard deviation The f o urth and the last model also estimated with hetero geneous time points. However the number of days was divided by 358 instead of 10 0 in order to set the mean of time points to 0, 2 and 4. This time coding scheme provides estimates directly comparable to Model 1 results. Table 2 5 shows descriptive statist ics for the time points used in the fourth model U sing unconditional and conditional LGM models I analyzed first, third and fifth grade math scores. Average math IRT scores were increased toward upper grades, starting with 59.98 at the first grade, 95.2 8 for the third grade and 116.79 for the fifth grade. Table 2 6 shows the descriptive statistics for IRT scores. Figure 2 2 shows the trajectory of IRT scores and indicates that increase in scores is not linear. In order to get parallel results to Morgan's analyses, I also mean centered continuous variables in level 2 at conditional model (i .e. SES and age) LGM results can be obtained with different estimation procedures. The m aximum likelihood (ML) procedure is the most widely used among these. All analy ses were
36 perform ed with Mplus (Muthen & Muthen 2008 ). The ML estimation method could not be employed for all four models. Instead robust maximum likelihood (MLR) was used for all models It is known that parameter estimates are the same in MLR and ML, but standard errors might be different. Table 2 5 Number of days between assessments divided by 358 ___________________________________________________________________________ Term N Min Max Mean SD Spring 1 st grade 7809 0 0 0 0 Spring 3 rd grade 6548 1.79 2.26 2.02 0.63 Spring 5 th grade 5174 3.77 4.23 4.00 0.81 ______________________________________________________________________________________________________ Note: N= Sample size, SD= Standard deviation (Please see A ppendix A for the SPSS syntax that executes this scaling ) Table 2 6 Descriptive Statistics for Mathematics IRT Scores First through Fifth Grade ___________________________________________________________________________ Assessment N Min Max Mean SD Spring 1 st grade 7809 11.17 120.50 59.98 17.03 Spring 3 rd grade 6548 33.60 146.59 95.28 20.96 Spring 5 th grade 5174 47.08 150.94 116.79 20.26 ______________________________________________________________________________________ ________________ Note: N: Sample size, SD: Standard deviation Figure 2 2 Mathematics Growth Trajectory
37 Results In this chapter analyses of results of applying four different LGM models and their fir st step, unconditional LGM (RRC) model results are reported. These two models are statistically equivalent in terms of having t he same level1 and level2 equations Moreover, using an analytical sample of 7 809 case s a RRC model was estimated and the exact same parameter estimates and standard errors were obtained with a LGM estimated using Mplus. Intercept values for all five models were roughly same. In terms of slope and quadratic term means, Model 1 and Model 4 produced essentially identical estimations with Morgan's model. However, Model 2 and Model 3 produced small er estimated slopes These differences occurred due to scale differences in time coding. The e ffect of scaling on parameter estimates is explained ma thematically below. Table 2 7. Parameter estimates of RRC model and the unconditional LGM model with different t ime coding combinations Means RRC Model 1 Model 2 Model 3 Model 4 Intercept 59.63* 59.99* 59. 99* 60.00* 60.01* Linear Slope 20.82* 20.82* 5.72* 5.70* 20.40* Curvature 1.77* 1.71* 0.127* 0.125* 1.60* # = average of individually varied time points p<.05 As shown earlier in Equation 2 1 the model for the score for person i at time t is
38 where consisted of values as time indicators. If we set a s a n alternative scale for t ime, then; (2 24) Because (2 25) and and In LGM the means and the variances of the are estimated. Let the means be From the expressions and and when the index for occasions 2 and 3 are fixed at 7. 25 and 14.33, these values approximately 3.6 times as large as 2 and 4 respectively. Thus from the preceding development we would expect and where the are the coefficients for t he scale 0, 7.25 and 14.33. The results in table 2 7 agree with the development. For example the mean slope for M odel 2, which was based on the scale points 0, 7.25 and 14.33, is 5.7 2 whereas the mean slope for M odel 1 is 20.82. The ratio is 3.6. Let the v ariances be and from the expressions and, The results in Table 2 8 agree with this development. For example the variance of the slope for M odel 2 is 0.306 and the variance for the slope for model 1 is 4.091 the ratio is 13.37 which is approximately
39 For each method of coding time, the model implied achievement mean was computed for each occasion. In the 3 rd and 4 th methods individually varying times of measurement were used. To calculate model implied means under these methods means of time scores used (i.e 7.25 and 14.33, for the 3d and 4 th method respectively). These results are reported in Table 2 9 and indicates that unconditional LGM model with four different time coding strategies produced appreciably same model implied means at the three time points. I further invest igated differences in model fit information MLR estimation procedure produc es only Akaike (AIC) and Bayesian (BIC) fit indices. Loglikelihood, AIC and BIC values for all four models presente d in Table 2 10 The Akaike Information Criterion (AIC) is where k re presents the number of parameters in the model and L is the maximized value of the likelihood function for the estimated model The value of AIC provided by model is not directly interpretable. Instead of focus ing on the magnitude of AIC, the model with the smallest AIC can be selected as the most ade quate mod el among compared models. In my stud y, M odel 4 provided the smallest AIC value The Bayesian Information Criterion (BIC) is BIC measures are used to compare the fit of models estimated from the sa me sample. Results showed t hat M odel 4 has slightly smaller BIC values. However, all four models can be assumed essentially equivalent due to very small differences among the loglikelihood, AIC and BIC for the four coding methods
40 Table 2 8 Variances of Parameter Estimates _________ ____________________________________________________________________ Parameter Model 1 Model 2 Model 3 Model 4__ Intercept 233.020 (17.21) 232.360 (17.19) 232.810 (17.26) 233.07(17.24) Linear Slope 4.0 91 (0.87) 0.306 (0.07) 0.306 (0.7) 4.15(0.86) Residual 63.500 (3.80) 63.370 (3.80) 62.850 (3.77) 62.70(3.72) _____________________________________________________________ ______________ __ Note: Standard errors are in parenthesis. Table 2 9 Estimated Mean of IRT Based Math Scores ( ) ________________________________________________________________ _________ Term Observed Model1 Model2 Model3 Model4 Spr. 1 s t 59.98 59.99 59.99 60.00 60.01 Spr. 3 rd 95.28 94.79 94.79 94.75 94.82 Spr. 5 th 116.79 11 5.91 115.88 116.01 116.01 ______________________________________________ ____________________ ______ Table 2 10. Model Fit Information for Unconditional LGM _____________________________________________________________________________ Estimations Model 1 Model 2 Model 3 Model 4__ Loglikelihood 93567.73 93574.20 93574.20 93501.92 AIC 187149.45 187162.40 187051.71 187017.83 BIC 187198.19 187211.14 187100.45 187066.57 Adjusted BIC 187175.95 187188.90 187078.20 187044.33 ____________________________________________________________________________ For the second step, I examined conditional LGM models and Morgan`s intercepts and slopes as outcomes (ISAO) model. Table 2 11 shows the parameter estimates for the equation for the intercept term. Significance levels of predictors at were the same for all five models. With two exceptions, repeat kindergarten and reading difficulty, estimates obtained by using ISAO and LGM were essentially the same. Differences in these two parameter estimates might be due to slight difference between samples. Note
41 that, because intercept associated with time point 0 with all four time coding s trategies, parameter estimates for LGM we re roughly identical. Table 2 12 shows parameter estimates for the slope term. There are some differences betwe en ISAO and its equivalent LGM M odel 1. In order to see if these differences occurred due to different software programs or estimation proced ur es, the same sample used for M odel 1 was implemented in HLM6 software. HLM6 produced exact ly the same results with LGM M odel 1. Again I suspect that differences between Morgan`s model and LGM might be due to different samples. Table 2 11 Comparison of par ameter estimates for the intercept between Morgan's ISAO model and the conditional LGM model with different time coding combinations. Intercept Model2 Model3 Model4 Mean 60.80* 59.72* 59.71* 59.71* 59.71* D10 11.74* 9.50* 9.48* 9.46* 9.47* D01 17.51* 18.07 18.05* 18.04* 18.05* D11 19.69* 18.14* 17.98 17.84* 17.85* Age in months 0.72* 0.41* 0.42* 0.42* 0.42* SES 5.46* 5.06* 5.05* 5.08* 5.09* White 6.24* 6.87* 6.86* 6.86* 6.86* Female 4.19* 4.01 4.08* 4.09 4.09* Repeat kindergarten 3.81 4.63 4.66 4.65 4.64 Reading difficulty 0.12 1.68 1.68 1.73 1.73 Approaches difficulty 6.44* 7.01* 7.06* 7.01 7.01 IEP 6.70* 7.70 7.69* 7.72* 7.71* *p<.05 The main purpose of the study is to examine effect of different time coding strategies in LGM estimations Similar to the uncon ditional LGM results, pairs of M od el 1/Model 4 and Model 2/M odel3 produced roughly same estimations. Differences in these two pairs can also be e xplained based on Equation 2 23 and 2 24. Coefficients for Model 1 and 4 appro ximately 3.6 times larger than coefficients for Model 2 and 3.
42 Fit information for conditiona l LGMs is presented in table 2 13 Consistent with the u nconditional fit information, M odel 4 has slightly smaller values which indicate better fit. However, all f our models can be assumed essentially same due to small differences. Table 2 12 Comparison of parameter estimates for the linear slope between Morgan's ISAO model and the conditional LGM model with different time coding combinations. Linear Slope Mo 1 Model 2 Model 3 Model 4 Mean 20.55* 21.21* 5.83 5.81* 20.80* D10 1.30* 1.30* 0.36* 0.35* 1.26* D01 1.29* 0.62 0.17 0.19 0.67 D11 1.96* 1.96* 0.55* 0.56* 1.99* Age in months 0.20* 0. 17* 0.05* 0.05* 0.20* SES 0.53* 0 .46* 0.13* 0.14* 0.48* White 1.35* 0.77* 0.21* 0.21* 0.73* Female 0.75* 1.34* 0.37 0.37* 1.33* Repeat kindergarten 0.62 2.15* 0.60* 0.60* 2.13* Reading difficulty 0.50 0.48 0.13 0.14 0.48 Approaches difficulty 0.55 0.06 0.02 0.03 0.08 IEP 0.06 0.84 0.23 0.23 0.82 Mean of Curvature 1.77* 1.7 1* 0.13* 0.13* 1.60* *p<.05 Table 2 13 Model Fit Indices for Conditional LGM _____________________________________________________________________________ Est imations Model 1 Model 2 Model 3 Model 4__ Loglikelihood 112158.20 112147.77 112093.66 112077.20 AIC 224392.39 224371.54 224263.33 224230.40 BIC 224656.99 224636.14 224527.92 224495.00 Adjusted BIC 224536.23 224515.38 224407.17 224374.25 _____________________________________________________________ _________________ Discussion This comparison based study examined the impa ct of omitting the inter individual time heterogeneity on measurement occasions in LGM. All models resulted in the same fit, which indicates that u sing heterogeneous time points (i.e., exact distance between as sessment dates for each subject) instead of fi xed time points could not provide a
43 more adequate fit for the data. Differ ences in parameter estimates were explained by a mathematical expression, which indicated that these differences occurred due to time point scales. However it is still a question if using inter individually varying time points would change the model fit and parameter estimates for datasets which have relatively more varied measurement dates. In order to see if increased heterogeneity in time points would change the estimates in an unc onditional linear LGM, a simulation study was conducted.
44 CHAPTER 3 SIMULATION STUDY In this chapter, the design of the simulation study and the followed procedures are described. The factors of the interest are range of assessment time points, distribution of assessment time points and sample size. A Monte Carlo simulation technique was used to investigate of the effects of the factors on the parameter estimates and model fit indices. With this simulation study, I aimed t o understand under what conditions researchers should consider to take into account the exact number of days between assessments for each individual rather than using fixed time points to represent assessment waves when they analyze a LGM. A total of nine different conditions were created to represent assessment time points. However the mean assessment time did not differ and each condition has mean values of 0, 2 and 4. Hence, in a real life situation, a researcher could decide to ignore the variation in measurement occasion and fit a LGM using the mean assessment times. Design The design of the simulation had three between subject factors. These factors were range of assessment time points, distribution of assessment time points and sample size. The range of the assessment time points had three levels, narrow, moderate and wide. Narrow ranged assessment time points were created based on ECLS K calendar dates of each assessment. This condition included zeros for all cases as the first assessment wave time p oints, values between 1.79 and 2.26 for all cases as the second assessment wave time points and values between 3.77 and 4.23 for all cases as the third assessment wave time points. As explained in chapter 2, these
45 values were obtained by dividing the numbe r of days between assessments by the constant number 358.The moderate range time point condition also included zeros for all cases as the first assessment wave, but twice the range as in the narrow range condition for other assessment points; so, the secon d assessment period ranged between 1.58 and 2.52 and the third period ranged between 3.54 and 4.46. The wide range time points included zeros for initial status, values between 1 and 3 for the second assessment and values between 3.01 and 5 for the third a ssessment period. Increasing the range allowed me to increase the heterogeneity in time points assigned for each assessment wave. Standard deviations of the time scores for both second and third assessment waves were 0.13, 0.26 and 0.58 for narrow, moderat e and wide range respectively. Table 3 1 shows the characteristics of the distributions of measurement occasions. Table 3 1. Characteristics of assessment time points distributions Range Measurement Min. Max. Mean SD Narrow Occasion 2 1.79 2.26 2 0.13 Oc casion 3 3.77 4.23 4 0.13 Moderate Occasion 2 1.58 2.42 2 0.26 Occasion 3 3.54 4.46 4 0.26 Wide Occasion 2 1 3 2 0.58 Occasion 3 3.01 5 4 0.58 SD=Standard deviation The second between subject factor was the distribution differences of the time point s. Three levels were created for this factor; uniform, moderately skewed and extremely skewed. Range, mean and standard deviations of time points were the same across all distributions. A uniform distribution is the simplest continuous distribution in prob ability. It has constant probability density on an interval (a, b) and zero probability
46 density elsewhere. The distribution is specified by a lower limit (a) and an upper limit (b). Its probability density function is: (3 1) In a real life situation, this distribution would be created for example, if an assessment team collects data for approximately the same number of children in each week during an assessment occasion. Moderately skewed and extremely skewed distribut ions would represent the attempt to collect as many measurements as possible at once, with only a few measurements being collected later. In order to create skewed distributions, the Fleishman (1978) power transformation was applied to the normal distribut ion Using the polynomial Equation 3 2, a normally distributed X variable can be transformed into a skewed Y variable. (3 2) The mean and variance of the X variable is known priori, but in order to have the desi red levels of skewness and kurtosis mean and variance for the Y variable, one needs to solve the equation with different set of constants. Each set of constants creates different distributions, thus different skewness and kurtosis values. This transformati on is the standard in simulation studies to simulate skewed data, because it allows precise control of the skew and kurtosis values. In my study, a moderately skewed distribution had skewness value of 1.25 and kurtosis value of 2.75 and an extremely skewed distribution had skewness value of 1.75 and kurtosis value of 3.75 were used. In order to get these specific skewness and kurtosis values, the set of
47 constants in Equation 3 2 were obtained from the table provided by Fleishman (1978). Histograms for the d istributions are shown in Figure 3. The third between subject factor was sample size. This factor also included three levels, 200, 2000 and 8000. A sample size of 100 has been recommended as a minimum sample size for LGM by Hamilton, Gagne and Hancock (200 3). I decided to use a sample size of 200 to represent small sample sizes. Larger sample sizes of 2000 and 8000 were included in my simulation study because there is greater availability of large longitudinal data sets in the past decades (e.g ECLS K, NLS Y). My replication study was also built on a large sample size (N=7809). Overall, the simulation study contained 3x3x3=27 conditions and one thousand datasets were generated for each condition. Population parameters for the simulated datasets were obtained from the LGM study published by Biesanz, Deeb Sossa, Papadakis, Bollen&Curran (2004) The population parameters are shown in Table 3 2. Figure 3 1. Distributions for the third assessment wave time points in narrow range.
48 Table 3 2. P opulation Parameters _____________________________________________ Parameter Name Parameter Value Intercept 39.46 Slope 8.06 Intercept Variance 28.78 Slope Variance 8.20 Intercept Slope Covariance 12.44 Residua l Variance 1 8.32 Residual Variance 2 12.01 Residual Variance 3 57.17 ______________________________________________ Data Generation The model used to generate the data was (3 3) with Or in matrix terms (3 4) where y is a 3 x 1 vector containing scores of three assessment waves for individual i time points, is the 2 x 1 vector containing an intercept and linear slope and is a 3 x 1 vector of disturbances. All data simula tions were conducted using the R statistical software (R Development Core Team, 2009). A thousand datasets were simulated for each
49 combination of conditions, resulting in 27000 datasets. The following steps were used to simulate the data: Simulate A, a n x 3 matrix of measurement times. The A matrix contained zeros in the first column. The second and third columns contained different time point values for the various conditions but the mean values of 2 and 4 respectively were constant across conditions. Cr eate B, a 2 x 2 covariance matrix for and based on the population parameters. Create C, a n x 2 matrix based on B containing intercepts in the first column and slopes in the second column which were sam pled from a normal distribution with the means, variances and covariances as shown in Table 3 2 Create D, a n x 3 matrix where each column contains the intercepts (i.e., the first column of C) Calculate E, a n x 3 matrix of growth values created by pre m ultiplying A by the transpose of the second column of C. The E matrix included zeros in the first column and growth values in the second and third columns. Create F, a 3 x 3 diagonal covariance matrix of disturbances based on population parameters. Create G, a n x 3 matrix of disturbances for each measurement occasion. Each column of G had multivariate normal distribution with a mean of zero and variance based on F matrix. Calculate H = D+E+G, where H is a n x 3 matrix including individual y scores in each row, and first assessment scores in the first column, second assessment scores in the second column and third assessment values in the third column. Data Analysis A LGM with fixed measurement occasions was fit to the simulated datasets using the Mplus 5.2 (Muthen & Muthen, 2008) software to determine how much difference occurred between estimates and population parameters. Table 3 3 shows an example of the Mplus code. Mplus produced 1000 sets of results for each condition. Each set included estimates of re sidual variances, intercept mean and variance, slope mean and variance, and standard errors for each estimate. The results obtained also included a
50 variety of model fit information: the chi square fit statistics, the Akaike information criterion (AIC), th Lewis index (Tucker and Lewis,1973) TLI, the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR) The mean of the parameter estimates fro m 1000 iterations of each condition was calculated using SPSS software. Relative parameter biases for the 27,000 estimates of mean of intercepts, mean of slopes, residual variances, variance of intercept, variance of slope, and intercept slope covariance w ere calculated by using the following formula (Hoogland & Boomsma, 1998): (3 5) where is the mean of parameter estimates across replications of a condition and is th e population parameter. The relative standard error bias was calculated using the following formula: (3 6) where is the mean of estimated standard errors for and is the empirical standard error, calculated as the standard deviation of the estimates of Hoogland and Boomsma (1998), stated that relative biases between and relative standard er ror biases between are considered acceptable. I calculated the relative parameter bias for each parameter under each condition. Some of the relative bias estimations were unacceptable. For these parameters I conducted analyse s of variance (ANOVAs) to figure out which factors
51 affected the relative bias. In the ANOVA analyses, factors were range, distribution and sample size. The dependent variables were 27,000 deviations of the estimates from the population parameters, divided by the population parameters (i.e. relative deviations).In addition (eta squared) were calculated to compare the effects of factors. Table 3 3. Example of Mplus LGM Program _______________________________________ Variable: Names are y1 y3; Analysis: type=general; Model: Int slope | y1@0 y2@2 y3@4; ________________________________________ Results Convergence Rates and Improper S olutions Convergence rates were at least 99.9% for all conditions. Improper solutions occurred due to non positive definite residual covariance (theta) and latent factor covariance (PSI) matrices. The percentage of non positive definite solutions was near zero for the sample size of 8,000. For the sample size of 2000, only the conditions with a wide range of measurement occasions were associated with approximately 10% improper solutions. The conditions with a sample size of 200 had the highest percentage of improper solutions. Especially with the conditions that had a wide range of measurement occasions, th e improper solution rate reached 40%, in other range conditions the rate was approximately 25%. Table 3 4 shows the percentages of convergence and improper solutions for each condition. Relative Bias of Parameter Estimates and Standard E rrors Table 3 5 pre sents the relative bias of the estimates of the means of the intercept and slope, the covariance between intercept and slope, the residual variances, the
52 slope variance and the intercept variance for all 27 conditions. The relative bias of the estimates of the means of the intercept and slope and the covariance between the intercept and slope were acceptable in all conditions. Except for the residual variances for the first assessment, all other relative bias for the residual variances were positive, which means they were over estimated. Substantial relative biases for the slope and intercept variances were all negative, which means they were underestimated. Because these parameters showed unacceptable bias with some conditions, I conducted analyses of var iance (ANOVAs). Effect sizes were calculated based on ANOVA results, but values below .02 are not presented. The residual variance bias for the first assessment and intercept variance were all negative, however ANOVA results showe d that none of the for factors was larger than .02. These results imply that the substantial relative biases for the intercept variance and residual variance of the first assessment cannot be explained clearly with range, distrib ution or sample size differences. Residual variances for the second assessment were affected by range ( =.603), by distribution ( =.147) and also by range x distribution interaction ( =.092). Residual variance for the third assessment were affected by range ( =.412) and range x distribution interaction ( =.053). From the ANOVA results, it is noticeable that larger differences in resi dual variance estimates for the second and third assessments were associated mostly with the range factor. Results showed that, increasing the range of the measurement occasions caused an increase in bias. In other words, the level of overestimation was in creased when the range was wider. Distribution was the second effective factor and results showed that the uniform distribution of the measurement
53 occasions was associated with an increase in bias of the estimates. Extremely skewed distribution of the mea surement occasions moderated the effect of range. The interaction between range and distribution also affected the relative bias. Results showed that, residual variance estimates for the uniformly distributed conditions that had a wide range of measurement occasions had the largest bias. ANOVA for the bias in slope variance estimates indicated that range ( =.061), distribution ( =.088), were associated with differences in relative bias estimates. Also ther e was a significant interaction between the range and distribution factor ( =.001). The extremely skewed distribution of measurement occasions caused an increase in absolute value of the bias estimates and the effect was larger whe n the range was wide. The range factor also affected the relative bias for the slope variance. Results showed that absolute value of the bias increased with an increase in range of the measurement occasions. The sample size factor was not associated with r elative bias estimations. Relative bias for standard errors for all parameter estimates for the conditions simulated were acceptable and they presented in Table 3 6. Model Fit Mplus provided several model fit information including chi square statistic ( ). In my simulation study, LGMs were constructed on nine known parameters, and eight unknown parameters were estimated with the model. With one degree of freedom LGMs were over identified. The chi square statistic provides fit info rmation for over identified models and a statistically significant test statistic implies that the model specification does not exactly generate the means or the covariance matrix of the
54 5 for the chi square test statistic was used as the significance criteria. The calculation of the estimated chi square value for testing the simultaneous null hypotheses is (3 7) (3 8) w here N is the sample size, is the maximized value of the likelihood function, is the observed covariance matrix, is the model implied covariance matrix, represents the mean structure of the observed values, represents the model implied mean structure, and q is the number of observed values. The simultaneous null hypotheses being tested are; (3 9) The average chi square statistic across conditions and percentage of significant chi square tests for each condition are presented in Table 3 7. Results showed that the percentage of significant ch i square statistics was below .05 in 4 of the 27 conditions, indicating that the models fit well under these 4 conditions. Results showed that, on average, 8.9% of the chi square test statistics were significant under the conditions with sample size of 200 whereas 41% were significant with sample size of 2,000 and 59.6% were significant with sample size of 8,000. These results indicate that an increase in sample size was associated with an increase in percentage of significant chi square test statistics. R esults also showed that on average 21.2% of the chi square test statistics were significant under the conditions with narrow range measurement occasions, whereas 39.9% was significant for the moderate range conditions and 48.4% for the wide range condition s, indicating that an increase in range of measurement occasions
55 was associated with an increase in percentage of significant chi square test statistics. The percentage of significant chi square test statistics was not affected by the distribution. Howeve r, because chi square test statistic is very sensitive to small misspecifications as the sample size increases, researchers do not rely solely on this statistic to assess the model fit. Sample sizes in the simulation study were 200, 2000, 8000. The large r the sample size, the more likely the rejection of the model with chi square test statistic. With very large samples, even small differences between the observed means and covariance matrix and the implied mean and covariance matrix may be found significa nt. Other fit information including TLI, CFI, RMSEA, SRMR and AIC values, presented in Table 3 8. The TLI and CFI are two of the fit indices commonly reported in SEM studies (Bollen & Curran,2006). Equations for TLI and CFI, which includes the test statis tic for a baseline model, are (3 10) (3 11) where is the test statistic for the baseline model, is the degrees of freedom fo r the baseline model, is the test statistic for the hypothesized model and is the degrees of freedom for the hypothesized model. The TLI and CFI values generally range between zero to 1 and value of 1 sug gests an ideal fit. Values lower than 0.95 raise concerns about the adequacy of a model (Hu and Bentler, 1999). For the
56 simulated conditions TLI values varied between .991 and 1, indicating that models fit well. CFI values varied between .997 and 1, which also indicate that models fit well Other fit indices provided by Mplus are RMSEA and SRMR. Hu and Bentler (1999) suggest that values smaller than .06 for RMSEA and smaller than .08 for SRMR indicate a good fit. The formula for the RMSEA and SRMR are; (3.12) where N is the sample size, is the test statistic for the hypothesized model and is the degrees of freedom for the hypothesized model and (3.13) where p is the number of observed variables, is the observed covariances, donates the reproduced covariances, and and are the observed standa rd deviations. For the simulated conditions SRMR values were all smaller than .016, indicating that models fit well. RMSEA is more conservative than CFI and TLI (Leite, 2007). Results showed that under 4 out of 27 conditions, the models did not fit well ac cording to RMSEA. Conditions with 200, 2000 and 8000 sample sizes, wide range and extremely skewed distribution had RMSEA values of .069, .063, and .076 respectively. Also the condition with 2000 sample size, wide range and moderately skewed distribution h ad a RMSEA value of .068. The Akaike Information Criterion (AIC) provided by model is not directly interpretable. Instead of focus on magnitude of AIC, the model with the smallest AIC
57 can be selected as the most adequate model among compared models. AIC v alues estimated from the same sample sizes indicated that a decrease in the range of the measurement occasions was associated with a decrease in AIC values. For example, from the sample size of 200, average AIC values were 4091, 4114, and 4026 for the con dit ions with narrow, moderate, and wide range, respectively. It is also noticeable that within a sample size and range level, an increase in skewness was associated with a decrease in AIC values, indicating that conditions with extremely skewed distributio ns had smaller AIC values than conditions with moderately skewed and uniform distributions. Discussion My simulation study focused on whether an unconditional LGM can produce accurate results when conducted with fixed time points for measurement waves when there are inter individually varying differences in assessment dates. Using known population parameters which were taken from a published study, three waves of dependent scores were generated based on three different factors. The first of these three fact ors was sample sizes with three levels. Small, moderate and relatively large sample sizes of 200, 2000 and 8000 were evaluated to determine if they produced differences with respect to overall fit of the model or bias in parameter estimates and standard er rors. Results showed that sample size differences did not appreciably affect model fit, parameter estimates or standard errors. However, a small sample size of 200 was associated with non positive definite solutions. Almost 35% of the Mplus analyses of si mulated data sets produced improper solutions with a small sample size. When the sample size is not large enough, improper solutions may occur due to mere sampling fluctuation. Anderson and Gerbing (1984) explained how parameter matrices (Theta
58 Delta, The ta Epsilon, PSI and PHI) may be non positive definite through mere sampling fluctuation. The second factor of interest was range in individual assessment dates. This factor had three levels, narrow moderate and wide. Even though range was varied over leve ls, each range condition had the same mean values for time points, 0,2 and 4 for the first, second and third assessments respectively. These values were chosen to represent assessment waves for every two years. Narrowly ranging time points were simulated b ased on the ECLS K. Widely ranging time points captured the assessment dates right after first year and just before the third year for the second assessment dates. For the third assessment, dates between third year and fifth year were captured. Wider range s for the assessment dates than the ones considered in this study would be unrealistic. Even though there was a significant interaction between the range and distribution factor, the large eta squared values for the range effect deserves discussion. The ra nge factor substantially affected the residual variance estimates. Increased heterogeneity in individual assessment dates caused larger residual variances for the second and third assessment. Large residual variances indicate an increase in unexplained var iation of the dependent variable scores. These results showed that a larger portion of the dependent variable variation will remain unexplained if fixed time points are used when heterogeneity exists in assessment dates. In other words there will be a larg er amount of error variance when heterogeneity in measurement occasions is omitted in the analyses. The interaction between the range and distribution factor affected the residual variance estimates, and results showed that the absolute value of relative bias were larger for the conditions with widely ranging
59 assessment dates with a uniform distribution. However the bias of the residuals with a uniform distribution decreased as range changed from wide to narrow. In other words, skewness moderated the range effect. The smallest absolute value of relative bias for the second residual variance was obtained under the condition that had narrowly ranging assessment dates with an extremely skewed distribution. Moreover, the smallest absolute value of relative bias for the third residual variance was estimated under the condition that also had narr owly ranging assessment dates with a moderately skewed distribution. Under these two conditions, residual variances estimated with a high degree of accuracy. N egatively biased slope variance estimates were found in the simulation study. The interaction between the range and distribution factor affected the slope variance estimates. The largest negative bias was estimated for the condition with a sample size of 200 a wide range of assessment dates and extremely skewed distribution of assessment dates. The second largest negative bias was estimated also for a condition with a wide range of assessment dates and extremely skewed distribution. The absolute value of bias for t he slope variance with a wide range condition decreased as distribution change from extremely skewed to uniform. Relative bias estimates were not negative for the uniform distribution but with a sample size of 8000 This finding is consistent with an empir ical example results provided by Singer and Willet (2003). They did not use individually varying assessment dates as the time indicator but used beginning of the study varied between 72 and 84 months and caused heterogeneity in time points. They concluded that estimated variance components would be larger if
60 heterogeneity in time points was ignored and fixed time points were used. The reason is that LGM with fixed time points fits worse, and fixed time points introduce error into the analysis. There is more unexplained variation in individual slopes. However the distribution of age was not reported in their study. In my study the distribution factor had three levels, uniform, m oderately skewed and extremely skewed. Moderately skewed and extremely skewed distributions were weakly associated with slope variance estimates, and associated with underestimation of these values. Results showed that bias of the slope variance was large r when there was extreme skewness. In other words, estimated slope variances were smaller than the population parameter when there was skewness in the distribution of measurement occasion. For the simulated conditions, these findings are reasonable, becaus e with fixed time points the mean of the slope will be to the right side of most of the skewed distribution, and individuals to the left of the distribution will have similar slopes. This similarity might reduce the variance of the slope. As reported in th e results section, a few of the relative bias estimates for the intercept variances were unacceptable; however, calculated eta squared values were smaller than .02 indicating that effect of the simulated factors were very small. The mean intercept values w ere estimated without bias under all 27 conditions. These results were expected because in each condition intercepts were estimated independently from the manipulations for the measurement occasions. Mean slope values were also estimated without bias indic ating that omitting inter individually varying assessment points did not affect the mean slope estimations in this simulation study. Correct estimates for the mean slope might have occurred due to same mean values of
61 time points. Even with the range and di stribution manipulations, mean values of the measurement time points were same across all conditions; 0, 2 and 4 for the first, second and third measurement occasions respectively. The covariance between slope and intercept also estimated without bias acro ss all conditions. In all conditions, data sets were generated with correlation coefficient of .81 between intercept and slope. Chi square tests for the model fit showed that only four of the simulated conditions produced an adequate fit. Results showed that conditions with a small sample size and a narrow range for measurement occasions are more likely to provide acceptable model fit based on chi square test statistic. It is known that chi square statistic is conservative and researchers do not solely re port this statistic to decide the model fit in SEM studies. RMSEA fit indices for the models showed that conditions with the extremely skewed measurement occasions distribution did not provide an acceptable model fit. However these unacceptable RMSEA indic es were close to cut off criteria of .06. Furthermore, there was an inconsistency between RMSEA and AIC values. Results showed that AIC values were smaller for the extremely skewed distributions for all sample size and range conditions. CFI, TLI and SRMR v alues were all acceptable. Researchers generally make their final decision about the model fit based on multiple sources. My overall conclusion about the model fits for the simulated conditions is that omitting the inter individual heterogeneity in measure ment occasions does not cause serious problems.
62 Table 3 4. Percentages of convergence and improper solutions for each condition Sample Size Range Distribution Convergence Imp. Solutions 200 Narrow Uniform 100% 24.2% M. Skewed E. Skewed 100% 100% 24.3% 27.8% Moderate Uniform 100% 24.7% M. Skewed E. Skewed 100% 100% 33.5% 28.6% Wide Uniform 99.9% 43.1% M. Skewed E. Skewed 100% 100% 36.2% 44.5% 2000 Narrow Uniform 100% .003% M. Skewed E. Skewed 100% 100% .011 % .006% Moderate Uniform 100% .003% M. Skewed E. Skewed 100% 100% .013% .020% Wide Uniform 100% .070% M. Skewed E. Skewed 100% 99.9% .127% .094% 8000 Narrow Uniform 100% 0% M. Skewed E. Skewed 100% 100% 0% 0% Moderate U niform 100% 0% M. Skewed E. Skewed 100% 100% 0% 0% Wide Uniform 100% 0% M. Skewed E. Skewed 100% 100% 0% 16%
63 Table 3 5. Comparison of relative parameter bias estimates across conditions. Sample Size Range Distribution 200 Narrow Uniform .065 .153 .025 .035 .016 001 .011 .004 M. Skewed E. Skewed .018 .038 .083 .028 .001 .027 .021 .021 .005 .010 .001 .001 .008 .008 .012 .005 Moderate Uniform .187 .600 .015 .017 .044 .001 .001 .044 M. Skewed E. Skewed .111 .029 .157 .165 129 .066 .062 .046 .0 39 .015 .002 .001 .016 .015 .005 .017 Wide Uniform .023 2.088 .589 .017 .013 .001 .018 .019 M. Skewed E. Skewed .021 .236 1.080 .711 .251 .305 .077 .131 .019 .070 .001 .002 .027 .042 .031 .011 2000 Narrow Uniform .072 .170 .018 .02 2 .021 .001 .005 .011 M. Skewed E. Skewed .034 .032 .036 .031 .027 .026 .021 .021 .011 .010 .001 .002 .007 .008 .001 .001 Moderate Uniform .106 .543 .028 .039 .031 .001 .009 .014 M. Skewed E. Skewed .068 .048 .210 .178 .086 .070 .040 045 .021 .017 .001 .001 .014 .018 .002 .001 Wide Uniform .095 2.011 .454 .015 .027 .001 .001 .023 M. Skewed E. Skewed .176 .163 .981 .828 .347 .292 .096 .106 .052 .047 .001 .001 .033 .039 .007 .002 8000 Narrow Uniform .079 .179 .018 .024 .023 .001 .005 .012 M. Skewed E. Skewed .015 .019 .050 .036 .023 .022 .015 .020 .004 .005 .001 .001 .006 .008 .002 .004 Moderate Uniform .124 .554 .024 .036 .036 .001 .008 .020 M. Skewed E. Skewed .026 .075 .250 .170 .066 080 .029 .047 .008 .022 .001 .001 .012 .018 .007 .001 Wide Uniform .011 1.997 .436 .002 .004 .001 .003 .005 M. Skewed E. Skewed .115 .191 1.114 .799 .296 .301 .078 .116 .033 .055 .001 .002 .031 .041 .003 .003 : Res idual v ariance 1, :Slope variance, :Intercept variance, : Intercept mean, : Slope mean, :Covariance between inte rcept and slope
64 Table 3 6. Relative bias estimates for standard errors. Sample Size Range Distribution 200 Narrow Uniform .027 .009 .009 .022 .034 .007 .012 .004 M. Skewed E. Skewed .025 .003 .006 .008 .000 .007 .003 .007 .020 .000 .034 .041 .032 .014 .004 .003 Moderate U niform .030 .027 .014 .022 .028 .028 .030 .010 M. Skewed E. Skewed .034 .023 .026 .020 .020 .037 .004 .031 .029 .026 .033 .032 .011 .033 .002 .025 Wide Uniform .016 .004 .001 .011 .020 .007 .042 .022 M. Skewed E. Skewed .0 07 .011 .018 .056 .030 .042 .045 .017 .019 .052 .011 .019 .026 .017 .008 .001 2000 Narrow Uniform .009 .046 .020 .046 .000 .033 .009 .009 M. Skewed E. Skewed .009 .010 .007 .002 .007 .014 .006 .053 .002 .002 .036 .035 .056 .078 006 .022 Moderate Uniform .021 .023 .001 .007 .010 .031 .025 .000 M. Skewed E. Skewed .006 .002 .012 .006 .023 .031 .017 .006 .014 .052 .039 .038 .057 .052 .014 .007 Wide Uniform .004 .019 .031 .020 .003 .032 .079 .030 M. Skewed E. Ske wed .039 .000 .017 .053 .035 .007 .003 .004 .014 .012 .032 .032 .076 .064 .049 .022 8000 Narrow Uniform .018 .057 .046 .002 .032 .033 .039 .042 M. Skewed E. Skewed .033 .045 .041 .035 .018 .018 .021 .017 .011 .022 .035 .035 .076 .078 .008 .006 Moderate Uniform .033 .038 .024 .039 .030 .031 .053 .014 M. Skewed E. Skewed .036 .020 .001 .010 .020 .001 .003 .016 .031 .042 .035 .035 .074 .078 .002 .024 Wide Uniform .013 .014 .021 .023 .015 .032 .037 .026 M. Skewed E. Skewed .008 .053 .003 .063 .026 .042 .027 .030 .018 .016 .032 .032 .058 .070 .005 .055 : Residual variance 1, :Slope variance, :Intercept variance, : Intercept mean, : Slope mean, :Covariance between intercept and slope
65 Table 3 7. Chi square statistic results across conditions Sample Size Range Distribution Average Ch i Sq. P(Chi.Sq<).05 200 Narrow Uniform 1.345 9.2% M. Skewed E. Skewed 1.005 1.059 3.9% 5.5% Moderate Uniform 2.051 17.0% M. Skewed E. Skewed 1.409 1.115 10.0% 6.1% Wide Uniform .515 .5% M. Skewed E. Skewed .698 2.575 2.0% 25.6% 2000 Narrow U niform 3.370 33.7% M. Skewed E. Skewed 1.291 1.411 7.8% 9.7% Moderate Uniform 6.462 67.9% M. Skewed E. Skewed 2.596 2.054 26.0% 17.2% Wide Uniform 2.419 21.9% M. Skewed E. Skewed 10.908 9.737 94.4% 90.1% 8000 Narrow Uniform 13.113 94.5% M. Skewed E. Skewed 1.398 1.962 9.8% 16.6% Moderate Uniform 29.195 100% M. Skewed E. Skewed 2.554 10.914 24.6% 90.2% Wide Uniform .751 2.5% M. Skewed E. Skewed 14.462 47.293 98.5% 100%
66 Table 3 8. Comparison of model fit information across c onditions. Sample Size Range Distribution CFI TLI RMSEA SRMR AIC 200 Narrow Uniform .998 .998 .034 .009. 4098 M. Skewed E. Skewed .999 .999 .999 .999 .025 .026 .008. .008 4087 4088 Moderate Uniform .997 .993 .056 .012 4130 M. Skewed E. Skewed .99 8 .999 .997 .999 .036 .027 .010 .009 4108 4103 Wide Uniform .999 1.000 .010 .007 4272 M. Skewed E. Skewed .999 .995 1.000 .989 .016 .069 .007 .016 4183 4164 2000 Narrow Uniform .999 .998 .027 .005 40899 M. Skewed E. Skewed .999 .999 .999 .999 .011 .011 .003 .003 40806 40799 Moderate Uniform .999 .996 .047 .007 41278 M. Skewed E. Skewed .999 1.000 .999 .999 .022 .017 .004 .004 41017 40956 Wide Uniform .999 .999 .022 .005 42523 M. Skewed E. Skewed .997 .998 .992 .993 .068 .063 .011 .010 418 05 41641 8000 Narrow Uniform .999 .998 .037 .005 163602 M. Skewed E. Skewed .999 .999 .999 .999 .006 .008 .001 .002 163231 163153 Moderate Uniform .998 .995 .058 .008 165073 M. Skewed E. Skewed .999 .999 .999 .999 .011 .033 .002 .005 164099 163852 Wide Uniform .999 1.000 .003 .001 170004 M. Skewed E. Skewed .999 .997 .997 .991 .040 .076 .006 .012 167363 166474
67 CHAPTER 4 CONCLUSION In the educational research, it is important to accurately assess individual changes in a behavior or sk ill. It is also valuable to understand which factors are related to these changes. In order to examine individual growth in detail, large and complex longitudinal studies have been conducted in last decades. One of the popular methods to analyze longitudin al data sets is LGM. Individual observations are defined as a function of time in a LGM. Hence, to accurately define the time indicators in a LGM is important. There are two commonly used approaches to create time indicators; using the ages of the particip ants or the measurement waves. The literature indicates that time coding ages at the measurement occasions. However, in educational research it is common to use measuremen t waves as the time points, (i.e. spring assessment, fall assessment). When using the measurement waves as the time indicators, it is ideal to assess each participant at the same time; unfortunately; this is impractical with large studies. The literature i ndicates that in the applied LGM studies, researchers tend to ignore heterogeneity in measurement dates and assume each child assessed at the same time. In this thesis, I examined the effects of omitting the inter individually varying measurement dates in LGM. The ECLS K dataset provides exact dates of measurement for each student. Morgan, Farkas and Wu (2009) published a study in which they examined mathematical growth for a subsample taken from ECLS K. They made the assumption that each individual was as sessed at the same time within a measurement wave. Based on their study, a conditional quadratic growth model was constructed and examined with
68 different time coding strategies in chapter 2. I used the exact number of days between assessments to create tim e points for each child in order to take into account the heterogeneity in measurement dates. With the particular dataset, results showed that there were no appreciable differences in compared models. Model fit was essentially the same across models and di fferences in the parameter estimates did occur due to different time scales but not due to heterogeneity in assessment dates. A mathematical explanation was provided for the differences in parameter estimates. These results indicate that researchers can om it the heterogeneity in assessment dates when working on ECLS K data set, because heterogeneity in these dates is not large enough to affect estimates. In Chapter 3, a simulation study was conducted to see if different range of the measurement occasions wo uld cause biased estimates in an unconditional linear growth model. The results of this thesis support the conclusion that, there will be a larger amount of error variance when heterogeneity in measurement occasions is omitted. In other words, a larger por tion of the dependent variable variation will remain unexplained especially if measurement occasions have a wide range. Based on these results, I recommend that researchers should avoid using the fixed time points approach if there is a large amount of het erogeneity in assessment dates. In Chapter 3, effects of distribution differences of the measurement occasions were also examined. Uniform distributions of measurement occasions represented a situation in which the same number of participants is assessed in each attempt within an assessment wave. Skewed distributions represented a situation in where most data is collected during a short interval of time. These are realistic situations, and results
69 showed that omitting the heterogeneity might cause incorrec t estimates for the slope variance if occasions have a skewed distribution. Another conclusion that can be drawn from this thesis is that a small sample size resulted in approximately 35% improper solutions in simulated conditions. It is also noticeable that, even though biased estimates were produced with fixed time points approach, all models provided acceptable model fit indices for the simulated data sets. However the chi square statistic indicated lack of exact fit in most conditions. Results showed that having acceptable model fit values do not justify using fixed time points approach. Moreover, consistent with the literature, my empirical study based on Morgan, Farkas and Wu (2009), support the conclusion that models constructed on person specific time points of measurement yield more adequate fit to the data. The overall conclusion of this thesis is that, researchers should estimate their growth models with person specific time points when the variability of time points is large. If the heterogenei ty in assessment dates is not large, there should be no substantial differences in estimates and model fit information between the models estimated with fixed time points and person specific time points. At this point, fixed time points approach could be p referred when using Mplus software, because Mplus does not provide some of the fit information for the models estimated with person specific time points. One limitation of this thesis is that my simulation study focused only on unconditional linear growth models. Future research is needed to investigate the effects of omitting the heterogeneity in assessment dates in conditional growth models. Both time invariant and time varying predictors should be included in these conditional
70 models. It is also necessa ry to investigate non linear growth models. Another limitation is that, I used only 3 waves of measurement for the dependent variable. This is the minimum number that a LGM requires. Future research might focus on four or more waves.
71 APPENDIX CALCULAT ING THE NUMBER OF DA YS WITH SPSS compute date1=DATE.DMY(c3asmtdd,c3asmtmm,c3asmtyy). formats date1(DATE11). variable widht date1(11). compute date2=DATE.DMY(c4asmtdd,c4asmtmm,c4asmtyy). formats date2(DATE12). variable widht date2(12). compute elapse12=(d ate2 date1)/86400. e xecute. To create four time points. compute fix = 0. compute fix1 = 2 compute fix2 = 4. compute fixnday = 0. compute fixnday1 = 7.25. compute fixnday2 = 13.44. compute randomnday = 0. compute randomnday1 = elapse12/100. compute random nday2 = elapse14/100. compute random= 0. compute random1=elapse12/358. compute random2= elapse14/358. execute.
72 LIST OF REFERENCES Baer, J., & Schmitz, M. (2000). Latent growth curve modeling with a cohort sequential design. Social Work Research, 24, 243 Biesanz, J., Deeb Sossa, N., Papadakis, A., Bollen, K., & Cu rran, P. (2004). The role of coding time in estimating and interpreting growth curve m odels. Psychological Methods, 9, 30 52. doi:10.1037/1082 989X.9.1.30. Blozis, S. & Cho, Y. (2008). Coding a nd centering of time in latent curve models in the p resence of interindividual time h eterogeneity. Structural Equation Modeling, 15, 413 433. doi:10.1080/10705510802154299. Bodovski, K., & Farkas, G. (2007). Mathematics growth in early elementary school: T he roles of beginning knowledge, student engagement, and instruction. The Elementary School Journal, 108, 115 130. doi:10.1086/525550. Bollen, K., & Cur ran, P. (2004). Autoregressive latent trajectory (ALT) models a synthesis of two t raditions. Sociologica l Methods & Research, 32, 336 383. doi:10.1177/0049124103260222. Bollen, K.A., & Curran, P.J. (2006). Latent curve models : A structural equation approach Hoboken, NJ: Wiley. ISBN: 047145592X Raudenbush, S. W., & Bryk, A. S. (2002).Hierarchical linear mode ls: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Cheadle, J. (2008). Educational investment, family context, and children's math and reading growth from kindergarten through the third grade. Sociology of Education 1 31. Retri eved from PsycINFO database. Collins, L., & Sayer, A. (2001). New methods for the analysis of change Washington, DC US: American Psychological Association. doi:10.1037/10409 000. Davison, M. (2008). Hypothesis generation in latent growth curve modeling us ing principal components. Educational Research & Evaluation 14 321 334. doi:10.1080/13803610802249498. DiPerna, J., Pui Wa, L., & Reid, E. (2007). Kindergarten predictors of mathematical growth in the primary grades: an investigation using the early chil dhood longitudinal study kindergarten c ohort. Journal of Educational Psychology 99 369 379. Dunca n, T., & Duncan, S. (2004). An introduction to latent growth curve m odeling. Behavior Therapy 35 333 363
73 Duncan, T., Duncan, S., Strycker, L., Li, F., & Al pert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications Mahwah, NJ US: Lawrence Erlbaum Associates Publishers. Dunn,G., Everitt, B., & Pickles, A.(1993). Modeling covariances and latent variables using EQS London: Chapman & Hall. Farrell, A., Sullivan, T., Esposito, L., Meyer, A., & Valoi s, R. (2005). A latent growth curve analysis of the structure of aggression, drug use, and delinquent behaviors and their interrelations over time in urban and rural a dolescents. Journal of Research on Adolescence (Blackwell Publishing Limited) 15 179 203. doi:10.1111/j.1532 7795.2005.00091.x. Gottman, J., & Rushe, R. (1993). The analysis of change: Issues, fallacies, and new ideas. Journal of Consulting & Clinical Ps ychology 61 907 Gresham, F. & Elliott, S. (1990). Social skills rating s ystem Circle Pines MN: American Guidance Service. Hancock, G., Kuo, W., & Lawrence, F. (2001). An illustration of second order latent growth m odels. Structural Equation Modeling 8 470 489. Hancock, G. R., & Lawrence, F. R. (2006). Using latent growth models to evaluate longitudinal change. In G. R. Hancock & R. O. Mueller (Eds.), Structural Equation Modeling: A Second Course Greenwood, CT: Information Age Publishing, Inc. Hertzog, C., & Nes selroade, J. (2003). Assessing psychological change in adulthood: an overview of methodological i ssues. Psychology & Aging 18 639 657. doi:10.1037/0882 7918.104.22.1689. Hong, G., & Rau denbush, S. (2005). Effects of kindergarten retention policy on children's cognitive growth in reading and m athematics. Educational Evaluation and Policy Analysis 27 205 224. doi:10.3102/01623737027003205. Kap lan, D. (2002). Methodological advances in the analysis of individual growth with relevance to educ ation p olicy. PJE. Peabody Journal of Education 77 189 215. http://search.ebscohost.com.lp.hscl.ufl.edu, doi:NO_DOI Kim, H. (2008). An analysis of developmentally appropriate and culturally responsive practices and the learning trajectories of kindergart en, first grade, and third grade children from ECLS k: Teachers' beliefs and practices as mediators. Dissertation Abstracts International Section A 68, 4965. Lawrence, F., & Hancock, G. (1998). Assessing change over time using latent growth modeling. Mea surement & Evaluation in Counseling & Development (American Counseling Association) 30 211
74 Lee, E. (2008). A latent growth curve analysis of the impact of school mobility on the reading scores of poor and non poor children in the U.S. Dissertation Abstra cts International Section A 69, 874. Leite, W. (2007). A comparison of latent growth models for constructs m easure d by m ultiple Items. Structural Equation Modeling 14 581 610. doi:10.1080/10705510701575438. Luo, Z. (2003). Children's mathematical develo pment from kindergarten to first grade: Identifying early predictors for success. Dissertation Abstracts International 64, 443. Marks, A., & Coil C. (2007). Psychological and demographic correlates of early academic skill development among American Indi an and Alaska native youth: A growth modeling s tudy. Developmental Psychology 43 663 674. doi:10.1037/0012 1622.214.171.1243. McArdle, J. & Epstein, D. (1987). Latent growth curves within developmental structural equation m odels. Child Development 58 110. doi:10.1111/1467 8624.ep7264163. Mckinnon, M. (2002, January). Latent growth curve and multilevel analysis of reading acquisition. Dissertation Abstracts International Section A 63, 2133. Mehta, P., & West, S. (2000). Putting the individual back into in dividual growth curves. Psychological Methods 5 23 43. doi:10.1037/1082 989X.5.1.23. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika 55 107 122. doi:10.1007/BF02294746 Morgan, P., Farkas G., & Qiong, W. (2009). Five year growth trajectories of kindergarten children with learning difficulties in m athematics. Journal of Learning Disabilities 42 306 321. Muthen, B., & Siek toon, K. (1998). Longitudinal studies of achievement growth using latent variable modeling. Learning & Indi vidual Differences 10 73. Perez Johnson, I. (2008). Parsing Hispanic White achievement gaps: The influence of individual, family, and school factors on mathematics achievement differences in the elementary grades. Dissertation Abstracts International Sec tion A 69, 1281. rowth curve modeling: Concepts, issues, and a Structural Equation Modeling 16 186 190. Raudenbush, S.W. (2001). Toward a coherent framework fo r comparing trajectories of change. In Collins, L. M. & Sayer, A. G. (Eds.) New methods for the analysis of change (pp. 33 64). Washington, DC: APA.
75 Rovine, M., & Molenaar, P. (2001). A structural equations modeling approach to the general linear mixed mod el. New methods for the analysis of change (pp. 67 98). Washington, DC US: American Psychological Association. doi:10.1037/10409 003 Singer J. & Willett. J (2003) Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence New York: Oxford Un iversity Press. DM000623770UO Sukkyung, Y., & Sharkey, J. (2009). Testing a developmental ecological model of student engagement: a multilevel latent growth curve analysis. Educational Psychology 29 659 684. doi:10.1080/01443410903206815. Wen Jui, H. (20 08). The academic trajectories of children of immigrants and their school e nvironments. Developmental Psychology 44 1572 1590. http://search.ebscohost.com.lp.hscl.ufl.edu, doi:10.1037/a0013886 Willett, J. (1997). Measuring change: What individual growth modeling buys you. Change and development: Issues of theory, method, and application (pp. 213 243). Mahwah, NJ US: Lawrence Erlbaum Associates Publishers. http://search.ebscohost.com.lp.hscl.ufl.edu Willett, J. (2004). Investigating individual change and d evelopment: The multilevel model for change and the method of latent growth modeling. Research in Human Development 1 31 57. Willett, J., & Keiley, M. (2000). Using covariance structure analysis to model change over time. Handbook of applied multivariate statistics and mathematical modeling (pp. 665 694). San Diego, CA US: Academic Press. doi:10.1016/B978 012691360 6/50024 0. Willett, J., & Sayer, A. (1994). Using covariance and structure analysis to detect correlates and predictors of individual change.. Psychological Bulletin 116 363. Woodcock, R., McGrew, K., & Werder, J. (1994). Woodcock McGrew Werder Mini Battery of Achievement Xitao, F., & Xiaotao, F. (2005). Power of latent growth modeling for detecting linear growth: Number of measurements and co mparison with other analytic a pproaches. Journal of Experimental Education 73 121 139 Vagi, S. (2008). Socioeconomic status and achievement in math and reading in kindergarten through elementary school: The role of social capital. Dissertation Abstracts International 68, 7005. Voelkle, M., Wittmann, W., & Ackerman, P. (2006). Abilities and skill acquisition: A latent growth curve approach. Learning & Individual Differences 16 303 319. doi:10.1016/j.lindif.2006.01.001.
77 BIOGRAPHICAL SKETCH Burak A ydin was born in Ista nbul, Turkey. He received his b achelors in a rts degree in mathematic s education from Kocaeli University, Turkey. He served for the Turkish Government as a mathematics teacher for seven months. He later qualified for a scholarship to study abr oad and in the fall of 2008, enrolled for graduate studies in the Department of E ducational Psychology at the University of Florida and will receiv e his master of art in education in r esearch and e valuation m ethodology form the Department of Ed ucational Psychology in August 2010.