UFDC Home  myUFDC Home  Help 



Full Text  
PAGE 1 1 A MULTIVARIATE GENERALIZABILITY ANALYSIS OF STUDENT STYLE QUESTIONNAIRE By YOUZHEN ZUO A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN EDUCATION UNIVERSITY OF FLORIDA 2007 PAGE 2 2 2007 Youzhen Zuo PAGE 3 3 To my Mom PAGE 4 4 ACKNOWLEDGMENTS First, I thank my advisor and committee chair Dr. David Miller for his invaluable guidance, feedback and encouragement throughout each semester. I also thank my committee member Dr. Thomas Oakland for providing me the data source and his quick feedback to my questions. Without their assistance, th is document is impossible. Next, I give my special thanks to Elaine Green and Linda Parsons wh o assist me in the track of my graduate study and their smiling faces make me feel like home in this department. Last, I deeply thank my husband who always believes in me and supports me for my graduate career and my two children (Yuanze and Bill) who bring a lot of fun to my life. PAGE 5 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ............................................................................................................... 4LIST OF TABLES ...........................................................................................................................7LIST OF FIGURES .........................................................................................................................8ABSTRACT ...................................................................................................................... ...............9 CHAP TER 1 INTRODUCTION .................................................................................................................. 10Classical Test Theory .............................................................................................................10Generalizability Theory ....................................................................................................... ...11Student Style Questionnaire ...................................................................................................132 LITERATURE REVIEW .......................................................................................................14Generalizability Theory Overview ......................................................................................... 14Variance Components .....................................................................................................15Universe of Admissible Ob servations and Facets ........................................................... 16Generalizability (G) Studies and Decision (D) Studies ................................................... 17Generalizability Coefficient and Dependability Index .................................................... 17Random and Fixed Facets ............................................................................................... 19Crossed and Nested Facets .............................................................................................. 20Multivariate Generalizability Theory Overview .................................................................... 213 RESEARCH DESIGN AND METHODOLOGY ..................................................................24Student Styles Questionnaire Data .........................................................................................24Multivariate Design and Research Questions ......................................................................... 25Multivariate Generalizability Studies ..................................................................................... 26Estimating Variance Components ................................................................................... 26Estimating Covari ance Components ...............................................................................28Variance and Covariance Matrix ..................................................................................... 28Disattenuated Correlations ..............................................................................................29Multivariate Decision Study ................................................................................................... 29SSQ Profiles ............................................................................................................................30Composite Universe Score Variance, Rela tive and Absolute Score Variance ................ 31GCoefficient and Dependability Index for SSQ Profiles ...............................................31Data Analysis Software for This Study ..................................................................................32 PAGE 6 6 4 RESULTS ....................................................................................................................... ........33Multivariate Generalizability Study Results ........................................................................... 33Multivariate Decision Study Results ...................................................................................... 35Multivariate G and D Study Results for SSQ Profiles ........................................................... 375 DISCUSSION AND CONCLUSION .................................................................................... 41Discussion of the Results ..................................................................................................... ...41Limitations of This Study ..................................................................................................... ..43Closing Remarks .....................................................................................................................43LIST OF REFERENCES ...............................................................................................................44BIOGRAPHICAL SKETCH .........................................................................................................46 PAGE 7 7 LIST OF TABLES Table page 41 Anova for G study p i design for EI in case 1 .................................................................. 3842 Anova for G study p i design for PM in case 1 ................................................................3843 Anova for G study p i design for TF in case 1 ..................................................................3844 Anova for G study p i design for OL in case 1 .................................................................3945 Variance and covariance components for uni verse Scores of SSQ data in case 1 ............ 3946 The G study variance and covariance components for person in SSQ .............................. 3947 The G study variance components for item and personbyitem in SSQ .......................... 3948 Reliability / dependability estim ates for SSQ subscale Dstudies ..................................... 40 PAGE 8 8 LIST OF FIGURES Figure page 21 Relative and absolute error for a random p i design ....................................................... 2331 Representation for a multivariate p i design. ................................................................32 PAGE 9 9 Abstract of Thesis Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Arts in Education A MULTIVARIATE GENERALIZABILITY ANALYSIS OF STUDENT STYLE QUESTIONNAIRE By Youzhen Zuo December 2007 Chair: David Miller Major: Research and Evaluation Methodology The Student Style Questionnaire (SSQ) is us ed to measure a students temperament. In this study, the method of multivariate generalizability theo ry is applied to assess the reliability of the SSQ. In particular random effect variance an d covariance components were estimated in the Generalizability (G) study. Meanwhile, generalizabil ity coefficients for four subscales and the total scale also were estimated in the subsequent Decision (D) study. The results showed that the generalizability coefficients (reliability) were acceptable for the total scale and two of the subscales. PAGE 10 10 CHAPTER 1 INTRODUCTION In education and psychology, tests are us ed to measure examinees abilities and attitudes. One issue that test developers a nd users concern about is the reli ability of the test. Reliability refers to the consistent replic ation of a measurement procedure across conditions, that is, a consistent score for an examinee would obtained ove r different test occasi ons, over parallel test forms, or over a set of raters. Reliability issue is so important that Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1985) required the test developers and users to obtain and report ev idence concerning reliability and errors of measurement as a standard practice. Accordingl y, investigators have developed methods that allow us to look into the measurement and target the error(s). To inves tigate the reliability, traditionally, we use Classical Test Theory (CTT), where an observed score is decomposed as a true score and an error score. However, one limitation of CTT is that only one undifferentiated error term can be specified at one time. Generalizability Theory (GT) was developed later to overcome this limitation and account for multiple sources of errors existed in the measurement simultaneously. In this thesis, CTT is briefly described as a comparison of method, however, our focus is on methodology using Generalizability Theor y, especially, Multivariate Generalizability Theory (MGT), where not only multiple sources of errors can be specified but also reliability of a test (or profiles) in cluding two or more subtests can be constructed. As a substantive illustration, we apply MGT to data from Student Style Questionnaire (SSQ) in later chapters. Classical Test Theory Classical test theory (CTT) is the most popular and simple method to assess the re liability of a measurement because it is easie r to understand and conceptualize. Under CTT, an observed PAGE 11 11 score is conceptually decomposed as a true score and an error score; the agreement between the true score and the observed score is based on th e reliability of the measurement. In common sense, the higher the reliability of a measure is, th e closer the true score is to the observed score. Although we can never obtain a true score for an ex aminee, using socalled parallel tests we can obtain the true score and observed score variances (Allen & Yen, 2002). As a result, the statistical representation of reliab ility is defined as the ratio of tr ue score variance to the observed score variance. In addition, the reliability estimation in CTT addre sses one source of error, such as testretest, alternative forms, interrater, inte rnal consistency, at a time (Crocker and Algina, 1986), which leads to different reliability coefficients in terms of different research designs. At the same time, comparing the different sources of errors existed in a measure and specifying the largest error is always difficult under the framew ork of CTT. In sum, with one and only one undifferentiated error term included in the model, CTT is limited to provide more insightful information on measurement error(s) that can he lp test developers and users adopt a better measurement design. Generalizability Theory To address the challenge coming fr om CTT, Cronbach, Gleser, Nanda and Rajaratnam (1972) first introduced Generalizbility Theory (G T) as a statistical theory for evaluating the reliability of measurements. Later, Shavels on and Webb further developed GT and made the theory more understandable with their published book Generalizability Theory: A Primer (1991). The theory reached its climax and was accepted by most researchers nowadays after R. L. Brennan produced his book Generalizability Theory (2001a). Comparing with CTT, GT is more flexible and powerful. In particular, instea d of decomposing an observed score as a true score and an error score, GT considers both systematic and unsystematic sources of error variations and disentangl es them simultaneously, so the observed score can be decomposed into PAGE 12 12 as many possible effects as specified by the meas urement design. For example, in a writing test where raters and prompts must be considered, an examinees score can be decomposed into a grand mean in the population and universe, and seve n other effects, due to person, rater, prompt, personrater interaction, personprompt interaction, raterprompt interaction, and personraterprompt interaction. By examining all possible sour ces of error, a resear cher can easily identify where large error sources come from and make appropriate decisions to decrease the error variance. As an extension of classical test th eory, GT shares some concepts and assumptions with classical test theory. For instance, the universe score in GT has the same implication as the true score in classical test theory, errors are assume d to be uncorrelated and independent of true scores, samples selected and used to estimate the error variances are randomly selected from the population. Many reliability studies utilizing Ge neralizability Theory (GT) is called Univariate Generalizability Theory (UGT), because only one universe score is associated with the object of measurement. For example, in math achievement test, each examinee only has one math score, that is, only one universe math score is associ ated with each person. Increasingly, however, datasets in the form of multiple subtests are mo re common to test developers and users. For example, in SAT test, each examinee has two scores representing verbal a nd math abilities, and a total score for the whole test. Of course, we could analyze the SAT data using UGT where the universe score of each examinee is regressed on his or her total score. However, we lose information about two specific subtests. In addition, if different set of items is used to measure each ability and the number of item s is not equal, then, we have unbalance data problem, which leads to the difficulty of vari ance components estimation. Even worse, some measurement may not have a composite score, that is, they are only profiles for subs cores. All these facilitated the PAGE 13 13 development of Multivariate Generalizability Theory (MGT), where two or more universe scores are associated with the object of measurem ent and covariance components in addition to variance components are taken into account. In sum, under certain circumstance, both UGT and MGT can analyze the same given data and the re sults from univariate and multivariate analysis can be similar, however, multivariate analysis provides more information that can be used by the test developers and users. The advantages of Multivariate GT over Univariate GT are obvious and will be described in detail in later chapter. Student Style Questionnaire In this study, we apply Multivariate Generalizability Theory (MGT) to evaluate the reliability of the Student Style Questionnaire (SSQ) Scale. The SSQ (Oakland, Glutting, Horton, 1996) includes 69 items designed to measure stude nts temperament on four bipolar subscales: Extroverism/Introverism, Thinki ng/Feeling, Practical/Imaginative, and Organized/Flexible Students respond to 69 forcedchoice questions rela ted to reallife situa tions to express their individual style preferences. Each item is a brief description of a daily basis event, followed by two mutually exclusive alternative responses that indicate the students preferred style. The Student Style Questionnaire measures preferences and students answer the questions in terms of what they like to do. The results can provide information for teachers, parents and counselors to assist students leani ng, vocational decisions and emotional counseling. The SSQ also can help in parent training, suggesting communication strategies for both parents and teachers. PAGE 14 14 CHAPTER 2 LITERATURE REVIEW Generalizability Theory Overview There is an extensive literature on th e Generalizability Theory (GT) and its applications since 1972. Recent literature (Gao & Brennan, 2001; Yin, 2005; SolanoFlores, G. & Li, M. 2006; Clauser, B.E., Harik, P. & Margolis M. J. 2006) has referred most to two books: Generalizability Theory: A Primer (Shavelson & Webb, 1991) and Generalizability Theory (Brennan, 2001a). Both books introduced basic c oncepts and statistical models underlying GT. However, the book Generalizability Theory by Brennan (2001a) has the most comprehensive descriptions of GT. At the beginning of the book, Brennan describes that GT is a combination CTT and ANOVA (analysis of variance), which helps us c onceptualize the relationships among the three. The first chapter of the book introduces the framework of GT and gives definitions of the concept of GT. In the later chap ters, Brennan demonstrates and compares GT analysis in the context of different research design, and ad dresses the methodological challenge such as unbalanced random effects designs and variability of statistics in GT. Topi cs are developed from simplicity to complexity along two main lines, which is Univariate Ge neralizability Theory (UGT) and Multivariate Generalizability Theory (MGT). In general, both UGT and MGT focus on fundamental concepts, different research designs and statisti cal computation of variance components. However, MGT has more centers of attention regarding to estimating covariance components and composite variance components. The book reduces complicated mathematical and statistical formulas, and uses a language that can be underst ood by the readers without much statistical background. Most importa nt to researchers, the book describes the computer programs that are developed by Brennan (2001b) to simplify the calculation of GT. PAGE 15 15 The importance of GT lies in thr ee reasons (Shavelson, 1991): First, GT allows the researchers to estimate the magnitude of multiple independent sources of error variance existed in a measure simultaneously. Second, the estimated variance components for a measure can be used to carry out Decision Studies which means we can decide the degree of er ror variation according to our desired accuracy of measurement. Third, GT allows the estimation of test score reliability based on whether the scores will be us ed to make relative (normreferenced test) or absolute (criterionreferenced test ) decisions. In this sense, GT e xpands CTT in that reliability of scores depends on how we use these scores. In su m, GT provides a flexible and powerful tool to assess the reliability of a measure. It not only allows us to investigate but also to design reliable observations, utilizing the decomposed sources of variation in the measurement and then minimizing the measurement error(s) to reach an optimal design. However, all gains from GT are not obtained without pain. The challenge in GT is to make a distinction among different measurement designs, which involve many conceptu al and statistical issu es. In this thesis, notations from Brennans book generalizability theory are used to illustra te the conceptual and statistical issues. Variance Components Variance components refer to the variati ons existed in a measure; it is the most fundamental concept of GT. Usually, we notice that persons scores on a measure are different; some higher, some lower. In common sense, at least, we can think of one reason why the scores differ different people have different abilities or attitud es on the tested content, which is the variation due to persons. In fact, observe d scores for examinees can be different due to other reasons besides person variation. For exampl e, in an achievement test, an individual score on a particular item is affected by a person effect (p) (systematic differences among peoples temperament, or object of the measurement ), an item effect (i, variability du e to items), and a residual including PAGE 16 16 the personitem interaction (pi) and other unspeci fied effect. As a result, an observed score for one individual on one it em can be stated as: Xpi = + ( p ) + (i ) + (Xpi p i + ) (21) Where Xpi refers to the score for any person in the population on any item in the universe, refers to the grand mean sc ore over all persons and items, p refers to average score for person p over all items, i refers to the average score for item i over all persons. The grand mean is constant for all people and has no variance. ( p ) designates a person effect, ( i ) an item effect, and the last the residual effect involving the interaction and all other sources of error not identified in this design. The last three terms are random effects. Accordingly, the variance of those Xpi scores can be decomposed as, 2(Xpi) = 2(p) + 2(i) + 2(pi,e) (22) where 2(Xpi) refers to the observed score variance, 2(p) refers to th e person variance, 2(i) refers to the item variance and 2(pi,e) refers to variance due to residuals. Thus, the variance of observed scores can be partitioned into independent sources of variation due to differences among persons, items, and the residual term, which includes the personitem interaction. Universe of Admissible Observations and Facets From the perspective of GT, a measurem ent is considered as a sample from a universe of admissible observations. All admissible observations have the same quality so that they can be interchanged for the purpose of making a deci sion about the measurement (Shavelson & Webb, 1991). The universe of admissible observations is made of facets, where each facet represents one source of variation in the measurement. The universe can have one or more facets, depending on how we conceptualize the measurement. In addition, each facet includes one or more similar conditions. For example, in a wr iting test, prompts are a possible facet. Each PAGE 17 17 prompt in the test can be considered a condition of this facet. An individuals score on a prompt is an observation. If each prompt is treated as a sample from a universe of prompts that are acceptable to test the writing ability, then the variation of the writing scores could come from the facet prompts besides differences in persons which leads to the estimation of variance component for the facet prompts. Generalizability (G) Studies and Decis ion (D) Studies GT distinguishes G study and D study as tw o different but related pr ocedures to investigate and design reliable observations. The purpose of a G study is to id entify and decompose different sources of variations from the observed scores, while the purpose of a D study is to use the obtained variance components from G study to corre ct the error by collecting new sample from the prespecified universe of ge neralization and design the measurement for a particular purpose. Although both studies are involved with estimation of variance co mponents, only D study allows us to manipulate the estimated variance com ponents and change the measurement error(s). Therefore, G study is associated with the development of a m easurement procedure, while D study is involved with application of the measurement procedure. Generalizability Coefficient and Dependability Index Generalizability (G) Coefficient and Depe ndability Index (phi coeffi cient) both represent the degree of agreements between universe score variance and observed score variance. However, G coefficient is different from De pendability Index in that: G coefficient is associated with a relative decision, where a decision of a score is based on relative standi ng of a person in the population (e.g., normreferenced test); while Depe ndability Index is involved with a absolute decision, where a decision of a score is based on whether a person passes a standard or cutoff score (e.g., criterion referenced test). PAGE 18 18 The Gneralizability coefficient in GT is analogous to the reliability coefficient in CTT. The Generalizability coefficient is defined as the ra tio of universe score variance to the observedscore variance, which includes bo th the universe score variance a nd the relative error variance. For example, for an achievement test with it ems as the only facet, the relative error is i epi reln2 2 (23) where2 epi is the variance due to person by item interaction a nd unspecified error, ni is the number of items in the measurement. The relative error (2 rel) for an instrument is inversely proportionate to its number of items. Obviously, wh en the number of items increases, we can get small relative error; when the number of ite ms decreases, we get large relative error. The formula for calculating the generalizability coefficient is: E 22 2 2 relp p (24) Where 2E refers to Generalizability coefficient, 2 p refers to universe score variance, and 2 rel refers to the relative error existed in the m easurement. The generalizability coefficient shows how accurate a persons observed score can be generalized to his or her universe score (Shavelson & Webb, 1991). It reflects how much of the observed score va riance that is due to universe score variance. Similarly, Dependability index is defi ned as the ratio of unive rse score variance to the observed score variance, which includes universe score variance and absolute error variance. In the same scenario of an achievement test mentioned above, the absolute error is i epii absn2 2 2 (25) PAGE 19 19 where 2 abs refers to the absolute error, 2 irefers to the variance due to item, 2 epi refers to the variance due to person by item inter action and unspecified error, and in refers to the number of items in the measurement. Notice that not only th e residual variance (inter action and unidentified error) but also the items variance contributes to the absolute error. When the number of items increases, the absolute error decreases; and vise versa. The formula for calculating the Dependability index is 22 2 absp p (26) where refers to Dependability index, 2 prefers to the univers e score variance, and 2 abs refers to the absolute error. From equation 24 and 26, we notice that both Gcoefficient and Dependability index involve the number of items in the measurement, therefore, we can determine how many items are necessary in a measurement in order to reach a specific 2 or To correctly specify the relative and absolute error in the measurement is important for calculating the right G coefficient and dependability index. Usually, relative error only involves error variances that only interact with the person (object of measurement) In contrast, absolute error involves all variances except for the universe score variance. This difference is demonstrated with the Venn diagram in Figure 21. Random and Fixed Facets W hether a facet is random or fixed de pends on how we define the conditions of the facet. Suppose the conditions of a facet in a measuremen t are randomly sampled from conditions in the universe and will be generalized to the other cond itions not included in the sample. In this case, the conditions in the measurement represent a sa mple from all possible conditions. Therefore, the PAGE 20 20 facet is a random one. On the contrary, a facet is a fixed one when the conditions of the facet in a measurement are the only conditions in the univ erse and could not be generalized to other conditions. The importance of re cognizing a fixed or random facet is in that, the variance associated with a fixed facet is not regarded as an error variance component. Therefore, when we calculate the reliability coefficients for the meas urement, variance due to a fixed facet should not be included in relative or absolute error but as a part of the universe score variance. However, the fixed facet presents a problem when GT treats it by averaging over the conditions of the facet, which does not always make a conceptual sens e. The solution that Shavelson & Webb (1991) suggested is to conduct separate G studies within each condition of the fixed facet. This suggestion is exactly aligned with MGT that we will present in later section. Crossed and Nested Facets Crossed facet happens when every condi tion of one facet interacts with every condition of other facet. For example, each in dividual responds to all the items in the achievement test. The design is a crossed one and can be denoted by p i, where p represents person and i represents items. In contrast, nested facet happens when diffe rent set of conditions of a facet interacts with one and only one condition of another facet. For exam ple, in SAT test, different sets of items are associated with either verbal s ubtest or math subtest but not both; therefore, the items are nested within the subtest facet. We can denote this nest ed design as i:t, where t represents the facet subtest. In addition, a design of a measure may include both crossed and nested facets at the same time. For example, each individual responds to all the items, and two or more items appear with one and only one condition (e .g., subtest) of an achievement test. The notational form of this design may be expressed as p (i:t). The importance of rec ognizing the difference between PAGE 21 21 crossed and nested facet is in that, estimation of variance components from a nested design is different from a crossed design, which directly leads to di fferent reliability estimation. Multivariate Generalizability Theory Overview For the last few decades, test devel opers and users have attempted to investigate the reliability of a measurement where responses of multiple subtests (or profiles) are obtained for each object of measurement. Such data have th e following characteristics: (1) each examinee (object of measurement) has two or more universe scores represen ting subtests or profiles; (2) The conditions of subtests (or profiles) are fixed, that is, the selected conditions are our interest and will not be generalized to other conditions. (3 ) The number of items in each condition of the subtests (or profiles) is not the same, which means the data are unbalanced. Furthermore, the researchers concern about not only each universe sc ore but also the compos ite (or profiles) of universe scores for the whole test. Multivariate ge neralizability theory (MGT ), in contrast with univariate generalizability theory (UGT), was developed to meet the challenge (Rajaratnam, 1965; Shavelson & Webb, 1991; Brennan, 2001a). MGT is not complete without comparing with UGT. The difference between MGT and UGT can be described like this: MGT involves with two or more universe scores for the object of measurement at the same time, while UGT involve s with only one universe score for the object of measurement at a time. In this sense, multiv ariate analysis of a specific dataset can be constructed based on multiple univariate analyses in a row. More importantly, multivariate analysis account for not only variance components like univariate anal ysis does, but also covariance components between th e universe scores that univari ate analysis cannot do. This powerful function of multivariate analysis allo ws us to investigate and design reliable observations both at each unive rse score level and composite score or profiles level. PAGE 22 22 However, each multivariate design can have a counterpart in a univariate design, which means, logically, that any data can be performed with univariate GT. The choice of multivariate GT over univariate GT depends on the complexity of the data and what kind of information that we want to derive. Brennan (2001) recommended performing a full multivaria te analysis if there is a fixed facet in the re search design. In his book Generalizability Theory, Brennan discussed the problems of analyzing unbalanced data, where the sample sizes in each condition of a facet are not equal. Unbalanced data creates complex ity when we want to decompose the variance components. One way to reduce the complexity of unbalanced data in univariate analysis is to analyze the data under the framework of multivariate design if possible. Take the SSQ data as an example: it is reasonable that different sample si zes of items are distribu ted to four temperament scales in SSQ test with more items in one scal e and fewer items in the other scale. Multivariate design avoids the problem of unbalanced data by an alyzing four parallel univariate designs. In the end, each univariate design has balanced data under four levels of fixed facet temperament. In addition, Haertel (2006) pointed out two disadvantages of using univariate analysis for data containing fixed facet. First, variance com ponents are forced to be the same for observed scores for the levels of the fixed facet. Secon d, universe score represents an equally weighted composite of scores on the levels of the fixed facet, which is not always true. Consequently, some information about the scores cannot be derived from univariate analysis. In sum, the advantages of multivariate GT over univariate GT are: a) multivariate GT reduces the complexities and ambiguities in term s of unequal numbers of items within fixed facets if univariate analysis is used; b) estimations of variance and covariance components can be alienated in a multivariate analysis, but no t in univariate analys is (Brennan, 2001a); (c) estimate observable correlations or universescore and error co rrelations for various D study PAGE 23 23 designs (Brennan, 2001a); (d) estimate the relia bility of profiles of scores using multiple regression of universe scores on the observed scores in the prof ile (Brennan, 2001a, Cronbach et al., 1972); or (e) produce a composite of scores with maximum generali zability (Shavelson & Webb, 1981). For all of these purposes, multiv ariate GT can be a better choice. A later chapter will present the met hods of how to compute variance and covariance components, disattenuated correlation, and Genera lizability coefficient and Dependability index under the framework of multivariate GT usi ng the actual SSQ data as an illustration. Relative Error (p i,e) Absolute Error (pi,e + i) Figure 21. Relative and absolute error for a random p i design p i pi,e p i pi,e PAGE 24 24 CHAPTER 3 RESEARCH DESIGN AND METHODOLOGY Student Styles Questionnaire Data The Student Styles Questionnaire (SSQ; Oakland, Glutting, & Horton, 1996) is an instrument designed to measure childrens te mperament. Temperament refers to peoples consistent attitudes, preferences, affect, and styles of behaving. In education, a students temperament influences the ways that students learn and associate in school. It is important for teachers and parents to understand students temperament so that effective and appropriate teaching methods can be applied to the students. There are four pairs of temperament styles: extroversionintroversion (EI), practicalimaginative (PM), thinkingfeeling (TF) and organizedflexible (OL) (Oakland et al., 1996). Each of the four bipolar st yles contrasts aspects of a students preferences. SSQ includes 69 selfreport temperam ent items; each item has two options in a forcedchoice response format. It was designed to be used with children ages 8 through 17. Each temperament style was measured with different set of items items measure EI style, 16 items PM style, 10 items TF style, and 26 items OL style. One feature of coding the responses is that, if the response of the student belongs to ex trovert, thinking, organized or practical style, the score is positive; otherwise, the score is negati ve. One example of the items is After school I like to do something (alone/wit h friends). If the response of the student is with friends (extroverttype), the score for the item is 3; if along (introverttype), the score is (Note that 6 out of 69 items measure two different styles simultaneously; for example, item 41 measures OL as well as PM. We ignored the complexity of 6 items crossed with two levels of temperament for convenience of illustration). PAGE 25 25 The SSQ data used in this study were acquired through a random sample representative of the 1990 U.S. Bureau of the Census data. Total 7,902 children from the public and private schools with the age range of 8 through 17 comp leted the questionnaire s (6.7% age 8, 8.2% age 9, 11.8% age 10, 14.9% age 11, 14.2% age 12, 14.7% age 13, 11.7% age 14, 8.1% age 15, 5.2% age 16, and 4.6% age 17). There were 3952 (50 %) female and 3950 (50%) male in the sample. The data were collected from four different regions: 1976 (25% of total) students were from the North Central, 1692 (21.4% of total) students from the West, 2785 (35.2% of total) from the South, and 1449 (18.3%) from the Northeast. There were 5547 ( 70.2%) white students, 868 (11%) Hispanic students, 1194 (15.1%) black students, and 293 (3.7%) other ethnic students. Multivariate Design and Research Questions SSQ has four temperament scales: extroversionintroversion (EI), practicalimaginative (PM), thinkingfeeling (TF) and organizedflexible (OL). Different set of items is associated with one and only one content area, which means, th e items are nested within the fixed facet. Temperament is fixed facet here because we ha ve no intention to generalize four types of temperament to other conditions of content. When the students take the SSQ, four subtest scores are reported to them. Each subtest score (xpi) for an examinee can be decomposed into effects due to persons (p), items (i), personbyitem inter action (pi) or re sidual effects, and a grand mean ( ) for the content. This is analogous to a univari ate analysis associated with each level of the fixed content facet. Finally, we have four such univariate designs in a row, which form the multivariate design for SSQ. The notational form of multivariate design of SSQ is p i, where p stands for students and i stands for items. The solid circle means that st udents are crossed with multivariate fixed facet, while empty circle means that items are nested within each condition of the multivariate fixed PAGE 26 26 facet. In addition, the existence of covariance component depends on wh ether there is a random facet linked with the fixed facet. Linked has the same meaning with crossed. As a result, the expected values of error covariance components fo r students (p) are not zero, while the expected values of error covariance components for items (i) are zero. Applying MGT for assessment of the SSQ data is important in that different aspects or facets of individual score differences can be quantif ied and used in future decisionmaking. In particular, the following questions will be examined in our study: 1. What is the magnitude of error variances in SSQ? 2. Are the four temperaments in SSQ correlated? 3. What are the generalizability and dependabil ity coefficients for each subscale in SSQ? 4. What are the generalizability and dependability coefficients for SSQ profiles? Multivariate Generalizability Studies Multiva riate analysis of SSQ data cons ists of two steps: multivariate generalizability (G) studies and multivariate decision (D) studies. In the first step, multivariate G studies focus on estimating variance and covariance component s, and disattenuated correlations between temperament styles. In the second step, multivariate D studies (in next section) focus on minimizing measurement error(s) given the in formation obtained from the first step, and calculating appropriate G or phi coefficients. Estimating Variance Components The multivariate generalizability study p i design is used to estimate the variance and covariance components for the SSQ data. Letting v1 to v4 stand for the four temperament styles (EI, TF, PM, and OL) for the SSQ, p stand for pe rsons or students, and i stand for items or questions, a linear model can be used to describe the observed scores for each of the temperament scales. Taking v1 (EI) as an example: Xpiv1 = v1 + pv1 + iv1 + piv1, (31) PAGE 27 27 in which the terms represent effects for 1, and v1 is the grand mean for the universe score for EI. An examinees observed score Xpiv1 for EI can be decomposed in to effects due to persons ( pv1), items ( iv1), personbyitem interac tion or residual effects ( piv1), and a grand mean for the level of temperament ( v1). Similar equations can be used to describe observed scores for TF, PM and OL. That is, there is a univariate p i design associated with each level of temperament The superscript filled circle for p() indicates that persons have scores on al l four temperament scales. The superscript empty circle for i () indicates that items are nested within each level of temperament; that is, each set of items belongs to only one leve l of temperament. The fixed levels of are linked in the sense that the same group of students respon ds to all items for all levels. The Venn diagram for multivariate p i design is showed in Figure 31, wher e v is the fixed facet temperament. Similarly, the variance of the scores fo r one scale (i.e., EI) ove r the population of students and items in the universe of admissible obs ervations can be expressed as below: 2(Xpi) = 2(p) + 2(i) + 2(pi) (32) Variance components cannot be observed directly. To obtain the variance component, we need ANOVA procedure to obtain sum squares and e xpected mean squares (EMS) first. After obtaining the EMS from ANOVA procedure, we obtain the variance components estimators using following equations: 2(p) = in piMSpMS )()( (33) 2(i) = pn piMSiMS )()( (34) 2(pi) = MS (pi) (35) where ni is the number of items, np is the number of students. PAGE 28 28 Estimating Covariance Components An unbiased estimator of the covarian ce between universe scores for the linked facets is calculated according to the following equation: )('pvv S(p) = '1vv p pvpvp p pXX n XX n n (36) where S(p) is the observed covariance between two universe scores, pvX refers to mean score for specific person in one specific type of temperament average across items, vXrefers to mean score for one specific type of temperament average across all persons and items, and np is the number of students. Variance and Covariance Matrix Variance and covariance components for the SSQ p i design are presented here using matrix conventions: p = 1 2(p) 12(p) 13(p) 14(p) 21(p) 2 2(p) 23(p) 24(p) (37) 31(p) 32(p) 3 2(p) 34(p) 41(p) 42(p) 43(p) 4 2(p) i = 1 2(i) 2 2(i) (38) 3 2(i) 4 2(i) pi = 1 2(pi) 2 2(pi) (39) 3 2(pi) 4 2(pi) In these matrices, each column repres ents one level of the fixed facet of SSQ in the following order: EI, PM, TF, and OL. Elements on the diagonal are estimated variance PAGE 29 29 components, and elements on the offdiagonal are estimated covariance components. Because persons are linked, covariance components can be estimated between pairs of four levels for the object of measurement. Meanwhile, items and pe rsonbyitem interaction terms are not linked. Thus covariance components between pairs of four levels are zero and therefore are not listed. Disattenuated Correlations In measurement, disattenuated correl ations refer to correlation between two measures accounting for measurement error, which is in c ontrast with weakened correlations because of measurement error. Disattenuated correlations for pairs of universe scores of four levels of SSQ through the object of measurement (students) can be estimated. Fo r example, the disattenuated correlation coefficient between two temperament scales (v and v) for the universe score (person) is vv (p) = )()( )(2 ` 2 `pp pvv vv (310) where ) (`pvv is the covariance of two universe scores,)(2pvis one of the universe score variances. Multivariate Decision Study Estim ates of variance and covariance components for the decision (D) study are pretty straight forward on the basis of the G study results. p is unchanged, and I and pI can be obtained by dividing the diagonal elements in i and pi by the number of decision study items (ni ) within each level. The three D study variance covariance matrices are as below, PAGE 30 30 p = 1 2(p) 12(p) 13(p) 14(p) 21(p) 2 2(p) 23(p) 24(p) (311) 31(p) 32(p) 3 2(p) 34(p) 41(p) 42(p) 43(p) 4 2(p) I = 1 2(i)/ni1 2 2(i)/ni2 (312) 3 2(i)/ni3 4 2(i)/ni4 pI = 1 2(pi)/ ni1 2 2(pi)/ ni2 (313) 3 2(pi)/ ni3 4 2(pi)/ ni4 Relative and absolute error variancecovari ance matrices can then be obtained: = pI (314) = I + pI (315) Equations 314 and 315 are diagonal matrices for the p I design. The square roots of relative and absolute error variance are the relative and ab solute standard error of measurements (SEMs). Also, from Equation 312 to 315, we can see that the relative a nd absolute error variances are directly influenced by the numbers of items in each level of the fixed facet. In particular, when the number of items increases, the relative and ab solute errors decrease. As a result, the G coefficient and dependab ility index increase. SSQ Profiles SSQ profiles include four temperament scales. A composite score as sum of subscores of four temperament scales for a student does not ma ke sense here, however, we still can assess the SSQ profiles as one unitary instrument in te rms of composite universe score variance and composite relative and absolute errors. Bre nnan (2001a) recommends three weights to sum the variance and covariance components, and one of them is priori weights. The rationale of a priori weight is that: an appropriate composite univer se score variance is a weighted average of the PAGE 31 31 universe score variance for each level of a fixed facet, where the weights are proportional to the number of items in each level. Such a weight vector can be defined as W`= 1 n ni 2 n ni 3 n ni 4 n ni], (316) where ni1 to ni4 are the number of items for each level of the fixed facet, and n+ is the total number of items. Composite Universe Score Variance, Rela tive and Absolute Score Variance Using the weight vector w`, th e varian ce for the composite universe score is 2 C(p) = '' 22)( )(vv vvvv v vvpwwpw (317) where C is the composite score, w is the weight. Similarly, estimated relative and absolute error variances for the composite score are 2 C( ) = v vvw )(22 = v iv vn w2)(2piv (318) 2 C( ) = )()( )(2 2 2 22pii n w wv v v iv v v vv (319) Square roots of relative and absolute error va riances for the composite are the relative and absolute standard error of m easurement for the SSQ profiles. GCoefficient and Dependability Index for SSQ Profiles A multivariate generalizability coefficient (E 2 C) for the profiles can be defined as the ratio of composite universe score variance to itse lf plus composite rela tive error variance: E 2 C = )()( )(2 2 2 C C Cp p (320) PAGE 32 32 A generalizability coefficien t is often used for making nor mreferenced interpretations. Similarly, a multivariate dependability index (phi coefficient) can be defined as the ratio of composite universe score variance to itself plus composite absolu te error variance. C= )()( )(2 2 2C C Cp p (321) A dependability index is often used for making criterionreferenced interpretation. Data Analysis Software for This Study All computations were performed using the SPSS 14.0 package and EXCEL. The sums of squares and expected mean squares (EMS) were computed using the repeated measures option of the GLM procedure in SPSS. EXCEL was used to calculate the estimated error variances from EMS, covariance components and disattenuated correlations be tween universe sc ores and all coefficients. The availability of computer programs allow for a more convenient way to do multivariate data analysis by mGENOVA (Brennan 2001b). There is a detailed description of mGENOVA in Brennan book Generalizability Theory (2001a). One advantage of mGENOVA program is to produce output for large size data set in a few minutes. However, this study will not use mGENOVA program. Figure 31. Representation for a multivariate p i design. p v i PAGE 33 33 CHAPTER 4 RESULTS In this study, the dataset was divided into eight random cases, with sample size for first seven cases is 1000, and the last case 902. The reasons for doing this are a) we want to examine whether eight cases representing eight random sample s are consistent with each other in terms of magnitude of error variances and b) with large dataset (i.e. tota l 7902 students in the SSQ data), SPSS usually is out of memo ry to produce the results. Multivariate Generalizability Study Results Item s are the only random facet in this multivariate generalizability analysis of SSQ data. Students are the objects of measurement. The ite ms were presented to all the students as a crossed design. In addition, students were crossed w ith items that were nested within each level of the fixed facet temperament. Therefore, the complete design in this multivariate study is denoted as: pi, where p refers to persons effect and i the items. The covariance components design is for p and variance components design is for i and pi Since the fixed facet of this multivariate design has four levels, the multiv ariate design involves 4 variance components designs and 4(41)/2 =6 c ovariance components designs. Table 41 through Table 44 provides mean squares and estimated variance components for EI, PM, TF and OL in case 1 SSQ data usi ng equation 33 to 35, and Table 55 provides covariance components for pairs of scales in Case 1 SSQ data using equation 310. Estimated variance and covariance components for eight samp les of SSQ following the procedure of Case 1 are listed in Table 46. Each sample was administered to 1000 students (with the exception of the last case) in a randomgroups desig n. Also provided are the means and standard deviations of the estimated variance an d covariance components. According to Brennan (2001a), the standard deviations are empirical estimates of the standard errors, without making PAGE 34 34 normality assumptions. The relatively small magnitude of this standard error is reflected by the similarity of the eight estimates. As shown in Table 46 and 47, varian ce and covariance component averages for all eight samples were similar in magnitude. Estimated va riance components for p were relatively small (0.28 to 0.80). Estimated variance components fo r i were similar to those for p. Estimated variance components for pi were the largest, (eight times larger than variance component estimates for p and i), which indicates that rela tive ranking of different students tended to vary across different items. Among the four scales of temperament, estimated variance components for persons were the largest for TF and least for PM. On average, TF has the largest variance components for persons (.69), PM has the smallest variance components for persons (.31). Estimated covariance components for p were small compared to th e estimated variance components for p. The components of covariance for persons reflect the underlying correlati on among the temperament components. On average, PMTF has the larges t covariance component (.15), EIPM has the smallest covariance compone nt (.01), which indicates that persons who have p ractical temperament style are also more likely to have thinking temperament style, and persons who have imaginative style more likely to have feeling temperament style. However, for persons are extrovert or introvert temperament style, there is no diffe rence in terms of preferences to practical or imaginative temperament style. The estimated va riance components for items for OL were the largest and for TF were the smallest, which indicates that item difficulty for OL varied the most. Estimated variance components for pi for TF were the largest. Using the means of the estimated vari ances and covariances in Table 46 and 47, the G study matrices are PAGE 35 35 p = .52 .01 .12 .05 .01 .31 .15 .13 (4 1) .12 .15 .69 .09 .05 .13 .09 .43 i = .59 .46 (42) .44 .68 pi = 4.73 3.68 (43) 4.89 4.16 Multivariate Decision Study Results Using D study sample sizes of ni1 = 23, ni2 = 16, ni3 = 10, and ni4 = 26, (same as the G study sample size) the D study estimated variancecovariance matrices are p = .52 .01 .12 .05 .01 .31 .15 .13 (4 4) .12 .15 .69 .09 .05 .13 .09 .43 I = .026 .029 (45) .044 .026 pI = .21 .23 (46) .49 .16 PAGE 36 36 It follows that the estimated universe score, relative error, and absolute error matrices are p = .52 .02 .20 .11 .01 .31 .32 .36 (47) .12 .15 .69 .17 .05 .13 .09 .43 = .21 .23 (48) .49 .16 = .236 .259 (49) .534 .186 where the italicized values in th e upper diagonal positions of p are disattenuated correlations for universe scores, the values in the diagona l positions are variance components. The first column (or row) of the matrix represents EI, second PM, third TF and fourth OL. The range of the correlation is from negative .17 to positive .36, which indicates some types of temperament are correlated with each other to some degree. The G coefficient and the dependability index coefficient within each level of the fixed facet temperament can be obtained using equation 24 and 26; the results are in the following matrix in the order of EI, PM, TF and OL. E 2 = .71 .57 .58 .73 (410) .69 .54 .56 .70 PAGE 37 37 To have more choices of Generalizab ility coefficient and dependability index for each subscale of temperament, we ca rry out several Dstudies using different items other than the items used in SSQ subscales. Table 47 shows the generalizability coeffici ents (Gcoefficients), as well as the dependability coefficients, associ ated with the various items, beginning with the original Gstudy sample sizes for EI, PM, TF and OL. Multivariate G and D Study Results for SSQ Profiles To obtain universe scores variance and er ror variance of SSQ profiles, we assume that the numbers of items that contribute to each SSQ scale are reflective of the relative importance for the universe of generalization intended by SSQ. Under this a ssumption, a prior weights are w1 = 23/75 = .31, w2 =16/75 = .21, w3 = 10/75 = .13, and w4 = 26/75 = .35. Using these weights, estimated composite variances for the universe scores using equation 317 is shown below )(2pC = (.31)2*(.52) + (.21)2*(.31) + (.13)2*(.69) + (.35)2*(.43) + 2*(.31)(.21)(.01) + 2*(.31)(.13)(.12) + 2*( .31)(.35)(.05) + 2*(.21)(.13)(.15) + 2*(.21)(.35)(.13) + 2*(.13)(.35)(.09) = .14 The estimated composite relative error va riance using equation 318 is shown below )(2C= (.31)2*(.21) + (.21)2*(.23) + (.13)2*(.49) + (.35)2*(.16) = .06, The estimated composite absolute error va riance using equation 319 is shown below )(2C= (.31)2*(.236) + (.21)2*(.259) + (.13)2*(.534) + (.35)2*(.186) = .07, The Generalizability coefficient for th e composite score of overall temperament in this multivariate G study can be calculated using equati on 320. Since the true variances are always unknown, those estimated variance components are plugged in the equation. The calculations are shown below PAGE 38 38 E2 C = 06.14. 14. = .70 This indicates that about 70% of the variability in individuals scores was systematic and attributable to the universe scor e. Such interpretation is simila r to that for the reliability coefficient in CTT since CTT concerns the relative standi ng of individuals. The dependability index was calculated using equation 321: C = 07.14. 14. = .67 This coefficient should be used when absolute de cisions are to be made using this measurement. Table 41. Anova for G study p i design for EI in case 1 Source of Variation Df Mean Squares Estimated Variance Component p 999 16.76 .52 i 22 567.77 .56 p i 21978 4.76 4.76 Note: EI = extrovertintrovert, np = 1000, ni = 23. Table 42. Anova for G study p i design for PM in case 1 Source of Variation Df Mean Squares Estimated Variance Component p 999 8.68 .31 i 15 401.32 .40 p i 14985 3.72 3.72 Note: PM = practicalimaginative, np = 1000, ni = 16. Table 43. Anova for G study p i design for TF in case 1 Source of Variation Df Mean Squares Estimated Variance Component p 999 12.08 .72 i 9 419.11 .41 p i 8991 4.87 4.87 Note: TF = thinkingfeeling, np = 1000, ni = 10. PAGE 39 39 Table 44. Anova for G study p i design for OL in case 1 Source of Variation Df Mean Squares Estimated Variance Component p 999 16.03 .45 i 25 604.07 .60 p i 24975 4.21 4.21 Note: OL = organizedflexible, np = 1000, ni = 26. Table 45. Variance and covariance components for universe Scores of SSQ data in case 1 EI PM TF OL EI .52 .01 .08 .07 PM .01 .31 .16 .09 TF .08 .16 .72 .09 OL .07 .09 .09 .45 Table 46. The G study variance and cova riance components for person in SSQ Person Case )(2 1p )(2 2p )(2 3p )(2 4p)(12p)(13p)(14p)(23p )(24p)(34p 1 .52 .31 .72 .45 .01 .08 .07 .16 .09 .09 2 .49 .32 .64 .43 .04 .09 .06 .15 .13 .09 3 .46 .29 .65 .41 .03 .12 .02 .16 .15 .10 4 .62 .33 .67 .42 .01 .12 .05 .15 .11 .13 5 .46 .31 .78 .44 .03 .15 .08 .09 .19 .05 6 .65 .30 .56 .42 .03 .15 .07 .12 .11 .12 7 .56 .36 .80 .41 .004 .16 .04 .19 .13 .10 8 .43 .28 .67 .44 .03 .10 .02 .16 .13 .03 Mean .52 .31 .69 .43 .01 .12 .05 .15 .13 .09 SE .08 .025 .077 .012 .023 .026 .022 .03 .026 .034 Note: In variance and co variance notation (i.e., )(2 1p), 1 = EI, 2 = PM, 3 = TF, 4 = OL. Table 47. The G study variance components for item and personbyitem in SSQ Item Person Item Case )(2 1i )(2 2i )(2 3i )(2 4i )(2 1pi)(2 2pi)(2 3pi )(2 4pi 1 .56 .40 .41 .60 4.76 3.72 4.87 4.21 2 .63 .47 .50 .66 4.74 3.68 4.91 4.16 3 .57 .44 .43 .70 4.82 3.74 4.89 4.11 4 .64 .45 .51 .77 4.62 3.69 4.85 4.14 5 .58 .54 .44 .68 4.81 3.63 4.73 4.14 6 .58 .47 .43 .65 4.59 3.67 5.09 4.28 7 .65 .51 .46 .73 4.66 3.59 4.79 4.14 8 .52 .39 .31 .61 4.84 3.75 4.96 4.11 Mean .59 .46 .44 .68 4.73 3.68 4.89 4.16 SE .043 .046 .061 .05 .095 .052 .109 .054 PAGE 40 40 Table 48. Reliability / dependability estimates for SSQ subscale Dstudies Type of Temperament Number of Items Gcoefficient (reliability) (dependability) EI 23 .71 .69 36 .80 .78 PM 16 .57 .54 28 .70 .66 47 .80 .74 TF 10 .58 .56 17 .70 .68 28 .80 .78 OL 26 .73 .70 23 .70 .67 39 .80 .78 Note: EI = extrovertintrovert, PM = practical imaginative, TF = thinkingfeeling, OL = organizedflexible. PAGE 41 41 CHAPTER 5 DISCUSSION AND CONCLUSION In this study, we have addressed the pr oblem of analyzing an unbal anced data with a fixed facet (Shavelson & Webb, 1991; Brennan, 2001a)the fact that a univariate analysis of such data is limited by the challenge from estimati ng variance components as well as relative less information derived from only one universe sc ore. We have described another way, a multivariate analysis, of analyzing unbalanced da ta with a fixed facet. Multivariate analysis avoids the complexity of unbalanced design, and allows us to inve stigate and further decrease the measurement error at each subtest level and w hole test. In addition, multivariate analysis provides information about disattenuated correlations between universe scores. Based on multivariate analysis strategy, we used Student Style Questionnaire (SSQ) data to illustrate how to formulate and compute the variance components, covariance components, generalizability and phi coefficients (dependabili ty index) at both microlevel (each profile) and macrolevel (whole profiles). Our results lend support that multivariate analysis is a powerful method to investigate and design reliable observa tions for unbalance data with a fixed facet. Discussion of the Results SSQ measures four types of temperament: extrovertintrovert (OL), practicalimaginative (PM), thinkingfeeling (TF) and organizedflexible (OL). SSQ includes 69 selfreport bipolar items. Six SSQ items simultaneously measure two different types of temperament, which leads to 23 items measuring EI style, 16 items measur ing PM style, 10 items measuring TF style and 26 items measuring OL style. Total 7902 students completed the questionnaire and eight random samples were produced for the convenience of data analysis. PAGE 42 42 Variance and covariance components for each sample size were obtained through ANOVA procedures. Results showed that the magnitudes of error varian ces are similar in eight random samples. Results from the G study using the mean variance and covariance components over eight samples indicated, in the profiles (EI, PM, TF and OL) of SSQ, that the variations due to persons (.31 to .69) and items ( .44 to .68) are much smaller comp ared to the va riation due to person by item interaction (3.68 to 4.89). The large variance from the interaction indicates the extent to which persons are ranked or dered differently by different items. The Generalizability coefficients and dependability index for eac h SSQ profile are also obtained. OL has the largest G and phi coefficien t (.73 and .70) while PM has the smallest G and phi coefficient (.57 and .54). By examining four profiles of SSQ, OL and EI have acceptable G and phi coefficients, but the G and phi coeffici ents for PM and TF are moderately low. Our subsequent D study results indicate that we can increase the Gcoefficient (reliability) to .7 for PM and TF if we add 12 more items to PM and 7 more items to TF. Furthermore, if we expect the Gcoefficient to be .8 for each SSQ subscale, we need 36 items for EI, 47 items for PM, 28 items for TF and 39 items for OL. Multivariate analysis allows us to observe the disattenuated correlations between SSQ profiles. OL and TF have the highest correlations with PM (.36 and .32) while EI has the lowest correlation with PM and OL (.02 and .11) which indicates st udents preferring organizedtype also prefer thinking type and practical type but not necessarily related with extroverttype or introvert type. Based on a prior weight s according to the importance of the number of items in each level of the fixed facet, the multivariate G and phi coefficients (.70 and .67) representing the overall SSQ PAGE 43 43 profiles are acceptable in this st udy. About 70% of the observedscore variance in all forcedchoice items is due to the object of m easurement (the person) in this study. Limitations of This Study In this study, there are several limitations. First, six out of 69 items measures two levels of the fixed facet (for example, item 41 measures OL as well as PM), so items are not purely nested within the profiles of SSQ. Fo r the convenience of illustrati ng the method, we ignore this complexity of the design by treating the six items as different items to measure OL and PM, which caused the total number of items to be 75. Second, generalizab ility theory is most useful when many random facets are included in the res earch design. In this study, we only have one random face (items), besides object of measurem ent, which limits the current study in that G studies should usually involve as many as po ssible potential sources of variation. Third, the software SPSS and EXCEL instead of mGENOVA (Brennan, 2001b) were used to carry out the data analysis in some way leading to less pr ecise calculation because of two or three digits rounding error. Closing Remarks The data that have been considered in this study involves une qual numbers of conditions nested within each level of a fixed facet, whic h is perfect for a multiv ariate design. Although a univariate method can also be used to analyze such data, estima ting variance components would be much more difficult. Multivariate analysis is rather straightforward in this situation. When each level of the fixed facet contains an e qual number of conditions, multivariate analysis constructs the variance components from each uni variate analysis for each level of the fixed facet, and accounts for the covariance com ponents that univariate analysis cannot. PAGE 44 44 LIST OF REFERENCES Allen, M.J. & Yen, W .M. (2002). Introduction to Measurement Theory Long Grove, IL: Waveland Press. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing Washington, DC: Author. Brennan, R. L. (1998). Rawscore conditional standard er rors of measurement in generalizability theory. Applied Psychological Measurement 22(4), 307331. Brennan, R. L. (2000). Performance assessments from the perspective of gene ralizability theory. Applied Psychological Measurement Vol.24, No. 4, 339353 Brennan, R. L. (2001a). Generalizability theory New York: Springer. Brennan, R. L. (2001b). mGENOVA(Version 2.1) [Computer software and manual] Iowa City, IA: American College Testing, Inc. Brennan, R. L.. Yin, P.. Kane, M. T. (2003). Methodology for Examining the Reliability of Group Mean Difference Scores. Journal of Educational Measurement Vol. 40, No. 3, pp. 207230. Clauser, B. E., Harik, P. & Margolis M. J. (2006 ). A Multivariate Generali zability Analysis of Data from a Performance Asse ssment of Physicians Clinical Skills. Journal of Educational Measurement. Vol. 43, No. 3, pp. 173191. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory Orlando, FL: Harcourt Brace Jovanovich. Cronbach, L. J., Gleser, G. C., Na ndam, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles New York: John Wiley. Cronbach, L. J., Nageswari, R., & Gleser, G. C. (1963). Theory of generali zability: A liberation of reliability theory. The British Journal of Statistical Psychology 16, 137163. Edward H. Haertel (2006). Reliability. Educational measurement ch.3, 65110. Gao, X. & Brennan, R. L. (2001). Variability of estimated variance components and related statistics in a performance assessment. Applied Measurement in Education 14(2), 191203. Jung, C. G. (1971). Psychological types (R.F.C. Hull, Revision of Trans. By H.G. Baynes). Princeton, NJ: Pr inceton University Press. PAGE 45 45 Kane, M. (1996). The Precision of Measurements. Applied Measurement in Education 9(4), 355379. National Council on Measurement in Education a nd American Council on Education. (2006). Educational Measurement 4th ed. Editor, Robert L. Brennan Nubaum, Albert. & Aachen, Rwth. (1984). Multivar iate Generalizability Theory in Educational Measurement: An Empirical Study. Applied Psychological Measurement Vol. 8, No. 2, pp. 219230. Oakland, T., Glutting, J.J., & Horton, C. B. (1996). Student Styles Questionnaire Manual San Antonio TX: Psychological Corporation. Oakland, T. & Lu, Li. (2006). Temperament styles of children from the Peoples Republic of China and the United States. School Psychology Internationally 27, 192208. Osterlind, S. J. (2005). Modern measurement: Theory, principles, and app lications of mental appraisal Upper Saddle River, NJ : Pearson Prentice Hall. Shavelson, R. J. & Webb, N. M. (1991). Generalizability theory: A primer Newbury Park, CA: Sage. SolanoFlores, G. & Li, M. (2006). The Use of Ge neralizability (G) Theory in the Testing of Linguistic Minorities. Educational Measuremen t: Issue and Practice 25(1), 1322. Yin, Ping. (2005) A Multivariate Generalizability An alysis of the Multistate Bar Examination. Educational and Psychological Measurement Vol. 65 No. 4, 668686 Webb, N. M. & Shavelson, R. J. (1981). Multivaria te Generalizability of General Educational Development Ratings. Journal of Educational Measurement V18. NO. 1, 1322. PAGE 46 46 BIOGRAPHICAL SKETCH Youzhen Zuo received the Bachelor of Art in English in 1992 from Wuhan University in China. She worked as an English teacher for over seven years in Fanchuan Middle School. In 2004, she was admitted to Ohio State University as a graduate student at the School of Educational Policy and Leadership. In 2005, she transferred as a graduate student to the Department of Educational Psychol ogy at University of Florida. 