1 DOES PARTICIPATION IN 4 H IMPROVE S TUDENT OUTCOMES? EVIDENCE FROM FLORIDA AND OHIO By TROY TIMKO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 201 1
2 2011 Troy Thomas Timko
3 To my d ad
4 ACKNOWLEDGMENTS I thank my supervisory committee chair and advisor Alfonso Flores Lagunes for his outstanding mentoring and support throug hout my doctoral studies. H is instruction and encouragement helped me to keep the motivation to complete my degree. I also thank the other members of my supervisory committee : Rodney Clouser, Carmen Carrion Flores, and Marilyn Norman for lending me their expertise throughout my research process. I thank Joy Jordan and Nancy Johnson for their help in obtaining the Florida 4 H data and in understanding the 4 H organization I also thank other members Florida 4 H, Ohio 4 H, the Florida Department of Educatio n, and the Ohio Department of Education for their help in Obtaining the data, without which my research would not have been possible. I thank my mother for her unconditional support in whatever path I chose for myself ; and thanks go out to my brother and sister for being there whenever I needed them.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ ........ 10 LIST OF ABBREVIATIONS ................................ ................................ ........................... 11 ABSTRACT ................................ ................................ ................................ ................... 12 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 14 Motivation ................................ ................................ ................................ ............... 14 Background ................................ ................................ ................................ ............. 17 4 H Background ................................ ................................ ............................... 17 The Florida Comprehensive Assessment Test ................................ ................. 19 The Ohio Achievement Assessments ................................ ............................... 20 Research Goals ................................ ................................ ................................ 20 2 LITERATURE REVIEW ................................ ................................ .......................... 22 4 H Research ................................ ................................ ................................ .......... 22 Research on Youth Development and After School Programs, and Other After School Activities ................................ ................................ ................................ .. 28 Other Youth Development and After School Programs ................................ .... 28 Other After School or Extracurricular Activities ................................ ................ 31 Reviews of Evaluations and Meta Analyses ................................ ..................... 33 Standardized Test Scores, Schooling Behavioral Outcomes and Related Economic Literature ................................ ................................ ............................. 35 Standardized Test Scores ................................ ................................ ................ 35 Test Score Outcomes and Economic Impacts ................................ .................. 37 Causal Inference and Program Evaluation ................................ ............................. 39 Advancing Research on the Impacts of 4 H ................................ ............................ 41 3 DATA ................................ ................................ ................................ ...................... 44 Florida Data ................................ ................................ ................................ ............ 44 Ohio Data ................................ ................................ ................................ ................ 48 Florida and Ohio Data Comparison ................................ ................................ ........ 50
6 4 METHODOLOGY ................................ ................................ ................................ ... 57 Establishing a Causal Relat ionship ................................ ................................ ......... 57 Ordinary Least Squares ................................ ................................ .......................... 58 Fixed Effects ................................ ................................ ................................ ........... 59 Difference in Difference in Differences ................................ ................................ .... 59 5 IMPACT OF 4 H PARTICIPATION ON STUDENT OUTCOMES IN FLORIDA ...... 65 Florida Comprehensive Assessmen t Test Analysis ................................ ................ 65 Florida School Quality Results ................................ ................................ ................ 73 Florida Urban Rural Results ................................ ................................ ................... 74 Florida Club Rate Results ................................ ................................ ....................... 75 Florida Misconduct Results ................................ ................................ ..................... 76 6 IMPACT OF 4 H PARTICIPATION ON STUDENT OUTCOMES IN OHIO ............. 90 Ohio Achievement Assessment Results ................................ ................................ 91 Ohio Student Misconduct Results ................................ ................................ ........... 95 7 ASSUMPTION violation AND FALSIFICATION TESTS ................................ ....... 106 Violations of the Assumptions Employed to Identify Causal Effects ..................... 106 Falsification Tests for Florida and Ohio ................................ ................................ 107 8 SUMMARY, POLICY IMPLICATIONS, AND CONCLUSION ................................ 115 Summary ................................ ................................ ................................ .............. 115 Policy Implications and Conclusion ................................ ................................ ....... 118 Possible Extensions to this Research ................................ ................................ ... 123 LIST OF REFERENCES ................................ ................................ ............................. 124 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 129
7 LIST OF TABLES Table page 3 1 Florida home schooling statistics ................................ ................................ ........ 52 3 2 Variables and descriptive statistics for Florida data set ................................ ...... 53 3 3 Variable descripti ve statistics for Ohio data set ................................ .................. 54 5 1 Estimated effects of the extent of 4 H participation on student FCAT math subtest ................................ ................................ ................................ ................ 81 5 2 Estimated effects of the extent of 4 H participation on student FCAT reading subtest ................................ ................................ ................................ ................ 81 5 3 FCAT DDD model mathematics subtest regression results for FCAT levels one through five ................................ ................................ ................................ .. 82 5 4 FCAT DDD model mathematics subtest regression results for FCAT passing percent and mean scores ................................ ................................ ................... 83 5 5 FCAT DDD model reading subtest r egression results for FCAT levels one through five ................................ ................................ ................................ ......... 84 5 6 FCAT DDD model reading subtest regression results for FCAT passing percent and mean scores ................................ ................................ ................... 85 5 7 Joint significance tests for FCAT math subtest interaction terms and fixed effects ................................ ................................ ................................ ................. 86 5 8 Joint significance tests FCAT reading subtest interaction terms and fixed effects ................................ ................................ ................................ ................. 86 5 9 Estimated impact of school quality factors ................................ .......................... 86 5 10 Urban and rural comparison using the DDD Model for the FCAT ma thematics subtest ................................ ................................ ................................ ................ 87 5 11 Urban and rural comparison using the DDD Model for the FCAT reading subtest ................................ ................................ ................................ ................ 87 5 12 4 H and 4 H club p articipation rate effect comparison under the DDD Model for the FCAT math subtest ................................ ................................ ................. 88 5 13 4 H and 4 H club participation rate effect comparison under the DDD Model for the FCAT reading subtest ................................ ................................ .............. 88 5 14 Estimated effects of the extent of 4 H participation on student misconduct ........ 88
8 5 15 Estimated effects of the extent of 4 H participation on student misconduct for urban and rural districts ................................ ................................ ...................... 89 5 16 Estimated effects of the extent of 4 H participation on student misconduct for middle and high schools levels ................................ ................................ ........... 89 6 1 Estimated effects of the extent of 4 H participation on student OAA mathematics subtest performance ................................ ................................ .... 100 6 2 Estimated effects of the extent of 4 H participation on student OAA reading subtest performance ................................ ................................ ......................... 100 6 3 Estimated effects of the extent of 4 H participation on student OAA science subtest performance ................................ ................................ ......................... 100 6 4 Estimated effects of the extent of 4 H participation on student OAA social studies subtest performance ................................ ................................ ............. 101 6 5 Estimated effects o f the extent of 4 H participation on student OAA writing subtest performance ................................ ................................ ......................... 101 6 6 Joint significance tests for OAA math subtest Interaction terms and fixed effects ................................ ................................ ................................ ............... 101 6 7 Joint significance tests for OAA reading subtest interaction terms and fixed effects ................................ ................................ ................................ ............... 102 6 8 Joint significance tests for OAA science subtest int eraction terms and fixed effects ................................ ................................ ................................ ............... 102 6 9. Joint significance tests for OAA social studies subtest interaction terms and fixed effects ................................ ................................ ................................ ...... 102 6 10 Joint significance tests for OAA writing subtest interaction terms and fixed effects ................................ ................................ ................................ ............... 103 6 11 Estimated effects of 4 H participation on behavior outcomes in Ohio ............... 103 6 12 Average counts of student misconduct incidences by grade level in Ohio ....... 104 6 13 DDD regression results 4 H participation impact on stu dent misconduct for seventh and eighth grades. ................................ ................................ .............. 104 7 1 FCAT mathematics subtest DDD using lagged extent of 4 H participation ....... 112 7 2 FCAT reading subtest DDD using lagged extent of 4 H participation ............... 112 7 3 OAA mathematics subtest DDD u sing lagged extent of 4 H participation ........ 112
9 7 4 OAA reading subtest DDD using lagged extent of 4 H participation ................. 113 7 5 OAA sci ence subtest DDD using lagged extent of 4 H participation ................. 113 7 6 OAA social studies subtest DDD using lagged extent of 4 H participation ....... 113 7 7 OAA writing subtest DDD using lagged extent of 4 H participation .................. 114
10 LIST OF FIGURES Figure page 3 1 Florida and Ohio data set co mparison. ................................ ............................... 56 6 1 Counts of student misconduct incidences for drug use or possession for alcohol, tobacco, and other drugs. ................................ ................................ .... 105
11 LIST OF ABBREVIATION S ASVAB Armed Services Vocational Aptitude Battery AYP Adequate Yearly Progress DDD Difference in D ifference in D ifferences FCAT Florida Comprehensive Assessment Test FE Fixed Effects F LDOE F lorida Department of Education GDP Gross Domestic Product iLRC interacti ve Local Report Card NAEP National Assessment of Educational Progress NCLB No Child Left Behind Act OAA Ohio Achievement Assessment s OLS Ordinary Least Squares Prob. Probability REDI Rural Economic Development Initiative S.D. Standard Deviation SET Science, Engineering, and Technology programs USD OE United States Department of Education
12 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DOES PARTICIPATION IN 4 H IMPROVE S TUDENT OUTCOMES? EVIDENCE FROM FLORIDA AND OHIO By Troy Timko May 2011 011 Chair: Alfonso Flores Lagunes Major: Food and Resource Economics Major: See your Editorial Document Management record for your exact major The 4 H program is the larg est youth development organization in the United States. Policy makers, tax payers, program administrators, and parents of participants are all stakeholders concerned with the effectiveness of 4 H in influencing the lives of its participants. To evaluate t he effectiveness of this program, reliable estimates of its causal impacts on objective outcomes such as standardized test scores are required. However, significant difficulties typically exist in obtaining impacts that can be given a causal interpretation since it is possible that confounding factors are present that complicate the estimation of the causal relationship between 4 H participation and student outcomes. To overcome these pitfalls, my dissertation utilizes data on school districts and grade lev els observed over several school years and a difference in difference in differences approach to control for these potential unobserved confounders of the causal relationship of interest. Employing this methodology, I examined the impacts of participation in 4 H on test scores and behavioral outcomes in the states of Florida and Ohio. With respect to Florida, results for the Florida Comprehensive Assessment Test (FCAT) are combined
13 with 4 H participation information obtained from Florida 4 H. The results in dicate that the extent of 4 H participation has a statistically significant and positive effect on student passing rates for both the mathematics and reading subtests of the FCAT as well as statistically significant effects on the average FCAT reading scor es. In the case of Ohio, test results on the Ohio Achievement Assessments (OAA) are combined with 4 H participation information obtained from Ohio 4 H. The analyses on the impact of 4 H participation on students in Ohio also show a positive effect of 4 H p articipation on OAA performance, although the effects are milder, which I attribute to characteristics of the data in Ohio relative to vidence of significant effects. Supporting evidence about the validity of the underlying assumption in my model to uncover causal impacts is also presented. Lastly, I illustrate how my results can be translated into medium and long term economic impacts o f the 4 H programs, and provide some policy recommendations.
14 CHAPTER 1 INTRODUCTION Motivation Programs must constantly compete for limited budget resources and must therefore frequently justify the funding they receive. This is especially true when t he programs are funded by state and federal governments. With the strong pressure in the United States to improve school quality and student testing outcomes, programs that are able to provide evidence of their effect on student achievement have a signific ant advantage when competing for funding with those that cannot. Although 4 H does not explicitly design its youth development programs with the intention of directly increasing standardized test score outcomes for its youth participants; there are several reasons why I might expect participation in 4 H to positively influence academic testing performance. Motivation behind this expectation can be derived from the fact that 4 H programs are based upon three core mission mandates: science, engineering and te chnology; healthy living; and citizenship. 4 H programs built on these mandates and constructed such that they positively engage youth in utilizing scientific reasoning, practicing healthy behaviors, and interacting in a positive manner with their peers an d mentors. Youth participation in 4 more direct ways that 4 H participation could lead to increases in their academic performance. SET projects within 4 H are designed with the experienti al learning process in mind such that participants are encouraged to not only think and make choices, but also to reflect on what they have learned afterward. This type of learning helps youth to be engaged in the learning process rather than passively abs orbing
15 knowledge ( California 4 H, 2011 ). Involvement in the learning process could help increase the overall interest in the subjects, such as mathematics and science, which the students might require as tools for their investigative processes. Enabling y outh to make healthy lifestyle choices not only enhances the quality of the lives of youth, but may also carry over to their school performance. The focus of 4 H on healthy living is significant since both good nutrition and good exercise habits promote h ealthy growth as well as helping children to resist disease (Silliman 2007; Satcher, 2001). If children are able to fend off sickness more easily, it could lead to fewer days of missed school. Fewer days out of school sick for school children might in tur n result in better test scores simply by those students being able to be present for more lessons and having to make up less school work. The expectation that 4 H participation would have a positive impact on student standardized test outcomes can be look ed at from another perspective. Consider the and subsequent analyses that replicated their methods studying the effects of 4 H These studies, which will be described in gr eater detail later, provide evidence that students who attended 4 H reported that they had better ac ademic outcomes, such as Because 4 H youth development programs follow similar guiding pri nciples in all states, it might be reasonable to suspect that a similar effect would be seen among students in other locations like Florida and Ohio, which are the subject areas for my analyses Although improvement in student class grades and improvement in standardized test scores are not exactly the same, there are reasons why I expect that they might be highly related.
16 In Florida, for example, school curricula and the Florida Comprehensive Assessment Test (FCAT) are based on the same standards, known as the Sunshin e State Standards (FLDOE, 2010) so it seems reasonable to assume that if student participation in 4 H can lead to positive effects on student grades, that participation would also be linked to improvement in standardized test scores for tests that are based on the same standards. Even though the above reasoning suggests that 4 H could have a positive effect on youth standardized test score outcomes, identifying a causal relationship is still difficult. The reason for this difficulty lies in the existence of many possible confounding factors that could obscure a causal relationship between 4 H participation and these testing outcomes. Family characteristics that I do not have information on, such as the percentage of students living in single par ent households, H as well as test score outcomes. I hold characteristics of school districts, years, grade levels as well as interactions between these variables constant by applying a differenc e in difference in differences (DDD) model to the combined data set of 4 H participation rates, student racial demographic data, and standardized test outcomes from the Florida Comprehensive Assessment Test (FCAT) and the Achievement Assessments (OAA). Lik e a fixed effects model, this strategy allows me to control for stable unobserved characteristics of grade levels, school districts, and years. However, this model has the additional ability to eliminate other possible unobserved confounding factors by al lowing us to also control for characteristics that are district specific for particular years, year specific for particular grade levels, and grade level specific for particular school districts.
17 After applying this model to my data, my results suggest tha t the extent of 4 H These results apply to the passing rates within school districts for both the mathematics and reading subtests of the FCAT. The differences between t hese results and the results obtained utilizing other models, such as ordinary least squares (OLS), indicate the importance of controlling for unobserved confounding factors. Background 4 H Background 4 H is the largest youth development organization in the United States, with over 6 million young Americans participating in the various programs it offers such as organized clubs, day camps, or school enrichment programs. 4 empower youth to reach their full potential through working and l earning in partnership with caring adults. 4 H accomplishes this mission through the dedicated efforts of more than 3500 cooperative extension educators and 518,000 volunteers. Since 4 inception in 1902, more than 60 million youth have partici pated i n its programs including fourteen governors, thirty three university presidents and chancellors, thirty one chief executive officers, and even four astronauts. The 4 H organization utilizes over 3,500 employees and 518,000 volunteers to deliver a wide arr ay of youth development programs to more than 6 million participants nationwide. As a youth development organization, 4 H reach is second to none. It can be found in every county in every state in the U.S. as well as in more than 80 countries around the world. 4 H is not only found in rural areas either, as might be suspected due youth who are involved in 4 H, nearly 45 percent live in either suburbs of major cities or
18 in cities with populations of over 50,000 ( Florida 4 H 20 10 ). 4 H has adapted to this change make up of their participants and includes a diverse range of subjects guided by 4 ng; and citizenship. One of the primary ways that 4 H influences the lives of young people is through their participation in 4 H Clubs. These clubs, which are guided by one or more adults, are often thought of as the foundation of 4 H and can consist of groups of almost any size. Generally, the projects in which 4 H club members participate fall into one of ten areas (Agricultural Literacy, Animal Sciences, Career Development and Workforce Participation, Citizenship and Leadership Education, Communication Sciences and Expressive Arts, Environmental Education and Earth Sciences, Healthy Lifestyle Education, Plant Science, and Science and Technology).However, since the interest of members often vary and since members of the clubs choose their own projects, t he actual experience of each 4 H participant is as personal as their projects ( Florida 4 H, 201 0 ). 4 H Afterschool is an initiative within 4 H that is designed to increase the quality and availability of after school programs for millions of children of al l grade levels in communities across America. These programs are offered at times when children are not in school and their parents are in need of safe, healthy, caring, and enriching environments for their children. Like many other facets of 4 H, the afte r school programs are designed on principles of youth development in such a way that they address the in terests through long term, structured, and sequentially planned learning
19 experiences while also addressing their physical, cognit ive, social and emotio nal needs ( 4 H, 2011). The F lorida C omprehensive A ssessment T est The F lorida C omprehensive A ssessment T est ( FCAT) is a standardized, comprehensive assessment test taken by students attending school in Florida. It is based on the Sunshine State Standards, which are a set of curricular standards for student achievement in Florida, while allowing school districts to have some flexibility in how they design their specific c urricula ( FLDOE, 2010 ). According to the Florida Department of Education, there are two main purposes of the FCAT. First, the test provides information to parents about the level of mastery of the skills of their children. Second, it provides information t o the public on the status of student education while allowing the public to hold schools and districts accountable for progress ( FLDOE, 2010 ). This progress relates to Adequate Yearly Progress (AYP) as mandated by the No Child Left Behind Act (NCLB) of 20 02 (PIRC, 2011). FCAT scores are, of course, not only important to outside stakeholders. The students themselves must achieve certain levels of FCAT performance in order to advance academically. In order to graduate from high school with a standard high sc hool diploma, students must earn scores on the Reading and Mathematics sections of the Grade 10 FCAT of 1926 and 1889, respectively. These scores correspond to achieving at least level 2 of the FCAT. Additionally, students must score level 2 or higher in o rder to be promoted from third to fourth grade. Specific FCAT scores for moving on for each grade level are not mandated by the state. However, students must perform at least at level 3 to be performing on grade level. Although the students in
20 middle grade s that perform below level 3 may be allowed to move on, they must receive remediation as well as receive additional diagnostic assessments in order to assess the speci fic nature of their difficulty (FLDOE, 2010) I therefore consider Level 3 to be a passing score for my analyses ( FLDOE, 2011). The O hio A chievement A ssessments T he O hio Achievement A ssessments (OAA) are a set comprehensive assessment test s taken by students attending school i n Ohio which measure progress toward AYP goals as required by NCLB much like the FCAT in Florida. Content standards for the OAA are s et and defined by the State Board of Education of Ohio. There are four main purposes of the OAA: assessing individual stat e content standards, providing data at aforementioned measurement of AYP For the OAA, the proficient level of performance can be thought of as analogous to level 3 performan ce on the FCAT, as it (ODE, 2011b). Students may be subject to retention in third grade if they perform in the limited performance range on the third grade reading subtest of the OAA. Specifically, districts have the options to retain the student in third grade, to promote the student to fourth grade by combined agreement of the teacher and principal that other measures of the tion, or to promote them with intensive intervention services in fourth grade (ODE, 2007). Research Goals There are several objectives that I have for this dissertation My first main objective is to: show if there is evidence of a causal effect of the ext ent of 4 H participation on standardized test outcomes in Florida. In particular, I analyze the effect
21 that the extent of 4 H participation has on FCAT scores; including effects on FCAT mean scores, individual level test outcomes, and test passing rates. M y second main objective is to: conduct similar analyses to those I conducted on student test outcomes in Florida on data for Ohio to see if students in another region of the country are experiencing similar effects to those shown by students in Florida for academic outcomes, specifically for outcomes for the state standardized tests My third main objective is to: examine other non academic outcomes for students in both Florida and Ohio to determine if there is evidence of a causal relationship between the se outcomes and the extent of 4 H participation. In particular, I analyze the effects of the extent of 4 H participation on measures of student misconduct and school suspension rates for students in both Florida and Ohio Fighting, harassment, and tobacco use are some examples of the types of outcomes I examine in the behavioral analyses.
22 CHAPTER 2 LITERATURE REVIEW 4 H Research As one of the largest Youth Development organizations in the world, 4 H delivers its programs to millions of young people throu gh land grant universities in all fifty states. 4 results in the organization drawing the attention of many stakeholders who want to know how the organization is impacting the lives of its youth participants. Surprisingly, however, l arge scale studies investigating the impact s of 4 H have only recently begun One of the first such large studies was th e National 4 H Impact Assessment Project ( 4 H, 2001), which was designed to gather information on youth and adult perceptions of the benefits of the 4 H Youth Development Program. The study used multiple classification a nalysis a method which examines the relationships among several categorical independent variables (Andrews et al., 1973) to analyze the ir data. Using this technique, t he authors analyzed f eedback from 2,467 youth and 471 adults from across the nation to ascertain if 4 H youth development i ncluded elements key to its participants developing in a positive manner. A few of the elements identified were: opportunities for self determination, a safe physical and emotional environment, and an inclusive atmosphere. According to the project results e ighty eight percent of youth reported that they felt they could try new or different things while ninety four percent of youth agreed that in 4 H they felt safe to try new things. Also, ninety percent of youth agreed that 4 H helps them to accept the dif ferences of others. These results served to
23 provide evidence that 4 H programs indeed contained features essential to positive growth and development of youth. The authors of the above mentioned study remarked that youth and adults associated with 4 H wer n order to gather data on the perceptions about the benefits of the 4 H Youth Development Program (4 H, 2001) Schwarz (1999) notes, that self changes to things such as the wording of a question, the format of that question, or the context in which that question had been asked can lead to drastic changes in the results obtained from survey responses. In Schwarz (1999) the author also offers a few examples of these is sues from some of the literature : list. Yet only 4.6% volunteered an answer that could be assigned to this category when no list was present (Schuman and Presser, 1981). When asked how successful they have been in life 34% of a representative sample reported high success when the numeric values ranged from 5 to +5, whereas only 12% did so when the numeric values ranged from 0 to 10 (Schwarz, Knauper, Hippler, Noelle Neumann, and Clark, 1991). One undesirable feature of this study from my point of view, is that it focuses only on youth and adult s associated with 4 H to assess the impact of 4 H youth development. This is an issue since concentrating on only those associated with 4 H, while ignoring non participants one overlooks the possibility that those associated with 4 H might be different (i n either observable or non observable ways) than other people These differences might not be at all associated with the effects of the 4 H program but could influence the survey outcomes and thus prevent drawing causal inference about
24 the program being an alyzed Literature that developed along the lines of Rubin (1974) explore s issues related to the ability to draw causal inferences. Another large scale 4 H study is the 4 H Study of Positive Youth Development also known as the Tufts study The study wa s a longitudinal investigation that surveyed youth and parents to measure the characteristics of positive youth healthy, positive development among diverse adolescents (Lerner et al., 2005). The results from the 2005 data were only for the first wave of data and were therefore only cross sectional. By the end of the fifth wave of the study, they had collected data from 4701 adolescents. Their cross sectional results fo r ninth grade in wave five indicated that the 4 H participants are 25 percent more likely to contribute to their families, themselves, and their communities. Additionally, the 4 H Study of Positive Youth Development found that the odds that 4 H participant s expect to go to college were 1.70 times higher than for comparison youth that did not participate in 4 H (Lerner et al. 2009). The 4 H Study of Positive Youth Development examines survey information from both youth development participants and non parti cipants. Although both the Tufts study and this current study are longitudinal in nature, there are some key differences between them. First, the questions the 4 H Study of Positive Youth Development attempts to target are different than those addressed in my study, especially with respect to the types of academic impacts examined. This study specifically addresses impacts on standardized test outcomes, instead of general academic performance Examples of the academic outcomes measured in their study includ e whether or not students expect to go to college and the percent of student s reporting getting mostly
25 The 4 H Study of Positive Youth Development has the advantage of utilizing individual level data for its analyses While my study utilizes aggregate data, it has the advantage of implicitly including many more students, with more than 1.5 million students being part of the aggregate data in each year for Florida and around 800,000 per year in the Ohio data. Impact studies of 4 H Youth Development pro grams have also been conducted at the state level by several states including Montana, Idaho, Colorado, and Nevada. The of School T ime Study, collected information from approximately 2,8 00 youth in fifth, seventh, and ninth grades (Astroth and Haynes, 2001). Their survey questionnaire, designed to help learn specifics about 4 4 H, including data on length of participation, as well as whether or not they participated in other out of school activities. Astroth and Haynes used the data collected from these surveys to compare youth who were active in 4 H to those who were not as well as to compare youth who were or were not a ctive in any type of structured out of school activity. Their analyses compared students on variables related to academic performance as well as on variables measuring at risk behaviors, like smoking. Results from the survey in Montana indicated that 33.4 percent of students active in 4 H only 19.6. Some of their other results indicated that 4 H youth reported that they were less likely to smoke cigarettes ( 10.3 percent ) as opposed to 16.5 percent among non 4 H youth The authors acknowledge that one drawback of their study is that they do not consider urban/rural comparisons. They offer several explanations for this, one of which
26 fferent meaning in other states than it has in Montana Though this study does examine both 4 H participants and non impact of the 4 H experience from the attribu tes of the youth who join 4 H (Astroth and Haynes, 2001). Additionally, Goodwin et al. (2005) note that one weakness of the Montana study conducted by Astroth and Haynes was that it failed to collect information on ethnic demographic data. In Goodwin et a l. (2005) the authors replicated the Montana study for Idaho. Their study included 3601 surveys from sixteen randomly selected counties across Idaho. Roughly 26 percent of youth respondents had been involved in 4 H for at least one year, while 16 percent h ad participated in 4 H for two years or more. Some of the results from the Idaho 4 H Impact Study were notably similar to those obtained from youth in Montana. Similar to the results found in the Montana study, Goodwin and his colleagues found that among s tudents who had participated in 4 H, roughly 36 percent study as well, with 6.2 and 8.6 percent of 4 H and non 4 H students reporting that they smoked, respectively. Like the Montana study the study of Idaho by Goodwin and his colleagues also anonymity of the subjects could be compr om ised if there were only one or two ethnic minorities present in a classroom where data were collected (Goodwin et al., 2005) Much of the same ideas were utilized to conduct a similar study in Nevada (Lewis et al. 2009). Unlike the previous stu dies, the authors employed a multivariate statistical
27 analysis with age groups, gender, 4 H participation, and population density as the independent variables. Additionally, they used two strata in their stratified random sample, which were urban and rural design make direct comparison of results between studies difficult ( Lewis Nevertheless, their study seemed to corroborate the positive characteristics seen in 4 H participants by previous stu dies. Under their analysis, they found that 4 H youth in Nevada were more likely to participate in leadership roles and that they have higher self confidence. They also found that respondents that were involved in 4 H did not differ in negative behavior pr acticed from non 4 H youth (Lewis et al. 2009) Studies estimating the impact of 4 H on youth outcomes seldom continue to explore the economic impact of those effects. A study co nducted at Texas A&M University bridge d this gap by exploring the link betwee n the leadership skills gained H to their earnings levels later in life. The authors chose to utilize wage the effect from Kuhn and Weinberger (2005), since their calculations fell into a broad r ange of wage ef fects (4 percent to 33 percent) They utilize social security administration wage data to calculate annual cumulative wage effects on 4 H club officers for the ten year period from 1996 through 2005. With this data their evidence suggests th ere is a positive impact of leadership on earnings of a $45,000 10 year leadership wage gain (McCorkle et al., 2007). Even among the studies mentioned above that compare 4 H participants with non participants, the reported impacts of 4 H almost exclusively rely on survey information from students, parents, or other adults on their perceptions of student performance levels, skills gained, or behaviors exhibited. As pointed out earlier, these self reported
28 perceptions are considered by some to be imperfect so urces of data (Schwarz, 2009). It (2001) Montana study, and those statewide studies replicating it, are correlational in nature as they do not control for selection into 4 H. As such, they stop short of being sources of information on the causal effects of participation in 4 H. Nonetheless, t hese studies offer important insights into different areas related to various aspects of youth developmen t that participation in 4 H m ight impact. F urther work is needed to extend these results on the impact of 4 H on youth outcomes in order to help policy makers relate to the benefits of youth development in terms of its cost to taxpayers. Research on Youth Development and After School Programs, and Other After School Activities 4 H is one of many choices youth face when selecting after school activities. These choices range from organized out of school time youth development organizations to school club participation and school athletic s. Because of increases in scrutiny and accountability, organizations and activities need to show evidence that they are providing youth with benefits in line with the desires of parents and other stakeholders. This requirement has driven researchers to ca refully examine the impacts of after school activity and youth development organization participation by youth. Other Youth Development and After School Programs Because of logistical and ethical issues associated with conducting experimental evaluations of these youth activities, research into the effects of youth development organizations and after school activities on their youth participants usually employ non experimental methods. For this reason, the random assignment evaluation of the Big Brothers a nd Big Sisters Program (BBBS) by Grossman and Tierney (1998) is
29 especially informative. Eccles and Templeton (2002) considered this evaluation to be specifically based on mento that half of the applicants to the agencies to be randomly assigned to treatment and control groups. Members of treatment group, consisting of 571 applicants, were matched with mentors where th e control group members, composed of 567 applicants, were put on a waiting list for 18 months. They determined the impact of BBBS by comparing the outcomes of these two groups at the end of the 18 month period utilizing information collected in baseline a nd follow up questionnaires as their primary sources of data. With this experimental design, the researchers were able to ensure that the the treatment youths had the oppor (Grossman and Tierney, 1998). The authors compared the groups along several characteristic areas including academic performance and attitudes, drug and alcohol use, and other antisocial behavior. Aca Little sisters reported grade point averages of 2.71, whereas their control group counterparts were 2.63 ( with the difference between the two statistically different from zero at the .1 0 level) The impact was even larger when comparing minority Little Sisters to minority girls in the control group, who reported GPA of 2.83 and 2.62 (statistically different from zero at the .10 level) respectively. They also found Little Brothers and Sis ters to be 45.8 percent and 27.4 percent less likely than their non participating counterparts to start using drugs or alcohol respectively. Hitting behavior
30 among Little Brothers and Little Sisters was 32 percent less than for control group youth, while n o impacts were seen on how often youth stole or damaged property (Grossman and Tierney, 1998). Positive impacts have also been seen from a youth development program known as Teen Outreach. The Teen Outreach program was designed with the intention of reduc ing adolescent pregnancy, school failure, and dropout (Allen et al., 1990) In their 1990 evaluation, Alle n et al. indicate that the overall effectiveness of Teen Outreach had been previously documented and that results from previous research showed that p articipants had significantly lower levels of suspension, school dropout, and pregnancy than did students in comparison groups (National Research Council, 1987; Philliber et al., 1989). The main focus of their evaluation was to examine why and when the Te en this study indicate that Teen Outreach sites were most successful when they worked with older (vs. younger) students, and when the volunteer component of the program Their evaluation considered both participant and comparison groups. However, unlike the BBBS evaluation previously discussed, students were not randomly assigned into the two groups. In their evaluation, the researchers assessed students at pr ogram entry and at program exit; therefore the authors did make some attempts to control for confounding factors. One of the methods they used was controlling for problem behaviors at entry to the program, which at entry thus appears to be at least a moderately effective strategy for
31 handling important differences among students entering the program (Allen et al ., 1990) Other After School or Extracurricular A ctivities Other research on after school activities examines the effects of multiple types of extracurricular activities in terms of their impacts on a variety of student outcomes. Some of these studies bre ak the analyses up and compare the effects of individual extracurricular involvement, the authors categorize these activities into pro social activities, performance activit ies, team sports, academic clubs and school involvement. They found that pro social activities, which they define as attending church and/or participating in volunteer and community service type activities, were associated with positive educational traject ories and low rates of risky behaviors. On the other hand, they found that participation in team sports was related also to positive educational trajectories but high er rates of drinking alcohol. A less recent study by Landers and Landers (1978) categoriz ed male stud ents in one high school by the type of extracurricular activities in which they were involved. This study utilized archival data from senior directories in high school yearbooks that ivities to generate the activity participation data. For their analysis, they separated students into categories based on if they participated in service and leadership activities, sports, and both types of activities or no extracurricular activities. Usin g court records for incidences of delinquency, which in their case meant incidences of either misdemeanor or felony offenses, they compared frequencies of incidences among their established categories. Their analyses found that lower incidences of delinque ncy were observed among individuals who
32 participated in at least some for m of extracurricular activities. They attempted to control for some measurement bias in the reporting of incidences of delinquency by accounting for the effects of socioeconomic statu s. Other positive associations between extracurricular activities and various outcomes have been found to exist. Hanks and Eckland (1978) found evidence of a direct effect of extracurricular activity participation on adult secondary associations or adult voluntary associations later in life. While Otto (1975) showed that, independent of economic status or academic performance, participation in extracurricular activities plays a significant role in educational attainment. When using a measure of total extra curricular participation to examine the relationships between participation in these activities and senior and postsecondary outcomes such as academic self concept or college attendance, Marsh (1992) found that statistically significant positive, although small, correlations existed. In contrast to the previous studies which included analyses that looked at the effects of various types of structured activities, Aizer (2004) looks at the simple effect of having adult supervision after school. She explores th e behavioral impacts of after school supervision on youth from ten to fourteen years of age. The outcomes of interest in her study were measures of negative youth behaviors such as skipping school or getting drunk. In her study, Aizer used ordinary least square (OLS) to conduct her preliminary analyses. However, she noted that many studies outside of the economic literature fail to account for unobserved characteristics that may be correlated both with youth outcomes and the decisions to allocate time to c hild care, which may impair the ability to draw causal inferences from the results of this body of literature. To assist in
33 ameliorating this problem, the author also conducted fixed effects (FE) regressions. Results from the study indicate that adult sup ervision of school aged children has a negative association with incidents of risky behavior and that this relationship seems to hold after controlling for unobserved family characteristics. The probability for getting drunk or high was .075 for unsupervis ed children, where as it was .048 for supervised children (difference statistically significant at the .01 level). A similar relationship was observed when examining the odds of a person hurting someone was .229 for an unsupervised child and .205 for a sup ervised child (difference significant at the .05 level). Reviews of Evaluations a nd Meta Analyses Kane (2004) examines and interprets the results of evaluations of the impact of four different after school programs: 21st Century Community Learning Centers, The After School Corporation, Extended Service Schools Initiative, and San Francisco Beacons Initiative. These program evaluations utilized different methods for conducting School Corporat participants and non participants, whereas the evaluation conducted on the 21st Century Community Learning Centers used both statistical controls and random assignment depending o n the school level of the evaluation. Kane notes, however, that significant impact on achievement test scores after one year of participation (Kane at these results may, in part, be an artifact of the lack of statistical power in most studies.
34 In Hollister (2003), the author examines whether or not after school programs are a good use of taxpayer dollars by looking at studies and reviews concerned wit h out of school time programs that focus on positive youth development. He draws conclusions based on ten studies that he considers to have utilized rigorous methodology to measure the impact of different youth development programs on an assortment of out random assignment evaluation design (Hollister othe r criteria for study selection are more appropriate for determining if program structures deserve more serious testing as opposed to being useful in assessing the impacts of youth development programs. In examining these studies, Hollister breaks down spec ific program aspects that seem to have an effect on youth. One such conclusion that the author is able to draw from these studies is that mentoring, or tutoring, appears to be an effective component of some youth development programs. One useful aspect of this review is the way the author breaks down the issues related to youth development programs. He points out many of the facets of program implementation that need to be considered when evaluating the effectiveness of youth development programs as succe ssful entities some of which include: the degree of targeting of particular problem behaviors, the participation of low income youth after the transition from elementary to intermediate schools, the operating location of the programs, the stage of developm ent for which the program is appropriate, and how the outcomes are measured for the program (Hollister 2003)
35 Reviews of program evaluations are often more critical and sometimes draw different conclusions about the success of youth development progra ms than the initial evaluations themselves. In his review of the impact of the Quantum Opportunities Program (QOP), Hahn (1994) concludes that significant differences exist overall as well as positive differences for each site including QOP program member s being less likely to have children while being more likely to be high school graduates. Roth et al. (1998) seemed to echo the positive findings of Hahn in his review of the QOP. In contrast, Hollister was far more critical of the conclusions draw in Hahn (1994), saying that the findings (Hollister not occur for the same site where the academic outcomes were isolated. Standardized Test Scores, Schooling Behavioral Outcomes and Related Economic Literature Standardized Test Scores Standardized test scores are often utilized in order to evaluate the effectiveness of organizations. This is especially true w ith respect to evaluations concerning the effectiveness of schools. Policy makers, parents, and other stakeholders are frequently interested in the effectiveness with which schools utilize resources, that is, how well they convert spending into measurable gains in student outcomes. This issue is in no to overstate the influence this articl e has had, inspiring an entire class of literature that seeks to analyze the link between school resources or other measures of school quality and student outcomes. The research studies conducted in order to understand the
36 impact of these factors on the ou tcomes of students are not merely academic exercises since research results often find their way rapidly into policy debates (Hanushek, 2006). While few people would be able to effectively argue that school children would be better off if schools had less money, the effectiveness of the spending that does take place (as well as how to increase the effectiveness of this spending) is still in question and this question serves as motivation for deeper understanding of how school resources affect student outco mes. Note that, within the literature, terms such as some overlapping components, while studies in the literature often differ in their choice of the specific inputs a nd outcomes measured. This literature serves as motivation for the use of a youth development program as another type of input to the school system in the production of positive student outcomes. This is because youth development programs could be viewed as another resource that, when existing alongside, or directly utilized by schools, promote student benefits that are aligned with gains in measures of academic achievement. Thus youth development program participation might be positively related to thes e academic performance measures, such as standardized test scores, even though the main focus of youth development organizations are not always specifically the improvement of academic outcomes. Much of the work examining student outcomes following the Col eman report has taken place within a production function framework, thus, this genre is often labeled 1986 ). There is a rather large variety of studies conducted utilizing the education production functio n as a framework
37 for their analyses. The majority of education production function analyses use measures of school quality against student outcomes in the form of various standardized test scores. Within these studies, school quality frequently consists of measures such as teacher to pupil ratios, teacher education (often if teachers have a degree or not), teacher experience, and expenditures per pupil. Of course, there are different examples of school quality measures, such as in Figlio and Lucas (2004), where the author examines the effect that high grading standards have on student performance on standardized tests (both the Florida Comprehensive Assessment Test (FCAT) and the Iowa Test of Basic Skills). Many of the studies addressed within Hanus article, which consisted of a comprehensive summary of the empirical evidence then available, also fall within this group of analyses. Test Score Outcomes and Economic Impacts Researchers have a multitude of options for outcomes available to e xamine the effectiveness of the education process other than standardized test scores, including incidences of crime and violence, dropout rates, or suspension rates. Unlike the effect of school quality on crime rates or dropout rates, the effectiveness of schooling on improving test outcomes is not typically the actual final objective that stakeholders might want to change. Rather, these performance measures tend to indicate effects that might have an economic impact at a later time, such as on income leve ls of individuals or the economic performance of a region. In the literature relating schooling to subsequent economic impacts; measured achievement on test scores has been shown to have an impact on earnings, even after controlling for various factors su ch as years of schooling attended. Hanushek (2006) notes that although many of the analyses focus on different aspects of individual
38 earnings, they generally find that there are substantial earnings returns to higher achievement on standardized tests. One example of this category of literature is found in Murnane et al. (2001) in which the authors examines several skills and looks at how measures of those skills predict earnings later in life. Academic skills, measured by Armed Ser vices Vocational Aptitude Battery ( ASVAB ) test, were one characteristic that they found important in determining wages a decade after students finished school Specifically, they found that roughly two thirds of the wage differential between Black males a nd White males in their analyses could be explained by differences in these groups in academic skills. They also found that the difference in academic skills explained almost all of the difference between White males and Hispanic males. Eventually, youth t aking part in youth development programs such as 4 H finish school and enter the labor force. Research has been done examining the link between standardized test scores across countries as measures of quality of schooling and economic growth rates. In Hanu measures of labor force quality from international mathematics and science test scores are strongly related to [economic] standard deviation difference on test performance is related to 1 percent difference in annual growth rates of gross domestic product (GDP) per capita (Hanushek 2006). Improved test scores themselves can be thought of as indicators of the effect of higher quality education leading to an increase in the human capital of the students. Thus, if 4 H has a positive effect on standardized test score outcomes of individuals this may give some indication in the improvement that 4 H has on the quality of the labor force since
39 entering participants might be thought of as having accumulated more human capital. Then it follows that if 4 H youth development acts to enhance the quality of the labor force available in the society through improvements in the human capital in 4 H participants there could b e noticeable impacts on economic growth. Higher economic growth could, in turn, have much more of an impact than the simple sum of the impacts on individuals through the effect of economic growth on the standard of living. As nship between measured labor force quality and economic growth is perhaps even more important than the impact of human capital and school quality on individual productivity and incomes (Hanushek Causal Inference and Program Evaluation In order to describe the ideal type of data that would be needed to perform the evaluation of a program, it might be necessary to first understand the main problem in establishing the cause and effect link for a program. The main idea behind this problem is that, for any given individual, only one of two situations can be true: either that individual is treated (participates in the program) or that individual is untreated (does not participate in the program). Therefore I cannot know exactly what the outcomes of inte rest for that individual would have been in the case that did not happen. The consequence of reality, that the counterfactuals are missing because only one of the two states could have occurred simultaneously, has been referred to as the As a result, the outcome under the state that is not observed must instead be estimated to solve this problem and obtain a causal e ffect for the program under evaluation. Therefore, I a m forced to rely on comparisons between different individuals
40 belonging to the untreated and treated categories to estimate this effect. Within the realm of medical treatments, the solution to the ident ification problem is accomplished by random assignment to treatment and control groups (Imbens and Angrist, 1994). The random nature of the assignment mechanism guarantees that treatment and control ws the average counterfactual to be estimated from the average outcome for the other group. Conceptually, economic program evaluation can be seen as mimicking a randomized medical trial. Since randomized trials lead to results which are often considered to be both simple and uncontroversial, the evaluation of several U.S. assignment of individuals within a program (Bloom at al. 1997; Schochet et al., 2001). For many reasons, h owever; the performance of randomized experiments is often infeasible. The cost of evaluating a large scale program through randomized experiments could be substantial. There could be ethical issues related to performing social experiments for the purpose of program evaluation. Despite the good intentions of designers and conductors of the experiments, people might be harmed, even if only being denied some advantage given to the experimental group. If the program was meant to confer some benefit, such as a more effective education, for example; the people who were selected out of the treatment group might feel unjustly treated. In fact, people might navigate around their random assignment into treatment and control selecting int o or out of the experiment (possibly R andom assignment provides a clear benchmark to gauge the alternative assumptions that are necessary to estimate the
41 causal effects using data that do not come from randomized e xperiments. S ocial scientists often attempt to estimate average causal effects by utilizing assumptions that allow them to conceptually "restore" a random assignment mechanism (Heckman et al., 1999; Morgan and Winship 2007). Methods used to restore the co nsequences of random assignment employ data on the outcomes of interest and on observed characteristics of both program participants and non participants. The ability to use observational data to produce estimates which could be said to resemble estimates originating from random assignment comes at the cost of having to depend on additional assumptions. The validity of these additional assumptions should be evaluated to determine whether or not the estimates generated from the data actually represent causal effects. Advancing Research on the Impacts of 4 H I do not currently have access to individual level data that relates to both test performance and 4 H participation. In my case, with respect to the data for Florida, the smallest unit of analysis I have is at the grade level within each school district for each year of the data (grade district year). Within these 2,680 of the Florida data, only 132 instances of total non participation in 4 H occur This widespread coverage of 4 programs cannot be attributed solely to student participation in in school 4 H programs such as school enrichment, which contains the largest number of 4 H participants. For a specific example, consider 4 H club participation and 4 H school enrichment program participati on for the 2004 to 2005 school year in Florida. Roughly 200,000 students participated in some sort of in school 4 H program, whereas approximately 24,000 participated in some form of 4 H club, of which community clubs contain the largest numbers of partici pants. However, within these two categories of 4 H participation,
42 seven school districts had zero levels of participation in school enrichment programs, whereas only one school district had recorded zero participation in 4 H clubs. Within the 2,640 grade county year cells that I have within the data for Ohio, there are zero instances of total, non participation. I therefore conduct my analyses for both states using the participation rate in 4 H within these cells as a measure of the intensity or extent of 4 H participation, thus making better use of the information at my disposal. My analyses for Florida utilize test outcomes from both the mathematics and reading subtests of the FCAT for each grade level and school district in Florida. By combining this dat a with demographic data from the Florida Department of Education and 4 H participation data from Florida 4 H, I am able to look at the effects that extent of participation in the 4 H program has on youth test performance measures. I use a similar set of da ta obtained for Ohio (from Ohio 4 H and the Ohio Department of Education), though there I examine not only reading and mathematics test outcomes, but also outcomes for science, social studies, and writing. In order to look at the effects that extent of pa rticipation has on student outcomes H as one input into the education process in an education production function framework. The inclusion of a set of relevant covariates as well as th e utilization of an extensive set of interaction terms in my model allows me to eliminate many potential confounding factors. For Florida, my data set includes information on third through tenth graders from all 67 individual school districts in Florida, w hereas I look at grades three through eight for Ohio. By employing these methods, I establish a causal relationship between the extent of 4 H participation and student test outcomes under some assumptions, which are
43 outlined in chapter 4 Establishing thi s kind of link will be an important step in developing valid estimates of the economic impacts of 4 H, which will in turn be helpful in producing informative cost benefit analyses for policy makers and other stakeholders. It is often a concern whether or not results obtained for analyses of one level of an organization (e.g. Florida or Ohio) are applicable at a broader level (nationally within 4 H) It would make sense that this would be true for 4 H because of its structure. The structure that 4 H has evo lved with, in its more than one hundred year history, is complex. However, even though each individual state may have its own particular organizational characteristics; the core philosophies, structures, mandates, and programs of these state level programs of 4 H are overseen at the national level. For this reason, it is reasonable to assume that many conclusions that might be drawn concerning the effectiveness of the 4 H program might extend to the programs found in other states.
44 CHAPTER 3 DATA I constr ucted two separate data sets to conduct my analyses on the impact of 4 H participation on student outcomes. The first section of this chapter describes the data that I obtained to construct the data set for the Florida analyses. The second section details the data that I gathered to conduct a similar series of analyses with respect to outcomes for students in Ohio. Florida Data The first section of this chapter discusses the data I utilize to conduct my analyses on the impact of the extent of 4 H participat ion on standardized test score outcomes. Archived Florida School Indicator Report (FSIR) data were available from the 2002 2003 to the 2006 2007 school years. According to the Florida Department of Education, ovides numerous indicators of school status and performance on public elementary, middle, and high schools for each of ( FLDOE, 2009 ). Some of the indicators contained within these data include: per pupil expenditures (exception al, regular, vocational, and at risk); percentages of types of school staff (instructional, administrative, and support); gifted and disabled. The main outcomes available w ithin the FSIR data are information on the performance of students within each school district on the mathematics and reading subtests of the Florida Comprehensive Assessment Test (FCAT). The actual scores for students in each district are not publicly ava ilable Instead, the data from the Florida Department of Education contain the percentages of students in each school district that fall within each of the five achievement levels, from the lowest level (level 1)
45 to the highest level (level 5) on each of t he subtests of the FCAT. For each of the five years of data these FCAT results are available for each grade level from third grade up to tenth grade. Data on racial/ethnic student membership was not readily available within the original FSIR data. However, the Florida Department of Education retained detailed demographic information in their record archives, which they provided to us upon request. In order to analyze the impact of 4 H attendance, I also obtain ed data on 4 H participation. The data I have ob tained consists of attendance numbers for 4 school, in school, and community programs within each school district compiled by Florida 4 school districts. The data represents 4 that 4 H offers within that county, they are only counted once. Additionally, since these data are aggregates of the dif ferent types of 4 H program participation, they effectively weight participation in each of 4 H different programs equally (i.e. community club participation is weighted the same as after school participation) The 4 H program is available in all counties i n Florida with each of these counties being its own school district As such, some of the students attending 4 H could cross into another county to attend 4 H in a different district than the one in which they attend school. I do not have information on t he number of students who choose to attend 4 H in a different district. However, I believe this number to be relatively small in comparison to the amounts that attend 4 H in their home school districts.
46 There may also be some concern that the degree of ho meschooling present in Florida could significantly impact the results of the analysis, especially since home schooling participation in Florida has been increasing steadily. Nonetheless, the actual percentages of students in Florida who are home schooled r emain quite small. Table 3 1 provides a snapshot of the home schooling statistics in Florida, showing that, over the years of my data set, approximately only 2% of students attending school in Florida were homeschooled ( FLDOE, 2009) Several steps had to b e taken before my Florida data was ready to use for regression analyses. First, I had to sort the FSIR data by individual grade level for each year and each school level. Each school district had separate values of characteristics such as per pupil expendi tures for each school level (elementary, middle, high). Data for each school level also included FCAT performance values, with separate columns of data for FCAT performance for each level of the FCAT for each grade level taking the exam within that school level. I had to rearrange this data by county such that each resulted in separate FCAT performance values for each grade level and by each school district. However, some of th e information did not originally have different values by grade level since the data had originally been arranged at the school level. Percent of transformation of the data, this value would be the same for sixth, seventh and eighth grades, but differ from the value for ninth and tenth grades (high school). Once this procedure had been separately completed for each of the five years of the data, I merged all five years into one combined set and set indicators for each year of the data.
47 Next I needed to develop my participation rate for 4 H, which was relatively straightforward. I combined data on grade level membership for each school district with data on 4 H participat ion by grade level within these same districts. The quotient of the 4 H participation values and the student memberships became my participation rate. Since this data was at the grade level for each district and year, I simply merged it with the data desc ribed in the previous paragraph. I also obtained racial membership data from the Florida Department of Education, by grade level, for each school district and year. Initially this data was given as membership values for each of the six racial categories ut ilized by the Florida Department of Education (White, Black, Hispanic, Asian, Native American, and Multiracial). After transforming these membership numbers into percentages, I merged these racial demographics with the rest of the data to obtain my complet ed data set for use in my regression analyses. As mentioned above, one of the key data transformations that I had to conduct was changing the 4 H attendance data into participation rates. This was necessary since, across the state of Florida, the numbers o f students within each school district differ greatly as do the numbers of students participating in 4 programs. Using the raw 4 H data on participation levels within each school district instead of participation rates could therefor e be very misleading when considering the effect of 4 H participation within a school district on student outcomes. Table 3 2 provides descriptive statistics of the data used for my analyses on Florida It includes statistics broken down by each of the fiv e years in my data set The first variable listed is 4 my analyses purposes as it represents the extent of 4 H involvement by students within a
48 school district at the grade level. With the exception of thi s variable, many of the variables in Table 3 2 such as the percent of teachers with advanced degrees Advanced degree teachers ), the percentage of English language learners ( Percent English l earner s ), per pupil expenditures, and the racial data are commonl y employed throughout the education production literature. The outcome variables are the various levels of FCAT math and reading outcomes as well as their passing percentages. Passing percentages include the percentages of students who scored the third ach ievement level or higher on that particular subtest, math or reading. I do not have information on average teacher salaries, a statistic that is often included as a covariate in the education production literature. I do, however, include the average number of years of experience of teachers as well as the percentages of teachers having advanced degrees Ohio Data In the previous section of this chapter I discussed the data that I gathered to conduct my analyses with respect to Florida In order to perform similar analyse s to those done in Florida, I needed to obtain multiple consecutive years of educational data from Ohio as well as matching years of data from the 4 H organization in that same state. I was able to obtain appropriate data for the analyses, including the data required for implementing the full difference in difference in differences ( DDD ) model for Ohio. The data used for the Ohio analyses comes from two sources, Ohio 4 H and the Ohio Department of Education. The 4 H participation data for Oh io comes from the yearly statistical reports prepared by Ohio 4 H and which are available online ( Ohio 4 H, 2011 ). The school discipline data and standardized test data come from the automated
49 online interactive Local Report Card (iLRC), which allows inter net users to create customizable reports of Ohio Department of Education data ( ODE, 2011 ). With hundreds of individual school districts in the state, the school districts of Ohio are much smaller units of observation than the school districts of Florida. Unfortunately, I was unable to take full advantage of this feature of the education data from Ohio since the lowest level of aggregation available within the Ohio 4 H data was the county level. In order to merge the two sets of data, I was required to aggr egate the Ohio schooling data to the county level. Hence, the data for Ohio is subdivided into 88 counties with the units of observation for this data set consisting of grade level county year cells Table 3 3 provides descriptive statistics of the data u sed for my analyses on Ohio It includes statistics for the full sample and also statistics broken down by each of the five years in my data set. As in the Florida analyses, the first variable listed in the table of descriptive statistics is 4 H participat outcome variables are the various levels of the Ohio Achievement Assessment for math, reading, science, social studies, and writing outcomes. The percent proficient and above variables are analogous to the pas sing percentage variables in the Florida data. They include the percentages of students who scored in either the proficient, accelerated, or advanced levels of particular subtest. Not all of the subtests were administered to every grade level of the data s et. The number of observations available for mathematics and reading are slightly different as a result. Science and social studies were only administered to fifth and eighth grade students. In addition, data for individual levels of science performance we re only available for three of the five school years examined. Only seventh and fourth grade students participated in the writing subtest.
50 Because data for the science and writing subtests were not available for all years I examined, several blank entries appear in table 3 3 Note that I was able to obtain an additional year of results for science percent proficient or above (the science subtest passing rate) alone, but not for the individual science subtest levels. Many of the other control variables are a nalogous to those seen in the Florida data. Some of these include the racial data, percent of gifted students, percent of disabled students, percent of students with limited English, percent of teachers with advanced degrees, and per pupil expenditures Fl orida and Ohio Data Comparison Though the data sets that I have constructed for Ohio and Florida are similar in many respects, there a few points regarding the differences between these data sets that are important to keep in mind when looking at the anal yses. One of the most obvious differences between the two data sets is in the numbers of grades and counties examined in Ohio versus the number examined for Florida. Figure 3 1 A provides a concise comparison of the organizational dissimilarities between t he Florida and Ohio data sets along the lines of the categories of which the county grade year cells are composed. This organization results in a lower number of county grade year cells in Ohio than in Florida, 2640 versus 2680. Nonetheless, this organizat ion results in a larger set of interactions in the Ohio data set as seen in the chart of data comparisons, Figure 3 1 B. One of the main difference s between the Ohio and Florida data sets used for these analyses deals with dissimilarity in the organizati on of school districts between these states. F or both Ohio and Florida, local 4 H program organization occurs at the county level. Thus, data on 4 H participation was available at the county level for both
51 states. In the case of Florida, this resulted in s chool district data from the FLDOE for wide school districts to already be aligned with data from 4 H. For Ohio, however, school districts are arranged into much smaller scale organizational bodies. Aggregation of the education data for Ohio from the 613 school district to the 2640 county grade year cells with mean enrollment counts roughly half the number of thos e for Florida at the cell level This a ggregation from the school district level to the county level solved the problem of merging the 4 H and education data for Ohio. However, there could be differences with respect to types of circumstances that the DDD model specification is actually control ling for when it controls for unobserved factors. In both states, for example, certain school districts might have mandatory class size restrictions in place for particular grade levels. For Florida, this is a systematic fact or that the DDD specification t akes into account. In the case of Ohio, though, policies set by one school district might not necessarily be adopted by neighboring districts that fall within the same county. Besides the differences in the structure of school districts between states, o ther important differences could be present between Florida and Ohio that would be relevant for my analyses. In particular, differences in the relative difficulties of the state standardized tests can be considered one of the most pertinent areas where dif ferences could occur between states with respect to my analyses on standardized test score outcomes. Recall that both states examined in my analyses construct their own standardized tests which serve as measures of progress of t oward AYP goals as
52 required by NCLB Though these tests are constructed for similar purposes, significant differences can occur between tests for each state, increasing the difficult y of making direct comparisons across state tests. To contend with the issue of comparability of diff earch report by de Mello et al. (2009) enable the comparison of math and reading standards for proficiency across states, specifically for those students in fourth and eighth grades. Examining their estimates for reading NAEP scale equivalents for fourth a nd eighth grades in 2007 standardized test to be a more difficult standard at both grade levels. Note that h igher equivalent scores relate to more difficult standards. For fourth grade state NAEP equivalent scores were 209 and 198 for Flori da and Ohio respectively. Eighth grade state NAEP equivalent scores were 2 62 and 240 for Florida and Ohio respectively ( de Mello et al., 2009) Examining their estimates for mathematics shows a narrower gap in the difference in difficulty levels between th e Florida and Ohio exams. Again, they look at NAEP scale equivalents for fourth and eighth grades in 2007 For fourth grade state NAEP equivalent scores were 2 30 and 225 for Florida and Ohio respectively. Eighth grade state NAEP equivalent scores were 26 6 and 2 65 for Florida and Ohio respectively (de Mello et al., 2009) Table 3 1. Florida home schooling s tatistics School Year 2002 2003 2003 2004 2004 2005 2005 2006 2006 2007 Number Home schooled 45333 47151 51110 52613 55822 Total Attending School 2476 244 2591566 2633719 2663389 2656720 Percent Homeschooled 1.83 1.82 1.94 1.98 2.10
53 Table 3 2. Variables and descriptive statistics for Florida data set 2002 to 2007 2002 to 2003 2003 to 2004 2004 to 2005 2005 to 2006 2006 to 2007 Variable Mean S.D. M ean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. 4 H participation rate 0.28 0.48 32.92 51.84 27.79 44.56 29.30 44.88 28.61 46.33 21.94 49.19 FCAT math mean 310.27 16.13 304.67 16.02 307.30 15.91 310.25 15.42 313.41 15.23 315.70 15.59 FCAT reading mean 302.59 14.15 298.07 13.66 300.10 14.77 301.95 14.16 305.65 13.24 307.16 12.84 FCAT math % level 1 20.5 9.16 23.64 9.23 22.34 9.72 20.44 8.82 18.80 8.29 17.28 8.21 FCAT math % level 2 22.26 5.04 23.00 4.37 22.54 4.49 22.29 5.05 21.94 5.38 21.54 5.67 FCA T math % level 3 30.98 5.73 29.83 6.15 30.00 5.96 31.35 5.30 31.43 5.34 32.31 5.49 FCAT math % level 4 19.69 7.23 17.82 6.67 18.63 6.86 19.53 7.04 20.99 7.48 21.47 7.44 FCAT math % level 5 6.57 3.9 5.73 3.47 6.43 3.76 6.42 3.86 6.88 3.91 7.38 4.27 FCAT reading % level 1 25.02 10.9 27.83 10.05 27.18 11.12 25.60 10.93 22.70 10.75 21.79 10.33 FCAT reading % level 2 21.64 7.41 21.91 6.63 21.22 6.69 21.67 7.67 21.81 7.99 21.62 7.96 FCAT reading % level 3 29.51 7.52 28.38 6.67 28.17 7.54 29.08 7.45 30.77 7.6 1 31.17 7.78 FCAT reading % level 4 18.33 9.18 17.04 7.82 17.64 8.66 18.10 9.41 19.53 10.12 19.34 9.50 FCAT reading % level 5 5.5 3.05 4.86 2.61 5.79 2.98 5.59 3.01 5.20 2.93 6.05 3.53 FCAT math pass % 57.24 12.25 53.38 11.48 55.06 11.99 57.30 11.91 59. 30 11.99 61.16 12.30 FCAT reading pass % 53.34 16.38 50.28 14.26 51.61 16.30 52.77 16.93 55.50 17.03 56.56 16.40 Advanced degree 30.77 7.26 31.31 7.58 31.34 7.16 31.00 6.94 30.33 7.22 29.87 7.32 % instructional staff 65.03 5.37 64.20 5.49 64.55 5.70 6 5.06 5.28 65.57 4.86 65.75 5.32 % English learners 3.69 4.21 3.24 4.09 3.62 4.16 3.65 4.14 3.87 4.22 4.05 4.42 Unemployment rate 4.17 1.01 5.08 0.88 4.55 0.89 3.75 0.71 3.36 0.66 4.10 0.89 White 64.76 19.59 66.38 19.12 65.71 19.33 64.91 19.62 63.85 19. 78 62.95 19.95 Black 19.82 14.87 20.37 14.94 20.15 14.97 19.73 14.95 19.54 14.84 19.32 14.67 Hispanic 11.95 13.15 10.54 12.40 11.12 12.68 11.91 13.14 12.70 13.46 13.48 13.84 Asian 1.23 1 .0 2 1.12 0.93 1.16 0.96 1.22 1.02 1.29 1.05 1.33 1.09 Native Am erican times 100 33 .27 41.6 0.30 0.36 0.30 0.38 0.33 0.41 0.36 0.44 0.36 0.48 Multi racial 1.91 1.46 1.29 1.11 1.56 1.24 1.89 1.40 2.24 1.52 2.56 1.60 % gifted 3.83 2.74 3.82 2.68 3.86 2.70 3.80 2.61 3.76 2.81 3.91 2.89 % disabled 16.62 3.73 17.07 3.70 17.00 3.79 16.69 3.75 16.31 3.79 16.01 3.52 % absent over 21 days 13.13 1.99 11.33 5.44 11.19 5.61 11.76 6.07 11.77 5.88 10.96 5.62 Average experience 11.4 5.73 13.13 1.99 13.81 2.02 13.48 2.06 12.91 1.78 12.87 2.29 Per pupil expenditures 5258 828 4517 364 4831 455 5173 549 5586 678 6180 795 Note : The full sample, from 2002 to 2007 contains 2680 observations, or cells. Each individual school year consists of 536 of these cells.
54 Table 3 3.Variable descriptive statistics for Ohio data set 2005 to 2009 2004 to 2005 2005 to 2006 2006 to 2007 2007 to 2008 2008 to 2009 Variable Mean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. 4 H participation rate 29.00 32.80 29.74 36.72 28.06 29.51 28.99 31.81 29.58 33.20 28.62 32.45 Math % advanced 15.45 8.98 7.60 5.22 14.24 7.67 15.53 9.75 17.37 8.30 18.59 9.02 Math % accelerated 19.32 5.44 16.69 5.30 18.42 4.83 19.07 5.19 20.14 5.95 20.98 5.12 Math % proficient 38.79 8.10 40.34 4.24 39.11 7.12 40.81 10.09 37.38 6.80 37.08 8.81 Math % basic 17.43 6.27 25.21 7.35 19.02 5.56 16.19 5.47 16.19 5.10 14.46 4.32 Math % limited 8.99 5.48 10.18 4.03 9.23 5.39 8.27 5.72 8.94 5.70 8.92 5.60 Reading % advanced 13.95 8.20 13.19 8.83 16.68 7.85 14.16 8.93 12.10 6.77 13.36 7.97 Reading % accelerated 25.93 6.56 25.2 5 7.15 24.69 4.88 26.59 6.47 27.46 7.20 25.43 6.67 Reading % proficient 39.69 11.30 40.16 12.21 38.31 9.94 39.96 11.73 39.92 11.95 40.25 10.74 Reading % basic 12.17 3.26 11.97 2.91 12.04 3.06 11.43 2.89 12.32 2.97 13.01 4.00 Reading % limited 8.26 4.01 9.48 3.89 8.30 4.03 7.73 3.40 8.23 4.39 7.97 4.07 Science % advanced 11.71 4.13 11.20 4.16 12.43 4.16 11.50 4.00 Science % accelerated 26.77 10.39 26.36 10.12 24.77 8.25 29.17 12.02 Science % proficient 30.34 6.74 31.34 6.07 30.1 6 7.43 29.52 6.56 Science % basic 26.82 6.06 26.36 5.75 27.36 5.40 26.74 6.92 Science % limited 4.24 2.57 4.34 2.50 5.30 2.80 3.09 1.84 Social % advanced 14.01 6.04 9.95 3.72 14.64 5.21 17.44 6.32 Social % accelerated 18.10 6.77 16.72 6.82 19.14 7.30 18.44 5.92 Social % proficient 25.81 4.43 28.44 3.51 27.39 2.74 21.59 3.46 Social % basic 33.85 6.30 34.92 5.60 32.69 7.03 33.92 6.04 Social % limited 8.12 5.22 9.58 6.04 6.16 4.00 8.63 4.84 Writi ng % advanced 3.06 1.92 3.24 1.44 2.70 1.25 1.96 1.00 3.12 2.09 4.20 2.25 Writing % accelerated 31.20 11.05 20.60 4.98 25.65 5.10 29.44 10.28 36.12 13.80 36.11 6.76 Writing % proficient 49.22 9.67 54.70 3.01 58.58 3.82 51.37 8.80 45.21 11.78 43.65 6.01 Writing % basic 12.93 4.66 13.51 3.41 10.03 3.46 15.23 4.59 11.86 3.44 12.85 5.65 Writing % limited 3.60 3.35 7.92 3.43 3.06 1.60 2.00 1.86 3.71 3.68 3.21 2.93 Math proficient or above 73.55 10.01 64.63 10.21 71.74 9.10 75.41 9.93 74.89 8.94 76.64 8.99 Reading proficient or above 79.55 6.21 78.59 5.85 79.65 6.46 80.70 5.67 79.46 5.99 79.03 6.73 Science proficient or above 68.53 8.55 67.69 9.67 68.90 8.53 67.34 7.49 70.18 8.13 Social proficient or above 57.91 10.30 55.10 9.76 61.16 10.35 57. 46 9.92 Writing proficient or above 83.47 5.78 78.54 6.14 86.92 4.75 82.77 5.21 84.44 5.73 83.96 4.95
55 Table 3 3.Continued. 2005 to 2009 2004 to 2005 2005 to 2006 2006 to 2007 2007 to 2008 2008 to 2009 Variable Mean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. White 92.91 10.77 93.66 10.57 93.49 10.65 93.36 10.65 93.19 10.61 90.85 11.17 Multi racial 1.50 1.89 0.88 1.30 1.09 1.51 1.28 1.70 1.49 1.87 2.76 2.31 Hispanic 0.96 1.89 0.79 1.66 0.82 1.74 0.86 1.76 0.88 1.82 1.47 2.33 Black 4.30 8.63 4.41 8.97 4.33 8.88 4.21 8.63 4.15 8.44 4.40 8.27 Asian 0.33 0.78 0.26 0.63 0.27 0.67 0.29 0.73 0.29 0.75 0.53 1.01 Native American(x100) 0.28 2.34 0.21 1.59 0.15 1.36 0.14 1.29 0.14 1.20 0.78 4.44 % gifted 18.37 7.55 18.43 7.06 18.93 7.47 18.56 7. 58 18.09 7.86 17.87 7.74 % limited English 0.82 3.25 0.67 3.25 0.76 3.20 0.84 3.25 0.88 3.28 0.94 3.29 % disabled 15.42 2.66 15.46 2.78 15.39 2.63 15.42 2.57 15.29 2.58 15.54 2.74 % teachers with masters 56.83 7.50 52.99 7.10 54.96 7.47 57.40 6.98 59.02 7.00 59.80 6.78 Unemployment rate 7.31 2.58 6.30 1.41 5.76 1.19 5.96 1.13 7.05 1.24 11.48 2.12 Teacher experience 14.83 1.88 14.51 1.61 15.13 1.96 15.00 1.95 15.01 1.99 14.49 1.76 Per pupil expenditures 8794 854 8265 710 8533 745 8796 759 9079 803 9295 830 Enrollment 1541 2463 1584 2576 1559 2522 1534 2457 1517 2409 1513 2354 Note: N o observations were available for the 2004 2005 through 2005 2006 school years for s cience or social studies OAA for specific proficiency ranges, with the exception of tho se scoring proficient or above available for science in 2005 2006 only. Total o bservations for all subtests varied based on availability of data for specific subtests and years : Math, 2376; Reading, 2464; Social, 528; Writing, 704; Science 528(704 profici ent or above )
56 A B Figure 3 1. Florida and Ohio data set comparison. A) Comparison of counts across each dimension of the Florida and Ohio Data B) Comparison of the number of interactions between th e dimensions of the Florida and Ohio data. (Note: In Florida, the County level and School district level are analogous.)
57 CHAPTER 4 METHODOLOGY Establishing a Causal Relationship In the previous chapter, I discussed the data I employed in conducting m y analyses on the impact of the extent of 4 H participation on standardized test score outcomes and student behavioral outcomes in both Florida and Ohio. This chapter details the methodology utilized in my analyses including ordinary least squares ( OLS ) an d fixed effects (FE) models, whose results I compare to the difference in difference in differences ( DDD ) specification, the preferred model specification used in my analyses. The separation of participants into treatment and control groups by way of rando m assignment is an important aspect of being able to establish a causal relationship between the treatment and outcome with in the realm of experimental treatment evaluations. However, one might ask why random assignment is needed to be able to state that the treatment has a causal effect on the outcome. The answer lies primarily with the elimination of alternative explanations for the relationship between the treatment and the outcome of interest ( Schneider et al., 2007 ). Under effective random assignmen t, both the treatment group and control group would have the same average characteristics (both observable and un ob servable), which would ensure that the only experimentally relevant difference between the two groups would be that one group would be expose d to the treatment and the other would not. Because experimental evaluation is often not plausible, non experimental econometric methods have been adapted to the analysis of treatment effects in an attempt to restore the consequences of random assignment.
58 O rdinary Least S quares In general, when dealing with observational data, the researcher cannot guarantee that the assignment to the treatment group or the comparison (control) group is random. This could lead the researcher to believe that th e treatment indicator is correlated with the error term which encompasses all unobservable factors related to the dependent variable (outcome) One way to attempt to correct for this is by estimating an OLS regression utilizing a set of control variables, X, to control for as many observable factors, as in the equation below (Winship and Morgan, 1999). Equation 3 1. For this method to restore the effects of a randomly assigned treatment, the set of observable s control variables must contain all variables that are important in the decision to be treated. This is the case of selection on observables or observable selection bias (Barnow et al., 1980). If the entire set of variables that are important in the decis variable bias occurs and it is not possible to uncover a causal relationship between the treatment and the change in the outcome of interest. In this case, the relationship d escribed by the OLS model merely represents a correlation. How then can one regain the effects of randomization to establish a causal relationship between the treatment and changes in my outcome in the face of possible omitted variables? Depending on the c haracteristics of the available data there may be several options. One possible solution to this dilemma presents itself when the available observational data consist of a panel data set. In this case, conducting fixed effects regressions may provide a sol ution.
59 Fixed Effects Using fixed effects regressions, the selection bias in the estimate of the treatment effect can be corrected for by eliminating individual (unit of observation) specific permanent effects. Using the model formulation presented in Winsh ip and Morgan (1999): Equation 3 2. In this formulation represents the fixed individual specific effect. Though a fixed effects regression framework can possibly aid in elimination of unobservable variables that are also related to selection into treatment through the differencing out of these fixed individu al effects, identification is still not guaranteed. As Winship and Morgan (1999) point out, "fixed effect models will only provide consistent estimates of the treatment effect if [the above equation] correctly models the time series structure of [ Y ] or if the fixed effects [ ] are the only unobservables that determine assignment to the Difference in Difference in Differences I attribute changes in test outcomes to changes in 4 H participation by removing other confounding factors that might otherwise obscur e this relationship. Factors that make this relationship harder to identify could exist at different levels. Unfortunately, even with the best data possible, many of these factors would still be unobservable. I n order to identify the causal effect that par ticipation in 4 H has on student test outcomes; however, one must find a way to eliminate these unobserved effects. Successful removal of these unobserved effects would allow the relationship between the extent of
60 4 H attendance and student performance on the FCAT to be interpreted as a causal relationship. Given that I have panel data over five year period s from the 2002 2003 school year through the 2006 2007 school year for Florida (2004 2005 through 2008 2009 for Ohio) ; I am able to use a DDD model t o sweep away the effects of multiple unobserved factors. Consequently, this model allows us to distill the influence of 4 H on test outcomes from the influences of other unobserved factors that might lead to differences in grade levels, school districts (c ounties) and years. Similar to the model in Currie et al. (2009), in which the authors analyze the impact of pollution on school absences, I apply the DDD model below to analyze the impacts of the extent of 4 H participation on FCAT outcomes. Equation 3 3 For Florida, my data includes outcomes that had already been aggregated at the school district level within each grade. In this case, my units of observation are a grade level within a school district for a particular year. Since 4 H data was not avail able at the school district level in Ohio, I aggregated the education data for Ohio to the county level. I use sub indices (i) for each the 67 school districts (88 counties) in Florida (Ohio), (r) for one of the five years of the data, and (g) for the grad e level. Thus, Y represents student test score outcomes in a given school district, year, and grade; T is a measure of the extent of 4 H participation for a given school district (county), year, and grade; X is a vector containing student demographic chara cteristics that includes percentages of students who are Asian, Black, Hispanic, Native American, White (omitted), and Multiracial; Z is a vector containing other covariates such as percentages of students who are English language learners, percentages of teachers with advanced degrees,
61 per pupil expenditures, percent gifted, and other covariates which might be related to the decision to participate in 4 H or to test outcomes. W is a vector containing covariates that vary by school district and by year b ut do not vary by grade level, such as unemployment rates within a county. School district (county) fixed effects, grade I also include sets of dummy (indicator) va terms help to control for a wide variety of unobserved factors that could potentially be impact ing student test score outcomes, thereby allowing us to more clearly isolate the effect of 4 H participation on test scores. For Florida, the school district grade level effects control for differences between grade levels among different school districts that do not change over the five years of the data and are the most extensive set of interactions that I include, consisting of 536 effects. The county grade level effects are also the largest set of interactions for Ohio, and consist of 528 county grade level effects. Consider that som e districts could have mandatory class size restrictions for captured by the observed covariates. This difference in learning conditions could, in turn, affect the would therefore be an important factor to take into account when attempting to isolate the causal effect of the extent 4 H participation on test outcomes. Note that this particular example might be more relevant to Florida, as opposed to Ohio, since in the latter state the school districts that set policies are a much smaller sized unit than the county level of aggregation.
62 The 335 school district year effects for Florida, and the 440 cou nty year effects for Ohio, are my second largest sets of interactions, controlling for factors that change over time for different school districts or counties but that are universal among grade levels. With respect to Florida, these interactions could acc ount for changes over time in relative test preparation intensities across different school districts. Variations in these intensity levels might also be directly and indirectly affecting the extent of 4 H participation. Consider a school district under a high degree of scrutiny for worsening test scores of its students. An increase in test preparation assignments might limit the time students have available to spend on outside interests, such as 4 H. These interactions could also reflect changes in the cha racteristics in the county populations occurring over the years of the data set that are not already captured by the observed covariates. An example could include an increase in the percentage of parents in the district with college degrees resulting from an increase in high level job opportunities from a newly established hospital or college. This example might be more applicable to the setting of Ohio with respect to this set of interactions Additionally, parents with college education might be more awar e that universities are often concerned with participate in activities such as 4 H. Thus, these levels of parental education could also to participate in 4 H and affect the extent of participation within the school district or county The third type of interaction I include consists of forty grade year effects for Florida and thirty grade year effects for Ohio These interactions control for factors that change across years and are different for each grade level yet remain constant across school
63 districts or counties As standardized tests evolve, they often try new questions or cycle different questions in and out of the test banks, vary ing the overall difficulty of the tests from year to year. With this in mind, these interactions could conceivably capture differences in the relative di fficulty of the FCAT or the OAA for each grade level from year to year. Notice that in equations 3 1 th rough 3 3, I model the extent of 4 H participation, possibility that 4 H participation could enter these equations in a non linear manner by modeling the equation for the DD D model with the 4 H participation rate also including both squared and cubed terms. Using F tests to test the joint significance of the non linear treatment effect terms (the square and cube), I was unable to reject the null hypothesis that these terms ar e of non linear component being equal to zero at the 5 percent level of significance in all but one of the fourteen regressions. For this reason, I concluded that it was reasonable to continue with the assumption that the treatment effect enters the model linearly is statistically supported by the data. Since the DDD model controls for the largest possible number of potential confounding factors, given the data I have on hand, it is my preferred specification. Using this model, I am able to identify the cau sal effect of the extent of 4 H participation on measures of FCAT performance as long as any remaining unobserved factors impacting FCAT scores that are not captured by the variables and fixed effects are uncorrelated with the extent of participation in 4 H. This is my underlying identifying assumption, and although it is inherently untestable. I do conduct falsification tests in chapter 7 in an attempt to indirectly assess its validity.
64 As a final note to this section, and noting that my analysis is carri ed out with aggregate data, I briefly discuss its relationship with a model employing individual level data (if it were available). In Flores Lagunes and Timko (2011) a model is developed which illustrates that the aggregate level model of this study ident ifies the individual level effect of 4 H when interpreted as the average effect of 4 performance. Additionally, the aggregated model potentially avoids identification issues that may arise in an individual level analysis.
65 CHAPTE R 5 IMPACT OF 4 H PARTICIPATION ON S TUDENT OUTCOMES IN FLORIDA Florida C omprehensive A ssessment T est Analysis The estimated effects of the extent of 4 H participation on p erformance on the mathematics subtest of the Florida Comprehensive Assessment Test ( F CAT ) along with the associated p values, are reported in Table 5 1. Likewise the estimated effects for the extent of 4 H participation on student p erformance on the reading subtest of the FCAT are reported in Table 5 2. Both of these tables present result s for the coefficient on 4 H participation rate under each of the four model specifications: ordinary least squares ( OLS ) Model 1; OLS with control variables, Model 2; fixed effects ( FE ) Model 3; and difference in difference in differences ( DDD ) Model 4 T he coefficients for the effect of the extent of 4 H participation on mean FCAT scores as well as on passing levels (percent scoring level 3 and above) are also shown for both subtests I also show the results for each of the five individual performance levels of the FCAT, with level five being the highest level and one being the lowest. Note that only the results for the coefficient of interest (4 H participation rate) are presented in the aforementioned two tables whereas the results for the coefficien ts for the other control variables under the preferred DDD model are presented in Table 5 3 for levels one through five of the mathematics subtest with results for passing percent and mean FCAT scores presented in Table 5 4. Similarly for the reading subt est, results for levels one through five are presented in Table 5 5 and results for passing percent and mean FCAT scores are presented in Table 5 6 Because of positive influences of the 4 H youth development program, I expect that a higher degree of 4 H participation will be positively related to better performance
66 on the FCAT. Therefore, I expect a positive relationship with both average scores and passing rates. Likewise, I anticipate that as the extent of 4 H participation increases I should witness an increase in the percentage of students performing in the highest level of the FCAT, also known as level 5. While level 5 is the highest level of FCAT performance, level 1 is the lowest level of performance. Since I expect 4 H participation to be positi extent of 4 H participation increases it would result in a decrease in the percent of students that perform poorly. This results in my expectation for the relationship between the ex tent of 4 H participation and the percent of students that score in the lowest performance category, level 1, to be a negative relationship. Because the top level of performance, level 5, and the bottom level of performance, level 1, are on the extreme end s of the performance spectrum, the expectations for the influence of 4 H on the percent performing in these levels is straightforward I also present results related to the intermediate levels of FCAT performance, level 2 through level 4. Note that the ta bles containing the main results, like Table 5 1, present changes in a single outcome (e.g. percent of students achieving level 5 FCAT performance), which are looked at independently from changes in other outcome categories (e.g. percent of students achiev ing level 4 FCAT performance). However, if I take each of the intermediate performance levels, level 2 through level 4, and look at them in isolation, it is unclear what the expectation for these levels should be. To elaborate, consider a situation where 4 H has a positive effect on student performance on the FCAT at all levels of performance. I expect that as performance increases, students should move up from one performance level to the next. The percentage of
67 students in a particular intermediate level of performance is essentially the net percent left in that intermediate category since some portion of those students might improve to the next higher category (decreasing the total percent in that category) and some students might improve from the perfo rmance category below (increasing the percent of students in the category being considered). However, if the percent of students that improve and move into an intermediate category are equal to the percent of students that improve and move up to the next c ategory, then the percent falling into that particular intermediate level of performance will remain unchanged, even with an improvement in student performance. Also consider an alternative situation where students improved performance on the FCAT, but mor e students improved and moved to the next higher level than the percent that improved from the level below. This would result in a decrease in the percent of students achieving that intermediate performance level even though performance gains were made ove rall by students. I therefore have no particular expectation for the relationship between 4 H participation and these intermediate levels of performance. This is because, taken in isolation, changes in the percent of students falling into these intermediat e categories do not give information on the presence of improvement (or declines) in performance. This means that even if I have a prior expectation that 4 H participation would improve FCAT performance for various reasons, I cannot look at changes in the intermediate categories of performance and say whether or not those changes, of either sign, indicate an improvement or decline in FCAT performance as the extent of 4 H participation increases. For this reason, the results I focus on are the percentages o f students in the highest (level 5) and lowest (level 1) categories of FCAT performance, as well as the average FCAT
68 scores and percent passing the FCAT. These categories have unambiguous expected signs if 4 H participation improves FCAT performance. Next, I present more of a detailed description of the tables containing the main results, prior to presenting the results themselves, in order for the reader to more easily understand the reported findings presented. The main results for the effects of 4 H part icipation on outcomes related to FCAT performance are presented in tables 5 1 and 5 2. The main outcomes observed, in separate regressions, are presented on the left side of these tables. These include outcomes such as the average score and the percent of students achieving the highest level of FCAT performance, level 5. Each of the models under which these results are generated, as described in the chapter on the methodology, are presented from left to right, with the models listed in the column headings and the preferred DDD model presented furthest to the right. For each of these models and outcomes, the coefficient for the extent of 4 H participation is presented. This coefficient represents the change in that particular outcome of interest for an incre ase in the extent of 4 H participation. Thus a positive coefficient indicates that a change in the extent of 4 H participation is positively correlated with a change in that outcome (with the estimated coefficients of the DDD model having a causal interp re tation under my assumptions) For a specific example of the coefficient interpretation, consider the result for the average test score outcome for the reading subtest. The value of the coefficient for 4 H participation under the preferred DDD model is 1.02 0, as seen in Table 5 2. This estimated effect implies that a one percentage point increase in the extent of participation in 4 H results in an increase of 1.02 points in the average reading scores of the FCAT. Additionally, the asterisk beside
69 the coeffic ient in the table indicates that this estimated effect is statistically significant at the ten percent level. I expect that 4 H participation is positively related to improvements in student test performance. In relation to tests of statistical significanc e on student test score outcomes, however; my null hypothesis is that the extent of 4 H participation has no influence on student test outcomes. I use asterisks to indicate statistical significance at different levels with in the tables. B ecause the statist ical significance levels conventionally used are actually chosen arbitrarily, each reader might have his or her own reason for considering certain levels of significance as statistically significant or not. For this reason, I also include the p value for e ach of these coefficients. In simple terms, the p value can be thought of as the lowest level of significance at which the null hypothesis can be rejected. The first models included in tables 5 1 and 5 2 are the OLS regressions of the measures of performa nce on the FCAT test against the 4 H participation rate. These OLS results are obtained from the model that contains the rate of 4 H participation as the independent variable with no other control variables, fixed effects, or interaction dummies. This rel atively nave model, which is mainly shown for comparison purposes, provides a simplistic representation of the correlation between FCAT performance measures and 4 H participation. A notable consequence of this simplicity is that many of the signs of the e stimated coefficients are counter to prior expectations based on positive influences of 4 H on testing outcomes. In fact, this model shows a negative effect on mathematics subtest scores of 2.528 (for average scores) and 4.853 (for percent passing) for t his subtest that are both highly statistically significant (with p
70 values of 0.000). There are many unaccounted for factors that could contribute to these counterintuitive results. The consequences of failing to take into account other information on fact ors related to both FCAT performance and 4 H participation are illustrated by the initial OLS results as described above. Even though many of the Model 1 results are statistically significant, the associations they show are unlikely to describe a causal re lationship as a result of omitting relevant variables. In Table 5 1, the second model OLS with controls, contains a set of observed control variables in order to better isolate the effect of 4 H. These control variables include student characteristics suc h as racial composition, percent of English language learners, percent of gifted students, and percent of disabled students. Polynomial terms are also included for each racial category (with whites being the base category in my regressions) as well as inte raction variables for race and unemployment. Additionally, I incorporate into this set of control teachers with advanced degrees, and average years of experience of teachers. Even though this model includes information I obtained on observed variables, it still fails to control for the possible presence of unobserved factors and therefore is not the best model for representing the causal effects of 4 H participation on testing outcomes that I can use given the data available Even though Model 2 may be insufficient for uncovering causal effects, some interesting changes occur between Model 1 and this model, which contains available control variables. Most notably, the signs for the effect of the extent of 4 H participation on both the mathematics and reading average FCAT score have switched from negative
71 to positive, although the coefficient for the effect on mathematics subtest is no longer statistically significant (with a p value of .684). The coefficient for 4 H participation on average reading scores, which went from a negative value of 0.757 in the simple OLS model to a positive 1.334 in the model that adds observed control variables is statistically significant (with a p value of 0.003), however. Additionally, some of the counterintuitive results for the effects within various levels of FCAT performance have decreased in magnitude. For example, the effect on the lowest level of mathematics performance decreased from 2.2 79 (with a p value of 0.000) in the simple OLS model, to 0.846 (with a p value of 0.026) in the control variable OLS model. Others effects have also switched to the expected sign, such as the effect on percentage performing in the highest level of the FCAT reading subtest, which is positive in the model with controls at 0.299 (with a p value of 0.007). Once the previously confounding missing information, in the form of the observed covariates, is included in the OLS model, the extent of 4 H participation it self explains less of the variation in testing outcomes. The third model analyzed takes into account fixed effects and includes separate effects for years, grade levels, and school districts in addition to the control variables introduced in Model 2. The estimates obtained from this fixed effects model show positive effects for the passing rate for reading of 0.497 (with a p value of 0.071) as well as for level 5, the highest level of the reading subtests, of the 0.222 (with a p value of 0.056). Many of t he other coefficients for the fixed effects model were, however, not statistically significant at even the ten percent level. Even though this model seems to be lacking in terms of the statistical significance of the estimated effects, it is interesting th at many of these coefficients now have their expected signs. For example, the effect
72 of the extent of 4 H participation on the highest performance level of the math subtest, which was negative for the two OLS models is now positive, with a coefficient val ue of 0.275, in Model 3 (with a p value of .158 in the FE model). Since the fixed effects model does not include the full set of fixed effects available with the use of my data, it is not the best model I can employ for capturing relevant unobserved facto rs. The DDD specification, or my preferred model, includes the available covariates from Model 2, the fixed effects included in Model 3, as well as interactions between each dimension of fixed effects. I believe this most fully controls for unobserved effe cts and best isolates the effect of 4 H on FCAT test outcomes. Joint significance tests for the interaction terms added in the DDD model are provided in Tables 5 7 and 5 8. For all levels of the FCAT, as well as for average and passing score outcomes, the sets of interactions are highly statistically significant. This shows that the interactions do matter in explaining the unobserved variation in test outcomes. This highlights the importance of controlling for these factors when seeking the causal effect of 4 H participation on test outcomes. The effect on the average FCAT scores for both mathematics and reading are positive, and have nearly identical magnitudes, 1.021 for the mathematics subtest and 1.020 for the reading subtest (with p values of 0.159 and 0.061, respectively). Although only the effect for reading is statistically significant at a ten percent level, both of these effects are higher than both of their corresponding estimated effects seen under the FE model. The estimates from this model also suggest that the extent of 4 H participation positively affects passing rates for both math and reading. The coefficient for the effects of 4 H on the percent passing mathematics was 1.102 (with a p value of 0.060), which
73 is similar in magnitude to the 0. 803 statistically significant (with a p value of 0.065) coefficient for the effect on the percent passing reading. In this case, the effects on passing rates for each subtest seem to be more informative than the effects on the percentages of students in e ach separate achievement category of the FCAT. However, the results for the DDD model also provide evidence of a significant impact of 4 H on the top level of reading subtest scores, level 5, of 0.374 (with a p value of 0.086). The lowest level of mathem atics scores also showed a statistically significant negative impact of 4 H of 0.879 (with a p value of 0.054). Recall that this effect is the expected sign for the effect of 4 H on the lowest level of FCAT performance since a reduction in the percent of students in the lowest level would indicate that 4 H was having a positive effect on low performing students. Note also that there is a statistically significant impact for level 4 for mathematics subtest (with a p value of 0.098). However, I did not have prior expectations for the signs of intermediate levels of the FCAT since improvements in student performance could result in coefficients of any sign as explained at the beginning of this section. Florida School Quality Results It would have been su rprising if I had found positive impacts of factors commonly used as measures of school quality given the relative scarcity of results in the literature showing significant positive impacts resulting from changes in these characteristics. However as seen in Table 5 9 I do find some effects of these factors when looking at my full DDD model containing controls for observed covariates, fixed factors, and interactions between these factors. The average yea rs of experience of teachers has a statistically sign ificant impact on student test performance on both the top and bottom levels of the reading subtest of the FCAT of 0.10 for the top level and 0.22 for the
74 bottom level of the reading subtest (with p values of 0.03 and 0.01, respectively) While for mathema tics, per pupil expenditures seem to be the factor most strongly associated with students achieving the highest levels of mathematics performance with a coefficient of 0.44 (with a p value of 0.03) for level 5 of the math subtest There does not appear t o be a strong relationship between per pupil expenditures and the performance of students in the lowest achievement levels in either mathematics or reading. However, for both subtests these lowest achievement levels level 1; the percentage of instructiona l staff is highly statistically significant with coefficients of 0.19 (with a p value of 0.00) for math and 0.14 (with a p value of 0.03) for reading Florida Urban Rural Results I analyze d the original data again, but separate d the sample into urban and rural school districts. In order to decide which counties to consider urban or rural, I utilized the definition of rural as it was defined in Florida Statute 288.0656, Rural E conomic unty with a population of 75000 or less or a county with 125,000 or less that is contiguous with a county of population 75,000 or fewer (REDI, 2009) This splits the original sample into 32 rural and 35 urban school districts. The results for the urban a nd rural comparison for the mathematics and reading subtests, Table 5 10 and Table 5 11 respectively, seem to indicate that the effects of 4 H participation in urban areas might be driving the results of positive impacts of 4 H. Compared to the sample used for the original model, the subsample of urban counties shows larger magnitudes of effects for both mathematics and reading average FCAT scores as well as for passing rates. Several of these results are highly statistically significant. The results for th e average mathematics FCAT scores, for example, have a
75 coefficient of 3.417 (with a p value of 0 .007 ) Also, the results for level 1 of the mathematics subtest the lowest level of perform ance, show a coefficient of 2.364 (with a p value of 0.005). Recal l again that for the lowest level of the FCAT a negative coefficient for the effect of 4 H participation is the sign that conforms to the expectation of a positive effect of 4 H on student test performance. Additionally, under the results for the reading s ubtest, the coefficient for the extent of 4 H participation on the percent passing the FCAT is positive(1.598) and statistically significant (with a p value of 0.029). The result s of this section might be surprising to som e who think of 4 H being rural ins titution because of its traditionally agricultural roots. It is possible that these results indicate that disadvantaged urban youth are able to experience more benefits from participation in 4 Florida Club Rate Results In addition to comparing the effects of urban and rural segments of 4 H, I used the current data and the DDD model to focus on a specific type of 4 H participation, 4 H club participation. Tables 5 12 and 5 13 contain models identical to the DDD model examin ed previously, with the exception of the addition of an interaction term for the rate of participation in 4 H clubs (as opposed to the original measure that looks at any type of 4 H involvement). As seen in Table 5 1 2 for mathematics, and Table 5 1 3 for reading, w hen the club rate interaction is added to the model the general effects of any type of 4 H participation seem to lose their statistical significance. In fact, neither the reading nor the mathematics subtests show statistically significant impacts for average scores or passing rates. At the same time, however, the impacts of club rate participation seem to eclipse the magnitudes of the previous effects seen from the original model. The coefficient for the effect of 4 H clubs on the percent passing the
76 math subtest of the FCAT is 5.75 and is statistically significant at the ten percent level (with a p value of 0. 0 93 ), while the coefficient for the effect on the lowest level of the math subtest is 4.731 and is statistically significant (with a p valu e of 0.071). A mong the various avenues through which 4 H delivers its youth development programs, 4 H clubs are the method that most strongly adheres to the philosophies of positive youth development. In other words, 4 H clubs are a more intensive form of 4 H participation relative to other delivery methods of 4 H. These results seem to support the idea that more intensive 4 H programs have a better chance of having positive effects on students. Florida Misconduct Results I n the previous section s, I concent rated on the impact of the extent of participation in 4 H on academic testing performance utilizing a DDD approach to control for potential confounders of the causal relationship Although these results are informative about the academic effects of 4 H, they do not address potential impacts of 4 H on other aspects of youth development. I extend the previous analyses below using data from Florida to examine the effects of 4 H participation on several measures of student misconduct as well as suspe nsion rates in Florida. The grade level 4 H participation counts data were aggregated to match the level of aggregation in the suspension and school misconduct data as reported by the Florida Department of Education (FLDOE) which is only at the school ty pe level for elementary, middle, and high schools. This new level of aggregation results in a reduction both in the number of cells (observations) and in the number of fixed effects ( interaction dummy variables and simple fixed effect dummy variables ) used in the analyses. Specifically, the number of observations decreases from 2,680 district grade year cells to only 1005 district school
77 type year cells. The aggregation also results in a decrease in the number of fixed effects from 991 to 626, when includin g both interactions and simple fixed effects. These 626 effects include 201 school type district effects, 15 school type year effects, 335 district year effects, and 75 simple fixed effect dummy variables. New 4 H participation rates were then calculate d for each school district and school level over each of the years of the data. The demographic data for the six racial/ethnic categorizations were aggregated to school type levels in the same manner. These newly calculated participation rates and student demographic data were then combined with the school misconduct data into one data set for use in conducting DDD analyses utilizing similar econometric techniques to those employed in the analysis of the impacts of 4 H on standardized test score outcomes. T herefore the preferred DDD model utilized to analyze the impacts of 4 H participation on these other outcomes is similar but not identical to the previously used DDD model. The estimated effects of the extent of 4 H participation on performance on both the categories of student misconduct and the two suspension types, along with their associated p values, are reported in Table 5 1 4 Th e coefficients presented in Table 5 1 4 represent the percentage change in the number of incidents of that particular type of student misconduct for a one percent change in 4 H participation. Exposure to the influences of positive youth development as youth participate in 4 H programs should result in students interacting better with one another and with adults. Thus, the expec tation is that participation in 4 H programs is negatively related to students engaging in undesirable behaviors. Results for four models are shown for comparison purposes as in the model analyzing the impacts of 4 H participation on test outcomes. Again these models
78 include the OLS model the OLS model with control variables, a simple fixed effects mod el and our preferred DDD model Both the simple OLS model and the OLS with controls model have many coefficients for student misconduct that are of the e xpected signs. Model 2, relies on the observed variables to isolate the effect of 4 H on student misconduct. Fixed effects for years, school levels, and school districts are added to the model in Model 3 to begin to account for potential unobserved factor s. Under the model including fixed effects none of the coefficients for the relationship of 4 H participation with any of the outcomes are statistically significant. When the interactions between the fixed effects are included in Model 4, some of the estim ated effects are again statistically significant. These estimated effects are in the opposite direction of the prior expectations of the effect of 4 H participation on undesirable student behaviors. Specifically, the estimated effect of 4 H on acts against persons is positive and statistically significant at the .05 level (with a p value of 0.048) while the estimated effect on out of school suspensions is positive and significant at the ten percent level (with a p value of 0.096) As I stated, these effect s are in the opposite direction of prior expectations since I would expect positive youth development participation to have a negative relationship with all categories of student misconduct. One of the reasons this could be occurring is the fact that for t hese outcomes, the data is only collected at the school type level, whereas the data on FCAT performance was available at the grade level. This loss of detail could result in the DDD methodology no longer controlling for all relevant unobserved factors, wh ich can obscure the true causal relationship with these outcomes. Therefore, I decided to explore if alternative ways of analyzing the sample
79 can shed light on these counterintuitive results. I divided the sample into subgroups are be possible with the cur rent level of aggregation of this data. One of the analyse s undertaken was to separate the sample into urban and rural school districts, similar to the supplementary analyses performed in the section related to FCAT scores. The results for these analyses a re presented in Table 5 15. When running the DDD model separately on these two subgroups, I found some evidence of positive influences of 4 H participation for the urban subgroup. The effect on drug (including alcohol, tobacco, and other drugs) use or poss ession showed a coefficient of 1.374 and was statistically significant (with a p value of 0.025). Also, among urban school districts, the coefficient for the effect on fighting and harassment was 4.197 and statistically significant at the ten percent lev el (with a p value of 0.087). Not all of the effects for the urban districts were in the expected direction, however. The coefficient for the effect of 4 H participation on weapon possession, in particular, was 0.198 and still statistically significant at the ten percent level (with a p value of 0.097) For the rural subgroup of school districts, none of the effects were statistically significant. Since many occurrences of discipline problems happen after elementary school, I also looked at student miscondu ct measures including only the high school and middle school levels for school types. The results for th is analys i s are presented in Table 5 16 Note that looking at the middle school level or high school levels separately was not possible if I wanted to e xamine them under the DDD model bec ause the number of cells of observations would be reduced too greatly for this model to be feasible Under this analysis with the DDD model, t he effects of 4 H participation for several of the outcome measures were of th e correct sign 0.573 (drug use or possession), 0.261
80 (property crimes), and 0.327 ( fighting and harassment ). However, none of these effects were statistically significant. The effect on out of school suspensions, however, was statistically significant ( with a p value of 0.023) and positive at 2.409. Though roughly double in magnitude, t his effect is still in the same direction as in the original DDD model for the effect of the extent 4 H participation on student misconduct measures and is also in the opp osite direction of my prior expectations of a positive influence of 4 H participation on student behavior. From this analysis of the effect of 4 H participation on student misconduct, I conjecture that the level of aggregation with which this data is repor ted for Florida is not amenable to the application of my econometric methodology. Further discussion of this is provided in section eight of this study.
81 Table 5 1. Estimated effects of the extent of 4 H participation on student FCAT math subtest Model 1: OLS Model 2: controls OLS Model 3: simple FE Model 4: DDD FCAT measure p value p value p value p value Average score 2.528 *** 0.000 0.225 0.6 35 0. 322 0. 670 1.021 0.159 % passing 4.853 *** 0.000 2.763 *** 0.000 0.242 0. 382 1.102 0.060 % level 5 (highest) 2.154 *** 0.000 0.892 *** 0.000 0.273 ** 0. 021 0.02 5 0.905 % level 4 2.783 *** 0.000 1.162 *** 0.000 0.067 0. 713 0.596 0.098 % level 3 0.085 0.7 17 0.710 ** 0.0 0 3 0.036 0. 843 0.481 0.293 % level 2 2.501 *** 0.000 1.870 0.000 0.097 0. 597 0.287 0.424 % level 1 (lowest) 2.279 *** 0.000 0.846 0.0 07 0.362 0. 087 0.879 0.054 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. A ll estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the district grade year level Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively. Table 5 2. Estimated effects of the extent of 4 H participation on student FCAT reading subtest Model 1: OLS Model 2: controls OLS Model 3: simple FE Model 4: DDD FCAT measure p value p value p value p value Average score 0.757 0. 188 1.334 *** 0.00 1 0.105 0. 849 1.020 0.061 % passing 7.716 *** 0.000 4.797 *** 0.000 0.497 0.0 53 0.803 0.065 % level 5 (highest) 0.587 *** 0.000 0.299 *** 0.00 8 0.222 ** 0.0 11 0 .374 0.086 % level 4 4.188 *** 0.000 2.487 *** 0.000 0.062 0. 727 0.411 0.130 % level 3 4.115 *** 0.000 2.012 *** 0.000 0.338 0. 074 0.018 0.959 % level 2 4.233 *** 0.000 2.505 *** 0.000 0.102 0. 548 0.149 0.610 % level 1 (lowest) 3.501 *** 0.000 2.327 *** 0.000 0.421 0. 092 0.612 0.126 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performan ce. All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the district grade year level Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
82 Table 5 3. FCAT DDD model mathematics subtest regression results for FCAT levels one through five Percent Level 5 Percent Level 4 Percent Level 3 Percent Level 2 Percent Level 1 Variable p value p value p value p value p value 4 H participation rate 0.02 0.91 0.60 0.10 0.48 0.29 0.29 0.42 0.88 0.05 FCAT math % tested 2.63 0.05 4.75 0.14 1.91 0.51 0.19 0.95 9.44 *** 0.01 % English learners 0.11 0.07 0.26 0.0 8 0.03 0.82 0.32 ** 0.02 0.06 0.67 % gifted 0.13 0.02 0.05 0.70 0.08 0.34 0.11 0.32 0.02 0.85 Average teacher experience 0.07 0.12 0.15 0.12 0.07 0.61 0.13 0.17 0.01 0.95 Per pupil expenditures (x1000) 0.44 0.03 0.70 0.09 0.17 0.71 0 .64 0.11 0.41 0.30 % disabled 0.08 0.06 0.27 *** 0.00 0.05 0.59 0.12 0.18 0.28 ** 0.01 % absent over 21 days (x100) 0. 33 0.87 0. 28 0.94 7.84 0.06 2.53 0.50 5.10 0.29 % advanced degree 0.09 ** 0.04 0.10 0.26 0.09 0.25 0.14 0.08 0.05 0.58 Unemployment rate 0.33 0.55 0.74 0.51 1.62 0.18 0.90 0.38 0.19 0.86 % instructional staff 0.03 0.27 0.07 0.27 0.02 0.73 0.12 ** 0.01 0.19 *** 0.00 Black 1.54 0.88 13.0 0.60 5.72 0.78 25.0 0.20 8.06 0.74 Hispanic 1.53 0.89 36.5 0.19 45.8 0.08 14.7 0.54 27.6 0.33 Asian 53.8 0.37 13.6 0.89 159 0.15 106 0.31 4 0 .1 0.99 Native American 17.2 0.80 112 0.35 96.8 0.43 2.74 0.98 2.12 0.99 Multi racial 24.0 0.44 26.0 0.71 115 0.06 90.3 0.18 61.1 0.41 Black squared 6.92 0.11 20.4 ** 0.04 24.7 ** 0.03 19.4 0.06 18.0 0.22 Hispanic squared 12.0 0.26 29.8 0.18 21.0 0.36 13.4 0.55 7.31 0.74 Asian squared 702 0.26 382 0.70 2127 0.06 1374 0.13 57.6 0.96 Native American squared 1587 0.05 1799 0.47 1114 0.52 1713 0 .52 36.2 0.98 Multi racial squared 309 0.31 208 0.67 940 0.06 685 0.20 379 0.44 Black*unemployment 2.61 0.22 4.95 0.33 0.98 0.83 1.21 0.77 8.49 0.13 Hispanic*unemployment 0.46 0.85 0.96 0.86 7.67 0.25 2.01 0.70 11. 6 0.06 Asian*unemplo yment 6.90 0.56 11.07 0.58 25.3 0.25 15.5 0.47 2.52 0.91 Native American*unemployment 7.83 0.58 40.61 0.13 5.84 0.85 14.5 0.67 13.4 0.67 Multi racial*unemployment 3.67 0.53 8.03 0.54 18. 4 0.15 14.7 0.25 10.9 0.44 Constant 14.5 *** 0.00 3 3.7 *** 0.00 39.8 *** 0.00 13.6 0.07 ** 0.04 0.99 Note: Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. All estimated standa rd errors are robust to heteroskedasticity of unknown form and clustered a t the district grade year level. Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
83 Table 5 4. FCAT DDD model mathematics subtest regression resul ts for FCAT passing percent and mean scores Percent p assing FCAT m ean Variable p value p value 4 H participation rate 1.10 0.06 1.02 0.16 FCAT math % tested 9.30 ** 0.05 14.6 *** 0.01 % English learners 0.40 ** 0.04 0.44 0.06 % gifted 0.09 0.59 0.05 0.78 Average teacher experience 0.15 0.31 0.15 0.42 Per pupil ex penditures 0. 97 0.10 1 16 0.10 % disabled 0.40 *** 0.00 0.57 *** 0.00 % absent over 21 days (x100) 7.89 0.21 6.02 0.43 % advanced degree 0.10 0.41 0.14 0.36 Unemployment rate 0.55 0.74 0.68 0.75 % instructional staff 0.07 0.37 0.21 ** 0.04 Bl ack 20.3 0.53 19.5 0.61 Hispanic 10.8 0.76 8.28 0.85 Asian 91.1 0.55 48.6 0.77 Native American 2.25 0.99 66.6 0.78 Multi racial 165 0.10 140 0.25 Black squared 2.61 0.88 3.84 0.85 Hispanic squared 3.20 0.91 0.97 0.98 Asian squared 1 808 0.20 715 0.66 Native American squared 1327 0.72 1361 0.76 Multi racial squared 1041 0.18 521 0.58 Black*unemployment 6.57 0.35 17.5 0.04 Hispanic*unemployment 9.10 0.24 10.4 0.27 Asian*unemployment 7.31 0.82 6.32 0.86 Native American* unemployment 26.9 0.53 6.08 0.91 Multi racial*unemployment 30.0 0.11 27.7 0.22 Constant 87.7 *** 0.00 346 *** 0.00 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT. Level 5 is the high est level of performance on the FCAT and level 1 is the lowest level of performance. All estimated standard errors are robust to heterosk edasticity of unknown form and clustered at the district grade year level. Statistical significance is indicated with ast erisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
84 Table 5 5 FCAT DDD model reading subtest regression results for FCAT levels one through five Percent l evel 5 Percent l evel 4 Percent l ev el 3 Percent l evel 2 Percent l evel 1 Variable p value p value p value p value p value 4 H participation rate 0.37 0.09 0.41 0.13 0.02 0.96 0.15 0.61 0.61 0.13 FCAT reading % tested 3.01 0.07 1.65 0.46 4.22 0.09 1.03 0.64 9.6 9 *** 0.01 % English learners 0.01 0.92 0.03 0.81 0.27 ** 0.03 0.11 0.36 0.14 0.33 % gifted 0.11 ** 0.05 0.04 0.62 0.10 0.25 0.07 0.43 0.19 0.18 Average teacher experience 0.10 ** 0.03 0.08 0.25 0.01 0.94 0.08 0.49 0.22 *** 0.01 Per pupil exp enditures (x1000) 0.18 0.28 0.07 0.84 0.03 0.93 0.21 0.56 0.04 0.93 % disabled 0.08 ** 0.04 0.15 0.06 0.19 ** 0.02 0.07 0.36 0.37 0.00 % absent over 21 days (x100) 0.47 0.79 2.21 0.38 1.60 0.62 4.27 0.18 7.27 0.13 % advanced degree 0.01 0 .77 0.06 0.34 0.06 0.43 0.00 0.95 0.01 0.90 Unemployment rate 0.33 0.43 0.34 0.65 0.73 0.49 0.78 0.42 1.80 0.17 % instructional staff 0.04 0.20 0.01 0.90 0.04 0.50 0.04 0.42 0.14 ** 0.03 Black 14.3 0.23 7.45 0.64 10.6 0.60 11.7 0.51 2 .44 0.92 Hispanic 13.7 0.13 6.53 0.69 18.1 0.41 40.0 ** 0.03 41.2 0.16 Asian 0.38 0.99 42.5 0.56 30.4 0.73 30.7 0.71 32.8 0.76 Native American 53.9 0.36 245 ** 0.02 76.5 0.51 190 0.09 1.76 0.99 Multi racial 51.7 0.10 39.8 0.39 119 ** 0. 01 142 *** 0.01 170 *** 0.01 Black squared 2.74 0.52 19.4 ** 0.01 30.0 *** 0.00 5.55 0.59 2.10 0.86 Hispanic squared 3.16 0.73 20.2 0.22 60.0 *** 0.01 8.26 0.63 37.5 0.12 Asian squared 174 0.70 1090 0.20 89.5 0.93 3.85 0.99 839 0.45 Native American squared 396 0.52 326 0.83 987 0.53 733 0.47 99.4 0.95 Multi racial squared 265 0.34 401 0.50 20.3 0.96 1252 *** 0.00 380 0.39 Black*unemployment 0.99 0.67 6.33 0.09 0.96 0.83 2.92 0.49 9.61 0.11 Hispanic*unemployment 2.06 0.29 0. 79 0.82 8.99 ** 0.05 8.91 0.06 17.4 *** 0.00 Asian*unemployment 2.33 0.78 4.55 0.77 3.82 0.83 1.73 0.92 3.03 0.89 Native Amer.*unemployment 18.1 0.16 52.1 ** 0.03 19.3 0.49 34.4 0.15 8.73 0.77 Multi racial*unemployment 5.97 0.27 5.04 0.62 24.1 ** 0.01 23.3 0.05 38.6 *** 0.01 Constant 6.23 0.10 18.9 *** 0.00 37.5 *** 0.00 26.6 *** 0.00 12.4 0.16 Note: Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. All estimated standard errors a re robust to heteroskedasticity of unknown form and clustered at the district grade year level. Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
85 Table 5 6 FCAT DDD model reading subtest regression results for FCAT passing percent and mean scores Percent p assing FCAT Mean Variable p value p value 4 H participation rate 0.80 0.07 1.02 0.06 FCAT reading % tested 8.88 *** 0.01 17.6 *** 0.00 % English learners 0.24 0.12 0.11 0.58 % gifted 0.25 0.07 0.21 0.27 Average teacher experience 0.18 0.21 0.37 *** 0.01 Per pup il expenditures (x1000) 0.29 0.52 0.42 0.48 % disabled 0.42 *** 0.00 0.63 *** 0.00 % absent over 21 days (x100) 4.28 0.32 5.32 0.40 % advanced degree 0.01 0.94 0.08 0.53 Unemployment rate 0.71 0.54 2.23 0.14 % instructional staff 0.08 0.22 0 .17 0.06 Black 11.2 0.70 0.07 0 99 Hispanic 2.07 0.94 24.0 0.52 Asian 12.5 0.91 83.0 0.55 Native American 223 0.13 333 0.08 Multi racial 27.8 0.67 25.0 0.76 Black squared 7.78 0.46 11.6 0.46 Hispanic squared 43.0 0.08 32.5 0.34 Asi an squared 826 0.48 1172 0.44 Native American squared 1057 0.55 170 0.94 Multi racial squared 686 0.19 230 0.72 Black*unemployment 6.30 0.29 16.3 ** 0.03 Hispanic*unemployment 7.72 0.17 14.3 0.07 Asian*unemployment 3.06 0.89 13.9 0.64 Nati ve American *unemployment 50.9 0.13 83.5 0.05 Multi racial*unemployment 13.1 0.34 15.1 0.37 Constant 62.3 *** 0.00 320 *** 0.00 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT. Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. All estimated standard errors are robust to heterosk edasticity of unknown form and clustered at the district grade year level. Statistical significance is indica ted with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
86 Table 5 7 J oint significance tests for FCAT math subtest i nteraction terms and fixed effects District y ear Grade y ear Distric t g rade Years Grades Districts F Prob.>F F Prob.>F F Prob.>F F Prob.>F F Prob.>F F Prob.>F Average score 6.30 0.00 19.0 0.00 228 0.00 1.13 0.34 17.1 0.00 13.7 0.00 % passing 6.08 0.00 10.3 0.00 217 0.00 0.33 0.80 12.5 0.00 11.3 0.00 %Level 5 4. 16 0.00 27.0 0.00 346 0.00 6.39 0.00 9.34 0.00 9.91 0.00 %Level 4 3.09 0.00 7.76 0.00 382 0.00 1.01 0.39 23.7 0.00 9.77 0.00 %Level 3 3.38 0.00 9.01 0.00 344 0.00 2.27 0.08 10.7 0.00 4.14 0.00 %Level 2 3.11 0.00 14.7 0.00 92.9 0.00 0.88 0.45 3.13 0.00 4 .95 0.00 %Level 1 4.84 0.00 12.4 0.00 141 0.00 1.69 0.17 10.9 0.00 10.5 0.00 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT. Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. Table 5 8 Jo int significance tests FCAT reading subtest i nteraction terms and fixed effects District y ear Grade y ear District g rade Years Grades Districts F Prob.>F F Prob.>F F Prob.>F F Prob.>F F Prob. >F F Prob.>F Average score 5.67 0.00 66.0 0.00 147 0.00 8.91 0.00 12.1 0.00 17.0 0.00 % passing 5.21 0.00 66.3 0.00 272 0.00 10.8 0.00 102 0.00 9.70 0.00 %Level 5 3.22 0.00 29.0 0.00 208 0.00 1.76 0.15 14.7 0.00 6.87 0.00 %Level 4 3.63 0.00 51.5 0.00 2 10 0.00 2.30 0.08 82.1 0.00 8.37 0.00 %Level 3 3.97 0.00 38.7 0.00 277 0.00 5.49 0.00 35.1 0.00 3.55 0.00 %Level 2 5.52 0.00 27.2 0.00 568 0.00 2.65 0.05 51.1 0.00 4.38 0.00 %Level 1 4.82 0.00 69.5 0.00 372 0.00 14.3 0.00 37.4 0.00 9.62 0.00 Note: Perc ent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT. Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. Table 5 9 Estimated impact of school quality fa ctors Percent level 5 math Percent level 1 math Percent level 5 reading Percent level 1 reading Variable p value p value p value p value 4 H participation rate 0.0 3 0.91 0.88 0.05 0.37 0.0 9 0.61 0.13 Teacher experience 0.07 0.12 0.01 0.95 0.10 ** 0.03 0.22 *** 0.01 Per pupil expenditures 0.44 ** 0.03 0.41 0.30 0.1 8 0.28 0.04 0.93 Per cent instructional staff 0.03 0.27 0.19 *** 0.00 0.04 0.20 0.14 ** 0.03 Percent advanced degrees 0.09 ** 0.04 0.0 5 0.58 0.01 0.77 0.01 0.90 Note: Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance. All estimated standard errors are robust to heteroskedasticity of unknown form. Statistical significance is indicated with asterisks, with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
87 Table 5 10. U rban and rural comparison using the DDD Model for the FCAT mathematics subtest Rural Urban Original DDD model Variable p value p value p value Average score 0.387 0.647 3.417 *** 0.007 1.021 0.159 Percent passing 0.793 0.239 2.203 0.0 63 1.102 0.060 Percent level 5 0.039 0.879 0.412 0.249 0.025 0.905 Percent level 4 0.386 0.367 1.365 0.067 0.596 0.098 Percent level 3 0.446 0.344 0.427 0.674 0.481 0.293 Percent level 2 0.418 0.316 0.321 0.595 0.287 0.424 Percent leve l 1 0.492 0.332 2.364 *** 0.005 0.879 0.054 Note: Level 5 is the highest level of performance on the FCAT and level 1 is the lowest level of performance All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at th e district grade year level. Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively. Table 5 11. Urban and rural comparison using the DDD Model for the FCAT reading subtest Rural Urban Original DDD model Variable p value p value p value Average score 0.879 0.155 1.687 0.102 1.020 0.061 Percent passing 0.619 0.223 1.598 ** 0.029 0.803 0.065 Percent level 5 0.401 0.145 0.339 0.283 0.374 0.086 Percent level 4 0.267 0.433 0.674 0.121 0.411 0.1 30 Percent level 3 0.049 0.906 0.585 0.389 0.018 0.959 Percent level 2 0.098 0.782 0.351 0.490 0.149 0.610 Percent level 1 0.475 0.273 1.207 0.121 0.612 0.126 Note: Level 5 is the highest level of performance on the FCAT and level 1 is the low est level of performance All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the district grade year level. Statistical significance is indicated with asterisks with *, **, and *** indicating statistical signif icance at the 10, 5, and 1 percent level, respectively.
88 Table 5 1 2 4 H and 4 H club participation rate effect comparison under the DDD Model for the FCAT math subtest 4 H r ate Club r ate Original DDD m odel Variable p value p value p value Average score 0.416 0.605 6.211 0.180 1.021 0.159 Percent passing 0.543 0.412 5.745 0.093 1.102 0.060 Percent level 5 0.094 0.670 1.221 0.346 0.025 0.905 Percent level 4 0.293 0.439 3.114 0.156 0.596 0.098 Percent level 3 0. 344 0.552 1.410 0.640 0.481 0.293 Percent level 2 0.151 0.740 1.397 0.613 0.287 0.424 Percent level 1 0.419 0.413 4.731 0.071 0.879 0.054 Note: 4 H rate indicates the effect of 4 H participation when an interaction for 4 H club participation, indicated by Club rate, is added to the model. Percent passing indicates percent scoring level 3 or above out of 5.All estimated standard errors are robust to heteroskedast icity of unknown form Statistical significance is indicated with asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively. Table 5 1 3 4 H and 4 H club partici pation rate effect comparison under the DDD Model for the FCAT reading subtest 4 H r ate Club rate Original DDD m odel Variable p value p value p value Average score 0.315 0.617 7.249 0.054 1.020 0.061 Percent passing 0.406 0.392 4.080 0.108 0.803 0.065 Percent level 5 0.320 0.139 0.561 0.633 0.374 0.086 Percent level 4 0.101 0.748 3.185 0.079 0.411 0.1 30 Percent level 3 0.015 0.974 0.334 0.905 0.018 0.959 Percent level 2 0.158 0.677 0.089 0.975 0.149 0.610 Percent level 1 0.208 0.681 4.152 0.214 0.612 0.126 Note: 4 H rate indicates the effect of 4 H participation when an interaction for 4 H club participation, indicated by Club rate, is added to the model. Percent passing indicates percent scoring level 3 or above out of 5.All estimated standard errors are robust to heteroskedast icity of unknown form Statistical significance is indicated wi th asterisks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively. Table 5 1 4 Estimated effects of the extent of 4 H participation on student misconduct Model 1 : OLS Model 2 : controls OLS Model 3 : s imple FE Model 4 : DDD Offenses and Punishments p value p value p value p value Acts against persons 0.226 *** 0.003 0.102 0.337 0.057 0.761 0.433 ** 0.048 Alcohol tobacco other drugs 1.124 *** 0.000 1.240 *** 0.000 0.119 0.820 0.387 0.206 Property crime 0.271 *** 0.000 0.212 *** 0.009 0. 018 0.831 0.074 0.474 Fighting and harassment 0.658 0.228 1.132 0.126 0.572 0.562 1.027 0.494 Weapons possession 0.037 0.476 0.050 0.470 0.092 0.426 0.054 0.570 In school suspension 4.457 *** 0.001 6.919 *** 0.000 1.720 0.167 0.461 0.804 Out o f school suspension 0.371 0.772 0.565 0.581 0.295 0.618 1.331 0.096 Note: All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the district grade year level Statistical significance is indicated with asteris ks with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
89 Table 5 1 5 Estimated effects of the extent of 4 H participation on student misconduct for urban and rural districts Urban Rural Original DDD model p value p value p value Acts against persons 0.253 0.151 0.489 0.116 0.433 ** 0.048 Alcohol tobacco other drugs 1.374 ** 0.025 0.027 0.942 0.387 0.206 Property crime 0.111 0.570 0.083 0.590 0.074 0.474 Fighting and harassment 4.197 0. 087 1.312 0.370 1.027 0.494 Weapons possession 0.198 0.097 0.055 0.662 0.054 0.57 0 In school suspension 2.607 0.341 1.646 0.517 0.461 0.804 Out of school suspension 0.844 0.549 1.407 0.160 1.331 0.096 Note: All estimated standard errors a re robust to heteroskedasticity of unknown form and clustered at the district grade year level. Statistical significance is indicated with asterisks, with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively. Table 5 16. Estimated effects of the extent of 4 H participation on student misconduct for middle and high schools levels Middle and high school only Original DDD model p value p value Acts against persons 0.576 0.205 0.433 ** 0.048 Alco hol tobacco other drugs 0.573 0.212 0.387 0.206 Property crime 0.261 0.205 0.074 0.474 Fighting and harassment 0.327 0.814 1.027 0.494 Weapons possession 0.007 0.959 0.054 0.57 0 In school suspension 4.284 0.188 0.461 0.804 Out of school sus pension 2.409 ** 0.023 1.331 0.096 Note: All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the district grade year level. Statistical significance is indicated with asterisks, with *, **, and *** indicating statistical significance at the 10, 5, and 1 percent level, respectively.
90 CHAPTER 6 IMPACT OF 4 H PARTICIPATION ON STUDENT OUTCOMES IN OHIO 4 H Youth Development programs are provided through land grant universities allowing their programs to be delivered to youth in all 50 states. Because of the broad geographic scope of 4 H programs, it is important to determine how impacts of participation on testing outcomes seen by its youth participants in one state, such as Florida, compare to the impacts seen in oth er states, thus providing some insight into the generalizability of my results as well as providing additional evidence for the types of results seen in studies based on self reports the 4 H study of positive youth development (Lerner et al., 2005 ). In Lerner et al. (2005), the authors present the findings of the first wave of their longitudinal 4 H study on positive youth development. At the time of the 2005 report, the study included data gathered from student and parent questionnaires in si tes in forty cities in thirteen states. Their research suggests goal of improvement of academic competence. They do not, however, address the effect of 4 H on standardi zed test achievement, which is a high profile measure related to student academic skill. Additionally, as the data for the 4 H study are based on self reports; therefore the results they generate could vary greatly depending on question wording, format, a nd context (Schwarz, 1999). Conceivably, estimates of the impacts of 4 H in other states could also help 4 H by providing another means through which the 4 its youth development programs. However, because of the national oversight in the organization of 4 H Youth Development programs, it seems reasonable to expect that
91 the impacts of 4 H on students in other states and regions would be similar to those effects seen in Florida. Ohio Achievement Assessment Results The reasoning behind the methodology employed to analyze the impact of 4 H participation on student outcomes in Ohio parallels the rationale used in the Florida analyses. This includes the use of two ordinary least squares ( OLS ) models (with and without control variables) and a fixed effects (FE) model, as well as the preferred difference in difference in differences ( DDD ) model For the Florida Comprehensive Assessment Test (FCAT), where levels were listed as levels one through five, with five being the highest and one being the lowest levels of performance. The five levels of performance on the OAA, from highest to lowest, are advanced, accelerated, proficient, basic, and limited. The expectations for the dir ections of the impacts of 4 H participation are similar to the expectations for the Florida analyses. Specifically, I expect that a higher degree of 4 H participation will be positively related to better performance on the Ohio Achievement Assessments (OAA ) due to the positive influences of 4 H youth development programs. Therefore, I expect a positive relationship with passing rates. The passing rate for the OAA includes the total percentage of students scoring in the proficient level of performance or abo ve. I also anticipate that as the extent of 4 H participation increases I should witness an increase in the percentage of students performing in the highest level of the OAA, or the advanced level of performance. My expectation is that there will also be a negative relationship between the extent of 4 H participation and the percent of students that score in the lowest performance category, since I expect that as the extent of 4 H participation increases it would result in a decrease in the percent of stu dents that perform poorly Thus, increased 4 H
92 participation is expected to decrease the percent of students scoring in the limited category of performance. I do not have a specific prior expectation for the direction of the effects in the intermediate le vels of performance, even with positive impacts of 4 H participation. Overall, this is because the signs of the intermediate levels of performance do not have a clear interpretation as students could be moving up and into these levels as well as up and out of them as performance gains were made. A more detailed explanation of the expectations for the intermediate levels of performance is provided in the previous chapter. The first model included in each table from T able 6 1 through Table 6 5 is the OLS regr ession of the measures of performance on the five Ohio Achievement Assessments ( OAA ) against the 4 H participation rate. No other control variables, fixed effects, or interaction dummies are present in this relatively nave OLS model, a specification which provides a simplistic representation of the correlation between OAA performance measures and 4 H participation. Some of the results for this model seem to conform to prior expectations of the effects of 4 H participation on student test performance. Speci fically, the results for the mathematics subtest in Table 6 1 under the simple OLS model Model 1, show a statistically significant (with a p value of 0.000) positive relationship between changes in the 4 H participation rate on the percent of students ac hieving the advanced category of performance the accelerated level of performance, and in the percent passing the mathematics OAA subtest Within the same subtest, the results also show a statistically significant negative relationship of 1.269 (with a p value of 0.000) between changes in 4 H participation and the percent achieving the limited category of performance.
93 At this point it is likely that there are still both observed and unobserved potential confounders of the causal relationship to take into account. However, suppose for a moment that indeed all relevant variables had been taken into account in this simple model. This would imply that the correlation seen from the simple OLS model could also represent a causal relationship. It might then be r easonable to assume that, through the effects of positive youth development, other subject areas closely related to mathematics might also show similar positive effects from changes in 4 H participation. At the very least, I would not expect to have the sa me source of change affect two similar subject areas in dissimilar directions. However, this is precisely what is seen to occur under the simple OLS model, where the effect of 4 H on the advanced performance level in the science subtest is negative at 2.0 75 and is highly statistically significant (with a p value of 0.003). The second model, OLS with controls, improves on Model 1 by including a set of control variables. These control variables consist of both student characteristics and school quality vari ables. I suspect that this model still fails to control for unobserved confounding factors resulting in the continued presence of contradictory results from related subtests. For example, the OLS model with controls shows a highly positive effect on achiev ement of proficient or above in social studies of 3.191 (with a p value of 0.051). This result is accompanied by a negative effect on the percent proficient or above on the reading subtest of 1.297, which is highly statistically significant (with a p valu e of 0.001). These continued contradictory results might indicate the presence of more confounders of the causal effect of interest that we do not directly observe. The
94 next step would then be to introduce fixed effects as a means of control for some of th ese possible unobserved factors. The fixed effects model, model 3, includes separate effects for years, grade levels, and counties in addition to the control variables introduced in Model 2. Even under the fixed effects model, effects on passing rates (th ose performing at proficient or above) for the science and writing subtests are still in the opposite direction of prior expectations. Recall again that my prior expectations are that the positive effects of 4 H youth development lead to positive effects o n student achievement test scores. This should translate to positive impacts on the percent of students achieving in the highest levels of the achievement tests as well as for passing rates, while I expect to see 4 H participation to have a negative impact on the percent of students achieving the lowest levels of those tests. Under the fixed effects model, however, the science subtest shows a negative coefficient for the effect of 4 H participation of 1.594 that is statistically significant at the ten perc ent level (with a p value of 0.098). Though the magnitude of the coefficient of 4 H participation for the percent passing the writing subtest is lower at 0.894, this value is still in the direction that is opposite of prior expectations as well as being s tatistically significant at the ten percent level (with a p value of 0.058). The majority of the effects of 4 H participation on the other subtests, at least those for which I have prior expectations, are not statistically significant. Another level of co ntrols can still be implemented by including the interactions between the various fixed effects that were utilized in model 3. This results in the DDD model specification. It is the preferred specification and contains the available covariates from Model 2 the fixed effects in Model 3, and the interactions between each
95 dimension of those fixed effects. Thus, t he DDD model, Model 4, is expected to most fully control for unobserved effects given the current data set Under the DDD specification, there are n o longer statistically significant results for the effect of 4 H participation on student testing outcomes that are co ntrary to their expected signs. Additionally the effect o n the highest performance level for mathematics is now positive at 1.055 and is statistically significant above the one percent level (with a p value of 0.004). The results for the highest level of performance for the reading subtest is also of the expected sign (positive), though smaller in magnitude, at 0.597 and is statistically si gnificant at the five percent level (with a p value of 0.035). Other values for mathematics are of the expected sign bu t are not close to statistical significance such as the effect on the lowest level of math performance, which is 0.216 (with a p value of 0.463). Some of the results for the intermediate levels of the writing subtest are statistically significant such as the coefficients for the effect on the percent accelerated or the percent proficient. However, recall that for the intermediate levels o f performance, I do not have a prior expectation for the sign of the effect. All other results for the DDD model on achievement test outcomes for Ohio are not statistically significant. Indeed, I remind the reader that for some of the subtests the number o f observations is considerably smaller than in the previous analyses. I provide j oint significance tests for the interaction terms and other fixed effects in Tables 6 6 through 6 1 0 For all levels of the OAA outcomes, the sets of interactions are highly s tatistically significant. This shows that the interactions do matter in explaining some of the variation in test outcomes. Ohio Student Misconduct Results In Chapter 5 I used information for school discipline and misconduct within the Florida data to exami ne the impacts of 4 H participation on these outcomes. I also
96 conducted a similar analysis utilizing data from the Ohio Department of Education on incidents of student misconduct and disciplinary measures. The data for student misconduct in Ohio are separ ated into more narrowly defined (disaggregated) categories, resulting in the separation of offenses into thirteen separate categories, as opposed to the five separate categories found in the Florida education data. Additionally, i n Ohio, information on stu dent misconduct was available at the grade level. Therefore additional aggregation to the school category level, as was needed to analyze these outcomes for Florida, was not necessary. These characteristics of the Ohio data are expected to result in a bett er analysis relative to Florida. I expect that exposure to the influences of positive youth development as youth participate in 4 H programs would result in students interacting better with one another and with adults. This should translate into students h aving less discipline problems in school Thus, the expectation is that participation in 4 H programs is negatively related to students committing and being caught for committing undesi rable behaviors. Table 6 11 provides the estimates from the four mode ls for the categories of student misconduct as well as for in school and out of school suspensions. Th e simple OLS model, Model 1, results in several coefficients that conform to prior expectations of the relationship between 4 H participation and student misbehavior. As a reminde r, however, this model contains no form of controls that could serve to isolate the effect of 4 H participation on measures of student misconduct. With control variables added through Model 2, the relationship s between 4 H particip ation and several of the dependent variables analyzed are no longer statistically significant though the majority still conform to the expected sign Under Model 3 which begin s to account for potential
97 unobserved f actors through the inclusion of fixed ef fects, none of the coefficients for the relationship between 4 H participation and any of the outcomes are statistically significant and of the expected sign (negative) Here the effect truancy, vandalism, theft, weapons, and out of school suspension are a ll statistically significant at the one percent level, but are all positive (the opposite direction of prior expectations). When the interactions between the fixed effects are included in Model 4 the DDD model the estimated effect of 4 H on fighting is st atistically significant. Th is estimated effect on fighting is also in the opposite direction of the prior expectations of the effect of 4 H participation on undesirable student behaviors and is statistical ly significant at the five percent level. Results f or the other categories of student misconduct were not statistically significant at the ten percent level. Recall that, in the behavioral data for Ohio, categories were broken down into more specific components relative to Florida For example, it contain ed data for tobacco use or possession and alcohol use or possession as separate offenses. In the Florida data, alcohol, tobacco, and other drugs were lumped into a single measure. Also remember that the Florida data on behavioral outcomes were available on ly at the levels. The above result s are especially surprising given this improvement in the available data for analyzing the impacts of 4 H participation on student misbeh avior in Ohio It seems unreasonable that participation in an organization that tries to encourage youth to develop positive directions would lead to incr eases in fighting or misbehavior among students This led me to take a closer look at the data to see if something else was at work that could be leading to th ese counterintuitive result s I suspected that
98 many of the categories of student misconduct would have very low numbers of incidences at lower grade levels. Table 6 12 provides a view of the a verage counts of student misconduct incidences by grade level in Ohio It can be clearly seen from this table that indeed many of the categories of misconduct increase greatly as grade level increases. In particular, the alcohol and tobacco categories have averag e counts of zero until seventh grade. Figure 6 1 provides another look at how incidences for the three categories of drug use vary over grade levels. Other categories of misconduct nearly double when going from sixth grade to seventh grade such as the diso bedience and vandalism categories. In order to shed more light onto how this feature of the behavioral data was influencing the previous results of the DDD model, I performed these regressions again while focusing on seventh and eighth grades only The res ults for the DDD model that examines the impacts on student misconduct for seventh and eighth grades can be seen in Table 6 13 Focusing on seventh and eighth grades for misconduct greatly affects the results. First, no counterintuitive results are statist ically significant. Two of the categories that changed as a result of focusing on these grades were fighting and harassment outcomes. These categories of student misconduct could be cl osely tied to bullying, which has been a growing concern faced by studen ts in areas all across the country. T he sign on fighting reverses to the expected direction at 2.106, and is also statistically significant a t the one percent level ( with a p value of 0.002 ) In addition, t he coefficient for 4 H participation on the haras sment outcome variable is highly statistically significant and in the expected direction. Tobacco use and possession is also in the expected direction at 0.250 and just below statistical significance at the ten percent
99 level ( with a p value of 0.107 ) The se results strongly suggest that both using the more disaggregated data in Ohio (relative to Florida) and making the focus of analysis the grades where misconduct largely occurs (i.e. grades seven and eight) result in a better analysis of the relationship between 4 H participation and student misconduct.
100 Table 6 1. Estimated effects of the extent of 4 H participation on student OAA mathematics subtest performance Model 1 : OLS Model 2 : controls OLS Model 3 : simple FE Model 4 : DDD p value p value p value p value Percent passing 3.726 *** 0.000 3.212 *** 0.000 0.432 0.384 0.790 0.276 Percent advanced 2.009 *** 0.000 3.178 *** 0.000 0.349 0.385 1.055 *** 0.004 Percent accelerated 1.901 *** 0.000 2.061 *** 0.000 0.751 0.038 0.377 0.373 Percent proficient 0.118 0.760 1.947 *** 0.000 0.593 0.101 0.460 0.317 Percent basic 2.497 *** 0.000 2.802 *** 0.000 0.437 0.171 0.573 0.219 Percent limited 1.269 *** 0.000 0.428 0.223 0.059 0.803 0.216 0.463 Note: Model contains 2,376 observations for this subtest. Percent passing indicates percent scoring proficient level or above All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the county grade y ear level. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively. Table 6 2. Estimated effects of the extent of 4 H participation on student OAA reading subtest performance Model 1 : OLS Model 2 : controls OLS Mo del 3 : simple FE Model 4 : DDD p value p value p value p value Percent passing 0.876 ** 0.040 1.297 *** 0.001 0.137 0.708 0.079 0.849 Percent advanced 0.462 0.440 1.144 0.068 0.040 0.894 0.597 ** 0.035 Percent accelerated 1.023 ** 0.018 1.200 *** 0.007 0.055 0. 860 0.277 0.415 Percent proficient 2.310 *** 0.003 3.585 *** 0.000 0.183 0.524 0.306 0.369 Percent basic 0.254 0.218 0.119 0.560 0.544 0.004 0.238 0.389 Percent limited 0.616 ** 0.026 1.206 *** 0.000 0.303 0.166 0.042 0 .892 Note: Model contains 2, 464 observations for this subtest. Percent passing indicates percent scoring proficient level or above All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the county grade year level *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively. Table 6 3. Estimated effects of the extent of 4 H participation on student OAA science subtest performance Model 1 : OLS Model 2 : controls OLS Model 3 : simp le FE Model 4 : DDD p value p value p value p value Percent passing 1.848 0.145 2.175 0.065 1.594 0.098 1.019 0.383 Percent advanced 2.075 *** 0.003 1.119 0.069 0.297 0.619 0.287 0.649 Percent accelerated 4.050 ** 0.014 4.707 *** 0.010 0.842 0.112 0 .118 0.859 Percent proficient 1.111 0.279 3.769 *** 0.001 0.525 0.491 0.585 0.517 Percent basic 0.838 0.396 0.907 0.381 1.565 0.060 0.371 0.733 Percent limited 1.510 *** 0.000 0.208 0.575 0.262 0.279 0.496 0.228 Note: M odel contains 704 observations for percent passing and 528 observations for the individual level outcomes advanced through limited Percent passing indicates percent scoring proficient level or above All estimated standard errors are robust to heterosked asticity of unknown form and clustered at the county grade year level. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively.
101 Table 6 4. Estimated effects of the extent of 4 H participation on student OAA social studie s subtest performance Model 1 : OLS Model 2 : controls OLS Model 3 : simple FE Model 4 : DDD p value p value p value p value Percent passing 1.986 0.191 3.191 0.051 1.632 0.201 1.473 0.381 Percent advanced 2.067 ** 0.023 0.691 0.404 0.182 0.837 0.620 0.600 Percent accelerated 2.695 *** 0.010 3.499 *** 0.003 0.644 0.217 1.594 0.054 Percent proficient 1.364 0.062 0.391 0.582 0.822 0.271 0.521 0.535 Percent basic 1.174 0.248 0.178 0.867 0.725 0.545 0.117 0.943 Percent limited 2.948 *** 0.000 2.475 *** 0.001 0.114 0.852 0.315 0.773 Not e: Model contains 528 observations for this subtest. Percent passing indicates percent scoring proficient level or above All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the county grade year level. *, **, an d *** indicate statistical significance at 10, 5, and 1 percent level respectively. Table 6 5. Estimated effects of the extent of 4 H participation on student OAA writing subtest performance Model 1 : OLS Model 2 : controls OLS Model 3 : simple FE Mode l 4 : DDD p value p value p value p value Percent passing 2.128 ** 0.012 1.696 ** 0.038 0.894 0.058 0.980 0.617 Percent advanced 0.715 *** 0.000 0.025 0.868 0.084 0.649 0.534 0.393 Percent accelerated 9.385 *** 0.000 6.747 *** 0.000 0.535 0 .386 4.501 ** 0.033 Percent proficient 7.979 *** 0.000 5.086 *** 0.000 0.444 0.502 2.956 ** 0.023 Percent basic 0.073 0.891 0.414 0.456 0.279 0.495 0.967 0.522 Percent limited 2.179 *** 0.000 2.102 *** 0.000 0.620 0.153 0 .054 0.937 Note: Model contains 704 observations for this subtest. Percent passing indicates percent scoring proficient level or above All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the county grade year l evel. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively. Table 6 6 .J oint significance tests for OAA math subtest Interaction terms and fixed effects County year Grade year County grade Years Grades County F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F % proficient or above 18.5 0.00 82.9 0.00 1597 0.00 13.34 0.00 57.8 0.00 11.4 0.00 % advanced 11.5 0.00 119 0.00 466 0.00 6.33 0.00 98.0 0.00 17.8 0.00 % accelerated 12.5 0.00 102 0.00 11684 0 .00 13.32 0.00 54.6 0.00 2.35 0.00 % proficient 21.5 0.00 86.1 0.00 2082 0.00 2.84 0.02 119 0.00 3.39 0.00 % basic 22.5 0.00 71.2 0.00 1087 0.00 16.81 0.00 26.7 0.00 4.14 0.00 % limited 34.1 0.00 62.4 0.00 2072 0.00 3.58 0.01 121 0.00 17.5 0.00 Note: P ercent passing indicates percent scoring proficient level or above on the indicated subtest of the OAA.
102 Table 6 7 Joint significance tests for OAA reading subtest interaction terms and fixed effects County year Grade year County grade Years Grades Coun ty F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F % proficient or above 22.2 0.00 116 0.00 1066 0.00 25.8 0.00 41.0 0.00 10.2 0.00 % advanced 12.3 0.00 170 0.00 2953 0.00 4.11 0.01 50.2 0.00 4.57 0.00 % accelerated 11.4 0.00 144 0.00 1 125 0.00 14.2 0.00 60.8 0.00 6.34 0.00 % proficient 6.24 0.00 155 0.00 343 0.00 0.28 0.84 63.6 0.00 2.98 0.00 % basic 8.81 0.00 111 0.00 449 0.00 10.7 0.00 48.5 0.00 5.59 0.00 % limited 8.82 0.00 84.6 0.00 819 0.00 16.4 0.00 43.5 0.00 7.11 0.00 Note: P ercent passing indicates percent scoring proficient level or above on the indicated subtest of the OAA. Table 6 8 J oint significance tests for OAA science subtest inter action terms and fixed effects County year Grade year County grade Years Grades Coun ty F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F % proficient or above 54.5 0.00 26.6 0.00 147 0.00 4.53 0.03 27.4 0.00 8.70E+05 0.00 % advanced 230 0.00 196 0.00 6.60 0.00 4.77 0.01 7.60 0.01 14.6 0.00 % accelerated 2032 0.00 166 0.0 0 15.8 0.00 18.0 0.00 60.2 0.00 25.8 0.00 % proficient 14853 0.00 62.3 0.00 9.17 0.00 0.03 0.97 7.85 0.01 5.34 0.00 % basic 362 0.00 67.8 0.00 4.31 0.00 15.3 0.00 16.8 0.00 10.0 0.00 % limited 1708 0.00 21.4 0.00 11.2 0.00 1.54 0.22 0.37 0.54 8.83 0.00 Note: Percent passing indicates percent scoring proficient level or above on the indicated subtest of the OAA. Table 6 9 J oint significance tests for OAA social studies subtest interaction terms and fixed effects County year Grade year County grade Y ears Grades County F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F % proficient or above 130 0.00 7.9 0.00 9.44 0.00 7.29 0.00 26.4 0.00 19.4 0.00 % advanced 16984 0.00 127 0.00 10.2 0.00 21.3 0.00 23.6 0.00 16.9 0.00 % accelerated 507 0.00 18.3 0.00 6.01 0.00 2.79 0.06 50.7 0.00 8.56 0.00 % proficient 1964 0.00 142 0.00 9.85 0.00 19.6 0.00 1.60 0.21 4.44 0.00 % basic 1045 0.00 63.1 0.00 10.2 0.00 26.4 0.00 2.54 0.11 16.2 0.00 % limited 87.0 0.00 93.6 0.00 16.7 0.00 127 0.00 14.9 0.00 4.88 0.00 Note: Percent passing indicates percent scoring proficient level or above on the indicated subtest of the OAA.
103 Table 6 1 0 J oint significance tests for OAA writing subtest interaction terms and fixed effects County year Grade year County gr ade Years Grades County F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F F Prob. >F % proficient or above 158 0.00 135 0.00 7.63 0.00 6.68 0.00 3.34 0.68 4.80E+07 0.00 % advanced 82.4 0.00 179 0.00 6.30 0.00 10.2 0.00 15.2 0.00 3.80E+06 0.00 % acce lerated 9.63 0.00 261 0.00 11.1 0.00 31.0 0.00 21.2 0.00 3.40E+06 0.00 % proficient 107 0.00 229 0.00 5.41 0.00 32.0 0.00 0.12 0.73 2127 0.00 % basic 30.1 0.00 153 0.00 17.0 0.00 5.07 0.00 0.92 0.34 1.40E+06 0.00 % limited 3128 0.00 71.4 0.00 9.78 0.00 16.9 0.00 12.9 0.00 7.40E+07 0.00 Note: Percent passing indicates percent scoring proficient level or above on the indicated subtest of the OAA. Table 6 11 Estimated e ffects of 4 H p articipation on behavior outcomes in Ohio Model 1 : OLS Model 2 : co ntrols OLS Model 3 : simple FE Model 4 : DDD Variable p value p value p value p value Truancy 0.536 *** 0.000 0.062 0.457 0.343 *** 0.001 0.116 0.363 Fighting 3.542 *** 0.000 1.675 *** 0.000 0.133 0.691 0.643 ** 0.020 Vandalism 0.068 *** 0.0 00 0.012 ** 0.014 0.023 *** 0.003 0.003 0.347 Theft 0.140 *** 0.000 0.014 ** 0.048 0.048 *** 0.001 0.002 0.854 Guns 0.002 *** 0.000 0.000 0.202 0.000 0.745 0.000 0.991 Weapons 0.031 *** 0.000 0.001 0.310 0.007 *** 0.007 0.002 0.156 Tobacco 0. 033 0.071 0.004 0.832 0.066 0.120 0.008 0.731 Alcohol 0.006 *** 0.000 0.000 0.499 0.001 0.452 0.001 0.673 Other drugs 0.031 *** 0.000 0.004 0.713 0.035 ** 0.029 0.006 0.598 Disobedience 9.940 *** 0.000 4.524 *** 0.000 1.430 0.286 1.156 0.113 Har assment 0.465 *** 0.000 0.153 *** 0.001 0.079 0.284 0.009 0.909 Sexual conduct 0.040 *** 0.000 0.003 0.052 0.017 ** 0.011 0.001 0.721 Serious bodily injury 0.011 0.462 0.034 ** 0.013 0.036 0.216 0.020 0.214 Suspended i n school 5.299 *** 0.000 2.6 56 *** 0.000 0.645 0.577 0.196 0.828 Suspended o ut of school 8.015 *** 0.000 2.996 *** 0.000 1.764 *** 0.005 0.327 0.187 Note: All estimated standard errors are robust to heteroskedasticity of unknown form and clustered at the county grade year level. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively
104 Table 6 12 Average counts of student misconduct incidences by grade level in Ohio 3rd Grade 4th Grade 5th Grade 6th Grade 7th Grade 8th Grade Alcohol 0.00 0. 00 0.00 0.00 0.27 0.53 Disobedience 47.1 66. 0 10 6 290. 506 551 Fighting 39. 6 56.6 76. 1 126 142 130 Guns 0.07 0.10 0.08 0.10 0.20 0.23 Harassment 3.14 4.39 7.23 14.9 23.4 23.9 Other drugs 0.00 0.00 0.00 0.08 1.44 4.28 Serious bodily injury 0.82 0.85 1.14 1.69 1.80 1.80 Theft 1.67 2.14 2.47 4.15 6.95 7.55 Tobacco 0.00 0.00 0.00 0.00 2.02 5.81 Truancy 0.16 0.21 1.09 12.5 32.5 47.6 Sexual conduct 0.40 0.35 0.73 1.64 2.80 3.22 Vandalism 0.42 0.63 0.98 2.19 3.77 4.17 Weapons 0.52 0.64 0.82 1.17 2.02 2.15 Table 6 13 DDD regression results 4 H participation impact on student misconduct for seventh and eighth grades. Variable p value Truancy 0.592 0.428 Fighting 2.106 *** 0.002 Vandalism 0.024 0.498 Theft 0.071 0.246 Guns 0.000 0.771 Weapons 0.002 0.726 Tobacco 0.250 0.107 Alcohol 0.006 0.398 Other drugs 0.023 0.854 Disobedience 1.331 0.787 Harassmen t 1.745 *** 0.001 S exual conduct 0.004 0.692 Serious bodily injury 0.101 0.292 Suspended i n school 4.289 0.360 Suspended o ut of school 0.438 0.716 Note: All estimated standard errors are robust to heteroskedasticity of unknown form and clustere d at the county grade year level. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively
105 Figure 6 1 Counts of student misconduct incidences for drug use or possession for alcohol, tobac co, and other drugs.
106 CHAPTER 7 ASSUMPTION VIOLATION AND FALSIFICATION TESTS Violations of the Assumptions Employed to Identify Causal Effects Causal effects of the extent of 4 H participation on the observed outcome variables are identified at the gra de school district year cel l level in Florida and grade county year cell level in Ohio for our analyses under the preferred difference in difference in differences ( DDD ) model specification. This DDD model specification controls for the largest possible nu mber of potential confounders of the causal relationship. A necessary condition for identification is that any remaining unobserved factors affecting the outcomes of interest that are not captured by the control variables or fixed effects are uncorrelated with the extent of participation in 4 H. Unfortunately, this underlying identifying assumption is inherently untestable. If this identifying assumption were violated, the estimated coefficients for the effect of the extent of 4 H participation on the outco me of interest would be biased estimates of the true causal effect of the 4 H participation. The extensive set of control variables and fixed effects reduce the chance that such a situation would occur at the level of my analyses. However, it is more plaus ible that the control variables added in the Florida analyses would be more effective in controlling for unobserved factors than those in the Ohio analyses. One reason for this is the difference in school district and county alignment between states. In Fl orida, school districts and counties are aligned, whereas in Ohio, school districts are much smaller organizational units. Ohio contains roughly 613 school districts that fall into its 88 counties. These school districts, nevertheless, are the organization al units that dictate local school policies. Even though the DDD model in
107 Ohio w ill still be controlling for changes at the county level, the model w ill not capture policy changes that are different for school districts within the same counties. Falsifica tion Tests for Florida and Ohio I indirectly test my identification assumption in this section by conducting a kind of falsification test. Although my identifying assumption is inherently untestable, this test can provide some indirect evidence to support the validity of my identification strategy. This falsification exercise employs the lag of the extent of participation in 4 H in place of the actual extent of participation in 4 H. If the estimated coefficients were to be significant, that would cast doubt on my identification strategy since it would be likely that such significance is due to unobserved factors (confounders) varying over time that are simultaneously related to standardized test performance and the actual treatment variable. The results of t he falsification exercise for the Florida data are reported in the first model in Tables 7 1 and 7 2. I have reproduced the original DDD results from Tables 5 1 and 5 2 in the second model in these tables for ease of reference. The results for the falsific ation tests for Ohio are presented in a similar fashion in Tables 7 3 through 7 7, with the second models from these tables coming from the results of the DDD models in Tables 6 1 through 6 5. The model for the falsification test employs a one period lag of the extent of participation in 4 H instead of the actual 4 H participation rate. In the Florida analyses, the use of a lag reduces the sample size from 2,680 to 2,144 observations, as a result of the loss of 536 cells (e.g., one year of data) from the regression. Employing a lagged measure of 4 H participation rate also reduces the number of observations available for the regressions for Ohio. However, in the case of Ohio, the number of observations available for the subtest for each subject area varies from subject to subject. This
108 results in different numbers of observations being utilized for the falsification test for each subtest. For the mathematics subtest of the Ohio Achievement Assessment (OAA), the original model consists of 2,376 observations and the lagged model consists of 1,901 observations. The reading subtest for Ohio contains the most observations. The original model for the reading subtest consists of 2,464 observations, and the lagged model consists of 1,975 observations. The numbers of observations for the science subtest are down even further since I was able to acquire additional observations for the percent passing the science subtest, but not for the individual levels of performance (percent advanced through percent limited), of th e science subtest. For the percent passing variable alone within the science subtest, the original model consists of 704 observations, and the lagged model consists of 575 observations. For the individual performance level variables the original model had 528 observations and the lagged model contains 430 observations. The social studies subtest has the fewest observations, with the original model having 528 observations, and the lagged model having 430 observations. Finally, the writing subtest consists of 704 observations, whereas the lagged model for this subtest consists of 566 observations. The falsification test regression results for Florida show that all estimated effects become statistically insignificant with the smallest p value estimated at 0.12 3 (for percent level 5 for mathematics). Moreover, many of the estimated coefficients drop in magnitude considerably, making it less likely that the insignificance of these results is exclusively due to the reduction in the number of cells as a consequence of the use of one lag. Therefore, the results for the falsification tests for Florida show that the lagged 4 H participation variable does not result in statistically significant impact estimates for
109 any of the testing outcomes. While this exercise does n ot represent hard evidence in favor of our identifying assumption, which is inherently untestable; it does clearly increase our confidence in it. Thus, I am more confident in the validity of the results provided by the DDD analyses of the impact of 4 H par ticipation on FCAT performance. A difference in the results of these falsification tests becomes evident when I shift the focus of the tests from the DDD analyses of the impact of 4 H participation on standardized test outcomes in Florida to the impacts un der the DDD model for Ohio. In several of the subtests of the OAA, statistically significant results appear for the effect of the lagged variable of 4 H participation. Under the lagged 4 H participation rate DDD model for the mathematics subtest, the coef ficient for the effect on the percent of students achieving in the limited category is 0.447 and statistically significant at the ten percent level (with a p value of 0.076). For the reading subtest of the OAA, the coefficient for the effect on the percent achieving in the accelerated category of performance is 0.971, and is statistically significant at the one percent level (with a p value of 0.008). The effect on the percent passing for the social studies subtest is 3.764 and statistically significant a t the ten percent level (with a p value of 0.062). Additionally, for this same subtest, the coefficient for the effect on the percent achieving in the accelerated category is 2.852 and statistically significant at the five percent level (with a p value of 0.030). Finally, the coefficient for the effect on the advanced level of the writing subtest is 1.169 and is statistically significant at the ten percent level (with a p value of 0.096). Also note that these statistically significant results for the lagg ed 4 H participation rate in Ohio occur despite the losses in the numbers of observations from the generation of the lagged values.
110 Therefore, it appears that the DDD methodology does not work as well for Ohio as it does for Florida, since several of the results using the lagged 4 H participation rate are statistically significant. Even though the DDD methodology itself did not change from the Florida analyses to those conducted in Ohio, dissimilarities in the data used in the analyses could be driving the difference in the ability of the DDD model to control for unobserved confounding factors. Specifically, in Ohio, school districts do not align with counties as is the case in Florida. I discussed this data difference in more detail earlier in Chapter 3 in the section on the comparison of the Florida and Ohio data sets. Recall that the county level of observation is a much more aggregated level, given that in Ohio there are 88 counties but 613 school districts (an average of 7 districts per county). Because counties in Ohio can contain many school districts that do not necessarily set the same polices, those policies that change at the school district level might not be captured by the county level fixed effects and their interactions. For example, student p erformance on standardized tests is an important factor that school districts are likely to consider when deciding on adopting various policies, and these unobserved factors (to the analyst) will not be adequately accounted for by the county fixed effects and their interactions. The fact that counties and school districts do not align in Ohio could, therefore, be important in determining if the DDD model is able to control for unobserved confounders related to school policies and therefore affect its abilit y to disentangle the effects of these unobserved factors from the impacts of 4 H participation. Educational data is already available for Ohio at the school district level. In order to improve this analysis, I need better data on the 4 H side. Specificall y, I would need
111 matching data on 4 H participation rates at the school district level by grade for each of the years of the analysis. This would allow the analyses for Ohio to be conducted at the school district level, the level at which school policies a re enacted. Obtaining 4 H data at the school district level for Ohio would also result in a model that contains far more observations. Considering just the years and grade levels available in the current data set, obtaining school district level data for the 4 H participation rates would increase this dimension from 88 to more than 600, resulting in a corresponding increase in the maximum number of available observations from 2,640 to over 18,000. This would represent a vast improvement in the level of de tail of the analysis and could provide a much better estimate of the impact of 4 H participation on student test outcomes for this state
112 Table 7 1. FCAT mathematics subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value Average score 0.050 0.930 1.021 0.159 % passing 0.226 0.676 1.102 0.060 % level 5 (highest) 0.215 0.123 0.025 0.905 % level 4 0.088 0.771 0.596 0.098 % lev el 3 0.529 0.248 0.481 0.293 % level 2 0.163 0.675 0.287 0.424 % level 1 (lowest) 0.372 0.343 0.879 0.054 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT *, **, and *** indicate stati stical significance at the 10, 5, and 1 percent level, respectively. The lagged model contains 2,144 observations; the original model has 2,680. Table 7 2. FCAT reading subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value Average score 0.294 0.554 1.020 0.061 % passing 0.253 0.503 0.803 0.065 % level 5 (highest) 0.030 0.854 0.374 0.086 % level 4 0.175 0.557 0.411 0.130 % le vel 3 0.398 0.210 0.018 0.959 % level 2 0.066 0.817 0.149 0.610 % level 1 (lowest) 0.375 0.344 0.612 0.126 Note: Percent passing indicates percent scoring level 3 or above out of 5 on the specified subtest of the FCAT *, **, and *** indicating sta tistical significance at the 10, 5, and 1 percent level, respectively. The lagged model contains 2,144 observations ; the original model has 2,680 Table 7 3. OAA mathematics subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value % passing 0.588 0.163 0.790 0.276 % advanced 0.049 0.885 1.055 *** 0.004 % accelerated 0.540 0.113 0.377 0.373 % proficient 0.048 0.913 0.460 0.317 % basic 0.300 0.311 0.573 0.219 % limited 0.447 0.076 0.216 0.463 Note: Percent passing indicates percent scoring proficient level or above. *, **, and *** indicate statistical significance a t 10, 5, and 1 percent level respect ively. The lagged model consists of 1,901 observations and the original model consists of 2, 376 observations
113 Table 7 4. OAA reading subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value % passing 0.686 0.128 0.079 0.849 % advanced 0.508 0.109 0.597 ** 0.035 % accelerated 0.971 *** 0.008 0.277 0.415 % proficient 0.019 0.972 0.306 0.3 69 % basic 0.266 0.348 0.238 0.389 % limited 0.035 0.884 0.042 0.892 Note: Percent passing indicates percent scoring proficient level or above. *, **, and *** indicate statistical significance a t 10, 5, and 1 percent level respec tively The lagged model consists of 1,9 75 observations and the original model consists of 2, 464 observations. Table 7 5. OAA science subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value % passing 0.547 0.674 1.019 0.383 % advanced 0.129 0.877 0.287 0.649 % accelerated 2.493 0.128 0.118 0.859 % proficient 1.912 0.191 0.585 0.517 % basic 0.197 0.881 0.371 0.733 % limited 0.346 0.557 0.496 0.228 Note: Percent passing indicates percent scoring proficient level or above. *, **, and *** indicate statistical significance a t 10, 5, and 1 percent level respectively For t he % passing variable, t he lagged model consists of 575 observations and the original model consists of 704 observations. For the individual level variables (% advanced through % limited), the lagged model consists of 430 variables and the original 52 8 var iables. Table 7 6. OAA social studies subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model Variable p value p value % passing 3.764 0.062 1.473 0.381 % advanced 0.869 0.613 0.620 0.600 % accelera ted 2.852 ** 0.030 1.594 0.054 % proficient 0.055 0.968 0.521 0.535 % basic 2.842 0.109 0.117 0.943 % limited 0.887 0.336 0.315 0.773 Note: Percent passing indicates percent scoring proficient level or above. *, **, and *** indicate statistical significance at 10, 5, and 1 percent level respectively. The lagged model consists of 430 observations and the original model consists of 528 observations.
114 Table 7 7. OAA writing subtest DDD using lagged extent of 4 H participation Lagged 4 H DDD Original DDD model p value p value % passing 0.701 0.740 0.980 0.617 % advanced 1.169 0.096 0.534 0.393 % accelerated 1.743 0.376 4.501 ** 0.033 % proficient 2.174 0.460 2.956 ** 0.023 % ba sic 1.228 0.614 0.967 0.522 % limited 0.514 0.455 0.054 0.937 Note: Percent passing indicates percent scoring proficient level or above. *, **, and *** indicate statistical significance a t 10, 5, and 1 percent level respectively. The lagged model consists of 566 observations and the original model consists of 704 observations.
115 CHAPTER 8 SUMMARY, POLICY IMPL ICATIONS AND CONCLUSION Summary Evaluation of the causal impacts of youth development programs such as 4 H is importan t to stakeholders at many levels including policy makers, 4 H administrators, and the parents of participating youth. Significant difficulties exist in obtaining impacts that can be viewed with a causal interpretation, including the presence of many confo unding factors that could obscure a causal relationship. Although I a m not able to conduct randomized experiments which would result in analyses with clear causal interpretations; I ma de the most of my data by eliminating a large degree o f possible confoun ding factors. I combine d grade district s with 4 H participation rates. I applied a difference in difference in differences (DDD) approach to control for potential confounders of the causal relationship at the lev el of school districts, grade levels, and years. Using the full model, I fou nd that the extent of 4 H participation has a statistically significant positive effect on passing rates on both the math and reading subtests of the Florida Comprehensive Assessme nt Test ( FCAT ) as well as statistically significant effects on the average reading FCAT scores Additionally, the difference between these outcomes and those of the simpler OLS models, including not only changes in magnitude but also the direction of the e ffect, reflects the importance of controlling for confounding factors. I found that teacher experience was a statistically significant factor in student performance on the reading subtest of the FCAT when I looked at the relationship between other commonly used measures of school quality and student test outcomes.
116 When examining other facets of the relationship between the extent of 4 H participation and FCAT scores, other interesting results surfaced. The effects of 4 H participation seemed to be stronger among urban school districts. Also, the results provided some evidence that participation in 4 H clubs has a more pronounced impact on test outcomes than does general participation in 4 H various programs. The examination of the impact of 4 H participati on on student behavioral outcomes in Florida did not show evidence of a positive influence of 4 H on student misconduct rates or suspension rates. In particular, the coefficient for 4 as positive and statistically significant at the five percent level. This result is in the opposite direction of prior expectations since the influences of positive youth development should not result in higher incident rates of student misconduct. Neverth eless, I n ote d that the data used for the behavioral outcomes for Florida was not available at the same level of aggregation as the data on the FCAT. Therefore, instead of a grade level dimension, this dimension was at the school level (ele mentary, middl e, and high) A similar data set to that used to conduct the analyses in Florida was obtained from the Ohio Department of Education and Ohio 4 H. The standardized test results examined for Ohio were from the Ohio Achievement Assessment (OAA), an achievemen t test required of Ohio students and taken in third through eighth grades. In Ohio, the results indicated that the extent of 4 H participation has a statistically significant positive effect on the percent of students performing in the highest performance level rates on both the math ematics and reading subtests of the OAA under the full DDD model The results for both the mathematics and reading subtests
117 agree with my prior expectations and are in line with the results seen in Florida in terms of the magnit ude of the effects as well. The other subtests examined in the Ohio analyses (science, social studies, and writing) did not show any statistically significant results under the full DDD model. It is worth mentioning, however, that there were far fewer obse rvations available for these other subtest areas, such as science, which was only given to fifth and eighth grades. Although the results for Ohio are in line with prior expectations and agree with those from Florida in terms of magnitude, there are some in teresting differences. Note that with respect to Florida, und er the DDD model specification, passing percentages for both subtests were statistically significant at the ten percent level. In the analyses of Ohio, neither of the results for the passing perc entages for these subtests was statistically significant, but instead, both reading and mathematics subtest results for only the advanced level were st atistically significant at the five percent level. I also conducted an analysis of behavioral outcomes fo r students in Ohio as well. Unlike in Florida, the Ohio data did not need to be re aggregated in order to conduct this analysis because the behavioral outcomes were available at the grade level, the same level as the data for standardized test outcomes in Ohio. Additionally, the particular misconduct offenses were separated into more narrowly defined disaggregated categories. Despite these two differences, only the result for the impact of 4 H participation on fighting was statistically significant, and in the opposite direction expected. However, the fact that the Ohio behavioral data was provided at a greater level of detail allowed me to examine the incidences of misconduct more closely. By focusing my analysis of student misconduct on later grades (seven th and eighth
118 grades) where incidences of misconduct were much more prevalent, I found that the results given by the preferred model were in the expected direction and statistically significant for both fighting and harassment variables. Policy Implicati ons and Conclusion My findings provide evidence that the extent of 4 H participation has a statistically significant and positive effect on student test outcomes. The evidence first found in the Florida analysis was corroborated to a large degree by the fi nding of similar effects in the context of Ohio for both reading and mathematics. This indicate s that the 4 H program is beneficial to its youth participants from an academic standpoint for key subject areas of reading and mathematics It should be noted however, that the falsification tests that I performed with respect to Florida yielded more convincing results than those conducted with respect to Ohio. One reason for this occurrence could be the fact that school districts and counties do not align in Oh io, while the analyses for Ohio had to be conducted at the county level. This factor could be another reason why, apart from results in the mathematics and reading subtests, few other results were statistically significant. It is also possible that the ina bility of the analyses of the other subject areas to show evidence of statistically significant impacts of 4 H participation is due, in part, to the far fewer observations available for these subject areas at the time of this study, which results in lower precision of the estimates. This reduction in precision due to fewer available observations would also be compounded by the fact that, even though the number of observations are lower in the Ohio data, there are more fixed effects dummy variables, increasi ng the total number of variables considerably in comparison to the number in the Florida analyses.
119 The initial analyses of the impact of 4 H participation on student behavioral outcomes did not yield results that conformed to my prior expectations for the effect of 4 H on student misconduct measures, whether I looked at the data for Florida or for Ohio In Florida, I found some evidence of positive effects of 4 H participation on the urban subgroup when the sample was subdivided into urban and rural subgr oups. For the urban subgroup, 4 H participation seemed to have a negative and statistically significant impact on the alcohol, tobacco, and other drug use outcome and the fighting and harassment outcome. I also looked at the effects of 4 H participation w hen considering only middle and high school levels, which did not yield any statistically significant impacts even though several variables were of the expected signs. The supplementary analysis conducted on the O hio data on the later grades also provide d some evidence that 4 H participation may have some positive influences on student behavior. Among seventh and eighth grade students in Ohio, 4 H participation seems to have a negative effect on both fighting and harassment. Although there were not many ca tegories for the behavioral outcomes that showed statistically significant results, it is reassuring that the outcomes that did show significant results in the supplementary analyses for each state were related. In particular, I refer to the categories rel ated fighting and harassment. One can readily see how these two categories of student misconduct could be closely related to bullying, a problem which has received growing attention as awareness of the problems faced by youth in schools increases. If parti cipation in 4 H, can help reduce some of the outwardly violent components of bullying, then positive youth development programs, such as 4 H, could have impacts that reach far beyond their possible effects on test
120 scores. What might this imply for a school system dedicated to reducing bullying occurring in schools? One thing it could mean is that getting students engaged in positive youth development programs like 4 H could be another avenue of reducing bullying in schools. This could be important, since it indicates that another benefit of 4 H positive youth development is that it could serve as a complement to increased law enforcement coordination with schools and other measures that might be specifically aimed at reducing bullying. Though my results indi cate that there are positive impacts of 4 H participation, the analyses could be improved given the availability of better data for both Florida and Ohio. Currently, the behavioral variables for Florida are calculated at only the school type level (element ary, middle, and high school). If the Florida Department of Education were to calculate these statistics at the grade level, there would be less need for additional aggregation and, hence, better estimates of the impact of 4 H participation could be calcul ated. Similarly, many of these behavioral measures are lumped together into groups of offenses (e.g. alcohol, tobacco, and other drugs). Calculating these categories of student misconduct separately would also allow for better detail in the analyses. In t he case of Ohio, the major improvement in the data would come instead from the 4 H side, where data on 4 H participation is calculated at the county level. If 4 H participation statistics could be calculated at the school district level for Ohio, there wou ld be a dramatic improvement in the level of detail of the impact analysis for this state. An important implication of my results is that they can be used to estimate the economic significance of the impacts of 4 H participation on youth As mentioned
121 prev iously in chapter 2 research has been conducted linking gains in te st scores may to economic growth through improvements in the quality of human ca pital entering the labor market. For example, Hanushek and Kimko (2000), found that an increase of one stan dard deviation in their meas ure of standardize test scores was related to a boost in the growth rate of real GDP per capita of 1 .4 percentage points per year. Taking this calculation at face value, my results for FCAT reading average scores suggest that a one percentage point increase in 4 GDP by roughly 0.1 percentage points per year. This figure is calculated by taking the change in the mean score on the reading subtest for a one percent change in 4 H particip ation ( 1.02 ) and dividing it by the number of points for one standard deviation on that subtest ( 14.15 points ) Therefore, a one percent increase in 4 H participation yields an increase of roughly 0 .07 standard deviations for the mean reading subte st score Taking the 1.4 percentage point boost in GDP growth rate found in Hanushek and Kimko (2000) and multiplying it by the 0 .07 standard deviations gained for a percent increase in 4 H participation results in a 0.1 percent calculated value for the bo ost in GDP growth rate. Alternatively, consider that Kane and Staiger (2002) calculated that the present value of lifetime earnings for a fourth grader from an increase of one standard deviation in standardized test scores was between $90,000 and $210,000. Alternatively, consider that Kane and Staiger (2002) calculated that the present value of lifetime earnings for a fourth grader from an increase of one standard deviation in standardized test scores was between $90,000 and $210,000. As noted in the previo us example, a one percent increase in 4 H participation yields an increase of roughly 0.07 standard deviations for the mean reading subtest score. Taking this increase of 0.07
122 standard deviations for a one percent increase in 4 H participation, these resul ts imply that the present value of lifetime earnings from 4 H participation, according to our results for the FCAT reading subtest, is between $6,300 and $14,700 on average. Several things must be kept in mind when translating impacts on test scores as a result of 4 H participation into economic impacts for cost benefit analyses. First of all, unlike some other youth development programs, 4 H as a whole is not designed with the sole purpose of improving test scores. These improvements are therefore most li kely byproducts of positive youth development. For example, improvements in test scores could result from increased student interest in academically related topics, improved general motivation for school, or more productive interaction with instructors and peers Thus, improvements in FCAT or OAA outcomes might represent a minimum of the 4 H Additionally, l aws and guidelines regarding specifics for student academic evaluations, including those for standardized tests such as the FCAT and OAA, are constantly evolving. As methods for evaluation of students change, it is possible programs designed to target specific tests might become less effective. In this way, programs such as 4 H that focus ins tead on overall positive youth development could provide students with benefits that On the cost side of the equation, another important factor to consider is that 4 H makes use of a v ast pool of volunteers in the delivery of its programs. With over 518,000 volunteers and 3 500 cooperative extension educators nationwide, this results in a 148 to 1 ratio of volunteers per extension educator. These are factors that are important to keep in mind
123 not only for future research but also for policy makers when considering the benefits and costs of the 4 H program. Possible Extensions to this Research There are several possible avenues of extension for this dissertation The first obvious extens ion includes replication of the analyses in this paper on data for other states as both additional 4 H data and other state education related data become available. As the standardized test scores for various states are not directly comparable, it might al so prove of interest to develop a method for utilizing the NAEP mapping to states This would allow the combin ation of data from multiple states into a single data set for analysis while taking account relativ e differences in difficulties of the standardized tests across these regions Another worthwhile extension would be to conduct analysis on the same outcome variables for individual level data when available. This extension would probably go the furthest in establishing the causal link between 4 H participation and st andardized test score outcomes or student behavioral outcomes. Additionally, I found evidence that the effects of 4 H participation seemed to be stron ger among urban school districts. While I mentioned that it i s possible that these results indicate that more disadvantaged urban youth are able to experience more benefits from participation in 4 enlightening to focus some further investigation on the examination of the source of these difference. O verall, this research serves as a useful addition to the literature on the impacts of 4 H by providing some evidence that there is a causally interpretable effect o f participation in 4 H youth development programs on student outcomes for students in both Florida and Ohio.
124 LIST OF REFERENCES 4 H, 2001. National 4 H Impact Assessment Project. Prepared and engaged youth. Washington, DC: United States Department of Agri culture (2001). (Available at: http://www.national4 hheadquarters.gov/about/4h_programs.htm ) 4 H, 2011. 4 H Youth Science Programs. (Available at: http://www.4 h.org/youth development programs/4 h afterschool/) Aizer, A., 2004. Home alone:supervision after school and child behavior. Journal of Public Economics 1835 1848. Allen, J. P., Philliber, S., Hoggson, N., 1990. School based prevention of teen age pregnancy and school dropout: Process evaluation of the national replication of the Teen Outreach Progra m. American Journal of Community Psychology 18, 505 524. Andrews, F., Morgan, J., Sonquist, J., Klem. L., 1973. Multiple Classification Analysis, Second Edition, Institute for Social Research, University of Michigan, Ann Arbor, MI. Astroth, K.A., Haynes, G.W., 2001. out of school time study Report No. 01 0301. Montana State University Extension Service, 4 H Center for Youth Development. Barnow, B.S., Cain, G.G, Goldberger, A.S., 1980. Issues in the Analys is of Selectivity Bias. Evaluation Studies Review Annual 43 59. Bloom, H.S., Orr, L.L., Bell, S.H., Cave, G., Doolittle, F., Lin, W., Bos, J.M., 1997. The Benefits and Costs of JTPA Title II A Programs, Key Findings from the National Job Training Partners hip Act Study. The Journal of Human Resources 549 576. California 4 H, 2011. University of California 4 H Youth Development Program. (Available at: http://www.ca4h.org/About/Mission/EL/ ) Coates, D., 20 03. Education Production Functions Using Instructional Time as an Input. Education Economics 273 292. Coleman, J.S., Campbell, E.Q., Hobson, C.J., McPartland, J., Mood, A.M., Weinfeld, F.D., York, R.L., 1966. Equality of Educational Opportunity. Washington D.C. U.S. Government Printing Office. Currie, J., Hanushek, E.A., Kahn, E.M., Neidell, M., Rivkin, S.G., 2009. Does Pollution Increase School Absences? The Review of Economics and Statistics 682 694.
125 D e Mello, V.B., Blankenship, C., McLaughlin, D., Rahma n, T., 2009. Mapping State Proficiency Standards Onto NAEP Scales. U.S. Department of Education, NCES 2010 0456. (Available at:http://nces.ed.gov/nationsreportcard/pdf/studies/2010456.pdf) Eccles, J.S., Barber, B.L., 1999. Student council, volunteering, ba sketball, or marching band: What kind of extracurricular involvement matters? Journal of Adolescent Research 14, 10 43. Eccles, J.S., Templeton, J., 2002. Extracurricular and other after school activities for youth. Review of Research in Education 26, 113 180. Figlio, D.N., Lucas, M.E., 2004. Do High Grading Standards Affect Student Performance? Journal of Public Economics 1815 1834. FLDOE, 2009. Florida Department of Education, Florida Schools Indicator Reports Data. (Available at: http://www.fldoe.org/eia s/eiaspubs/fsir.asp) FLDOE, 2010. Florida Department of Education, Sunshine State Standards ( Available at : http://www.fldoe.org/bii/curriculum/sss/sss1996.asp ) FLDOE, 2011. Florida Departm ent of Education, FCAT Frequently Asked Questions. (Available at: http://www.fldoe.org/faq/default.asp?Dept=202&ID=660) Flores Lagunes, A., Timko, T., 2011. Does Participation in 4 H Improve Schooling Outcomes? Evidence from Florida Working Paper, Univers ity of Florida. Florida 4 H, 2010. Florida 4 H Youth Development, What is a 4 H club? (Available at: http://florida4h.org/clubs/about.shtml ) Goodwin, J., Carroll, J.B., Oliver, M., 2005. Public School Research Report, Colorado State University. Grossman, J.B., Tierney, J.P., 1998. Does mentoring work? An impact study of the Big Brothers Big Sis ters program. Evaluation Review 22, 403 426. Hahn, A., Leavitte, T., Aaron, P., 1994. Evaluation of the Quantum Opportunities Program (QOP): Did the program work? Waltham, MA: Brandeis University, Heller Graduate School, Center for Human Resources. Hanks, M. Eckland, B., 1978. Adult Voluntary Associations and Adolescent Socialization. The Sociology Quarterly 3, 481 490. Hanushek, E.A., 1986. The Economics of Schooling: Production and Efficiency in Public Schools. Journal of Economic Literature 1141 1177.
126 Ha nushek, E.A., Kimko, D.D., 2000. Schooling, labor force quality, and the growth of nations. American Economic Review 1148 1208. Hanushek, E.A., Welch, F., 2006. Handbook of the e conomics of e ducation, Amsterdam, Elsevier. Heckman, J., LaLonde, R.J., Smith, J.A., 1999.The e conomics and e conometrics of a ctive l abror m arket p rograms. Ashenfelter, O., Card, D. (Eds.), Handbook of l abor e conomics. Amsterdam: Elsevier, 1865 2097. Hollister, R. 2003.The g rowth in a fter s chool p rograms and t heir i mpact. Brookings I nstitution, Washington, DC. Imbens, G.W., Angrist,J.D., 1994.Identification and Estimation of Local Average Treatment Effects. Econometrica 62, 467 475. Kane, T.J., 2004. The Impact of After School Programs: Interpreting the Results of Four Recent Evaluat ions. Working paper of the William T. Grant Foundation 33. Kane, T.J., Staiger D., 2002. The Promise and Pitfalls of Using Imprecise School Accountability Measures. Journal of Economic Perspectives 16 (4), 91 114 Kuhn, P., Weinberger, C., 2005. Leadershi p Skills and Wages. Journal of Labor Economics 395 436. Landers, D., Landers, D., 1978. Socialization via Interscholastic Athletics: Its Effect on Delinquency. Sociology of Education 51, 299 301. Lerner, R.M., Lerner, J.V., Almerigi, J.B., Theokas, C., Ph elps, E., Gestsdottir, S., 2005. Positive youth development, participation in community youth development programs, and community contributions of fifth grade adolescents: Findings from the first wave of the 4 H study of positive youth development. Journal of Early Adolescence 25 (1), 17 71. Lerner, R. M., Lerner, J. V., Almerigi, J. B., Phelps, E., 2009. Waves of the Future: The First Five Years of the 4 H Study of Positive Youth Development. Institute for Applied Research in Youth Development, Tufts Univ ersity. (Available at: http://4 h.org/d/Assets/National%204H%20Tufts%20Report_highres%20%282%29.pdf) Lewis, S.R., Murphy, T.H., Baker, M., 2009. The Impact of the 4 H Program on Nevada Public School Youth. Journal of Extension (Available at: http://www.joe.org/joe/2009june/rb3.php ) Marsh, H.W.,1992. Extracurricular activities: Beneficial Extension of the Traditional Curriculum or Subversion of Academic Goals? Journal of Educational Psychology 84, 553 5 62.
127 McCorkle, D.A., Howard, J., Klose, S.L., Hanselka, D., Lepely, T., 2007. Economic Benefit of Youth Leadership Experience through Texas 4 H. Texas Agrilife Extension Document, MKT 3557P (n.d.): 3. Morgan, S.L., Winship, C., 2007. Conterfactuals and c aus al i nference. Cambridge University Press. Murnane, R.J., Willett, J.B., Braatz, M.J., Duhaldeborde, Y., 2001. Do different decade later? Evidence from the NLSY. Economics of Edu cation Review 311 320. National Research Council, 1987. Adolescent sexuality, pregnancy and childbearing. In: Hayes, C.D. (Ed.), Risking the Future. National Academy Press, Washington, DC. ODE, 2007. Ohio Department of Education Office of Assessment, Ohio Statewide Testing Program Rules Book. (Available at: http://www.noacsc.org/rsit/Rules%20Book%202 8 07.pdf) ODE, 2011. Ohio Department of Education, Interactive Local Report Card home. (Available at: http://ilrc.ode.state.oh.us/) ODE, 2011b. Ohio Department of Education, National Assessment of Educational Progress in Ohio vs. Ohio Assessments (Available at: http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3& TopicRelationID=1069&ContentID=8395&Content=100602) Ohio 4 H, 2011. Ohio 4 H Year ly Statistical Reports. (Available at: http://www.ohio4h.org/about/statistics.html) Otto, L.B., 1975. Extracurricular Activities in the Educational Attainment Process. Rural Sociology 40, 162 176. Philliber, S.P., Allen, J.P., Hoggson, N., McNeil, W., 1988 Teen Outreach: A three year evaluation of a program to prevent teen pregnancy and school dropout. Junior Leagues of America, Washington, DC. PIRC, 2011. Parental Information and Resource Center No Child Left Behind: Accountability FCAT and School Report Cards. (Available at: http://www.flairs.org/pirc/nclbaccountability.pdf) REDI, 2009. Rural Economic Development Initiative, Florida Statutes Chapter 288.0656. (Available at: http://www.myfloridahouse.gov/FileStores/Web/Statutes/FS09/CH0288/Section_02 88.065 6.HTM). Roth, J.L., Brooks Gunn, J., Murray, L., Foster, W, 1998. Promoting healthy adolescents: Synthesis of youth development program evaluations. Journal of Research on Adolescence 423 459.
128 Rubin, D.B., 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 66, 688 701. Satcher, D., 2001. The Surgeon General's Call To Action To Prevent and Decrease Overweight and Obesity (Available at: http://www.surgeongeneral.gov/topics/obesity/calltoa ction/toc.htm). Schneider, B., Carnoy, M., K ilpatrick, J., Schmidt, W., Shavelson, R., 2007 Estimating causal effects using experimental and observational designs. R eport from the Governing Board of the American Educational Research Association Grants Pro gram American Educational Research Association, Washington, DC. Schochet, P., Burghardt, J., Glazerman, S., 2001. National Job Corps Study:The Impacts of Job Corps on Participants' Employment and Related Outcomes. Mathematica Policy Research, Inc., Prince ton, NJ. Schuman, H., & Presser, S., 1981. Questions and answers in attitude surveys: Academic Press, New York. Schwarz, N., 1999. Self Reports: How the Questions Shape the Answers. American Psychologist 54, 93 105. Schwarz, N., Knauper, B., Hippler, H., N oelle Neumann, E., Clark, F., 1991. Rating Scales: Numeric Values May Change the Meaning of Scale Labels. Public Opinion Quarterly 55, 570 582. Silliman, B., 2007. Critical Indicators of Youth Development Outcomes for 4 H National Mission Mandates.(Availabl e at: http://www.national4 hheadquarters.gov/library/Indicators_4H_MM.pdf). Winship, C., Morgan, S.L., 1999. The Estimation of Causal Effects from Observational Data. Annual Review of Sociology 659 707.
129 BIOGRAPHICAL SKETCH Troy Timko is a PhD student in t he Food and Resource Economics Department at the University of Florida. Troy received his Bachelor of S cience in economics and Master of Science in forest resources and c onservation from the University of Florida as well Following the completion of his ma year in Japan teachin g English as a foreign language. During his experiences overseas he realized his affinity for teaching, which resulted in his return to the United Sta tes to pursue his doctorate in food and resource e conomics During the research phase of his studies he has conducted analyses on the impacts of the 4 H youth development program on student outcomes for both Florida and Ohio.