1 PARENTAL ATTITUDE TO WARD STATE WIDE STANDARDIZED TESTING AND REPORT CARD GRADES: SOMETIMES IN SYNC AN D SOMETIMES AT ODDS By ADAM P. DENNY A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN EDUCATION UNIVERSITY OF FLORIDA 2010
2 20 1 0 Adam P. Denny
3 To my m om
4 ACKNOWLEDGMENTS I would like to start out thanking my research advisor, Dr. David Miller, who has shown patience and diligence with my progression and without his guidance I would not be able to accomplish this paper. Also, I would like to thank my committee member, Dr. David Therriault, who was instrumental in the completion of the paper. I would like to thank Dr. Anne Donnelly and Dr. Doug Levey, who allowed me to use the S cience Partners in Inquiry based Collaborative Education (SPICE) program to collect the data as well as the teachers, students, and parents at the Alachua County Middle Schools who assisted with the completion this survey. Without the data, none of this c ould be accurate. I would also like to thank my wife, who has sacrificed hours away from me in order to complete this paper along with the time used for all of the classes I would also like to thank Dr. Brian Schnoover, who without his exampl e I would not have gone as far as I have. And last but not least, I would like to thank my Mom, who was always reminding me to finish.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ ............... 4 LIST OF TABLES ................................ ................................ ................................ ........................... 6 LIST OF FIGURES ................................ ................................ ................................ ......................... 7 ABSTRACT ................................ ................................ ................................ ................................ ..... 8 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .................... 9 Validity ................................ ................................ ................................ ................................ ..... 9 Construct Irrelevant Variance ................................ ................................ ................................ 10 Large Scale Testing ................................ ................................ ................................ ................ 12 ................................ ................................ ............ 18 Report Card Grades ................................ ................................ ................................ ................ 19 2 METHOD ................................ ................................ ................................ ............................... 22 Questionnaire ................................ ................................ ................................ .......................... 22 Participants ................................ ................................ ................................ ............................. 22 The Parent ................................ ................................ ................................ ........................ 23 The Child ................................ ................................ ................................ ......................... 23 3 RESULTS ................................ ................................ ................................ ............................... 25 4 DISCUSSION ................................ ................................ ................................ ......................... 32 Discussion and Findings ................................ ................................ ................................ ......... 32 Limitations and Future Studies ................................ ................................ ............................... 35 APPENDIX : PARENTAL ATTITUDES SURVEY ON FCAT SCORES AND REPORT CARD GRADES ................................ ................................ ................................ .................... 37 LIST OF REFERENCES ................................ ................................ ................................ ............... 42 BIOGRAPHICAL SKETCH ................................ ................................ ................................ ......... 45
6 LIST OF TABLES Table page 3 1 Parent reported grades in math and science. ................................ ................................ ...... 26 3 2 Mean responses and standard deviations. ................................ ................................ .......... 27 3 3 Selected questions that the majority of respondents agreed with and the percentage that agreed with it. ................................ ................................ ................................ .............. 28 3 4 Selected questions that the majority of respondents disagreed with and the percentage that disagreed with it. ................................ ................................ ....................... 28 3 5 Independent t test for gender on Math report card grades, science reported grades, Math FCAT scores, and Science FCAT scores. ................................ ................................ 29 3 6 current academic standing. ................................ ................................ ................................ 30
7 LIST OF FIGURES Figure page 3 1 The amount of time that parents spent helping their child with homework in English and Mathematics. ................................ ................................ ................................ ............... 25
8 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Arts in Education PARENTAL ATTITUDE TO WARD STATE WIDE STANDARDIZED TESTING AND REPORT CARD GRADES: SOMETIMES IN SYNC AN D SOMETIMES AT ODDS By Adam P. Denny December 20 10 Chair: David Miller Major: Research and Evaluation Methodology Parent opinion about standardized assessments such as the FCAT verses traditional report ing systems such as report card grades could pose a valuable resource in future endeavors for state wide implementation of these tests. Their interpretations represent a form of construct irrelevant validity that can pose a threat to test validity. The researcher used a forty seven item perceptions in three specific areas; comparisons between the FCAT scores and report card grades, the consequences about those scores, and who is believed to have the greatest control over report card grades and FCAT scores. General information was gathered on the in dividuals to make comparisons and conclusio ns. Questions were also separated to illustrate support for either FCAT or report card grades. Additional analysis was done on gender differences and whether future expectations impacted current scores. Approxi mately eighty surveys were returned and i t was concluded method for accountable and comparability necessarily fulfills that role They want to hold school officials accountable for learning an d to compare the education that their child is receiving with others but p arents still rely on report card grades to
9 CHAPTER 1 INTRODUCTION Tests are used in various capacities from evaluating progress in education to placin g minimum standards for graduation or retention Often these tests only report results as a total score. The score is a summary of the entire test (or subject area) and then interpreted as an s level of competence in that content area. In ord er to have a test able to be taken by large number s of people within a limited amount of time in administering and scoring but still remaining reliable, valid, and comparable across groups these test have moved toward standardization methods that can be us ed for large scale testing. Large scale testing has become a controversial topic and debated extensively Educational Measurement: Issues and Practice dedicated a special issue to just such a subject in the Winter edition 2006. In the first of five articl es, Ferrara (2006) suggests making the centrality of construct irrelevance in defining the construct to be measured. Validity Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed use the same, the examinees the same, the content the same and the scores the same, the validity can change from one interpretation or use to the next. It is the interpretation or use of the test that is being validated. Validation is an investigative process from which an argument has to be made for the interpretation or use of test scores. Once you have made this argument, evidence has to be collected and evaluated to support the validity of the in terpretation (Haladyna and Downing, 2004). As Haladyna and Downing (2004) state, the most fundamental step in validation is defining the construct. For state standardized testing, these include defining the domain of
10 knowledge and skills and /or cognitive ability. If a test effectively measures a set of knowledge and skills it would be assume d that if a student has high marks on the exam, he or she has a high amount of knowledge in the subject area that the exam is covering. Multiple choice fo rmats are a good way to evaluate this (Haladyna and Downing, 2004). Cognitive abilities may require a series of more complex tasks. This is harder to evaluate on a large scale testing format because they are often multiple pathways that can arrive at an a nswer. These are represented by performance exams and large scale testing that have an underlining assumption that if the examinee can master the content knowledge, he or she has the aptitude to perform higher level tasks. The definitions of the construc t is where additional factors can become an unintended part of the assessment and dilute the construct being measured. Construct Irrelevant Variance Haladyna and Downing (2004) list construct irrelevant variance as one of the five major threats to validity Construct irrelevant variance (CIV) is error variance that is caused by systematic error. Random error is so labeled because of its randomness. We cannot measure or anticipate this error, but systematic error is not random; CIV is predictable. It can be defined in person or group terms. It can systematically increase scores just as well as it can systematically decrease scores. Mathematically, systematic error s are correlated with observed scores an d true scores while the expected value of random e rror s is zero. Ra n dom error is uncorrelated to either true or observed scores and if CIV is present, a non zero value for errors is expected. One example of this is if two different test forms that have different difficulties are used and they are not e quated. This is systematic error by group. A second example of CIV which is commonly mentioned is the effect of reading ability on math or science assessments If the test
11 is designed to measure science knowledge, but those with lower reading abilities score lower then what is being measured is not just science ability, but reading ability. Reading has become a factor This is espec ially true on timed tests. The effect of this error is on the individual level and the construct irrelevant effect is r eading comprehension. The interpretation that would be given by results on both of these errors would be biased Other types of CIV would be test anxiety, fatigue and motivation (Haladyna and Downing, 2004). Wise, Bhola, & Yang (2006) used warning messages in low stakes testing to increase motivation, decrease CIV and thus increase the validity of test scores. T hey use the amount of time that a student spends responding to an item and i f the student takes too long on an it em a warning is given Their results report an increase in validity but their drawbacks are their procedures are limited to computer based testing and it was only used for low stakes testing. It is debatable whether h igh stakes testing has the same lack of motivation that low stakes testing does. Construct irrelevant variance should be fair across all examinees and can be linked with four areas of fairness. Those four areas are: 1) tests should have no form of bias, 2) examinees should all be treated th e same, 3) if groups are equal in ability then their scores should be equal, and 4) students have an equal advantage to learn the knowledge before being tested. Haladyna and Downing (2004) describe several points that can affect fairness or CIV including test preparation, test scoring, test item format and test administration. Even though students are the central stakeholders in determining CIV, parents can alter CIV to threaten validity. Parents can alter motivation, pay for test preparation, and pres sure politician s to change policies or interpretations of the test responses As voters, they can impact elected members and thus change the interpretation of the exam without changing the exam.
12 This would alter the validity of the tests. Understanding and report card grades are important because of the influence that parental perceptions have on CIV. Large Scale Testing With respect to validity on large scale testing, Haertel (1999) broke down the argument into three points. First, large rationale that the public gives for using large scale testing. The public and media want to be able to evaluate their school systems co mpared to previous years and other schools. Second, these tests are designed to focus attention on educational concerns. Using this method to compare school systems makes it easier to write about in the mainstream media. After all, it is the parents an d other people within the community that are footing the bill for the education system and it is presumed that the money should lead to positive results. And last the result of large scale testing should reduce changes in the part of the educational sys tem that is showing positive results or lead to an overhaul of those that do poorly. The results allow teachers, parents, students, and administrators to realize what is being taught in their school system and how well the system is succeeding. When comp aring the results to previous years, stakeholders would want to see improvements. Large scale testing has come a long way in public education. It has grown tremendously beginning with a series of publications in the mid 1980s with A Nation at Risk in 1 983 and John Nationally N ormed chools in 1987 aimed at creating measurable standards and the absence of statewide curriculums (DePascale, 2003 ). Large scale testing has many advantages includin g the administrational ease and producing reliable results using objective quantifiable results (Abu Alhija, 2007). Teachers
13 can use large scale testing to align their curriculum with standards, and also to enhance student learning and achievement (Stech er, 2002). Administrators can use the results to apply resources that will change school policies to programs enabling student learning to increase Hanushek & Raymond (2004) found that states with accountability systems had more gains in math and scienc e than those who did not. Large scale testing is often high stakes because of the associations and impacts of the test results with decisions that from which results are used to make significant education decisions about schools, teachers, government levels will then use results to evaluate the progress of schools. Public officials will examine the multiple uses of the assessments. Florida includes a wide range of assessments including h igh school graduation school grades, t eachers' meri t pay and student promotion from 3rd to 4th grade. Recently and arguably (Nagy, 2000; Serafini, 2001), large scale testing has been used for providing instructional diagnosis. Tests with higher stakes generally make for stronger consequences (Kane, 2002) These stronger consequences are both positive and negative. The No Child Left Behind Act (NCLB) of 2001 defines how our current society relies heavily on test scores to measure individual progress, and in particular standardized testing. The NCLB is a federal law designed to increase the use of diagnostic feedback to make the public school progress It funds federal programs designed to improve student performance in the United States by increasing accountability by states and schools for any school that receives federal funding. Establishing measurable goals can increase individual outcomes in education. Each state has to develop assessments with their own individual
14 standards and provide a state wide standardized test which is then approved by the federal government. Progress is expected to be made from year to year, or the school will face decreased funding after an initial period given to allow time for the school to improve. If the school does not meet this improvement, students can be removed from that school to attend a Schools that do not make adequate progress can also be closed. From its inception in 2001, federal funding of education has increased approximately 40% (US Depar tment of Education Press Release, December 10, 2008). With such a large amount of funds giving to educational assessment from the government, try and increas e outcomes with measurable goals; diagnostic assessment poses a great advantage over the general estimate of ability score. To increase performance, high stakes testing needs to have beneficial information provided for each level of the education system t o create, plan and implement an individual assessment plan. There are negative effects of large scale testing (see Eisner, 2001; Thompson, 2001) Aggravation and anxiety (Elliot & Branden, 1997) for the examinees are two effects that accompany the stere otype often associated with large scale testing. Because of the consequences score as opposed to increasing their master y of the subject (Stecher, 2002; Lewis, 2001). This is teachers teaching to certain content forms and being less adaptive to the larger content domain but it can also include changing the curriculum to such an extent th at it excludes other subjects not included in the test. Teachers will often sacrifice learning time to other subject areas while focusing on teaching material that will be covered on the exam (Kohn, 2000). This has been argued in the
15 opposite direction by claiming in a positive way to align instruction with state content standards It takes curriculums that are not in compliance with what students should know and molds the curriculum into what it is that students are suppose to specifying the content to be taught (Kane, 2002). Some have argued that t ests items are constructed and chosen not exclusively for their ability to gauge th e achievement of the student but also for their ease of scoring, ease of use, and abilities untested problem with most standa are designed to be quickly scored mean ing that the multiple choice questions are set up so that they emphasize simple memorization of facts, not higher level abilities like interpretation. Lewis (2001) acknowledges that these tests are not made to measure other qualities that are needed for success (i.e. leadership and creativity) at Fortune 1000 companies. Kohn (2000) wrote that clever strategies on test taking can supplant actual knowledge in getting higher scores. Politicians in several states have claimed that they a re satisfied with their student s increase in scores, but it is not known whe ther this is an improvement in ability or just the narrowing of the curriculum to improve test scores. It is assumed that the increase in scores will lead to the increase in knowledge on the State Standards, not test taking strategies. It is also assumed that this will be done without losses in other areas (Kane, 2002). This has prompted some colleges to rely less on standardized tests for entrance into the college. Large scale testing is less likely to
16 have these negative effects if higher order thinkin g and problem solving are included (Abu Alhija, 2007). Other studies have researched and found a difference in exam scores as it corresponds to race, ethnicity, and socioeconomic status (Borg et al., 2007; Hanusek & Raymond, 2004). In terms of equality, o ne or the biggest arguments for standardized large scale testing is to make sure that high statewide standards are demanded from all schools. Kohn (2000) addresses this issue of fairness by first arguing that standardized tests are biased because their qu estions often require a set of knowledge and skills that are more likely to be attained by more privileged children. This knowledge would have been gained outside of school. More privileged children can also afford better test preparations. Borg et al. (2007) found that schools with more new teachers had a higher failure rate on a standardized exam while schools with teachers with advanced degrees had a higher percentage of passing on standardized exams. Kohn (2000) argues that low income and minority s tudents are disadvantaged by the emphasis on tougher standards by establishing a system of rewards and punishments that is destructive to learning. He argues that the tests are not designed to measure the dom ain that they were intended for and t he tests discourage students from thinking and replace it with A s alluded to earlier, a n additional emphasis is placed on the speed of which one can complete the test as opposed to the thoughtfulness o r thoroughness of the work (Kohn, 2000). For younger test takers, the test could be testing the capacity with which the student can sit still. Downing (2002) reports findings that CIV increases with poorly drafted test questions. His study uses a low s takes medical exam but does report that poor item writing can cause excess difficulty and interferes with the interpretation of the results.
17 When a mistake is made in scoring the exam, a huge ripple effect can be felt throughout all the levels that are c ontingent on the results. An example of this is when Florida incorrectly graded the state standardization exam higher in 2006 and did not figure out the mistake until over a year later when the exam scores decreased considerably. Because the exam had bee n tied to graduation, yearly retention the state. Florida is one of the few states that have tied up so much weight on the scores of standardized testing, but as Cech (2007) points out examinees had scores that were o ff by 10 to 200 points as reported in the Los Angeles Times (Trounson & Silverstein, 2006). Although incorrect scoring procedures do not take away the validity of the exam and the 4000 examinees are less than one percent of the total test takers for that year, it does hit media markets and become a sore spot for readers that can ultimately affect public opinion toward standardized exams. In 1992, a poll found that 71% of parents favored the use of measuring the academic achievement of students through the use of standardized national tests in public schools (Elam, Rose & Gallup, 1992). This differed from another study by the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) in 1995 that was designed to examine parents op inions on standardized testing and found different results. In the CRESST poll, they surveyed 3rd grade parents and found that only 46% supported national tests. Additionally, parents were found to put more emphasis on report card grades then standardize d tests on determining student progress (43% to 14%). Seven years later, in 2002, the USA Today reported that parents were evenly split on a decision of whether standardized exams were a good indicator
18 of student performance (49.6 % said it was and 48.5% that it was not). When asked about what was a better indicator of performance, report card grades or standardized test scores, parents responded that grades were a better indication of performance at 33.6% as compared to 29.4% reporting that standardized tests were a better indicator (the two other factors were stated as high school graduation rates and student behavior). used not simply as measures of progress but rather as the instruments that will promote p The state of Florida uses the FCAT for five purposes. One of the uses was touched on earlier with the accountability that the No Child Lef t Behind Act of 2001 has required The act requires each state to establish yearly improvements for the district and school and establish the standardized tests to produce the results called Adequate Yearly Progress (AYP). Included in the results are various subgroups (i.e. ethnicity, socioeconomic status, disability). Every school must have all students and subgroups participate in the state assessment program. This means that no school can pad their results by withholding groups from tak ing the exam and all schools must incorporate all subgroups into the classrooms for equal learning. The state sets the passing rate for each year and the NCLB of 2001 requires that 100 percent of students will be proficient by the 2013 14 school year (alt The other four uses of the test are requirements specific to the state of Florida. The state of performance gra des. These grades give an evaluation number to a school so that parents can understand and compare the school that their child is attending with other schooling options Ultimately, it provides the parent with a measure of the education al system
19 Another use that Florida uses the FCAT score is to promote students to the next grade or out of high school. Although Florida has been doing this for a long time, as of 2002, 18 other states also required passing the state standardized exam in order to re ceive a high school diploma (Amrein & Berliner, 2002). It means that even if a student were to receive grades sufficient to pass high school, they could be denied a diploma if they fail to achieve passing on the statewide standardized exam. The final two uses of the FCAT are for progress monitoring and remedial training. If a score is below a certain level on the test the student is required to have intensive remedial training Low scores may also result in having a progress monitoring plan that monitor s the learning level at a more scrutinized level The actual uses of the assessment vary by grade level. For example, if a student is in the third grade, the reading portion of the test is used for promoting to the next grade, AYP, school grade s and progress monitoring plan s Testing students year after year would assume that students learn at the same pace and should increase evenly across grades, or at a minimum, everyone would increase evenly on average Report Card Grades This study also a sks parents about their beliefs on report card grades. Report card grades have been around for a long time; schools have been issuing report grades to parents and grandparents of students long before the advent of large scale testing. They are issued by school systems throughout the United States and involve many interpretations. The purpose of grades is performance at school, of which the validity of these grades have been questioned by many researchers (Allen 2005; Friedman & Frisbie, 1995; Gallagher, 1998). With grades being a heavy factor in educational decisions such
20 as schools to get into, colleges to attend, jobs to apply to, and honors to receive, it is important to understand the interpretation that parents have in re gards to report card grades. Allen (2005) believes that teachers often add other extraneous factors to the grade that is reported such as effort, attitude or neatness. The weight of these external factors is different from teacher to teacher and its addition undermines the interpretation of the grade system. These are important constructs to illustrate to parents, but should not be included in the letter grade. There is a wide spread difference in the amount these other factors should weigh in on with grades. Because of this widespread variability, the validity is hard to measure (Allen, 2005). Studies have tried to understand the difference between parents and teachers understanding of report card grades (Friedman & Frisbie 1995; Waltman & Frisbie, 1994). Both studies report a lack of understanding by each group, however, Waltman & Frisbie (1994) report that 94% of teachers and p arents believe that effort should be included in grades. Teachers will also view grades as a motivational tool to develop good study habits and promote healthy classroom behaviors. With all of these factors in the grade itself, the grade has become more of a function of many variables. As a result, it can lead to miscommunication and confusion (Allen, 2005). Waltman & Frisbie (1994) found that teachers and parents disagreed on the spread of grades. It was found that when parents reported what the sprea d of teacher assigned grades were A child bringing would be interpreted by the teachers as illustrating that the student is below average where parents would understand that as average. With validity one of the most fundamental principles of educational measurement, it would be important to make sure that teache rs are trained on measurement principles. But fewer than
21 2005). Critics argue that even when teachers are taught the basic measurement principles, they sti ll rely on their value judgments to assign grades (Brookhart, 1993). This is probably due to the fact teachers have a difficult time accepting these measurement principles that are not aligning with their experiences (Allen, 2005).
22 CHAPTER 2 MET HOD Questionnaire A survey was created with 45 questions on it split into two parts: the first part consisted of biographical information and the second part consisted of opinions. Questions were printed on both sides of the paper and numbered from one to forty five. Information on parental help with homework and child participant in school information was also included. Grades for both report cards and FCAT scores were self reported by the parent. The biographical information asked to make a choice fr om a selection and included possible multiple selected answers. Opinions were broken into several areas; general comparisons between the FCAT and report card grades (items 16 25) consequences about those scores (items 26 33) and who has control over the scores or grades (items 34 45) The opinionated questions operated on a four point Likert scale ranging from Strongly Disagree to Strongly Agree. There were no open ended questions on this survey, but some parents wrote additional information in the margins or at the end of the survey that will be discussed later in this paper. Participants Surveys were given to teachers to hand out at the end of the school year to students that participated in the Science Partners in Inquiry based Collaborative Education (SPICE) program. The SPICE Program is a National Science Foundation funded program that promotes inquiry based learning in under resourced middle school classrooms. The program takes University of Florida (UF) gr aduate students focusing in areas of science, technology, engineering, and mathematics and teaches them how to apply hands on training techniques in the classroom. The UF graduate students are then paired with a middle school teacher and teach throughout a full resourced middle
23 schools applied to the SPICE program to participate and then the program selected the most qualified individuals to fit the needs of the program. The breakdown of the audience of the survey was determined by the grade level that the teachers in the SPICE program taught. The students were then given the survey to take home to their parents and bring back. If the parent had multiple children, the participant was asked to answer the questions based on the student who brought the survey home. All surveys were collected within two weeks of being distributed out. Approximately 600 students were involved in the SPICE program, of which 497 surveys were distributed. One teacher had issues not involving the program but with her certifications and her students were not considered as part of the study. Out of the 497 surveys that were handed out, 86 were returned back. Out of those 86, five were not able to be used in the analysis, eight were not finished, and four were filled out only on the front side of every page. The five that were unable to be used in the analysis were not legible enough to be of any understanding. Depending on how one calculate s it, this gives a response rate between 13 and 17%. The Parent Sixty percent of responde nts who filled out the survey w ere female Almost 60% were in the age group of 30 39. Although I included five categories on the survey for age group, no respondent was between the ages of 20 29 or over 60 years of age while around 20% were in both remaining categories of 40 49 and 50 59 years of age. All respondents had three children or less, with 80% accounting for two or three children. A lmost 90% of all those surveyed reported that they were full time employed. The Child The grade of the child, as mentioned before, had more to do with the layout of the classes than anything else. The dependency on the classes from the SPICE program determined the
24 breakdown of the grade level. The survey was only conducted at middle schools, so only 6 th through 8 th grade students were targeted Out of those grades, eighth grade had the least representation at 21%. Grades six and seven had only a two person difference and were split at approx imately 40 percent. The gender of the child was slightly in favor of male child over the female child (56% to 4 4 %).
25 CHAPTER 3 RESULTS All measurements were inputed into Microsoft Office Excel 2007 and then analyzed using SPSS 16.0 Graduate Pack. Figure 3 1 shows the breakdown of the amount of time that the repsondent spent helping the student with homework in English and Mathematics. A s evident, there is a much wider spread in the amount of time that a person helped a student with English as opposed to the amount of time they helped in Math. Of the respondents who were unsure of the amount of time that they helped their child with homework in science, they knew the time that they helped with Math and it was all on the lower end of the sca le. A nother question asked the amount of time spent helping with homework in other subjects and 80% of answers fell in 3 hours or less. Figure 3 1. The amount of time that parents spent helping their child with homework in English and Mathematics. Sel f reported report card grades were asked for in this questionnaire and Science grades were reported (Table 3 1) as higher overall with 41% achieving a grade of an A, where in comparison, A grades were reported in Math for only 20% of the students 0 5 10 15 20 25 30 35 40 45 50 Less than 1 hour 1 to 3 hours 3 to 5 hours More than 5 hours Unsure English Math
26 Table 3 1 Parent reported grades in math and science. In Math? In Science? 20.3% A 40.5% A 59.5% B 39.2% B 20.3% C 20.3% C -D -D -F -F By far, the most unanswered question from the survey was the question asking what the 33% of the time. This is compared with no missing items on report card grade s. Parents in this The last question before the Likert Scale questions asked the parents which they would believe, the FCAT or report grade s if the two were not consistent. There were only four possible choices for them to chose from: one favoring FCAT scores as correct, one favoring school grades as correct, one saying they are both accurate but measuring different aspects of achievement, a nd another selection being that the parent would not know what to think. No one choose that they parents were split between the remaining two options, with a h eavy favoring by parents on school grades being accurate and FCAT scores are not (58.2%). Thirty seven percent thought that they were each measuring different aspects of achievement (5 % did not answer the question ) Questions 16 45 w ere all broken down into a four item Likert scale question with options for strongly disagree, disagree, agree, and strongly agree. When running analyses on this data, strongly disagree was coded at a 1 disagree at a 2 agree at a 3 and strongly agree at a
27 4 Table 3 2 shows the number who responded to the question, the mean and the standard deviations for each question. Table 3 2 Mean responses and standard deviations. Survey Question N Mean SD My child enjoys homework. 75 2.00 .000 Homework helps my child learn. 74 3.20 .405 Homework helps my child get better FCAT s cores. 74 2.39 .808 Homework helps my child get better report card grades. 74 2.99 1.104 My child gets enough homework from school. 72 2.40 .494 My child has enough time to do his/her homework. 72 3.25 .436 I end up doing more of the homework than my child. 72 2.01 .661 65 2.29 .458 FCAT scores are more important than report card grades. 71 2.79 .754 his/her report card grades. 71 2.17 .737 FCAT scores are important. 75 2.77 .421 academic development. 75 2.17 .381 ability. 74 1.96 .629 The academic success. 73 1.97 .623 Report Card grades are important. 73 3.42 .498 Report card grades are important in determining your 73 3.22 .417 Report card grades are a good academic ability. 73 3.03 .645 future academic success. 73 3.01 .656 72 1.78 .419 You are the greatest 72 2.10 .342 It is up to my child how well he/she does on FCAT scores. 72 3.38 .488 scores. 72 3.03 .627 Your child can do better on the FCAT. 72 2.79 .409 FCAT scores. 72 3.00 .168 72 3.14 .387 grades. 72 2.76 .741 It is up to my child how well he/she does on report care grades. 72 2.79 .730 card grades. 72 2.58 .783 Your child can do better on his or her grade. 72 3.21 .409
28 Table 3 2 Continued Survey Question N Mean SD The educational system is responsible for y grades. 72 2.79 .409 Respondent answers for question 16 45 were combined into whether they agree with the statement (if they selected agree or strongly agree) or whether they did not (if they selected disagree or strongly disagree). Table 3 3 lists the questions that the majo rity agreed with and the percentage that agreed. Adversely, table 3 4 lists the questions that the majority disagreed with and the percentage that disagreed. Table 3 3 Selected questions that the majority of respondents agreed with and the percentage that agreed with it. Survey Question N Agreed Homework helps my child learn. 74 93.7% Homework helps my child get better FCAT s cores. 74 55.7% Homework helps my child get better report card grades. 74 74.7% My child has enough time to do his/her homework. 72 91.1% FCAT scores are more important than report card grades. 71 53.2% FCAT scores are important. 75 73.4% Report Card grades are important. 73 92.4% development. 73 92.4% ability. 73 74.7% success. 73 73.4% 72 74.7% Your child can do better on the FCAT. 72 72.2% 72 98.6% 72 98.6% grades. 72 53.2% It is up to my child how well he/she does on report care grades. 72 55.7% Your child can do better on his or her grade. 72 100.0% The educational system is responsible for y 72 79.2% Table 3 4 Selected questions that the majority of respondents disagreed with and the percentage that disagreed with it. Survey Question N Disagreed My child enjoys homework. 75 94.9% My child gets enough homework from school. 72 54.4% I end up doing more of the homework than my child. 72 70.9%
29 Table 3 4. Continued Survey Question N Disagreed 71 58.2% 71 57.0% academic development. 75 78.5% 74 77.2% 73 75.9% 72 91.1% 72 83.5% It is up to my child how well he/she does on FCAT scores. 72 91.1% 72 54.4% Several studies have found that there is a difference between males and females on Grade Point average (which are calculated by report card grades) and also standardized testing (Willingham and Cole, 1997; Johnson et al, 2005; Bartels, Rietveld, Van Baal & Boomsma, 2002; and Hicks et al., 2007). This data set was analyzed using an Independent Sample t test as reported in Table 3 5. The results show that although there was no significant difference between males and females on report card scores, there is a statistical significant difference between gender on FCAT scores (P value of .000 for math and a p=.004 for science). Table 3 5. Independent t test for gender on Math report card grades, science reported grades, Math FCAT scores, and Science FCAT scor es T test Df Sig (2 tailed) Math Grade .000 36.860 1.000 Science Grade 1.738 50.088 .088 FCAT Math 6.322 34.068 .000 FCAT Science 3.031 49.301 .004 Other studies have and the correlation with their academic performance (Mau, 1995; Seginer & Vermulst, 2002) so included in this study is whether parental expectations of future educational attainment is related to current grades and or
30 standardized exam scores. Raty & Kasanen (2007) found t hat parent perceptions in Finland on what the parent believes the competence of the child is in either subject. The parents begin to create stereotypes of what he or she can accomplish as early as the third grade and then solidify those beliefs This expectation can increase motivation or may involve additional resources w ith which to attain higher scores therefore adding to CIV. An ANOVA was conducted to test a correlation between what the parents planned for their to compare an ANOVA was analyzed in SPSS and the results are reported in Table 3 6. There were five choices from which parents could choose for their child. Two choices; they did not know and work full time were not c hosen. The other three choices: attend a four year college a community college or technical school or join the military were all selected and tested. SPSS reported that there was no significant difference between the groups in their math report card grades F (2, 72) = 0.01, p = 0.99 0. There were, however, significant was receiving in science on their report care grade F (2, 72) = 5.885, p = 0.004 There was also significant effects for the FCAT scores in both subjects; F (1,49) = 8.901, p = 0.004 in Math and F (1,49) = 5.378, p = 0.025. Table 3 6. current academic standing. Sum of Squares df Mean Square F Sig. Report Card Grade in Math Between Groups .009 2 .004 .010 .990 Within Groups 30.978 72 .430
31 Table 3 6. Continued Sum of Squares df Mean Square F Sig. Total 30.987 74 Report Card Grade in Science Between Groups 5.984 2 2.992 5.885 .004 Within Groups 36.603 72 .508 Total 42.587 74 FCAT Score in Math Between Groups 2.249 1 2.249 8.901 .004 Within Groups 12.379 49 .253 Total 14.627 50 FCAT Score in Science Between Groups 1.179 1 1.179 5.378 .025 Within Groups 10.742 49 .219 Total 11.922 50
32 CHAPTER 4 DISCUSSION Discussion and Findings The survey was designed to measure t hree areas of parental opinions scores on report card grades and FCAT scores : general opinions, consequences of scores, and the locus of control. P arents were in agreement that homework helps their child learn (93.7%), but more parents believed that homework helped on report card grades than it did FCAT scores (74.7% compared to 5 5.7%) even with the reported validity issues of report card scores This could be a possible threat to validity since homework should help content knowledge and both the FCAT and report card grades are based on content. When asked if their child receive s enough homework, i t was more evenly (54% vs. 36%), but the majority agreed that plenty of time was given to complete homework. This was answered after parents help their child with homework for a total time of an average of 1 to 3 hours of English homew ork, 1 hour of Math homework, and 1 hour of additional homework in other subjects per week The results for helping with English homework seems to run counter intuitive as respondents either helped their child with less than one hour of homework (38.7% ), or helped them with 3 to 5 hours of homework ( 36.0%) Only 5.3% helped from 1 to 3 hours. I would hypothesize that this could be based on demand of the student; e ither the child does not do well in English and needs the extra time, or the student does well in English and does not need the extra help. It is also possible that the since the question asked how much help with homework was assisted with each week, the respondent simply took one hour a day and multiplied it by five. With 92% working full t ime, it is probably not because of available time as all parents work over 40 hours.
33 When comparing FCAT scores against report card grades, question number 15 directly asked the parent what they would believe if either evaluative proponent did not match up. As reported earlier, 58.2% responded with school grades are accurate but FCAT scores are not. This seems almost contradictory with the responses to a later question that asked if FCAT scores are more important than report card grades of which 53.2% agreed or strongly agreed. A different but similar question asks if FCAT scores are important of which 73.4% agreed. It is believed that this is because parents do not agree that FCAT scores are reflective or supportive of report care grades nor are the al content evaluation. B ut parents believe that the FCAT scores are important for other things. As discussed previously, FCAT scores are used for five things. Although not specifically surveyed for in this que stionnaire, it can be ascertained that people want to hold schools accountable and w a n t to have a measure to compare students across schools and to see progress being made, but that the y do not perceive the FCAT as the most beneficial mea ns to test academi c achievement describing validity. The parents do not believe in the interpretatio ns that the FCAT scores are being utilized for, but still rely on faith in report card scores to accurately portray proper construct representation. This is further suppor ted when 78.5% of parents disagree that FCAT scores are important in determining academic development and disagree with the FCAT being a good representation of academic ability or future success. Parents are not in agreement that the FCAT is a valid indic ator of achievement. Parents continuously respond with report card grades being better indicators for these measures. It is important to note that t he CRESST Study in 1995 found that parents were more supportive of standardized testing at the beginning of the year then at the end. The surveys
34 conducted here were all given at the end of the year and not time spaced. A future research study could be done which analyzes the differences in public opinion at multiple times of the year to determine a change in support for FCAT scores. The final portion on the survey measured opinions of who has control over the grades or FCAT scores. The questions were aimed at determining who was believed to have more control; parents, teachers, or the child. Parents agreed that the teacher has the most influence on the e parent nor the child have the greatest influence. The thems elves having an important role but that the teacher is not the greatest influence. Parents agree that the educational system itself is responsible for both FCAT scores and report card grades It is possible that parents do not find that the FCAT measures construct areas accurately, or that there is a lot of CIV which destroys the validity of the score. Although standardized testing was implemented in the educational system to curb the unreliability and poor validity of school report card grades, parent s are under the belief that these standardized exams are not doing a good job. Parents still rely on report card grades to measure the construct Parents do see a need to have an instrument to measure abilities across schools and measure progress, but are not favoring the FCAT to provide this information. No matter what the measure of achievement was, parents believed that their child was not living up to his or her potential. Seventy two percent believed that their child could do better on the FCAT and all of the respondents that participated in this survey believed that their child could do better on his or her report card grade.
35 Limitations and Future Studies A key limit ation in this study was that the scores for the FCAT and report card grades were self reported by the parent. For the purpose of the study it was assumed that the accuracy of the reported score was correct. Future studies, if the time and resources are a vailable should try to get exact scores from the county and then match them to the individuals. A second limitation i s the lack of generalizability due to the area of study. The population in this sample is from a small area in the northern part of Fl orida, and more specifically, Alachua County. It can be further broken down into lower economic status schools and classrooms, since that is where the participants stem from. Again, if time and resources available, it would be more beneficial in terms of generalizing the results to a larger scale if a broader sample is used. This survey did not contain open ended items for parents to state any comments or concerns on, but that did not refrain them from filling in additional opinions as they saw necessary Those with the strongest opinions would be those that write in the margins. Parents believe that the FCAT scores are more for the school and less for their students. A survey designed with this specifically in mind could be completed that has more open ended questions aimed at this resolution. Parent opinions of the FCAT have also been affected by the opinions of teachers. This is a classic situation where two sides are competing for a solution and fighting over the method of use On the one side are politicians who want accountability and on the other are the teachers who believe that a standardized exam decreases the versatility and freedom to teach the students items that may not be able to be measured in a standardized exam. The parents who wrote additional opinions in the margins seem to side with both. They explain that teachers dislike the FCAT but parents still show an agreement to the importance of the FCAT.
36 The following statement sums up a general opinion of parents and something that nee ds to be captured better in further studies. There is so much emphasis of the school trying to get an A that the everyday teaching is much sense. You close a school a nd that will cause overcrowding in the surrounding schools. This is not fixing the problem, it is causing other problems. The FCAT would be a great tool if it was used correctly. Anonymous
37 APPENDIX PARENTAL ATTITUDES S URVEY ON FCAT SCORES AND REPORT CARD GRADES school curriculum. Therefore, it is important to gather your opinions on FCAT scores and report card grades. Please take the time to answer the following question s. When finished, please have you child turn this survey back in to his/her teacher. Thank you for your time. I NSTRUCTIONS : P LEASE CHECK THE CORR ESPONDING ANSWER I F YOU HAVE MORE THAN ONE CHILD PLEASE BASE THE ANSW ERS ON THE ONE CHILD THAT BROUGHT T HIS HOME TO YOU 1. What is your gender? ___ Male ___ Female 2. What is your age category? ___ 20 29 ___ 30 39 ___ 40 49 ___ 50 59 ___ >60 3. How many children do you have in school? ___ 1 ___ 2 ___ 3 ___ 4 ___ >5 4. What grade is your child in? ___ 6 ___ 7 ___ 8 5. What is the gender of your child? ___ Male ___ Female 6. What is your employment status ? ___ Unemployed ___ Part Time ___ Full Time
38 ___ Housemaker 7. In which of the following activities does your child participate in after school ? (Check all that apply.) ___ Tutoring (either private or at school) ___ An after school program at school ___ An after school program that does NOT take place at school ___ Individual lessons (such as ballet, karate, musical instrument, etc.) ___ Sports team(s) ___ My child does not participate in any after school activities. 8. In a typical week, in which of the following activities does your child participate in after school ? (Check all that apply.) ___ Getting help on homework from my parent(s)/guardian(s) ___ Getting help on homework at the after school program ___ Going to the library ___ Doing research on the Internet ___ Participating in community service ___ My child does not participate in any of these activities. 9. Approximately how much time do you spend helping your child with English homework each week ? ___ Less than 1 hour ___ 1 to 3 hours ___ 3 to 5 hours ___ More than 5 hours ___ Unsure 10. Approximately how much time do you spend helping your child on Math homework each week ? ___ Less than 1 hour ___ 1 to 3 hours ___ 3 to 5 hours ___ More than 5 hours
39 ___ Unsure 11. Approximately how much time do you spend on helping your child on homework for classes other than English or Math ? ___ Less than 1 hour ___ 1 to 3 hours ___ 3 to 5 hours ___ More than 5 hours ___ Unsure In math? In Science? ___ A ___ A ___ B ___ B ___ C ___ C ___ D ___ D ___ F ___ F score: In math? In Science? ___ 5 ___ 5 ___ 4 ___ 4 ___ 3 ___ 3 ___ 2 ___ 2 ___ 1 ___ 1 14. Which of the following best describes your current plans for what your child will do after they finish high school ? ___ Attend a four year college ___ Attend a community college, business school, or technical school ___ Work full time ___ Join the military ___ Do not know you believe: ___ FCAT scores are accurate and school grades are not
40 ___ School grades are accurate and FCAT scores are not ___ ___ I would not know what to think I NSTRUCTIONS : I NDICATE THE EXTENT T O WHICH YOU AGREE WI TH THE FOLLOWING STATEMENTS REGARDING YOUR OPINION OF THE FCAT AND REPORT CARD GRADES ( CIRCLE YOUR RESPONSE ) SD = Strongly Disagree D=Disagree A=Agree SA=Strongly Agree 16. My child enjoys homework. SD D A SA 17. Homework helps my child learn. SD D A SA 18. Homework helps my child get better FCAT scores. SD D A SA 19. Homework helps my child get better report card grades. SD D A SA 20. My child gets enough homework from school. SD D A SA 21. My child has enough time to do his/her homework. SD D A SA 22. I end up doing more of the homework than my child. SD D A SA fairly. SD D A SA 24. FCAT scores are more important than report card grades. SD D A SA report card grades. SD D A SA Consequences 26. FCAT scores are important. SD D A SA 27. The FCAT is important in determining your SD D A SA 28. The FCAT is a good representation of your SD D A SA future academic success. SD D A SA 30. Report Card grades are important. SD D A SA 31. Report card grades are important in SD D A SA 32. Report card grades are a good representation SD D A SA 33. Report card grades are a good indicator of SD D A SA
41 Locus of Control FCAT scores. SD D A SA FCAT scores. SD D A SA 36. It is up to my child how well he/she does on FCAT scores. SD D A SA 37. The teacher is the greatest influence on your SD D A SA 38. Your child can do better on the FCAT. SD D A SA 39. The educational system is responsible for your SD D A SA 40. You have a strong grades. SD D A SA report card grades. SD D A SA 42. It is up to my child how well he/she does on report care grades. SD D A SA 43. The teacher is the greatest influence on your SD D A SA 44. Your child can do better on his or her grade. SD D A SA 45. The educational system is responsible for your SD D A SA
42 LIST OF REFERENCES Abu Alhija, F.N. (2007). Large scale testing: Benefits and pitfalls. Studies in Educational Evaluation 33 50 68. AERA, APA, NCME (1999). Standards for educational and psychological testing Washington, DC: American Educational Research Association. All en, J.D. (2005). Grades as Valid Measures of Academic Achievement of Classroom Learning. The Clearing House, 78 (5) 218 223. Amrein, A.L. & Berliner, D.C. (2002). High stakes testing, uncertainty, and student learning. Education Policy Analysis Archives 10 (18). Retrieved February 2, 2009, from http://epaa.asu.edu/epaa/v10n18/ Bartels, M. Rietveld, M.J.H., Van Baal, G.C.M. & Boomsma, D.I. (2002). Heritability of educational achievement in 12 year old s and the overlap with cognitive ability. Twin Research 5 544 553. Borg, M.O., Plumlee, J.P. & Stranahan, H. A. (2007). Plenty of Children Left Behind: High Stakes Testing and graduation Rates in Duval County, Florida. Educational Policy 21 (5), 695 716. Educational Measurement, 30 (2), 123 142. Cannell, J.J. (1987). schools: How all 50 states ar e above the national average (2 nd ed.). Daniels, WV: Friends of Education. Cech, S.J. (2007 June 13 ) Florida Scoring Glitch Sparks Broad Debate. Education Week 26 ( 41 ). DePascale, C.A. (2003). The Ideal Role of Large Scale Testing in a Comprehensive As sessment System. National Center for the Improvement of Education Assessment Retrieved February 16, 2009, from http://data.memberclicks.com/site/atpu/vo lume%205%20issue%201%20The%20ideal%2 0role.pdf Downing, S. M. (2002) Construct irrelevant Variance and Flawed Test Questions: Do Mulyiple choice Item writing Principles Make Any Difference? Academic Medicine 77 (10) S103 S104. Eisner, E. (2001). What does it mean to say that a school is doing well? Phi Delta Kappan 82 367 372.
43 Elliot, S., & Branden, P.J. (1997). Educational assessment and accountability for all students: Facilitating the meaningful participation of students w ith disabilities in district and statewide assessment programs. Wisconsin Department of Public Instruction Madison: Wisconsin. Ferrara, S. (2006). Toward a Psychology of Large Scale Educational Achievement Testing: Some Features and Capabilities. Educat ional Mea surement: Issues and Practice, 25 (4) 2 5. Frederiksen, N. (1984) The real test bias: Influences of testing on teaching and lear ning. American Psychologist, 39 (3), 193 202. Friedman S.J. & Frisbie, D.A. (1995). The Influence of Report Card Grade s on the Validity of Grades Reported to Parents. Educational and Psychological Measurement, 55 5 26. Gallagher, J.D. (1998). Classroom Assessments for Teachers Upper Saddle River, NJ: Merrill/Prentice Hall. Haertel, E.H. (1999). Validity Arguments for High Stakes Tes ting: In Search of the Evidence. Educational Measurement: Issues and Practice. 18 (4), 5 9 Haladyna, T. M. & Downing, S.M. (2004). Construct Irrelevant Variance in High Stakes Testing. Educational Measurement: Issues and Practice, 23 (1) 17 27. Hanushek, E.A. & Raymond, M.E. (2005). Does school accountability lead to improved student performance? Journal of Policy Analysis and Management 24 (2), 297 327. Hicks, B.M., Johnson, W., Iacono, W.G., & McGue, M. (2008). Moderating Effects of Per sonality on the Genetic and Environmental Influences of School Grades Helps to Explain Sex Differences in Scholastic Achievement. European Journal of Personality 22 247 268. Jeynes, W. H. (2007). The Relationship Between Parental Involvement and Urban S econdary School Student Academic Achievement: A Meta Analysis. Urban Education 42 (1), 82 110. Johnson, W., McGue, M. & Iacono, W.G. (2005). Disruptive behavior and school achievement; Genetic and environmental relationships in 11 year olds. Journal of E ducational Psychology 97 391 405. Kane, M. (2002) Validating High Stakes Testing Programs. Educational Measurement: Issues and Practice, 21 (1), 31 41. Kohn, A. ( 2000). Burnt At The High Stakes. Jo urnal of Teacher Education, 51 ( 4 ) 315 327
44 Lewis, A. C. (2001 February 21). Washington Commentary Heads in the Sand. Phi Delta Kappan 83 (1) Nagy, P. (2000). The three roles of assessment: Gate keeping, accountability, and instructional diagnosis. Canadian Journal of Education 25 (4), 262 279. National Com mission o n Excellence in Education (1983 April ). A Nation at Risk: The imperative for educational reform. A Report to the Nation of the Secretary of Education, United States Department of Education Retrieved January 15, 2009, from www.ed.gov/pubs/NatAtRisk Parents Disagree with Lawmakers Over Reform. (2002, April). USA TODAY, p.14. Raty, H. Sex Roles 56 117 124. Serafini, F. (2001). Three paradigms of assessment: Measurement, procedure and inquiry. The Reading Teacher 54 384 393. Shepard, L.A. & Bleim, C. L. (1995). An Analysis of Parent Opinions and Changes in Opinions CSE Technical Report 397. National Center for Research on Evaluation, Standards, and Student Test ing (CRESST). Stecher, M.B. (2002). Consequences of large scale, high stake testing on sc hool and classroom practice. In L. Hamilton, M.B. Stecher, & S.P. Klien (Eds.), Making sense of test based accountability in education (pp.79 100). Santa Monica, CA, RAND. Thompson, S. (2001) The authentic standards movement and its evil twin. Phi Delta Kappan, 82 358 362. Trounsen, R. & Silverstein, S. (2006 March 9 ). October SAT scores were false for 4000 students. Los Angeles Times Retrieved January 20, 2009, from http://articles.latimes.com/2006/mar/09/local/me sat9 US Department of Education Press Release, Retrieved December 10, 2008, from http://www.ed.gov/news/pressreleases/2006/02/02062006.html Card Grades. Applied Measurement in Education 7 (3), 223 240. Willin gham, W., & Cole, N. (1997). Gender and Fair assessment Mahwah, NJ: Erlbaum. Wise, S. L., Bhola, D.S., & Yang, S. (2006). Taking the Time to Improve the Validity of Low Stakes Tests: The Effort Monitoring CBT. Educational Measurement: Issues and Practice 25 (2) 21 30.
45 BIOGRAPHICAL SKETCH Adam Phillip Denny was born in London, England. He moved to south Florida when he was seven years old and attended public schools in Miami Dade and Broward counties. He attended and received his Bachelor of Science in Social Science Education from the University of South Florida His education was put on hiatus due to yearlong deployment s to Iraq in February 2003 and May 2009 where he served in multiple capacities in multiple locations. In between deployments he applied and got accepted to the University of Florida. His younger sister, Laura, has since finished her Bachelor of Science in Business Management and works in the Gainesville area with her husband, Jon. Adam is married to Holly and they have one son, Vander, who is three years old. They are expecting their second child in January 2011. They live in Gainesville Florida where they continue their military work and school studies concurrently.