<%BANNER%>

Responsiveness Comparison of Two Upper-Extremity Outcome Measures

Permanent Link: http://ufdc.ufl.edu/UFE0021936/00001

Material Information

Title: Responsiveness Comparison of Two Upper-Extremity Outcome Measures
Physical Description: 1 online resource (153 p.)
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: measurement, rehabilitation, upper
Rehabilitation Science -- Dissertations, Academic -- UF
Genre: Rehabilitation Science thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The primary focus in the rehabilitation clinics where patients with upper-extremity impairments are treated is enabling improvement. Not just progression at the impairment level, but advances in functional ability (i.e., changes that affect patients? daily lives). In order to ensure the achievement of this goal, assessments that are reliable, valid, and able to detect change are essential. While reliability and validity of many hand/upper-extremity assessments have been well established, this is not the case for ability to detect change, a property of instruments called responsiveness. The responsiveness of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was compared to the Upper-Extremity Functional Index (UEFI). Areas under the ROC curves were almost identical for the two assessments (.67 and .65), as were correlations between global ratings and the change scores (r = 0.332 and 0.350). Given that the responsiveness calculations for both assessments are very similar, there appears to be no real advantage of one instrument over the other in detecting change. One concern highlighted in the comparison of the responsiveness of the DASH to the UEFI centered on use of a patient reported global rating of change as a 'gold standard.' The correlation between the global rating and the person measure change scores was in the range of what is considered low. Thus, there is a problem with not having a ?gold standard? with which to compare the change scores. In essence in this study then, both the assessment and the 'gold standard' were two patient reported outcomes. This study tends to imply that patients are not accurate in reporting the amount of change that has occurred in their functional ability. The item-level psychometrics and factor structure of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was investigated. Results of exploratory factor analyses (EFAs) and confirmatory factor analyses (CFAs) were inconclusive as to whether or not a one factor solution was plausible. A number of items misfit (ten at admission and eleven at discharge). DASH item point measure correlations were all above .30. However, the observation that many (19 at admission and 20 at discharge) were above .70 indicates that some of the items may be redundant (i.e., add no further information to the test than other items already obtain). Item difficulty hierarchies support the theory of motor control. Several items displayed significant differential item functioning (DIF) from admission to discharge using p value guidelines. However, using the 0.5 logit difference guideline, no items displayed significant DIF. Furthermore, mean person ability measures were not affected by the inclusion of DIF items. The intraclass correlation coefficient (ICC) was high indicating reliability in measurement between the two time-points. Since none of the item discrimination values were zero or negative, this indicates that all items contribute in a similar fashion to the overall test. Item discrimination did not appear to dramatically affect person measures, although, there is more difference between the person ability measure calculated using the different models at discharge. Similar curves for test information for admission and discharge data provide further evidence for the stability of the test at two time points. Finally, the potential for creating clinically useful data collection forms through the use of Rasch methodologies was demonstrated. The general keyform output from Winsteps Rasch analysis program(1) was used as the basis for creating the data collection form. Creation of such a form proved to be an easy process and potentially offers several advantages over the forms used in clinics today. Instructions were placed at the top of the form and items (activities for patients to rate) were listed to the left of the possible answer choices (both numbers to circle and descriptions of what the numbers mean; for example, no difficulty placed above 5). A person measure scale ranging from zero to 100 was placed at the bottom. As depicted by forms filled out according to patients of differing admission ability levels (high, medium, and low), at a glance it is possible to get an idea of where an individual?s overall ability to perform functional tasks requiring the upper-extremity would fall. Furthermore, an advantage of this type of data collection form is that it can be used in goal setting and treatment planning. By observing the location where this individual starts to have mild to moderate difficulty with a substantial number of items, the clinician can get a real sense of what activities should be set as first goals for the patient.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Velozo, Craig A.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021936:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021936/00001

Material Information

Title: Responsiveness Comparison of Two Upper-Extremity Outcome Measures
Physical Description: 1 online resource (153 p.)
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: measurement, rehabilitation, upper
Rehabilitation Science -- Dissertations, Academic -- UF
Genre: Rehabilitation Science thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The primary focus in the rehabilitation clinics where patients with upper-extremity impairments are treated is enabling improvement. Not just progression at the impairment level, but advances in functional ability (i.e., changes that affect patients? daily lives). In order to ensure the achievement of this goal, assessments that are reliable, valid, and able to detect change are essential. While reliability and validity of many hand/upper-extremity assessments have been well established, this is not the case for ability to detect change, a property of instruments called responsiveness. The responsiveness of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was compared to the Upper-Extremity Functional Index (UEFI). Areas under the ROC curves were almost identical for the two assessments (.67 and .65), as were correlations between global ratings and the change scores (r = 0.332 and 0.350). Given that the responsiveness calculations for both assessments are very similar, there appears to be no real advantage of one instrument over the other in detecting change. One concern highlighted in the comparison of the responsiveness of the DASH to the UEFI centered on use of a patient reported global rating of change as a 'gold standard.' The correlation between the global rating and the person measure change scores was in the range of what is considered low. Thus, there is a problem with not having a ?gold standard? with which to compare the change scores. In essence in this study then, both the assessment and the 'gold standard' were two patient reported outcomes. This study tends to imply that patients are not accurate in reporting the amount of change that has occurred in their functional ability. The item-level psychometrics and factor structure of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was investigated. Results of exploratory factor analyses (EFAs) and confirmatory factor analyses (CFAs) were inconclusive as to whether or not a one factor solution was plausible. A number of items misfit (ten at admission and eleven at discharge). DASH item point measure correlations were all above .30. However, the observation that many (19 at admission and 20 at discharge) were above .70 indicates that some of the items may be redundant (i.e., add no further information to the test than other items already obtain). Item difficulty hierarchies support the theory of motor control. Several items displayed significant differential item functioning (DIF) from admission to discharge using p value guidelines. However, using the 0.5 logit difference guideline, no items displayed significant DIF. Furthermore, mean person ability measures were not affected by the inclusion of DIF items. The intraclass correlation coefficient (ICC) was high indicating reliability in measurement between the two time-points. Since none of the item discrimination values were zero or negative, this indicates that all items contribute in a similar fashion to the overall test. Item discrimination did not appear to dramatically affect person measures, although, there is more difference between the person ability measure calculated using the different models at discharge. Similar curves for test information for admission and discharge data provide further evidence for the stability of the test at two time points. Finally, the potential for creating clinically useful data collection forms through the use of Rasch methodologies was demonstrated. The general keyform output from Winsteps Rasch analysis program(1) was used as the basis for creating the data collection form. Creation of such a form proved to be an easy process and potentially offers several advantages over the forms used in clinics today. Instructions were placed at the top of the form and items (activities for patients to rate) were listed to the left of the possible answer choices (both numbers to circle and descriptions of what the numbers mean; for example, no difficulty placed above 5). A person measure scale ranging from zero to 100 was placed at the bottom. As depicted by forms filled out according to patients of differing admission ability levels (high, medium, and low), at a glance it is possible to get an idea of where an individual?s overall ability to perform functional tasks requiring the upper-extremity would fall. Furthermore, an advantage of this type of data collection form is that it can be used in goal setting and treatment planning. By observing the location where this individual starts to have mild to moderate difficulty with a substantial number of items, the clinician can get a real sense of what activities should be set as first goals for the patient.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Velozo, Craig A.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2010-05-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021936:00001


This item has the following downloads:


Full Text

PAGE 1

1 RESPONSIVENESS COMPARISON OF TWO UPPER-EXTREMITY OUTCOME MEASURES By LEIGH A. LEHMAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008

PAGE 2

2 2008 Leigh A. Lehman

PAGE 3

3 To Daddy, Moma, and Lynn I can never thank you enough.

PAGE 4

4 ACKNOWLEDGMENTS First, I would like to thank G od, who was with m e and went beside me each step of the way along this journey, who in his infinite wisdom brought me to this place, this place in my journey along the path of life. Thanks to Him, who sees the whole picture and knows where each piece fits in, even at times, when it se ems ambiguous to us. When the struggles and frustrations were immense and no one was physica lly present to encourag e me, when I felt at times lonely, I knew I was not alone. I knew that God never left my side. All the countless times that I wanted to quit and give up, He renewed my strength. Over and over again, He renewed His promise, But they that wait upon th e Lord shall renew thei r strength; they shall mount up with wings as eagles; they shall run, an d not be weary; and they shall walk, and not faint.01 In addition to His deeply felt presence in my life at every moment, every second, every breath, He graced me with people in my life who ne ver ceased to provide stability and a constant wellspring of support. I would like to thank my family, the three dearest people in the world to me (Jerry, Faye, and Lynn Lehman). I can never thank them enough! Enormous thanks go to Daddy, who twice, at two of the roughest points al ong this journey came to stay with me. He took a sabbatical and came to encourage me for a semester when I was doing my internships. Then again in my final month of working on my dissertation, he came to stay with me that I might reach the finish line. I will never forg et the morning, after I came home ready to quit, when he ran in to my room very early, awaken ed me, and said, Get up, Leigh! GO GATORS! You are going back to Gainesville and I am going with you Tremendous thanks go to Moma, who without any regret quit her job as an elementa ry school librarian to come and stay with me, 1 Isaiah 40:31

PAGE 5

5 at another low point along the way. I will never fo rget the day that I was so depressed, I thought I would never make it through the long hours ahead a nd she was there by my side and helped me through it. Undoubtedly, her commitment to my educational pursuits has sustained me during the tough times. A greater love than my parents have shown to me, I can never imagine. I would have given up on the quest for this degr ee a million times over wi thout their continual love and support. My deepest thanks, to Lynn, my beloved sister, who paid my rent several semesters and never ceased to give to me, literally thousands of dollars of which I may never be able to repay, and for all the shoes, all the clothes. I am truly blessed to have been given such a sister. I am so unbelievably lucky to have been blessed with such an extraordinary family. I thank God for placing these three mo st special people in my life. I would like to thank my mentor, Craig Veloz o. For all the many hours he spent with me, teaching me to write and always emphasizing the correct approach to conducting research. He gave an incredible amount of time to mentori ng and instructing me. I have never met anyone who works so hard and with such dedication. Without his guidance and support, I could never have made it to this point. I would like to thank my committee members, Orit Shechtman, Lorie Richards, and James Algina for their support th roughout my dissertation work. I would like to thank Dennis Hart and Focus on Therapeutic Outc omes (FOTO), Inc. for providing me with the data sets for my dissertation project. Finally, I would like to thank all of the pe ople in the Rehabilita tion Science Doctorate (RSD) program that truly became my RSD family. They opened my eyes to discover that I could do and achieve more than I ever dreamed possible. To all of them, I am forever indebted for this. Thanks to Dennis, Mike, Arlene, Sande, Jamie, Rick, Bhagwant, Inga, Patricia, Pey-Shan, JiaHwa (big thanks for Hawaii and all the trips!), Jessica, Megan, Michelle, Milap, Eric, Margaret,

PAGE 6

6 Todd, Dr. Mann and so many more, too many to name, in the department of Occupational Therapy at UF, who showed me such kindness and expanded my life vision. Each one changed and touched my life in ways they do not even realize. May each of their journeys in life take them to the best of places! The Florida dream, the Florida experience, now culminates in the completion of this Ph.D. degree. As the sun sets on this time in my life, I pray that I might use th is degree and all that I have learned from my experiences along the way, to fulfill Gods will for my life, that I might help someone, as those that have helped me along this journey. To Him that is able to keep you from falling, and to present you faultless before the presence of his glory with exceeding joy12, I offer abundant thanks for the Florida experience: for giving me this time in my life, for helping me to reach this end in the completion of my Ph.D., for placing these people in my life who are so deserving of thanks. As a new day dawns and I find myself now, humbly as Dr. Lehman, I pray for the opportunity to more fully serve Hi m with all the knowledge I have gained on this journey. 2 Jude 24

PAGE 7

7 TABLE OF CONTENTS page 0ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ........10 LIST OF FIGURES.......................................................................................................................11 3LIST OF ABBREVIATIONS........................................................................................................ 12 ABSTRACT...................................................................................................................................14 CHAP TER 411 ABILITY TO DETECT CHANGE IN PATIENT FUNCTION: RESPONSIVENESS DESIGNS AND MET HODS OF CALCULATION.............................................................. 17 1Introduction................................................................................................................... ..........17 1Types of Responsiveness Designs..........................................................................................21 3Single-Group Designs.....................................................................................................21 3Multiple-Group Designs.................................................................................................. 22 1Discussion/Conclusions..........................................................................................................24 522 RESPONSIVENESS OF DISABILITIES OF THE ARM, SHOULDER, AND HAND (DASH) VERSUS THE UPPER-EXTREM ITY FUNCTIONAL INDEX (UEFI) ............... 33 1Introduction................................................................................................................... ..........33 1Methods..................................................................................................................................37 3Sample.............................................................................................................................37 3Materials..........................................................................................................................38 3Procedures..................................................................................................................... ..39 3Statistical Analysis.......................................................................................................... 39 5Rasch analysis to obtain comparable measures........................................................ 39 5Analysis of variance (ANOVA)...............................................................................41 5Sensitivity and specificity........................................................................................ 41 5Correlation................................................................................................................42 1Results.....................................................................................................................................43 2Discussion...............................................................................................................................44 2Conclusions.............................................................................................................................47 63 ITEM CHARACTERISTICS OF THE DISA BILITIES OF THE ARM, SHOULDER, AND HAND (DASH) OUTCOME QUESTIONNAIRE....................................................... 53 2Introduction................................................................................................................... ..........53 2Methods..................................................................................................................................55 3Sample.............................................................................................................................55

PAGE 8

8 3Assessment......................................................................................................................56 3Procedures..................................................................................................................... ..57 5Data analysis............................................................................................................57 5Unidimensionality.................................................................................................... 57 5Item difficulty........................................................................................................... 59 6Person-item match.................................................................................................... 59 6Differential item functioning.................................................................................... 60 6Item discrimination..................................................................................................61 6Test information....................................................................................................... 62 2Results.....................................................................................................................................62 4Unidimensionality........................................................................................................... 62 6Admission data EFA................................................................................................ 62 6Admission data CFA................................................................................................ 63 6Discharge data EFA.................................................................................................63 6Discharge data CFA.................................................................................................64 6Rasch-derived fit statistics and point m easure correlations..................................... 64 4Item Difficulty................................................................................................................ .65 4Person-Item Match.......................................................................................................... 65 Differential Item Functioning.......................................................................................... 66 4Item Discrimination......................................................................................................... 68 4Test Information.............................................................................................................. 68 2Discussion...............................................................................................................................69 2Conclusion..............................................................................................................................73 74 CREATING A CLINICALLY USEFUL DATA COLLECTION FORM FOR THE DISABILIT IES OF THE ARM, SHOULDER, AND HAND OUTCOME QUESTIONNAIRE USING THE RASCH MEASUREMENT MODEL............................. 92 2Introduction................................................................................................................... ..........92 2Methods..................................................................................................................................95 4Sample Characteristics....................................................................................................95 4Assessment......................................................................................................................96 4Disabilities of the Arm, Shoulder, and Hand (DASH) Outcome Questionnaire ............. 96 4Procedures..................................................................................................................... ..97 4Analyses..........................................................................................................................98 6Item infit analysis and previous factor an alys is with larger sample to test Rasch measurement model fit......................................................................................... 98 7Test of logic of diffi culty hierarchy order ................................................................ 99 7Obtaining a general keyform in wi nsteps for dash adm ission data........................ 100 7Creation of data collection form.............................................................................100 2Results...................................................................................................................................102 5Item Infit Analysis and Previous Factor An alysis with Larger Sam ple to Test Rasch Measurement Model Fit............................................................................................. 102 5Test of Logic of Difficulty Hierarchy Order.................................................................102 5Creation of Data Collection Form................................................................................. 103 3Discussion.............................................................................................................................106

PAGE 9

9 85 CONCLUSIONS.................................................................................................................. 125 9APAAPAPPENDIX A ITEMS ON THE DISABILITI ES OF THE ARM, SHOULDER, AND HAND (DASH) OUTCOME QUESTIONNAIRE.........................................................................................137 1B ITEMS ON THE UPPER-EXTREMIT Y FUNCTIONAL INDEX (UEFI )........................ 139 1C GLOBAL RATING OF CHANGE...................................................................................... 140 1LIST OF REFERENCES.............................................................................................................141 1BIOGRAPHICAL SKETCH.......................................................................................................153

PAGE 10

10 LIST OF TABLES Table page 1-1 Single group designs and coefficients................................................................................ 30 1-2 Multiple-group designs and coefficients............................................................................ 31 2-1 Sample of 214 demographics............................................................................................. 49 2-2 Sensitivity, specificity, and overall error at various cutoff points for the DASH and UEFI ...................................................................................................................................51 3-1 Sample of 991 demographics............................................................................................. 75 3-2 Item factor loadings on first factor.....................................................................................76 3-3 Admission item factor loadings on three factors after varim ax rotation........................... 77 3-4 Discharge item factor loadings on three factors after varim ax rotation............................. 78 3-5 Infit statistics at admission and discharge.......................................................................... 79 3-6 Point measure correlations at admission and discharge..................................................... 79 3-6 Point measure correlations at admission and discharge..................................................... 80 3-7 Item difficulty estimates and differential item functioning (N = 960, **p<0.002, *p<0.05).............................................................................................................................83 3-8 Item discrimination estimates............................................................................................ 88 4-1 Sample of 37 demographics.............................................................................................116 4-2 Admission item fit and difficulty estimates..................................................................... 117 4-3 Discharge item fit and difficulty estimates...................................................................... 118

PAGE 11

11 LIST OF FIGURES Figure page 2-1 Calculating sensitivity and specificity wh ere sen sitivity = a/(a +c) and specificity = d/(b+d)...............................................................................................................................50 2-2 Receiver operating characteristic (ROC) curves for the DASH (circles ) and UEFI (triangles), showing the accuracy of different cutoff points (diagonal line represents assessment with 50/50 chance of detecting improvement ) ................................................52 3-1 DASH admission scree plot............................................................................................... 81 3-2 DASH discharge scree plot................................................................................................82 3-3 Admission (left) and disc harge (right) person-item ma ps (Arranged so zeros line up to allow comparison)..........................................................................................................84 3-4 DASH admission item calibrations vs. dash discharge item calibrations.......................... 86 3-5 With all items and with DIF items dele ted m ean person ability measures (y-axis, logits) at admission and discharge..................................................................................... 87 3-6 One parameter model versus two parameter model mean person ability measures (yaxis, logits) at adm ission and discharge.............................................................................89 3-7 Admission scale information function using one param eter model...................................90 3-8 Discharge scale information f unction using one param eter model.................................... 91 4-1 Data collection form for the DASH (B ased on analysis of adm ission data)................... 119 4-2 Data collection forms with individuals responses.......................................................... 120 4-3 Goal setting data collection form..................................................................................... 123

PAGE 12

12 LIST OF ABBREVIATIONS ARAT Action Research Arm Test ANOVA Analysis of Variance AU Area Under (Receiver Operati ng Characteristic Curve) ECOS-16 Assessment of health relate d quality of life in osteoporosis BQ Boston Questionnaire CFI Comparative Fit Index CFA Confirmatory Factor Analysis df Degrees of freedom DASH Disabilities of the Arm Shoul der and Hand outcome questionnaire ES Effect Size EFA Exploratory Factor Analysis FOTO Focus on Therapeutic Outcomes, Inc. FMA Fugl-Meyer Assessment GMAE Gross Motor Ability Estimator GRI Guyatt's Responsiveness Index ICF International Classification of Functioning, Disability, and Health ICIDH International Classification of Im pairments, Disabilities, and Handicaps ICC Intraclass Correlation Coefficient IRT Item Response Theory JT-HF Jebsen Taylor Hand Function Test JCAHO Joint Commission MAP Maximum a Posterior estimation MLE Maximum Likelihood Estimation MnSq Mean Square Standardized residuals

PAGE 13

13 MHQ Michigan Hand Outcome Questionnaire MCID Minimally Clinically Important Difference MRMT Minnesota Rate Manipulation Test MAS Motor Assessment Scale PWRE Patient-Rated Wrist Evaluation PRO Patient-Reported Outcome instrument ROC Receiver Operating Characteristic RC Reliable Change Index RMSEA Root Mean Square Error of Approximation SD Standard Deviation SE Standard Error SEM Standard Error of Measurement SPADI Shoulder Pain and Disability Index SRM Standardized Response Mean SRMR Standardized Root Mean Square Residual TLI Tucker-Lewis Index UEFI Upper Extremity Functional Index UEFS Upper Extremity Functional Scale WMFT Wolf Motor Function Test WHO World Health Organization

PAGE 14

14 Abstract of Dissertation Presen ted to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy RESPONSIVENESS COMPARISON OF TWO UPPER-EXTREMITY OUTCOME MEASURES By Leigh A. Lehman May 2008 Chair: Craig A. Velozo Major: Rehabilitation Science The primary focus in the rehabilitation clinics where patients with upper-extremity impairments are treated is enabling improvement. Not just progression at the impairment level, but advances in functional ability (i.e., changes that affect patients daily lives). In order to ensure the achievement of this goal, assessments that are reliable, valid, and able to detect change are essential. While reliability and validity of ma ny hand/upper-extremity assessments have been well established, this is not the case for ability to detect change, a property of instruments called responsiveness. The responsiveness of the Disa bilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was compared to the Upper-Extre mity Functional Index (UEFI). Areas under the ROC curves were almost identical for the two assessments (.67 and .65), as were correlations between global ratings and the change scores ( r = 0.332 and 0.350). Given that the responsiveness calculations for both assessments are very similar, there appears to be no real advantage of one instrument over the other in de tecting change. One con cern highlighted in the comparison of the responsiveness of the DASH to the UEFI centered on use of a patient reported global rating of change as a gold standard. The correlation between the global rating and the person measure change scores was in the range of what is considered low. Thus, there is a

PAGE 15

15 problem with not having a gold standard with which to compare the change scores. In essence in this study then, both the assessment and th e gold standard were two patient reported outcomes. This study tends to imply that patient s are not accurate in reporting the amount of change that has occurred in their functional ability. The item-level psychometrics and factor structur e of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questio nnaire was investigated. Resu lts of exploratory factor analyses (EFAs) and confirmatory factor analyses (CFAs) were inconclusive as to whether or not a one factor solution was plausibl e. A number of items misfit (t en at admission and eleven at discharge). DASH item point measure correlations were all above .30. However, the observation that many (19 at admission and 20 at di scharge) were above 70 indicates that some of the items may be redundant (i.e., add no furthe r information to the test than other items already obtain). Item difficulty hierarchies support the theory of motor control. Several items displayed significant differential item functioni ng (DIF) from admission to discharge using p value guidelines. However, using the 0.5 l ogit difference guideline, no items displayed significant DIF. Furthermore, m ean person ability measures were not affected by the inclusion of DIF items. The intraclass correlation coefficient (ICC) was high indicating reliability in measurement between the two time-points. Sinc e none of the item discrimination values were zero or negative, this indicates that all items c ontribute in a similar fashion to the overall test. Item discrimination did not appear to dramatically affect person m easures, although, there is more difference between the person ability measur e calculated using the different models at discharge. Similar curves for test informati on for admission and discharge data provide further evidence for the stability of the test at two time points.

PAGE 16

16 Finally, the potential for creating clinically useful data collection forms through the use of Rasch methodologies was demonstrated. The ge neral keyform output from Winsteps Rasch analysis program(1) was used as the basis for cr eating the data collection form. Creation of such a form proved to be an easy process and potenti ally offers several a dvantages over the forms used in clinics today. Instructions were placed at the top of the form and items (activities for patients to rate) were listed to the left of the possible answer choices (bot h numbers to circle and descriptions of what the numbers mean; for ex ample, no difficulty placed above 5). A person measure scale ranging from zero to 100 was placed at the bottom. As depicted by forms filled out according to patients of differing admission ability levels (high, medium, and low), at a glance it is possible to get an idea of where an individuals overall abilit y to perform functional tasks requiring the upper-extremity would fall. Furt hermore, an advantage of this type of data collection form is that it can be used in goal setting and trea tment planning. By observing the location where this individual starts to have mild to moderate difficulty with a substantial number of items, the clinician can get a real sens e of what activities should be set as first goals for the patient.

PAGE 17

17 CHAPTER 1 ABILITY TO DETECT CHANGE IN PATIEN T FUNCTION: RESP ONSIVENESS DESIGNS AND METHODS OF CALCULATION Introduction In clinics w here patients with hand impairments are treated, the primary focus is on enabling patient improvement. Not just progressi on at the impairment level, but advances in functional ability (i.e., changes that affect patients daily lives). To ensure the achievement of this goal, assessments that are reliable, valid, an d able to detect change are essential. While reliability and validity of many hand/upper-extrem ity assessments have been well established, this is not the case for ability to detect change, a property of instruments called responsiveness(2). Recently other fields have realized the importance of studying responsiveness. For instance, ma ny studies have been conducted to assess the capacity of healthrelated quality of life measures to detect change(3-7). Since hand/upper-extremity clinicians spend much time and energy trying to bring a bout change specifical ly to the hand/upperextremity ailments of their patients, using assessments that have been validated as having the ability to detect a change in the problem of interest is of foremost importance. Outcomes of hand therapy are generally meas ured by therapists using three types of measures: impairment measures, performance measures, and self-report measures. For each of these measures, various authors have reported on reliability and validity, however, responsiveness has received less attention in the literature. Measurement at the impairment level has been the primary focus of evaluating outcomes for many years in hand therapy(8). Impairment level measures most commonly used in hand therapy include manual muscle testing (MMT), grip/pinch strength, and sensibility testing. MMT standards have been well established(9-13 ). Cuthbert and Goodheart(14) reviewed over 100 studies providing evidence for th e reliability and validity of MMT. Likewise, techniques for

PAGE 18

18 grip and pinch strength measures have been we ll documented(15). Reliability and validity of grip strength measures have been studied by ma ny researchers(16-26). For various grip strength measures, studies show calculations of test-retes t reliability resulting in very high intraclass correlation coefficients (ICC), ranging from .95 to 99(16, 23, 25, 27). Likewise, investigators have reported concurrent validity (intraclass correlation coefficients) comparisons of various types of dynamometers to range from .98-.99(16, 27). Pinch strength measures have been also been shown to be reliable and valid(28-32). Res earchers have documented interrater reliabilities ranging from .82-.99 for pinch measures (intraclass correlation coefficients)(29, 31, 32). Additionally, there is evidence for concurrent validity between manua l and computerized methods of assessing pi nch strength (ANOVA, p > .54)(29). Although, detailed procedures exist for performing sensibility testing, there is less ev idence for the reliabilit y and validity of these measures(8). However, tactile discrimination a nd object identification have been shown to be related to functional ability(33-36). Compared to the wealth of studies es tablishing the reliability and validity of impairment measures, there is a de arth of evidence as to their ability to detect patient change related to ev eryday performance of tasks. Although perhaps used more to assess patients with stroke, performance measures are also sometimes used in hand/upper-extremity therapy clinic(37, 38). Various authors have shown commonly used performance measures to be both reliable and valid. Sa bari and colleagues(39) showed the interrater reliability of the Motor Assessment Scale (MAS) to range from .95 to .99 on the three subscales (Upper-Arm Function Scal e, Hand Movements Scale, and Advanced Hand Activities). Similarly, these authors showed the MAS to correlate .88 with the Fugl-Meyer Assessment (FMA). The Fugl-Meyer Assessmen t also has been established by many other authors to be reliable and valid(37, 40-45). The Wolf Motor Function Test (WMFT) and the

PAGE 19

19 Action Research Arm Test (ARAT) have shown sim ilar reliability and vali dity properties(46-48), (45, 49, 50). Another performance measure, more frequently used with problems in the hand clinic, the Jebsen Taylor Hand Function Test (JT-HF), has been shown to provide stable estimates across testing sessions( 38, 51-53). Nevertheless, like asse ssments of impairment, there are very few studies of performance measures wh ich assess responsiveness or ability to detect change in daily function(45). Recently, the Food and Drug Administration has noted the importanc e of self-reports, termed patient-reported outcome (PRO) instruments, in measuring patient function(54). This is because several aspects of a patie nts condition are only known to the patient (e.g., level of pain, satisfaction with treatment). Furthermore, the patient holds a unique perspective on the treatment process that may be useful to his/he r therapist. Increasing use of self-reports is emerging in hand clinics today due to their ability to efficiently investigate patient abilities in areas of daily activity and partic ipation in society(8). A wealth of information exists on the reliability and validity of various self-report inst ruments. For example, satisfactory reliability and validity of the Disa bilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire has been reported in numerous studies(55-61) Beaton, Katz, Fossel, Wright, Tarasuk, and Bombardier(57) found that the DASH showed good te st-rest reliability (intraclass correlation coefficient = 0.96). Additionally, these authors found the DASH to have discriminative validity, differentiating between those curr ently able to work and those unable to work (t = -7.51, p < 0.0001). Convergent validity was also demonstrated with all correlations with other upperextremity measures exceeding 0.70 (Pearson). Si milarly, several authors have published studies relating to the reliability and validity of the ABILHAND(62-65).

PAGE 20

20 While responsiveness of a few of these self-r eports has been assesse d, for many of them no studies on responsiveness have been conducted. The DASH has been perhaps the most studied in terms of responsiveness. Beaton, Katz, Fossel, Wright, Tarasuk, and Bombardier(57) demonstrated the DASHs ability to reveal change in patients who were expected to change (Standardized Response Mean = 0.74-0.80) and in those patients who eith er said that their problem was better overall or th at their ability to function had improved (Standardized Response Mean = 0.92-1.40). Similarly, Gummesson, Atrosh i, and Ekdahl(56) found that the DASH was able to detect change in function before a nd after surgical treatment for upper-extremity conditions (Mean Score Change = 15; Effect Si ze = .70; Standardized Response Mean = 1.2). Likewise, Kotsis and Chung(66) found the DASH to be moderately responsive (Standardized Response Mean = .70). Much of the evidence about responsiveness of other upper-extremity assessments has come from studies comparing these assessments to the DASH. Beaton and colleagues(57) found that the DASH was as responsive as two other joint specific measures (the Shoulder Pain and Disability Index and the Brigham carpal tu nnel questionnaire). Similarly, Greenslade, Mehta, Belward, and Warwick(67) found the DASH to be as responsive as the disease specific Boston questionnai re (also used with individual s with carpal tunnel syndrome). In contrast, MacDermid and colleagues(68, 69) found the Patient-Rat ed Wrist Evaluation (PWRE) to be more responsive than the DASH and Gay, Amadio, and Johnson(70) found the Carpal Tunnel Questionnaire to be more re sponsive than the DASH. The Michigan Hand Outcome Questionnaire (MHQ) has been shown to de tect change in patien ts after carpal tunnel release (Standardized Response Mean for pain scale = 0.9; Standardi zed Response Mean for function scale = 0.6; Standardized Response Mean for combined function/symptom scale = 0.7).

PAGE 21

21 In spite of these few studies, the study of responsiveness in upper-extremity assessments is lacking. Additionally, studies comparing the responsiveness of multiple similar assessments have been inconclusive as to which assessments are foremost in ability to detect patient change(57, 71, 72). Since the major ity of a clinicians efforts are intended to effect change in patient ability, it is clear that the capacity of instruments to measure change has been under evaluated. The purpose of this review is to pr esent the state-of-the-art responsiveness designs and methods, to highlight problems associated wi th comparing responsiveness coefficients, and to suggest a step toward increasing the existi ng knowledge of the ability of currently used hand/upper-extremity assessments to measure patient change. Types of Responsiveness Designs There are tw o categories of responsivene ss designs, as outlined by Stratford and colleagues(73). These categories are based on how many groups are involved: 1) single-group designs and 2) multiple-group designs. Singl e-group designs include the before-after (no baseline) design and the before -after design with a baseline. While multiple-group designs include three types: 1) those that compare patients who are randomly assigned to receive either a previously proven effective treatment or a pla cebo, 2) those that compare two or more groups whose health status, based on prior evidence, is expected to change by different amounts, and finally, 3) those that compare two groups e xpected to change by different amounts based on responses to an external standa rd (e.g., a global rating of change score). What follows is a description of each type of design and the responsi veness coefficients that are appropriate to use with each type of design. These designs/coeffici ents are summarized in Tables 1-1 and 1-2. Single-Group Designs In single-group designs patients wh o are expected to undergo a ch ange in health status may be measured before and after an intervention is implemented (before-after, no baseline designs).

PAGE 22

22 Alternatively, a baseline may be taken (before intervention), followed by measurements just before and after the intervention (before-after, with baseline designs)(73). See Table 1-1. With the before-after (no baseline) design the accep table choices for responsiveness coefficients include effect size (ES), standardized response mean (SRM), and paired t value(73). These coefficients represent the average change in scor e divided by a standard deviation (SD) (measure of average variability of scores). The differe nce between the formulas used for these three coefficients (ES, SRM, and paired t value) is the SD used. For ES the SD used is that of the initial scores, for SRM the SD used is that of the change scores, and for the paired t value the SD is that of the change scores divided by the square root of the number of subjects. These coefficients only detect whethe r or not a change has occurred and cannot be used to assess various amounts of change. In contrast to the above coefficients that are appropriate to use when there is no baseline measure taken, with the before -after designs with a baseline, the appropriate coefficient to calculate is Guyatts Responsivene ss Index (GRI), also called the responsiveness statistic or responsiveness index(72 ). This statistic represents th e ratio of observed change after treatment compared to the ra ndom variability observed in the baseline measure. Multiple-Group Designs Multip le-group designs provide a more refined estimate of change over time(73). These designs include groups of patient s who are expected to undergo different amounts of change. Various coefficients can be used to calculate responsiveness when using each design. Guyatts Responsiveness Index (GRI), t va lue for independent change sc ores, Analysis of Variance (ANOVA) of change scores, Normans Srepeat, Normans Sancova, and area under the Receiver Operating Characteristic (ROC) cu rve are all responsiveness coeffici ents that can be calculated when placebo and intervention groups are being compared. These coefficients with the exception of Normans Sancova, can also be used when comp aring two or more groups whose

PAGE 23

23 health status is expected to ch ange by different amounts. All of these indices, in addition to correlation, can be used when comparing two gr oups expected to cha nge by different amounts based on responses to an external standard (e.g., a global rating of change score obtained from the patient or therapist or the average of both ratings). Each of the coefficients used with multiple-g roup designs varies in formula. First, consider the coefficients that involve two or more groups, ex pected to change by different amounts, which do not require an external measure of change (s ee Table 1-2). GRI represents the ratio of observed change in a group of patients expected to undergo a change (i.e., proven intervention group) to the variabil ity in the group not expected to change (i.e., in this design the placebo group). In contrast, calculating a t valu e for independent change scores compares the change in patients who are expected to undergo an important change with the change in patients who are not expected to undergo an important change (i.e., here proven intervention to placebo group). ANOVA is similar to the calculation of a t value, only it allows for more than two groups. Both of Normans coefficients essent ially compare group variance to group and error variance. Normans Srepeat represents the variance of the group x time interaction divided by the group x time interaction plus error variances. While Normans Sancova represents the group variance divided by the group plus error variance with the depende nt variable being the followup score and the covariate being the initial score (see Table 1-2). A special type of responsiveness coefficients used with multiple-group designs are those that require an external measur e of change. With the other coefficients, there is no comparison measure, thus, when change is detected, one is still unsure whether the in strument used correctly identified change. If a strong external measure is used, this provides an advantage. However, since the external measure is considered truth in these designs, if the external measure is

PAGE 24

24 faulty, there is much error associated with th ese coefficients. Two coefficients requiring an external criterion are Receiver Operating Characteristic (ROC) curves and correlation coefficients. A brief description of these two follows. First, a ROC curve graphs sensitivity on the Y-axis against 1 specificity on the X-axis (7 4). Sensitivity being defined as the number of patients correctly identified by an assessment as undergoing an important change divided by all patients who undergo an important change as de termined by the external criterion. (Note: Correct versus incorrect identification by an asse ssment is based on agreement with the external criterion. Thus, the external crit erion here is being considered a gold standard.) In contrast, specificity is defined as the number of patients who are correctly identified by an assessment as not undergoing an important change divided by the number of patients who did not undergo an important change as determined by the external criterion. Th e greater the area under the ROC curve, the greater the assessments ability to di stinguish patients who di d and did not undergo an important change (as defined by the external criterion). Second, in calcu lating a correlation for this type of responsiveness desi gn, the external standard evaluati ng change (e.g., global rating of change scores) is correlated with change scores from the assessment. Higher correlations indicate a stronger association be tween the results of the assessment and the external standard. Assuming the external standard is correct, hi gher correlations indicate a more responsive assessment. Discussion/Conclusions There are lim itations associated with each of the designs presented above and problems associated with using various coefficients. Perh aps, the foremost concerns are associated with single-group designs. A di sadvantage associated with the before -after, no baseline design is that if there is not a change detect ed, it is unclear whether the m easure was unable to detect the change or whether the patients did not undergo the expected change. This design is considered

PAGE 25

25 the weakest of the designs. With the before-after designs with a baseline a major disadvantage is that the period during which stability is measur ed (i.e., period from baseline to measurement taken just before intervention) is often shorter than the period during which change is assessed (i.e., measure taken just before intervention to measure taken after intervention). Thus, this design may underestimate the magnitude of random variability that occurs over longer periods in patients whose health stat us is truly stable(73). The addition of a comparison group gives multiple group designs an advantage over single-group designs. However, each of the mul tiple-group designs also has limitations. One disadvantage when comparing patie nts who receive either the intervention or a placebo involves the difficulty finding an interventi on that is known to be effective (i.e., an intervention that has a high probability of resulting in changes to the treatment group). In a similar way, the major disadvantage when comparing groups expected to ch ange by different amounts is that often it is difficult to find groups who meet this criterion (i.e., often there are not two groups of patients available whose condition is expected to cha nge by different amounts). When comparing two groups which have been defined by some external standard (e.g., global ra ting of change scores) the validity of the study may be compromised fo r two reasons. First, patients complete the measure under study and the criterion rating and thus, the criterion measure is not independent of the measure under study (i.e., a patient who responds to the questions on the measure indicating they have changed is also likely to respond to the criterion measure in the same manner). Secondly, when a global rating of change measure is used as the external criterion, evidence suggests that patients have difficulty recalling their initial state and consequently, may be inaccurate when assessing the amount of change that has taken place(75) Although an outside criterion provides the advantage of being a be nchmark by which to assess whether change has

PAGE 26

26 really occurred, accuracy of results depends on the validity of the crite rion as an effective measure of change. In addition to the limitations related to sp ecific designs, a problem with measuring the property of responsiveness involves the discrepant results obtained when different coefficients are used. For example, when several similar instruments are compared to determine which assessment is the most responsive, the ranking of these instruments of ten differs depending on which design and which coefficient is used to calculate responsiveness(72, 76, 77). For instance, Wright and Young(72) compared five different assessments using five different responsiveness coefficients (GRI, SRM, relative efficiency stat istic, ES, and correlation). These investigators showed that these different responsiveness coe fficients provided different rank ordering of the responsiveness of the assessments Thus, an important question to answer before beginning a responsiveness study is: What is the most appropriate responsiven ess design and coefficient to use when calculating responsiveness? Si nce no gold standard exists for measuring responsiveness of hand/upper-extremity assessm ents and multiple possible responsiveness coefficients can be calculated (with conflicting results), there is a n eed for clarification as to what designs require the use of which coefficients. Erroneously, many researchers calculate an array of different coefficients(78). However, this appr oach is inconsistent with mathematical logic. Since the formulas for each of the coefficients di ffer slightly, the differe nt indices can lead to conflicting results. A more theoretically sound approach is to determine which responsiveness coefficient is most appropriate for your study design and sample If your design involves a single group, one of the coefficients outlined above as being appropriate for single -group studies must be used. Alternatively if your design includes more th an one group, you should choose from those

PAGE 27

27 coefficients outlined above for use with multip le-group designs. Also, with multiple-group designs you should consider the am ount of change expected of your groups. That is, what differences in change do you expect between placebo versus treatment groups, acute versus chronic groups, or between groups divided based on an outside criterion. Considering these two elements will lead away from calculations of multiple coefficients and thus, a more planned structured attack. However, it should be noted that even when the appropriate coefficients for the given number of groups and sample characteris tics are used, there ma y be discrepancies in the calculations of different coefficients due to minor variations in the formulas for each. Highly sophisticated statistical studies are needed to help determine whic h results are most credible in these cases. Because hand/upper-extremity therapis ts foremost concern is inciting change in the functional ability of their patien ts, more studies leading to higher standards of responsiveness (equal to the standards of reliability and validity) should be mandated. One final area of concern invol ves confusion over terminology th at exists relating to the measurement of patient change(79, 80). Sensit ivity as described above as it relates to ROC curve analyses refers to an assessments ability to obtain positive results when the condition is present(74). However, some authors use the term, sensitiv ity to refer to the ability to detect a change using statistical analyses whether or not that change is clinically significant(81). Furthermore, various slightly different methods fo r evaluating the significan ce of clinical change have been proposed. Two of these methods are the Minimally Clinically Important Difference (MCID) and the Reliable Change (RC) index. The Minimally Clinically Important Difference (MCID) has been defined as the smallest differe nce between the scores in a questionnaire that the patient perceives to be beneficial(7, 82). Similarly, the Reliable Change (RC) index has been described as a method to determine statisti cally reliable change th at is also clinically

PAGE 28

28 significant(83). While each of these methods has advantages and disadvantages, no conclusive evidence shows one to be superior to the othe rs (84). A brief discussion of the Minimally Clinically Important Difference (MCID) and Reliable Change index (RC) follows. As an example of the use of the Minimum C linically Important Difference (MCID), Badia, Diez-Perez, Lahoz, Lizan, Nogues, and Iborra(7 ) calculated the MCID for the Assessment of health related quality of life in osteoporosis, abbreviated ECOS16. They looked at the change in total scores (difference between scores at ba seline and those at 6 months follow-up) of those individuals that indicated on a general health status item that their condition was slightly better. They found that a change of 0.5 points (approximately a half standard deviation) in ECOS-16 score represented the smallest amount of ch ange that was meaningful to these patients. This is consistent with findings from other studies(85-87). Norman, Sloan, and Wyrwich(87) conducted a systematic review to identify studies that computed an MCID. In all but six out of thirty-eight studies, the MCID was close to one half standard devi ation. These researchers, thus, conclude that in most circumstances, the thres hold for detecting change in health status is approximately one half standard deviation. Of note, others have found that one standard error of measurement (SEM) is equivalent to an MCID of one half standard deviation(88). However, several authors(85, 89, 90) argue th at the MCID is context-specifi c, dependent on several factors including severity of patients, individual perspectives, disease processes, and method used for determination. Beaton(85) asserts that using the half standard de viation as a rule for the MCID in all cases is too simple, loses the individual in the average and necessitates the acceptance of many exceptions where the rule does not work. Jacobson, Follette, and Revenstorf(91) emphasize that clinical significant change can be conceptualized as patients entering therapy as part of a population with f unctional limitations and

PAGE 29

29 leaving as part of one without these limitations This implies that discharge scores on the variable of interest should fall within (or at least closer than admissi on scores to) what is considered to be normal(83). The Reliable Change (RC) index was developed to assess statistically significant change that is also clin ically meaningful. The RC index assesses whether a patients score at follow-up is two standard deviations better than the initial score. This degree of improvement is considered to be true change(92). The above presentation of responsiveness ope ns a new and important area for studying upper-extremity measures. We need to go beyon d typical reliability a nd validity indices in determining the most appropriate measures. The above review is intended to introduce the reader to designs and indices of responsiveness. While requiring further research, they provide us with a means to identify the measures most sensitive in assessing the effectiveness of our treatments.

PAGE 30

30Table 1-1. Single group designs and coefficients Design Appropriate coefficients Before-after (no baseline) design ES = Average change SD of initial scores SRM = Average change SD of change scores Paired t value = Average change SD of change scores/ n Before-after design with a baseline GRI = Average change between T2 and T3 SD of difference between T1 and T2 scores Note: ES = Effect Size, SRM = Standardiz ed Response Mean, GRI = Guyatts Responsi veness Index, SD = Standard Deviation, n = number of subjects, T = time period T1 T2 Initial Follow-up T1 T2 Initial Follow-up T3

PAGE 31

31Table 1-2. Multiple-group designs and coefficients Design Appropriate coefficients Treatment and placebo groups GRI = Ave T1 to T2/SD of in Stable group t value = Ave in "Important change" group Ave in "Not important change" group/ Pooled Var x (1/nic + 1/nnic) F(ANOVA) = Between SS/(g-1) / Within SS/(n g) Norman's Srepeat = Group x Time Var/Group x Time Var + Error Norman's Sancova = Group Var/Group + Error Var ROC curve = plot of sensitivity against 1-specificity Groups expected to change by different amounts GRI = Ave T1 to T2/SD of in stable group t Value = Ave in "Important change" group Ave in "Not important change" group/ Pooled Var x (1/nic + 1/nnic) F(ANOVA) = Between SS/(g-1) / Within SS/(n g) Norman's Srepeat = Group x Time Var/Group x Time Var + Error ROC curve = plot of sensitivity against 1-specificity Note: GRI = Guyatts Responsiveness Index, Ave = Average, = Change, Var = Variance, nic = number in Important Change Group, nnic = number in "Not Important Change" Group, SD = Standard Deviation, g = number of groups, n = number of subjects, SS = sum of squares, T = time period T1 Initial Follow-u p T2 Random Effective therapy Placebo T1 Initial Follow-u p T2 Not random Acute Chronic

PAGE 32

32Table 1-2. Continued Design Appropriate coefficients Groups determined by external criterion GRI = Ave T1 to T2/SD of in stable group t Value = Ave in "Important change" group Ave in "Not important change" group/ Pooled Var x (1/nic + 1/nnic) F(ANOVA) = Between SS/(g-1) / Within SS/(n g) Norman's Srepeat = Group x Time Var/Group x Time Var + Error Var ROC curve = plot of sensitivity against 1-specificity Correlation coefficient = Spearman's or Pearson's can be used. T1 Initial Follow-up T2 Criterion measure of change Important change Not important change Construct for change

PAGE 33

33 CHAPTER 2 RESPONSIVENESS OF DISABILITIES OF T HE ARM, SHOULDER, AND HAND (DASH) VERSUS THE UPPER-EXTREMITY FUNCTIONAL INDEX (UEFI) Introduction Responsiveness, the ability of assessm ents to measure change, is critical to outcome measurement in the area of upper-extremity rehab ilitation for three reasons. Outcomes measures need to be assessed for reliability, validity and re sponsiveness before they should be used in 1) research, 2) clinical applicati ons and 3) policy decisions. While substantial evidence exists pertaining to the reliability and validity of upper-extremity assessments, more work is needed in the arena of responsiveness. First, upper-extremity rehabilitation research depends on the existence of responsive assessments. Responsiv e assessments allow clinical trials to be completed with fewer study particip ants(72, 76). This is because responsive assessments lead to increased effect size making the same power achievable with fewer participants. The necessity for fewer study participants decreases the overall cost of clinical trials an d helps to expedite the availability of treatments that have evidence of th eir effectiveness. Furthermore, clinical trials often test interventions that may have si milar outcomes (e.g., comparing various splinting techniques). Responsive assessments can determ ine potentially significant yet small differences in change resulting from a treatment(72). Respons ive assessments are also essential for clinical applications. Responsive assessments must be ava ilable in order for patients and therapists to gauge improvement/decline in function resulting fr om various treatments. Finally, in order to influence healthcare policies, evidence of treatm ent effectiveness must be available. This requires assessments with the ab ility to detect clinically m eaningful change(93, 94). Thus, selecting the most responsive measure is es sential to upper-extremity rehabilitation. The Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire is one of the most widely used and researched assessm ents in upper-extremity rehabilitation(55, 57, 95-

PAGE 34

34 97). The DASH is a measure of general upper-e xtremity function, meaning it was not designed specifically for one upper-extremity condition. Th ere are over 47 studies on the psychometrics of the DASH(55, 57, 61, 70, 95, 97, 98). Studies have shown that the DASH has good discriminative validity (differentiates between dissimilar constructs), construct validity (accurately represents a construct), and test-retest reliability (sco res can be replicated). These studies have tested the DASH on patients with a wide range of upper-extremity conditions(55, 57-61). For instance, Beaton and colleagues(57) confirmed the test-retest reliability of the DASH (intraclass correlation coe fficient = 0.96). These research ers showed that the DASH had good discriminative validity, differentiating between those currently able to work and those unable to work. Convergent validity was also de monstrated with correl ations to the Shoulder Pain and Disability Index (SPADI) subscales and the Brigham (carpal tunnel) questionnaire subscales exceeding 0.70. Relevant to the present study, the DASH has demonstrated the ability to detect clinical change (responsiveness). For instance, Gu mmesson and colleagues( 56) found that the DASH was effective in detecting cha nge in function before and after surgical treatment for upperextremity conditions (effect size = .70; standardi zed response mean = 1.2). Effect sizes of 0.2 and below are considered small, 0.5 medium, and 0.8 or higher are considered large(99). The same values hold for standardized response mean. That is, a standardized response mean of 0.2 is considered small, 0.5 is moderate, and 0.8 is large (99). Beaton and Bombardier(57) showed that in a group of patients with diverse musculo-skeletal cond itions, the DASH was comparable or better at detecting clinical change than joint-specific asse ssments. Similarly, Kotsis and Chung(66) also found the DASH to be moderately responsive (standardized response mean = .70) when used to evaluate outco mes after carpal tunnel surgery.

PAGE 35

35 Studies that have compared the DASH to disease specific assessments (as opposed to general upper-extremity function assessments) ha ve resulted in conflictin g evidence. Some of these studies have found the DASH to be as resp onsive as disease specific assessments(57, 67) while others have found disease spec ific scales to be more respons ive(68-70). In support of the DASH as a responsive measure, Beaton and collea gues(57) in a sample of two hundred patients with wrist/hand or shoulder problems, found standa rdized response means to be equivalent or greater for the DASH compared to two other join t specific assessments (the Shoulder Pain and Disability Index and the Brigham carpal tunnel questionnaire). Li kewise, with individuals with carpal tunnel syndrome, Greenslad e and colleagues(67) found the DASH to be as responsive as the disease specific Boston questionnaire (BQ) reporting standardized response means of 0.66, 1.07, and 0.62 for the DASH, BQ-symptoms scale a nd BQ-function scale, respectively. In contrast, in support of diseas e specific scales being more responsive, MacDermid and colleagues(68, 69) found the Patient-Rated Wris t Evaluation (PRWE) had slightly higher responsiveness than the DASH (standardized re sponse mean=1.51 vs. 1.37) in a cohort of 60 patients (36 hand problems, 24 wr ist problems). Similarly, Gay and colleagues(70) found the Carpal Tunnel Questionnaire to be more respon sive than the DASH in a sample of 34 patients after having carpal tunnel release (Carpal Tunnel Questionnaire: effect size = 1.71, standardized response means=1.66; DASH: effect size=1.01, standardized response mean=1.13). However, there is a lack of studies compar ing the responsiveness of the DASH to other self-report assessments of general upper-extremity function (those intended for any upperextremity ailment). Despite its widespread use, the DASH may be similar or better than other self-reports at measuring change. A similar, though less researched, m easure of general upperextremity function is the Upper-Extremity Func tional Index (UEFI)(71). Like the DASH, the

PAGE 36

36 UEFI is a self-report assessment where indivi duals rate how much difficulty they have performing tasks relating to general upper-extremity function. Items on the UEFI relate to daily activities such as grooming your hair, opening doors, and cleaning. However, the UEFI is shorter than the DASH (20 items versus the 30 items on the DASH) and does not include pain items, making it more appropriate for individuals who are not experiencing the symptom of pain with their injury. Most patients complete the UE FI in three to five minutes and scoring takes only approximately 20 seconds. The UEFI has been shown to have high te st-retest reliability ( r = .95)(71). Additionally, the UEFI has been show n to be similar to th e other upper-extremity assessments (the Upper-Extremity Functional Scale, UEFS) in its ability to detect change in function (correlation between UEFI and the UEFS ch ange scores greater than .60)(71). If the UEFI is equivalent or better th an the DASH in its ability to de tect change, the quick completion and scoring times of the UEFI would make it ideal for use in the clinic. Because these two assessments are attempting to measure the same construct (i.e., upperextremity function) and are using similar items and rating scales (response ca tegories) to do so, it is likely that they are similar in their ability to measure pati ent change. The purpose of the current study is to compare the responsiveness of the DASH and UEFI. The specific aims are: 1) compare person measure change for the DASH and UEFI, 2) identify total score change on each assessment that indicates a change in functiona l ability, 3) compare ove rall error in predicting clinically important change for the two asse ssments (using area under Receiver Operating Characteristic, ROC, curves), and 4) determine the correlation between person measures on these assessments and a global patient reported measure of change. The specific hypotheses are: 1) The difference between the person measure change on the DASH and UEFI will not be significant. 2) Total score change that indicates a change in upper-e xtremity status will be similar

PAGE 37

37 for the DASH and the UEFI. 3) Overall error in predicting clinically important change (determined using area under the ROC curves) will be similar for these two assessments. 4) Correlations between person measure change on these assessments and global patient reported change will be high and si milar for both assessments. Methods Sample Data was collected at various outpatient cl inics throughout the United States by Focus on Therapeutic Outcom es (FOTO), Inc. FOTO coll ects outcomes data on various assessments of physical functioning and provides outcomes reports for rehabilitation facilit ies. Through the use of this assessment package a large amount of data is generated for research purposes(100, 101). Patients who had completed the assessments at two time points (admission and discharge) were selected from the larger data sets. This re duced the original DASH data set from 2487 study participants to 960. For the UE FI, the original data set was reduced from 3949 to 1953. These data sets were further reduced to one data set, which incl uded only participants that had completed both the DASH and the UEFI. This resulte d in the final data se t of 214. However, for analyses that required the global rating, the da ta set was further reduced to 178 because of missing global rating data. Two hundred and four teen patients completed both the DASH and UEFI. Fifty one percent of the 214 patients sample were female. The mean age was 49.0 + 14.3. Study participants most commonly experienced shoulder impairments (59%) followed by those with neck impairments (27%) and the least num ber of individuals with forearm impairments (1%). About 70% of the study participants had chronic conditions. Mean number of days seen in rehabilitation was 52.5 + 33.5 and total number of treatment sessions was 14.5 + 10.2 (See Table 2-1).

PAGE 38

38 Materials The DASH was designed to m easure impairment based on patient report, as well as, to capture limitations in activities and participati on imposed by single or multiple disorders of the upper-extremity(102). It is based on the World Health Organization (WHO) Model of Health, at the time of development called th e International Classification of Impairments, Disabilities, and Handicaps (ICIDH) but today revised to be th e International Classification of Functioning, Disability, and Health (ICF)(103). The DASH is a standardized instrument and evaluates impairments and activity limitations, as well as, pa rticipation restrictions for both leisure and work(103). It consists of two components: 30 disability/symptom questions and four optional high performance sport/music or work questions (104). The DASH includes both disability and symptom items. Examples of di sability items on the DASH include: Open a tight or new jar, Write, Make a bed, and Use a knife to cut food. While examples of symptom items include: Arm, shoulder or hand pain, Weakness in your arm, shoulder or hand, and Stiffness in your arm, shoulder or hand. Response choices for disability items range from 1 to 5 (1: no difficulty to 5: unable)(104). Response choices fo r symptom items also use a five-point scale, but varying from none to extreme. At least 27 of the 30 disabili ty/symptom questions must be completed for a score to be calculated(104). Typically, in order to produce a score out of one hundred, the response choices are summed and averaged, produci ng a score out of five(104). This value is then transformed to a score out of 100 by s ubtracting one and multiplying by 25(104). For example, if the patient responded to all the questions with a 5, th ey would have an average score of 5. You would then subtract one from 5, gi ving you 4 and multiply by 25. This individual, with the highest responses possible, would have a score of 100. Normally, the DASH produces scores between 0 and 100 with a higher score indicating more disabili ty(104). For this study, however, the rating scale was recoded such th at a higher score indi cated more ability.

PAGE 39

39 Additionally, when change scores were calculated total scores for admission and discharge were used as they were and were not converted to scores out of 100. Thus, making the range of possible scores on the DASH zero to 150. Appendi x A includes a list of items on the DASH. The Upper-Extremity Functi onal Index (UEFI) is also based on the International Classification of Functioning, Disability, and Hea lth (ICF)(71). The UE FI is a self-report assessment containing twenty items assessing ge neral upper-extremity function. Examples of items on the UEFI include: Grooming your hair, Dressing, Opening doors, Cleaning, and Throwing a ball. Participants responses are on a 5-point scale (0: extreme difficulty or unable to perform activity to 4: no difficulty). Total UEFI scores are obtained by adding the responses to all questions, thus, total scores vary from 0 to 80 with higher scores indi cating greater functional status level. Most people can co mplete the UEFI in three to five minutes and scoring takes about 20 seconds. Appendix B includes a list of items on the UEFI. Procedures When the data was obtained, in the F OTO data set response c hoices on the DASH had been recoded so that a higher number response in dicated more ability, thus, 5 = no difficulty and 1 = unable. Additionally, in the FOTO data set the response s cale on the UEFI was recoded to be more consistent with the DASH response scale. Zero was recoded to 1, 1 to 2, 2 to 3, 3 to 4, and 4 to 5. Because of the recoding of the rating scale, for this study total scores on the UEFI varied from 20 to 100. Data extraction was performed using the Statis tical Package for the Social Sciences (SPSS)(105). Statistical Analysis Rasch analysis to obtain comparable measures Since the D ASH and UEFI have different ratin g scales and ranges, Rasch analysis, a oneparameter item response theory model, was used to place measures from the two instruments on

PAGE 40

40 the same scale. A co-calibration analysis in which DASH and UEFI ad mission data were run simultaneously was conducted using Winsteps(1). Then separate analyses were run for DASH admission data, DASH discharge data, UEFI admissi on data, and UEFI discharge data anchoring the items and rating scale at the estimates obt ained in the co-calibration analysis. Person measures (in logits) obtained during these analys es were used to obtain person measure change scores (person measure at discharge minus pe rson measure at admission) for both the DASH and the UEFI. These person measure change scores were used in all of the analyses except for the sensitivity/specificity analyses. It was felt that since the sensitivity/sp ecificity analyses would help to identify a cutoff score for evaluating whether or not change had occurred, it would be more useful for clinicians to use the actual a ssessment change scores than calculated person measure change scores. Statistical analyses included calculating ANOVA, sensitivity/spec ificity, and correlations. ANOVA was calculated for the DASH and UEFI using a sing le group design (e.g., all DASH data was treated as a single gr oup). For both the sensitivity/sp ecificity and the correlational analyses, two patient groups were created based on global rating scores. For this global rating, patients rated the perceived amount they had change d since admission on a scale of -7 to +7 with a negative number indicating th at the condition had worsened si nce admission, zero indicating no change, and positive numbers indicating varying degrees of improvement (See Appendix C). Based on previous literature(73, 106), for the current study, people with scores below +3 were considered to be different from those whose sc ores were +3 and above. Therefore, the groups were divided at +3, with indivi duals reporting a global rating of ch ange of +3 and above placed in the improved group and those with less th an +3 placed in the not improved group.

PAGE 41

41 However, of note, the not improved group also contained those who rated their condition as being worse at discharge. Analysis of variance (ANOVA) A within subjects ANOVA was con ducted using SPSS(105). The goal of this analysis was to determine if there was a significant differe nce between change in person measures on the DASH and change in person measures on the UEFI In addition to allow further comparison, mean person measure change, range, and standa rd deviations were reported for the two assessments and a Pearson correlation between person measure change scores on the DASH and UEFI was conducted in SPSS(105). Sensitivity and specificity The sensitiv ity and specificity analysis was performed to identify an assessment change score that indicates a difference in functional abilit y. Sensitivity is defined as the ability of an assessment tool to detect a condition when it is present or to provide true positive results(107, 108). Specifically, in this study, sensitivity was calculated as the number of patients correctly identified (based on the groups created from their global ratings) by the DASH or UEFI as undergoing an important change divided by the numb er of patients whose gl obal rating indicated they had actually changed (positiv e three or above). In contrast, specificity is defined as the ability of an assessment tool to not detect a condition when the condition does not exist, or to provide true negative results(107, 108). In the current study, speci ficity was calculated as the number of patients who were correctly identified by the DASH or UEFI as not undergoing an important change divided by the number of pati ents who had actually not changed based on their global ratings. The best cutoff point (i.e., the number of points on the DASH or UEFI that an individuals total assess ment score must change to indicate a difference in functional ability) was

PAGE 42

42 identified on the basis of lowest overall error rate, specificity, a nd sensitivity. This overall error rate was calculated using the formula [(1 sensitivit y) + (1 specificity)]. If cutoff values had a similar error rate, the cutoff value with higher spec ificity was chosen, as it was considered better to err on the side of continuati on of services versus dischargi ng a patient too soon. Sensitivity, specificity, and overall error were calculated at thirty point intervals of change on the DASH from points of change (minimum observe d change) to 109 points of change (maximum observed change) and on the UEFI at 15 point in tervals of change from to 77. The optimal combination of sensitivity and specificity was confirmed by generating a Receiver Operating Characteristic (ROC) curve. This type of cu rve plots sensitivity on the Y-axis against 1 specificity on the X-axis. One minus specificity represents the false positive rate. Figure 2-1 depicts how specificity and sensitivity were calculated. Areas under the ROC curves (or indices of discrimination, that is, sens itivities) were calculated by counting the number of squares beneath the curve and dividing by the total number of squares (the number of squares above the curve plus the number below). The greater the area under the ROC curve (AU), the less the overall error in predicting clinically important change. The AU for the DASH was compared to the area calculation for the UEFI. Correlation Correlation is another commonl y used statistic for m easuring responsiveness. Person measure change from admission to discharge on the DASH and UEFI were correlated with patient global rating scores(73). The magnitude of the correlations obtained for the DASH and the UEFI were compared. Values for correla tions commonly considered strong are .60 and larger, moderate are between .40 and .60, and weak are below .40(109). Once again, since all of the individuals in the sample did not complete th e global rating of change, the sample sizes were 178 for the DASH and 178 for the UEFI.

PAGE 43

43 Results Contradicting the firs t hypothesis, A NOVA between person measure change scores on the DASH and the UEFI was significant (F = 12.47, p=.001). This indicates that the person measure change scores on the two assessm ents significantly differ. Ho wever, the mean person measure change was only slightly larger for the UEFI (DASH mean person measure change = .80, UEFI mean person measure change = 1.1), the range wa s wider for the UEFI but comparable (for DASH .45 to 8.85, for UEFI .08 to 9.20), and the standard deviation was slightly larger for the UEFI (DASH standard deviation of person measure change scores = 2.30, UEFI standard deviation of person measure change scores = 2.67 ). Finally, the Pearson correlation between the change scores was .90, p = .000. Thus, the second hypothesis was upheld in that person measure change, indicating an alteration in ability a nd symptoms, was similar for the DASH and UEFI. For the DASH, the cutoff point with the best co mbination of sensitivit y and specificity was twenty (sensitivity = .52, specificity = .79, overall error rate = .69). For the UEFI, the cutoff point with the best combination of sensitivity and specificity was fifteen (sensitivity = .45, specificity = .77, overall error rate = .78). Ta ble 2-2 shows sensitivity, specificity, and overall error rate for the cutoff values for the DASH and UEFI. This table illustrates the compromise between sensitivity and specificity (as sensitivity increases specificity decreases and vice versa). Correspondingly, in support of hypothesis three, overall error in predicting clinically important change (determined using area unde r the ROC curves) was similar for the two assessments. Areas under the ROC curves were similar for the DASH and UEFI. For the DASH the area was approximately .67, while for the UEFI the area was approximately .65 (Figure 2-2). The area under the ROC curve is an index of th e degree of separation between the distributions of true positives vs. false alarms(110). The al most identical areas under the curves for the DASH and UEFI indicate that the sensitivity of these two assessments is comparable.

PAGE 44

44 Finally, correlational analyses confirmed hypot hesis four. Correlations between patients global ratings and person measure change scores from intake to discharge were similar for the two assessments. For the DASH, the correlatio n between the global rating scores and person measure change scores was r = .33, while for the UEFI the correlation was r = .35. While both correlations reached standards that could only be considered small (i.e., less than r = .40)(57), they were significant at the p = .01 level. Discussion The results of the current study indicate that for outpatients w ith upper-extremity problems, the DASH questionnaire and the UEFI measure patient change in upper-extremity condition very similarly. A within subject s ANOVA revealed no significant difference between person measure change between the two assessmen ts. Additionally, areas under the ROC curves were almost identical for the two assessments (.67 and .65), as were corr elations between global ratings and the change scores (r = 0.33 and 0.35). While both assessments identify change in patient function, since the respons iveness calculations for both assessments are very similar, there appears to be no real advantage of one instrument over the other in detecting change. One reason that these assessments are similar in measuring change may be due to the similarity of the items making up these assess ments (See Appendices A and B). For example, both have items that involve li fting objects above ones head and both have the item opening a jar lid. Furthermore, the DASH and the UE FI both include comparable items involving grooming, food preparation, and dressing. For ex ample, the DASH includes the item Wash or blow dry your hair, while the UEFI includes the item Grooming your hair. Likewise, the DASH has the item Use a knife to cut food and the UEFI has the similar item Preparing food (e.g., peeling, cutting). The DASH has the item Put on a pullover sweater while the UEFI contains a broader item Dressing. Because th e items on these assessments are comparable, it

PAGE 45

45 is likely that these two assessments discriminate between people of different abilities in a similar way. No studies have investigated the responsiveness of the DASH and UEFI using methods comparable to the ones used in the current study. That is, no other studies have used item response theory methodologies to convert scores on these assessments to person measures and equated the two rating scales and score ranges before comparing them. However, in the literature that exists on the DASH, the findings have been comparable to those of the current study(56, 66, 111). For example, Beaton and co lleagues(57) found corre lations between DASH change scores and global measures of change to be r = .40 (Pearson) and R = .43 (Spearman). Also, these researchers found the area under th e curve to be approximately 70%. They calculated standardized re sponse means to be .74-.80. The ROC curves generated in this study dem onstrate the trade-off between specificity and sensitivity. A larger total change score on the DA SH or UEFI (more stringent criteria) yields a smaller sensitivity and greater specificity, and vice versa; a smaller total change score (less stringent criteria) yiel ds greater sensitivity and smaller specificity. Clinicians who use assessment change scores need to make a compromise of sorts when deciding on what type of error they are willing to make. By using a lower total score change, clinicians increase their chance of assuming a patient has changed when th ey have not. Conversely, using a greater total score change increases the chance of assuming a pa tient has not changed and continuing to treat the condition after their functiona l ability has returned to the desired state. Portney and Watkins(112) state that those who use a screeni ng tool must decide what levels of sensitivity and specificity are acceptable Sensitivity is mo re important when the risk associated with missing a diagnosis is high, as in the case of life-threat ening disease or a debilitating deformity;

PAGE 46

46 specificity is more important wh en the cost or risks associated with further intervention are substantial. In the case of assessment change scores, it could be argued that it was better to continue to treat a patient l onger than actually needed, than to discharge them too early. Of particular concern with the findings of th is study is that the correlation between the global rating and the person measure change scores was in the range of what is considered weak. Additionally, the ROC curve was really more of a line, indicating low probabilities of specificity and sensitivity, i.e., 50% chance. Thus, there is a problem with not having a gold standard to compare the change scores to. In essence this study is comparing two pati ent reported outcomes. It tends to imply that patients are not accurate in reporting the amount of change that has occurred in their functional ability. The authors felt that using total assessment change scores to calculate the best cutoff score woul d be more meaningful for therap ists (i.e., give them an idea what actual amount of change in total score on the assessment would indicate that a patient had changed). However, since the DASH and UEFI ar e not on the same scale and have a different range of possible scores, this analysis is limited in the conclusions that can be drawn about the similarity of the two assessments. There are several limitations pertaining to the st atistical methods used in the calculation of responsiveness. A major disadvantage when there is no external criterion is that if a change is not detected, it is not clear whether the measure was unable to detect a change or whether the patients did not undergo the expect ed change(73). Correspondingly, if a statistically significant change is detected, it is uncertain whether this is due to some property of the measure (i.e., the measure is so sensitive that it has picked up a ch ange that is so small that it is not functionally significant) or if a significant ch ange in function was detected. Th is problem is somewhat solved by the use of an outside criterion (such as the global rating in the current study), as is done with

PAGE 47

47 the ROC curve analysis and the correlation(73). Ho wever, if the outside criterion is not valid the results obtained will be misleading. It appears this was the case in the current study. The global rating was a very poor gold standard and thus, results of this study may be of questionable validity. An additional limitation involves the sample us ed in this study. While large, it was a sample of convenience. Moreover, the size of the sample was much reduced once individuals who did not complete the assessments at discharg e and those that did not complete the patient global rating were eliminated. However, ba sed on previous analyses conducted by the investigators, it appeared that the reduced samp le was comparable to the larger sample on all demographic characteristics. Finally, the data was collected at various clinics across the country and from patients with diverse upper-extremity dia gnoses. Therefore, situ ations in the clinics and treatments used by therapists are likely to have differed. While this increases the generalizability of the results, it is unclear what treatments were used with which patients and contributed to the most advantageous outcomes. Conclusions The results of this study seem to indicate that neither the DASH nor UEFI has a clear advantage over the other when measuring responsive ness. Thus, if time is an issue, perhaps, the shorter UEFI is a better choi ce, even though it is a lesser known instrument. However, if comparison outcomes data is needed or communi cation of outcomes with other therapists is desired, the DASH may be preferred since it is th e more widely used and studied. Additionally, when change scores on assessments are used careful consideration should be given to the values used (i.e., the degree of sensitivity vs. specificity desired). Further res earch is needed using additional samples and investigating responsivene ss methods. Moreover, there is a need for the development of a gold standard for sensitivity and specificity analyses. Research focusing on

PAGE 48

48 how to create more responsive assessments is need ed so that guidelines might be established to aid future assessment development.

PAGE 49

49 Table 2-1. Sample of 214 demographics Mean age + SD 49.0 + 14.3 (N = 202) Gender Female Male 108 (50.5%) 106 (49.5%) (N = 214) Primary body part injured Shoulder Neck Elbow Upper Arm Wrist Hand Forearm 120 (59.4%) 55 (27.2%) 9 (4.5%) 6 (3.0%) 6 (3.0%) 4 (2.0%) 2 (1.0%) (N = 202) Symptom acuity Chronic Subacute Acute 142 (70.3%) 41 (20.3%) 19 (9.4%) (N = 202) Mean number of days treated + SD 52.5 + 33.5 (N = 157) Mean number of treatment sessions + SD 14.5 + 10.2 (N = 157) Note: Many of the totals presented in table are not the same as the sample total due to missing data.

PAGE 50

50 Test (Person measure change score on DASH or UEFI) Global Rating + (GR > 3) A (True positive) (Score above cutoff) c (False negative) (Score below cutoff) (GR) (GR < 3) B (False positive) (Score above cutoff) d (True negative) (Score below cutoff) Figure 2-1. Calculating sensitivity and specificity wher e sensitivity = a/(a +c) and specificity = d/(b+d)

PAGE 51

51Table 2-2. Sensitivity, specificity, and overall error at various cutoff points for the DASH and UEFI

PAGE 52

52 Figure 2-2. Receiver operating characteristic (ROC) curves for the DASH (circles) and UE FI (triangles), showing the accuracy of different cutoff points (diagonal line represents assessment with 50/50 chance of detecting improvement ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.10.20.30.40.50.60.70.80.911-SpecificitySensitivity DASH UEFI Diagonal Line UEFI Cutoff = 15 Sensitivity = .45 Specificity = .77 Overall Error = .78 DASH Cutoff = 20 Sensitivity = .52 Specificity = .79 Overall Error = .69

PAGE 53

53 CHAPTER 3 ITEM CHARACTERISTICS OF THE DISABI LITIES OF THE ARM, SHOULDER, AND HAND (DASH) OUT COME QUESTIONNAIRE Introduction One of the most widely used and researched assessm ents in upper-extremity rehabilitation is the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire(55, 57, 9597). There are over 47 studies on the psyc hometrics of the DASH(55, 57, 61, 70, 95, 97, 98). Studies have shown that the DASH has good disc riminative validity (di fferentiates between dissimilar constructs), construct validity (accurately represents a construct), and test-retest reliability (scores can be repli cated). These studies have tested the DASH on patients with a wide range of upper-extremity conditions(55, 5761). For instance, Beaton and colleagues(57) confirmed the test-retest reliability of the DAS H (intraclass correlation coefficient = 0.96). These researchers found that the DASH showed di scriminative validity, differentiating between those currently able to work and those una ble to work. Convergent validity was also demonstrated with correlations to the Shoulder Pain and Disability Inde x (SPADI) subscales and the Brigham (carpal tunnel) ques tionnaire subscales exceeding 0.70. The DASH has demonstrated ability to detect cl inical change (responsiveness) at the level of the assessment as a whole. For instan ce, Gummesson and colleagues(56) found that the DASH was effective in detecting change in func tion before and after surgical treatment for upper-extremity conditions (effect size = .70; standa rdized response mean = 1.2). Effect sizes of 0.2 and below are considered small, 0.5 medium, and 0.8 is considered large(99). Beaton and Bombardier(57) showed that in a group of patients w ith diverse musculo-sk eletal conditions, the DASH was comparable or better at detecting clinical change than joint-specific assessments. Similarly, Kotsis and Chung(66) also found the DASH to be moderately responsive (standardized response mean = .70) when used to evaluate outcomes after carpal tunnel surgery.

PAGE 54

54 Despite the wealth of studies examining th e properties of the DASH as a whole, no published analyses have thoroughly investigated the item charact eristics of the DASH or how these item characteristics affect the ability of this assessment to measure change in patient condition. Item response theory (IRT) methodologies were used to eliminate DASH items in the creation of a Quick-DASH version of the instrume nt, however, in this study the only item-level statistic reported was misfit(97). Knowledge of item-level psychometrics are important for a number of reasons(113-115). Fi rst, using item response theory methodologies, the extent to which all items on a test repres ent the same underlying trait can be examined. Additionally, IRT analyses provide item and person ability estima tes that are independent of the sample of individuals and the set of items used(114). Item difficulty estimates allow for a comparison of the challenge of items making up an assessment relative to each other. IRT allows the investigation of the equivalence of items for different samples or fo r the same sample at different time points(113). Estimates of item discriminati on derived from item response theory analyses indicate an items ability to distinguish between individuals w ho possess different amounts of the trait of interest(114). Furthermore, informati on curves show at which person ability levels the test is most useful. Thus, the purpose of the present study was to examine item-level characteristics of the DASH including: 1) unidi mensionality of the items (i.e., do they all represent the same construct), 2) item difficulty hierarchies at admission and discharge, 3) person-item match, 4) differential item functioning (DIF) from admission to discharge, 5) item discriminations at admission and discharge, a nd 6) comparison of test information curves produced from admission and discharge data.

PAGE 55

55 Methods Sample Data was collected at various outpatient cl inics throughout the United States by Focus on Therapeutic Outcom es (FOTO), Inc. FOTO prov ides a patient evaluatio n tool and aggregate data management service. The Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire is one of the many assessments on which FOTO collects data. FOTO has accumulated the largest external database for rehabi litation outcomes. It is included on the Joint Commission's (JCAHO) list of ac ceptable performance measuremen t systems. Large amounts of data collected by FOTO is used for research purposes(100, 101). Data extraction was performed using the Sta tistical Package for the Social Sciences (SPSS)(105). Patients who had completed the as sessments at two time points (admission and discharge) were selected from the larger data se ts. This reduced the original DASH data set from 2487 study participants to 991. For comparison, demographic information on the larger and the reduced data sets is presented in Table 3-1. Nine hundred and ninety-one individuals with upper-extremity conditions from various outpatient clinics throughout the United Stat es completed the DASH. These individuals completed the assessments at two different time poi nts (i.e., admission and di scharge) as part of standard treatment procedures. Fifty-seven percen t of the sample was fema le. Study participants most commonly experienced shoulder impair ments (51%) followed by those with neck impairments (20%) and the least number of indivi duals with forearm impairments (1%). About half of the study participants had chronic conditions. Mean number of days seen in rehabilitation were 48.6 32.2 and total number of treatmen t sessions were 13.3 10.0. See Table 3-1.

PAGE 56

56 Assessment The Disabilities of the A rm, Shoulder a nd Hand (DASH) outcome questionnaire was designed to measure impairment s ubjectively, as well as, to capture limitations in activities and participation imposed by single or multiple disorders of the upper-extremity(102). It is based on the World Health Organization (WHO) Model of Health, at the time of development called the International Classification of Impairments, Disabilities, and Handi caps (ICIDH) but today revised to be the International Classification of Functioning, Disa bility, and Health (ICF)(103). The DASH is a standardized instrument and eval uates impairments and activity limitations, as well as, participation restrictions for both leisure and work(103). It consists of two components: 30 disability/symptom questions and four op tional high performance sport/music or work questions(104). The DASH includes both disability and symptom items. Examples of disability items on the DASH include: Open a tight or new ja r, Write, Make a bed, and Use a knife to cut food. Examples of symptom items include: Arm, shoulder or hand pain, Weakness in your arm, shoulder or hand, and Stiffness in your arm, shoul der or hand. Response choices for disability items range from 1 to 5 (1: no difficulty to 5: unable)(104). Respons e choices for symptom items also use a five-point scale, but varying from none to extreme. At least 27 of the 30 disability/symptom questions must be completed for a score to be calculated(104). In order to produce a score out of one hundred, the response choices are summed a nd averaged, producing a score out of five(104). This va lue is then transformed to a sc ore out of 100 by subtracting one and multiplying by 25(104). Thus, an individual with the highest responses possible, would have a score of 100. Normally, the DASH produces scores be tween 0 and 100 with a higher score indicating more disability(104). For this study, the rating scale was recoded such that a higher score indicated more ability. Appendix A includes a list of items on the DASH.

PAGE 57

57 Procedures Data analysis Unidimensionality In order to create a leg itimate measure, a ll the items must contribute to the same construct(116). To test for unidimensionality, exploratory and confirmatory factor analyses (weighted least square parameter estimates using a diagonal weight matrix with standard errors and meanand variance-adjusted chi-square test statistic that use a full weight matrix) were conducted using Mplus software(117). Polychoric co rrelations were factored in both exploratory and confirmatory analyses. For the explorator y factor analysis (EFA), eigenvalues for each factor, loadings on each factor, scree plots of eigenvalues, tota l variance associated with the factors, and correlations between the factors were investigated. Eigenvalues represent the variances extracted by the factors. Using the Ka iser criterion, only f actors with eigenvalues greater than one should be retained (i.e., those factors that extract at least as much of the variance as one of the original variables)(105). The scree plot shows the eigenvalues for each factor plotted in descending order. The scree test propose s that where the line in the plot levels off is the cutoff for the number of factors to retain( 105). The greater the amount of total variance accounted for by the factor, the more evidence th ere is for its presence(118). High correlations between factors suggest that they may be measuring the same latent trait(118). Based on results of the EFA (number of factor s present), a confirmatory factor analysis (CFA) was conducted. CFA examined model fit st atistics including chi-sq uare, Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Standa rdized Root Mean Squa re Residual (SRMR), and Root Mean Square Error of Approximation (RMSEA). Total amount of variance was also reported. Goodness of fit statistics such as th e chi-square, CFI, TLI, SRMR, and RMSEA test the model fit for a preset number of factors. With the chi-square statistic the null hypothesis

PAGE 58

58 tested is that there is no di fference between the data and th e model being tested, while the alternative hypothesis is that th e data do not support the model. Therefore, a significant chisquare indicates that the model does not fit. However, the chi-square statistic is affected by sample size. With large samples, it is difficult to show model fit with this statistic(119). For CFI, TLI, SRMR, and RMSEA there is some disagr eement as to what values indicate that the model fit the data. For the curr ent study, as suggested by Hu and Bentler(120) the values of CFI and TLI greater than 0.95 and RMSE A and SRMR below 0.06 were used. Additionally, using Rasch analysis, the ex tent to which items contribute to a unidimensional construct was evaluated employing mean square standardized residuals (MnSq) produced for each item on the instrument. MnSq represents observed variance divided by expected variance(121). Consequently, the desired value of MnSq for an item is 1.0. The acceptable criteria for unidimensionality depends on the intended purpose of the measure and the degree of rigor desired. Linacre(122) and Green( 123) suggest that criteria for fit statistics depends on sample size. For sample sizes clos e to a thousand, reasonable ranges of MnSq fit values between 0.6 and 1.1 associated with stan dardized Z values < 2.0 are recommended. A low MnSq value (i.e., <0.6) suggests that an item is failing to discrimi nate individuals with different levels of ab ility (i.e., people with different amount of diffi culty performing upperextremity tasks) or that an item is redundant (i.e ., other items on the instrument are of similar difficulty). High values (i.e., >1.1 ) indicate that scores are variant or erratic, suggesting that an item does not belong with the other items on the same continuum or that the item is being misinterpreted. Items with high MnSq values re present a threat to valid ity and thus, are given greater consideration. Fit statistics alone should not be used to assess the dimensionality of an instrument(124, 125). Thus, results from fit and factor analyses were considered together.

PAGE 59

59 As an even further test, correlations between each item and the entire instrument (point measure correlations) are often used in determining dimensionality (126). However, according to Wright (126), there are no set rules for what is an acceptable range for point measures. Negative values indicate that observed re sponses to the item are in oppos ition to the general meaning of the test but this is all that is certain. Other authors, Allen and Yen(127), recommend these correlations range from .30 to .70 with too high values seeming to suggest that the item is redundant and adds no more knowledge than is related by other items Item difficulty Based on theories of upper-extrem ity r ecovery, such motor control theory(128), hypotheses were developed about the hierarchy of difficulty of the items on the DASH. For example based on the concept from motor control theory that complex items are more difficult, doing heavy household chores (e.g., wa sh walls, wash floors) was hypot hesized to be harder than putting on a pullover sweater. Like wise, this theory contends th at more challenging activities require the use of multiple joints. Thus, puttin g on a pullover sweater was hypothesized to be more difficult than turning a key in a lock. It em response theory methodologies provide a means to investigate the validity of a proposed hierarch y by computing item calibrations. Data was fit to a one parameter Rasch model using the Wins teps software program(1) to calculate item difficulty parameters. Separate analyses were done utilizing admission and discharge data to determine the stability of item calibrations across the two time points. Person-item match To f urther investigate the relationship between admission item difficulties and discharge item difficulties, person-item maps were produced using Winsteps(1). These maps plot person ability and item difficulty on the same scale. This illustrates how well the items are suited for the particular sample under investigation.

PAGE 60

60 Differential item functioning Differential Item Functioning (DIF) analysis was used to compare admission and discharge item difficulties. If DIF is found, this implies that the item is acting differently at admission than at discharge(129). If this is the case, change scores calculated from admission to discharge may be misleading. That is, if the order of difficulty of the items changes from admission to discharge, it would be analogous to the marks on a ruler moving from one ruler to the next ruler. Thus, it would be impossible to subt ract a value associated with a mark on the first ruler from the value of another mark on the second ru ler and obtain a meaningful value. Many approaches are available to determine if items are acting differently from one time point to another(130). In this study, uniform DIF was calculated us ing Winsteps(1). Because the Rasch model was used, investigation of non-unifo rm DIF was not possible(130). The Winsteps program utilizes a relatively dir ect DIF procedure in which each items difficulty is compared between time points using iterative t-tests(131). Different alpha le vels have been suggested for this analysis(132, 133). Some au thors have suggested using a critical value of 1.96 to detect differences in item estimates ( = .05)(132). However, using this critical value does not correct for doing multiple comparisons, which increases the likelihood of falsely detecting significant differences(133). Acknowledging that small diffe rences may occur by chance, a Bonferroni adjustment was made based on the number of items on the DASH (.05/30, = .0017). As suggested by Crane and Hart(133), both small DIF ( = .05) and large DIF ( = .0017) were reported. For the purposes of this study, an alph a level of .05 was consid ered significant while an alpha level of .0017 was considered highly significant. Nevertheless, some investigators(130) contend that when a large sample is used (N > 200) a statistically signifi cant result may have no practical meaning. Thus, a 0.5 logit difference s hould be used to identify DIF items. DIF based

PAGE 61

61 on these guidelines is also reported. Because the presence of DIF does not always indicate that an individuals ability is bei ng incorrectly assessed(134), av erage person ability estimates derived from the analysis of versions of the DASH with and without DIF items were compared. DIF was also portrayed using a scatter plot of item difficulty estimates. A fit of the item difficulty calibrations within the 95% confidence interval affirms item constancy(135). Items whose calibrations fell outside the 95% confidence in terval were considered statistically distinct. Additionally, admission and discha rge item difficulties were compared using an intraclass correlation coefficient (ICC; one-way random, 95% conf idence interval, test value: 0). This type of correlation provides an indica tion of the reproducibility of item difficulty calibrations(136). Item discrimination Item discrimination gives an indication of how responses to a particular item correspond to responses to the overall assessment(137, 138). In general, high values for item discrimination are considered good, indicating that responses to that particular item reflect a similar construct as the other items on the test(138). Zero or negati ve discrimination values indicate that the item does not fit well with the rest of the items(137). Winsteps(1 ) was used to estimate item discriminations for admission and discharge data. Values obtained were examined to ensure that none were zero or negative, indicating poor items. In order to investig ate the effect of item discrimination on person ability estimates, mean person ability estimates calculated utilizing a one parameter model (using Winsteps(1) software) and a two parameter model (MLE or MAP computation, graded model using Multilog(139) so ftware) were compared. Maximum likelihood estimation (MLE) is a special case of Maximum a Posteriori (MAP) estimation(140). In MLE the most probable estimate is calculated based on th e data(140). Because Winsteps estimates are in logits and Multilog estimates are in probits estimates from the Multilog analysis were converted to logits using the conversion rati o, 1 logit = 1.7 probits(1), before comparison.

PAGE 62

62 Test information Inform ation curves for the entire test at both admission and discharge were obtained using Winsteps Rasch analysis program(1). Information using the Rasch one parameter model (Winstesps), is simply the probability of passing an item times the probability of failing an item(141). The standard error is inversely propor tional to information. Thus, less error implies more information(141). Information is somewhat more complex with the two parameter item response theory model (Multilog analysis) in that it is multiplied by item discrimination (i.e., probability of passing an item times probability of failing an item times item discrimination)(142). Since item information is an additive property, information for the entire test can be calculated(143). Test informati on function is related to the underlying trait of interest. With the Rasch model, the zero point is set at the mean item difficulty. Thus, using Winsteps the test information function is alwa ys centered around zero on the latent variable. Information affects the measurement of cha nge in the following manner(144). At person abilities where scale information is high, smaller changes appear significant (result in larger changes in test scores). Howeve r at person abilities where informa tion is low, an individual must change more on the trait measured in order to see the same change (in test score)(143). Results Unidimensionality Admission data EFA The eigenvalue for the first factor was 18.40. Retaining one f actor can explain 61.32% of the total variance. A ll items load relatively high on one factor, ranging from .46 to .87 (See Table 3-2). The first two factors ar e highly correlated with each other, r = .74. The scree plot shown in Figure 3-1 shows a steep drop off after the mark representing the eigenvalue for the first factor. There were thr ee factors with eigenvalues gr eater than one, 18.40, 1.56, and 1.54,

PAGE 63

63 respectively. Loadings of each item on these thre e factors are shown in Table 3-3. There is a general tendency for items loading on factor one to be more gross motor, complex items, requiring more movement. Items loading on factor two tend to be items requiring only hand use, while, those loading on factor three tend to be more general items and symptom items. Admission data CFA Confir matory factor analysis was run to test a one factor solution. This analysis had mixed results. The chi-square was large, 3628.84 (df = 104) and highly significant, p = .000. Consequently, this statistic did not support th e one factor solution. The CFI was .77, TLI was .97, SRMR was .07, and the RMSEA was .19. Thus, only the TLI value supports the one factor model. However, the first factor accounted for a large proportion of the variance, 62.86%. Discharge data EFA Sim ilar to with the admission data, the eigenva lue for the first factor was 20.23. Retaining one factor can explain 67.44% of the total variance. All items load relatively high on one factor, ranging from .68 to .89 (See Table 3-2). Again, the first two factors were highly correlated with each other, r = .74. The scree plot shown in Figure 3-2 illustrates a steep drop off after the mark representing the eigenvalue for the first factor. There were three factors with eigenvalues greater than one, 20.23, 1.39, and 1.16, respectively. Loadings of each item on these three factors are shown in Table 3-4. Once again there is a genera l tendency for items loading on factor one to be more gross motor, complex items, requiring more movement, items loading on factor two tend to require only hand use, and items loading on factor three to be more general items and symptom items.

PAGE 64

64 Discharge data CFA Also sim ilar to with the admission data, for a one factor solu tion, the chi-square was large, 2677.74 (df = 103) and highly significant, p = .000. The CFI was .84, TLI was .98, SRMR was .06, and the RMSEA was .16. Thus, the TLI and SRMR supported the one-factor model. Rasch-derived fit statistics and point measure correlations Table 3-5 presents Rasch derived item infit statis tics for the 30-item DASH at admission and discharge. At admission, ten items had high fit values (MnSq > 1.1 with ZSTD > 2.0): I feel less capable, less confident or less useful b ecause of my arm, shoulder or hand problem; Stiffness in your arm, shoulder or hand; Open a tight or new jar; Difficu lty sleeping because of pain in your arm, shoulder or ha nd; Interfered with your normal social activities with family, friends, neighbours or group; Tinglin g (pins and needles) in your arm, shoulder, or hand; Sexual activities; Write; Manage transportation needs; and Turn a key. At discharge, the following items continued to misfit: I feel less capable, less confident or less useful because of my arm, shoulder, or hand problem; Open a tight or new jar; Difficulty sleeping because of pain in your arm, shoulder or hand; Tingling (pins and need les) in your arm, shoulder, or hand; Sexual activities; Write; Manage transportation needs; and Turn a key. Additionally, the following items misfit at discharge: Wash your back; Recreation activities that require little effort; and Use a knife to cut food. Point measure correlations for admission data ranged from .46 to .82, while point measure correlations for discharge ranged from .57 to .83 (See Table 3-6). All correlations were above .30. However, the majority were above .70 (19 above .70 at admission, 20 at discharge). Taken together, there is mixed evidence for a on e factor solution. One might argue that the large number of misfit items (ten with admission data and eleven with discharge data) were due to the stringent criteria used in this study to judge fit (MnSq > 1.1). Additionally, some authors

PAGE 65

65 have argued that factor analysis is not the optimal way to investigate items influenced by psychological factors (i.e., self-r eports)(143). Thus, it was felt that there was enough support for a one factor solution to apply item response theory methodologies. All further analyses were continued based on the assumption that the items re flected a single construct. The inconsistent findings from the factor analysis and fit statistics will be addr essed in the discussion section. Item Difficulty Table 3-7 p resents item difficulty calibrations at admission and discharge in the order of relative challenge at admission. The most chal lenging items at admission (higher calibrations) are located at the top of each table and least challenging items at admission (lower calibrations) are located at the bottom of the table. Overa ll, item difficulties at admission and discharge were similar. DIF discussed later will compare the di scharge difficulties to the admission difficulties. The four items that were most difficult at admi ssion were: Less capable, less confident or useful, Recreational activities in which you take some fo rce or impact, Recreational activities in which you move your arm freely, and Do heavy hous ehold chores. Turning a key, Manage transportation needs, and Write were easy items at admission. Middle difficulty items included: Open a tight or new jar lid, Wash your back, and Limited in work or other regular daily activities. Person-Item Match Figure 3-3 p resents maps of both admission and discharge item difficulties plotted against person ability measures. These maps were generated using item difficulties obtained when admission and discharge items were run in separate analysis. The Xs to the left of the vertical line symbolize the person measures with those at the top represen ting individuals with the most upper-extremity ability and persons on the bottom illust rating those with the least. The items are represented to the right of the ve rtical line, with the most challe nging items located at the top and

PAGE 66

66 the least challenging at the bottom. These items are located on the map at their average measure of challenge, (i.e., the measure when individuals respond moderate difficulty). As can be seen from these maps, the sample performed better at discharge than at admission. The sample mean fell above all the items at discharge, whereas, at admission several items (I feel less capable, less confident or less useful, Recreat ional activities in which you take some force or impact, and Recreational activities in which you move your arm freely) were above the sample mean. At discharge 41 individuals had ma ximum estimated ability levels, while at admission only 2 had maximum estimated ability levels. Differential Item Functioning Table 3-7 presents the DIF analysis results. DASH items are arranged in decreasing order of admission item difficulty, from least difficult items at admission at the bottom to most difficult items at admission at th e top. The DIF analysis compar ed the admission and discharge item difficulty calibrations of each item. These tvalues are associated wi th a probability level (seen beside the t-values in the table). Five items had large DIF valu es (p<.0017): I feel less capable, less confident or less useful because of my arm, should er or hand problem; Difficulty sleeping because of the pain in your arm, s houlder or hand; Push open a heavy door; Tingling (pins and needles) in your arm, shoulder or ha nd; and Sexual activities. While four items had small DIF (p<.05) (Weakness in your arm, shoulder or hand; Limited in work or other regular daily activities as a result of your arm, shoulder, or hand problem; Put on a pullover sweater; and Make a bed). Of these nine items, four of th ese items have higher admission item difficulty than discharge (Difficulty sleeping because of the pain in your arm, shoulder or hand; Put on a pullover sweater; Make a bed; and Limited in work or other regular daily ac tivities as a result of your arm, shoulder, or hand problem). The remaining five displa y lower admission item difficulty values than discharge item difficulty va lues (I feel less capable, less confident or less

PAGE 67

67 useful because of my arm, shoulder or hand problem; Push open a heavy door; Tingling (pins and needles) in your arm, shoulder or hand; Sexual activities; and Weakness in your arm, shoulder or hand). However, us ing the 0.5 logit difference guid elines, none of the items displayed significant DIF. Figure 3-4 shows a visual illu stration of DIF. The item diffi culty calibrations at admission are plotted on the x-axis against item difficulty calibrations at discharge on the y-axis. Lines represent where 95% confidence inte rval bands fall. As can be seen from the figure, among the items showing DIF, they do not fall much outside the 95% confidence interv al. Labeled items in the figure are those that displayed large DIF. Comparison of admission item difficulties to discharge item difficulties using an intraclass correlation coefficient (ICC) indicated a high reliability between the two time-points (ICC = .99). To compare the possible impact of DIF on measurement of upper-extremity condition, Rasch derived mean person ability measurements at admission and discharge were compared using two versions of the DASH: (1) the full 30-it em version and (2) a 21-item version with the items displaying significant DIF removed. Figur e 3-5 presents the admission to discharge (xaxis) mean person ability measurements (y-axis, logits) for the two ve rsions. The solid line shows the change admission to discharge of pers on ability estimates when all the items on the DASH were used (change from .90 + 1.37 at admission to 2.38 + 1.68 at discharge). The dashed line illustrates the change admission to discharge of the person ability estimates when only the 21-items that did not show DIF were used (change from .93 + 1.52 at admission to 2.49 + 1.78 at discharge). Thus, mean person abil ity measures were almost iden tical whether or not the items showing DIF were included.

PAGE 68

68 Item Discrimination Table 3-8 presents item discrimination calibrati ons at admission and discharge. Values for item discrimination at admission ranged from .09-1.38. The values at discharge ranged from .281.36. Thus, item discrimination ranges were similar but somewhat wider for admission data. To compare the possible impact of item discri mination on measurement of upper-extremity condition, Rasch derived mean person ability m easurements at admission and discharge were compared to those derived using a two paramete r model. Figure 3-6 presents the admission to discharge (x-axis) mean person ability measuremen ts (y-axis, logits) for the two models. The solid line shows the change admission to discha rge of person ability estimates when the one parameter model was used (change from .91 + 1.39 at admission to 2.59 + 1.92 at discharge). The dashed line illustrates the change admission to discharge of the person ability estimates when the two parameter mode l was used (change from 1.1 + 1.75 at admission to 2.01 + 1.47 at discharge). Test Information Figure 3-7 presents the DASH inform ation f unction at admission using the one parameter model, while Figure 3-8 presents the DASH in formation function obtained using discharge scores using the one parameter model. As shown by the figures, test information is identical at admission and discharge, thus, indicating that pers on ability corresponds with test information in a similar way at both admission and discharge. Th is implies that item difficulties (which affect the calculation of information) do not change enough from admission to discharge to affect person ability measures. Additi onally, these curves illustrate how information peaks when the mean of the latent variable is zero. This is also where the standard error is the lowest.

PAGE 69

69 Discussion The purpose of this paper was to investigate the item -level characteristics of the DASH, including unidimensionality, item difficulty, pe rson-item match, differential item functioning across admission and discharge, item discrimina tion, and test information. Several findings pertaining to the item-level psychometrics of the DASH are presented. Results of exploratory and confirmatory factor analyses were inconclusi ve as to whether or not a one factor solution was plausible. Eigenvalues for the first factor in both the admission and discharge EFAs were much greater than for the second factor, however the second and third factor eignevalues were greater than 1.0 (criterion nece ssary for acceptance of the presen ce of a factor based on the Kaiser rule). One f actor accounted for most of the va riance (> 60%) for both admission and discharge data and all items loaded high on the fi rst factor. The first two factors were highly correlated. With the admission CFA, the only goodness of fit statistic that supported a one factor model was the TLI. With the discharge CFA, the TLI and SRMR supported the one factor model. However, goodness of fit statistics should never should be used alone to determine dimensionality(143). A number of items misfit (ten at admission and eleven at discharge). With less stringent criteria, such as that used with smaller sample sizes, it is expected that many fewer items would misfit. Preliminary analysis by the authors w ith data on the DASH with smaller samples sizes which allows a less stringent criteria be followe d has shown only three items (at admission) and five items (at discharge) to misfit. All point m easure correlations were above .30. However, the observation that many (19 at admission and 20 at di scharge) were above 70 indicates that some of the items may be redundant (i.e., add no furthe r information to the test than other items already obtain). This suggests that a short form of the DASH might be just as informative as the longer version.

PAGE 70

70 Item difficulty hierarchies support the theory of motor control. Four tenants of motor control theory are: 1) complex items are more difficult, 2) environmental factors increase the challenge of items, 3) more difficult items require the use of multiple joints, and 4) isolated, abstract items are more difficult than func tional tasks(128, 145, 146). Complex items were found to be more difficult. For example, doi ng heavy household chores (e.g., wash walls, wash floors) was harder than putting on a pullover sweater. Likewise, gardening or doing yard work was more difficult than carrying a shopping bag or briefcase. E nvironmental factors were shown to influence difficulty ranking. Recr eational activities requ iring little effort, such as card playing or knitting, were rated easier than recreational ac tivities in which you move your arm freely, such as frisbee or badminton. Furthermore, r ecreational activities in which you move your arm freely were rated easier than recreational activities in which you take some force or impact through your arm, shoulder, or hand. Items requiring the use of multiple joints were more difficult items. For instance, re creational activities, such as frisbee or badminton, were rated more difficult than using a knife to cut food. Also, preparing a meal was more difficult than turning a key in a lock. Finally, isolated tasks, presented without a concrete context or purpose were rated as harder. To illustrate, carrying a heavy object was rated as harder than carrying a shopping bag. Several items displayed significant DIF from ad mission to discharge. Five showed large DIF (I feel less capable, less confident or less useful because of my arm, shoulder or hand problem; Difficulty sleeping because of the pain in your arm, shoulder or hand; Push open a heavy door; Tingling (pins and needles) in your ar m, shoulder or hand; and Sexual activities) and four small DIF (Weakness in your arm, shoulder or hand; Limited in work or other regular daily activities as a result of your arm, shoulder, or hand problem; Put on a pullover sweater; and

PAGE 71

71 Make a bed). The four items that had a highe r admission difficulty than discharge difficulty were: Difficulty sleeping because of pain in your arm, shoulder or hand; Putting on a pullover sweater; Making a bed; and Limited in work or ot her regular daily activi ties as a result of your arm, shoulder, or hand problem. Perhaps this is because at admission these activities were problematic, but at the conclusion of therapy, they had more fully resolved than other items. That is, for example, once symptoms of pain a nd limitations to daily activities were gone by the end of treatment they were not difficult items at all, rating even easier than would have been expected based on the admission difficulty. Fi ve items had lower admission item difficulties than discharge. These items were: I feel less cap able, less confident or le ss useful because of my arm, shoulder or hand problem, Push open a he avy door, Tingling (pins and needles) in your arm, shoulder or hand, Sexual activities, and Weakne ss in your arm, shoulder or hand. This could be because if an individual is still having problems with these at discharge, they see this as more of a problem than if they are having these i ssues at admission. One might consider removing DIF items to create a sounder instrument. Howe ver, using the guidelines of 0.5 logit difference to identify DIF, none of the items displayed signi ficant DIF. Additionally, mean person ability measures were not affected by th e inclusion of DIF items. Furthe rmore, the intraclass correlation coefficient (ICC) was high indicating reliability in measurement between the two time-points. This stability of the measure from admission to discharge is critical to the measurement of patient change. Since none of the item discrimination values were zero or negative, this indicates that all items contribute in a similar fashion to the ove rall test(137, 138). Nevertheless, because these values are not equal, some might agree that anal ysis using a two parameter model is necessary. Yet, analysis using the one parameter model app ears to be the most parsimonious solution since

PAGE 72

72 item discrimination does not appear to dramati cally affect person measures, although, there is more difference between the person ability measur e calculated using the different models at discharge. Similar curves for test informati on for admission and discharge data provide further evidence for the stability of the test at two time points. Comparison to past literature is complicated by the lack of studies investigating item-level psychometrics of the DASH. The one study inves tigating item-level proper ties of the DASH(97) in order to create a shortened version, found onl y four items to misfit (weakness, tingling, sexual activities and write). However, they were using a less stringen t criteria (MnSq less than 1.3). Likewise, one study has investigated the f actor structure of the DASH. Liang and colleagues(147) conducted a principal axis factor analysis. These resear chers conclude that there is support for a one factor structure. They found that the fi rst factor had an eigenvalue of 13.51 and explained 45% of the total variance. Load ings on the first factor in this study ranged from .43 to .89. None of their findings we re indicative of a two factor solution. There are several limitations of this study. One involves inconclusive evidence as to whether the DASH should be cons idered unidimensional (i.e., measuring one construct). Considering that the fit statisti c criteria used was very stringe nt (MnSq > 1.1) and past studies have determined that goodness of fit statistics may be misleading(119), a unidimensional solution was considered to be plausible. Ye t, other guidelines, such as the Kaiser rule (eigenvalues greater than one) supported a three factor solutio n. Examination of which items loaded highest on each of the factors in the three fact or solution, separates the items into three possible constructs. One (those that load highest on the first factor) representing more gross motor, complex activities, a second (those that lo ad highest on the second factor) being solely hand activities, and third (those that load highest on the third factor) involving more general

PAGE 73

73 items or symptom items. Thus, the decision to tr eat the DASH as unidimensional is debatable. Perhaps, dividing the DASH into three separate scales with three total scores would be more meaningful. Using multiple scales, instead of one with debatable unidimensionality might provide a clearer picture of patient progress from admission to discharge. A second limitation involves the sample used. While the sample used in this study was large, it was a sample of convenience. Moreover, the size of the sample was much reduced once individuals who did not complete the assessments at discharge were eliminated. Despite that the demographics presented were comparable, there is the potential that participants who completed the discharge surveys differed on some significant variables from the original larger sample. Third, the data was collected at various clinics across the country and fro m patients with diverse upper-extremity diagnoses. Therefore, situations in the clinics and treatmen ts used by therapists differed. Some treatments could have been mo re effective. While this increases the generalizability of the results, because there is no information on the treatment used in each case, it is unclear which treatments are most optimal. As a final limitation, item discrimination did not affect person ability estimates dramatically, but especially at discharge there was a notable difference. Therefore, a thorough analysis using the two parameter model is indicated. Additional tests of model fit are also indicated. Because item psychometrics is a ne w area of investigation for the DASH, studies should be conducted with other populations and at other time-points th roughout the treatment process as well to provide conclusive eviden ce as to the properties of this assessment. Conclusion The evidence presented in this paper suggest s that the psychom etrics of the DASH could be improved upon. On the positive side, the item hi erarchy is supported by motor control theory, DIF, although present does not appear to affect the measurement of person ability, and there is

PAGE 74

74 some support for unidimensionality. However, th ere is also evidence that the DASH items may represent three different constr ucts: a more gross/complex cons truct, a hand construct, and a general/symptom construct. Also, removing DIF items might produce a sounder instrument. Due to the limited number of studies investigating the item-level psychometrics of this assessment, more studies are needed using differe nt IRT models and with different populations and settings to confirm these findings. Perhaps, since unidimensionality is an assumption of IRT models, dividing the DASH into three different constructs would be superior methodologically based on the highest loadings of the items on thr ee different, conceptually logical factors in the EFA.

PAGE 75

75Table 3-1. Sample of 991 demographics Total Sample Sample with Intake and Discharge Data Mean age + SD 95% Confidence Interval (CI) 47.51 + 13.90 (N = 2441) CI = 47.10, 48.83 48.84 + 13.88 (N = 934) CI = 48.18, 50.44 Gender Female Male 1427 (57.4%) 1057 (42.5%) (N = 2484) 543 (56.6%) 417 (43.3%) (N = 960) Primary body part injured Shoulder Neck Hand Elbow Wrist Upper Arm Forearm 1078 (43.3%) 474 (19.1%) 334 (13.4%) 237 (9.5%) 255 (10.3%) 63 (2.5%) 46 (1.8%) (N = 2487) 496 (51.7%) 192 (20.0%) 87 (9.1%) 79 (8.2%) 69 (7.2%) 24 (2.5%) 13 (1.4%) (N = 960) Symptom acuity Chronic Subacute Acute 1253 (50.4%) 814 (32.7%) 414 (16.6%) (N = 2481) 540 (56.3%) 302 (31.5%) 116 (12.1%) (N = 960) Mean number of days treated + SD 95% Confidence Interval (CI) 42.64 + 31.33 (N = 990) CI = 41.08, 45.12 48.63 + 32.22 (N = 535) CI = 46.22, 51.90 Mean number of treatment sessions + SD 95% Confidence Interval (CI) 11.43 + 9.06 (N = 990) CI = 10.87, 11.99 13.30 + 9.98 (N = 535) CI = 12.12, 13.79

PAGE 76

76Table 3-2. Item factor lo adings on first factor

PAGE 77

77Table 3-3. Admission item f actor loadings on three factors after varimax rotation

PAGE 78

78 Table 3-4. Discharge item factor loadings on three f actors after varimax rotation

PAGE 79

79Table 3-5. Infit statistics at admission and discharge

PAGE 80

80Table 3-6. Point measure correlations at admission and discharge

PAGE 81

81 Figure 3-1. DASH admi ssion scree plot Eigenvalue Factor

PAGE 82

82 Figure 3-2. DASH disc harge scree plot Eigenvalue Factor

PAGE 83

83Table 3-7. Item difficulty estimates and di fferential item functioning (N = 960, **p<0.002, *p<0.05).

PAGE 84

84 Figure 3-3. Admission (left) and discharge (right) person-item maps (Arranged so zeros line up to allow comparison) Note: On Admission Map (left): activad = Limited in your work or other regular daily activities at admission; forcead = Recreational activities in which you take some force or impact at admission; heavyad = Carry a heavy object (over 10 lbs) at admission; armfrad = Recreational activities in which you move your arm freely at admission; lesscad = I feel less capable, less confident or less useful at admi ssion; jarad = Open a tight or new jar at admission; yardad = Garden or do yard work at admission; limitad = Limited in your work or other regular daily activities at admission; painad = Arm, shoulder or hand pain at admission; weaknad = Weakness

PAGE 85

85 in your arm, shoulder or hand at admission; stiffad = Stiffness in your arm, shoulder or hand at admission; choread = Do heavy household chores (e.g., wash walls wash floors) at admission; washbad = Wash your back at admission; socacad = Interfered with your normal social activities at admission; bedad = Make a bed at admission; shelfad = Place an object on a shelf above your head at admission; shbagad = Carry a shopping bag or briefcase at admission; sleepad = Difficulty Sleeping at admission; lightad = Chan ge a lightbulb overhead at admission; doorad = Push open a heavy door at admission; blowdad = Wash or blow dry your hair at admission; cutfoad = Use a knife to cut food at admission; m ealad = Prepare a meal at admission; tinglad = Tingling (pins and needles) in your arm, shoul der or hand at admissi on; sexuaad = Sexual activities at admission; ltlefad = Recreational activities which require little effort at admission; sweatad = Put on a pullover sweater at admissi on; writead = Write at admission; transad = Manage transportation needs at admissi on; keyad = Turn a key at admission On Discharge Map (right): activdc = Limited in your work or ot her regular daily activities at discharge; forcedc = Recreational activities in which you take some force or impact at discharge; heavydc = Carry a heavy object (over 10 lbs) at discharge; armfrdc = R ecreational activities in which you move your arm freely at discharge; lesscdc = I feel le ss capable, less confident or less useful at discharge; jardc = Open a tight or new jar at discharge; yarddc = Garden or do yard work at discharge; limitdc = Limited in your work or other regular daily ac tivities at discharge; paindc = Arm, shoulder or hand pa in at discharge; weakndc = W eakness in your arm, shoulder or hand at discharge; stiffdc = Stiffness in your ar m, shoulder or hand at discharge; choredc = Do heavy household chores (e.g., wash walls, wash floors) at discharg e; washbdc = Wash your back at discharge; socacdc = Interfered with your norma l social activities at discharge; beddc = Make a bed at discharge; shelfdc = Place an object on a shelf above your head at discharge; shbagdc = Carry a shopping bag or briefcas e at discharge; slee pdc = Difficulty Sleeping at discharge; lightdc = Change a lightbulb overhead at disc harge; doordc = Push open a heavy door at discharge; blowddc = Wash or bl ow dry your hair at discharge; cutfodc = Use a knife to cut food at discharge; mealdc = Prepare a meal at discha rge; tingldc = Tingling (pins and needles) in your arm, shoulder or hand at discharge; sexuadc = Sexual activities at discharge; ltlefdc = Recreational activities which require little effo rt at discharge; sweatdc = Put on a pullover sweater at discharge; writedc = Write at discha rge; transdc = Manage transportation needs at discharge; keydc = Turn a key at discharge

PAGE 86

86 Figure 3-4. DASH admission item calibrations vs. dash discharge item calibrations Note : Labeled items in the figure showed large DIF in Winsteps analysis.

PAGE 87

87 Figure 3-5. With all items and with DIF items de leted mean person ability measures (y-axis, logits) at admission and discharge

PAGE 88

88Table 3-8. Item discrimination estimates

PAGE 89

89 Figure 3-6. One parameter model versus two parameter model mean person ability measures (yaxis, logits) at admission and discharge

PAGE 90

90 Figure 3-7. Admission scale information function using one parameter model

PAGE 91

91 Figure 3-8. Discharge scale informati on function using one parameter model

PAGE 92

92 CHAPTER 4 CREATING A CLINICALLY USEFUL DATA COLLECTION FORM FOR THE DISABILIT IES OF THE ARM, SHOULDE R, AND HAND OUTCOME QUESTIONNAIRE USING THE RASCH MEASUREMENT MODEL Introduction Num erous assessments exist which evalua te upper-extremity f unction. Therapists evaluating upper-extremity ability can chose from multiple performance measures, such as the Motor Assessment Scale (MAS)(148), Fugl-Mey er Assessment (FMA)(4 1), and Wolf Motor Function Test (WMFT)(47). Alternatively, self-re ports of daily functiona l ability of the upperextremity, such as the Disabilities of the Arm, Shoulder and Hand (DASH) outcome questionnaire(57), Upper-Extremity Functional Index (UEFI)(71), or Shoulder Pain and Disability Index (SPDI)(149), may be utilized. Yet another option is to assess capacity at the impairment level using measures such as the Minnesota Rate Manipulation Test (MRMT) or Purdue pegboard test(150). Although various assessments and even types of assessments of the upper-extremity exist and have been shown to be reliable and valid (45, 47, 150-153), often they are not used in the clinic. This may be because results from these as sessments do little to inform day-to-day clinical practice(154). When they are used, frequently th is is because they are required by accrediting agencies or are necessary for reimbursement(154). In contrast, to being seen as useful tools to guide therapy, clinicians often vi ew the completion of such assessments as taking valuable time away from treatment. One reason that assessments fail to advise c linical practice is that they do not enhance clinicians knowledge of what activities individuals can actually perform and what tasks should be set as the next best goal fo r a patient. Frequently assessment total scores are summed from patients responses to the items on the assessment. This sum is not direc tly interpretable, i.e.,

PAGE 93

93 when standing alone it only represents a mere number. The actual tasks that the individual is able or unable to complete are obscured when only the total score is known. By and large, assessments do not clearly quantify how well an i ndividual will perform on specific items. Thus, they provide a vague picture of what goals should be set for a patient. In order to be useful in informing clinical decision-making (i.e., goal se tting, treatment planning), an assessment must provide a directly interpretable desc ription of a patients abilities. To illustrate how assessment total scores are unclear about actual ability to perform tasks, imagine a patient with a score of 70 on the Disa bilities of the Arm, S houlder, and Hand (DASH) outcome questionnaire(154). This score of 70 tells the clinician little about the problems the patient is experiencing in da ily life and what goals should be set to help conquer these challenges. Can this individual carry a shoppi ng bag without difficulty? Is opening a jar lid troublesome? Or is it only when this individual engages in recr eational activities requiring force or impact to their arm that problems arise? A total score provides little information about the patients present abilities and provides virtua lly no insight into his/her potential for improvement. Moreover, when total scores are used, missing data becomes problematic. If several items do not apply and thus, are not answered by a patien t, naturally the total score will be lower. However, one cannot assume this individual is less able than someone who answered all the questions and got a higher total sc ore. In order to make results comparable even when items are not answered (i.e., do not apply to a given individual) requires time for additional mathematical calculations. The Rasch measurement model, the one parameter item response theory model, provides a means through which assessments can be modified to more effectively inform clinical practice

PAGE 94

94 (i.e., goal setting and treatment planning)(155). This method involves redesigning assessments into meaningful data collection forms. Rasch an alysis makes the creation of such data collection forms possible by placing an individuals abilit y measures and item difficulty estimates on the same continuum (i.e., calibrating the item difficulty estimates relative to the samples ability or put another way, arriving at difficulty estimat es based on person responses). On the data collection forms, items are ordere d according to their difficultie s relative to each other (most difficult items at the top, followed by less di fficult ones). Person ability measures are transformed to range from zero to 100. Patients circle rating scale choices for each item (ranging from unable to no difficulty), which are arranged according to where they fall in relation to the person ability scale (0-100). Someone looking at su ch a data collection form is able to quickly evaluate what items an individual is having chal lenges with and which they are not(156). This paper intends to generate and s how the application of Rasch ba sed data collection forms using the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire. The output from Rasch analysis that assists in the creation of these data collection forms has been termed a keyform by Linacre(156) who first proposed the idea. Thus, the term keyform will be used in this paper to refer to the raw output from the statistical program. The term data collection form will be used to refer to a more clinically useful modification of this output. To further clarify how the Rasch measurement model makes possible th e creation of such a data collection form, requires an explanation of several concepts that are foundational to this model. First, it must be conceivable that all th e items on an assessment contribute to one idea or construct (for instance, all items relate to upper-ex tremity function). If this assumption (i.e., that all items contribute to one constr uct) is upheld, Rasch analysis can be used to turn ordinal data into equal interval data, which has a unit of measurement called a logit (log-odds unit). All the

PAGE 95

95 items on the assessment can be seen as making up a ruler with each item s difficulty estimate (logit measure) falling at a specific marking. Pe rson ability estimates (als o calculated in logits based on the individuals responses to the items) can be placed on the same continuum allowing for the direct comparison of person ability to it em difficulty(155). The formation of this person ability/item difficulty continuum is a second concep t of importance in the creation of efficacious data collection forms. Person ability estimates indicate that someone possesses more or less of the construct being measured. Fo r instance, on the DASH, a patient with a lower person measure would have less upper-extremity ab ility than one with a higher person measure. This holds regardless of whether or not all the items applied to these indivi duals (i.e., all the items were completed by both individuals). The purpose of this study is three-fold: (1) To demonstrate, using data collected on the DASH, how Rasch methodologi es can be used to generate a cl inically useful data collection form, (2) To show how different ability st udy participants respons es (at admission and discharge) would look on the data collection fo rm, and finally (3) To demonstrate how these forms can be useful in setting patient goals. Prior to addressing these goals, analyses will be presented to determine if two assumptions of the Rasch model are met. First, to determine as accurately as possible with the given sample size whether the required assumption of unidimensionality is met (i.e., whether all the it ems can be conceptualized as representing one construct or trait) and second, to determin e whether items on the DASH follow a logical progression of difficulty. Methods Sample Characteristics Thirty-seven participan ts with upper-extrem ity impairments were recruited from outpatient clinics in Gainesville, Florida including the Ma lcom Randall Veterans Affairs Medical Center

PAGE 96

96 (VAMC), Orthopedic Center at the University of Florida, Student Health Care Center at the University of Florida, as well as, other outpatient clinics in the University of Florida/Shands Healthcare system. Table 4-1 presents demographic information for these participants. Admission and discharge data was obtained from all study participants. The sample included slightly more females (54.1%) than males. The mean age was 39.0 years ranging from 19 to 67 years. The mean number of me dications the participants were currently taking was 3.3 and the mean number of surgeries was 1.5. Most participants reported that they exercise one or two times a week (43.2%), followed by exercising at l east three times a week (29.7%), and seldom or never exercising (27.0%). Patient diagnoses include d a wide variety of ailments such as wrist tendonitis, lateral and medial epicondylitis, basa l joint arthritis, finge r fractures, olecranon bursitis, humeral fracture, shoulder disloca tion, Dupytren's, and carpal tunnel syndrome. Assessment Disabilities of the Arm, Shoulder, and Hand (DASH) Outcome Ques tionnaire The Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire was designed to measure impairment s ubjectively, as well as, to capture limitations in activities and participation imposed by single or multiple disorders of the upper-extremity(102). It is based on the World Health Organization (WHO) Model of Health, at the time of development called the International Classification of Impairments, Di sabilities, and Handicaps (ICIDH) (revised to be the International Classification of Functioning, Di sability and Health (ICF))(103). The DASH is a standardized instrument and evaluates impa irments and activity limitations, as well as, participation restrictions in bot h leisure and work (103). It consists of two components: 30 disability/symptom questions and four optio nal high performance sport/music or work questions(104). Examples of items on the DASH include: Open a tight or new jar, Write, Make a bed, and Use a knife to cut food. Response choices for disability items range from 1 to 5 (1: no

PAGE 97

97 difficulty to 5: unable)(104). Re sponse choices for symptom items also use a five-point scale, but vary from none to extreme. Guidelines for the original DASH, suggest that at least 27 of the 30 disability/symptom questions must be comple ted for a score to be calculated(104). General procedures for scoring the DASH involve summing the response choices and averaging them to produce a score out of five(104). This value is then transformed to a score out of 100 by subtracting one and multiplying by 25(104). For example, if the patien t responded to all the questions with a 5, they would have an average sc ore of 5. You would then subtract one from 5, giving you 4 and multiply by 25. This individual, with the highest responses possible, would have a score of 100. Normally, this is the met hod used to produce DASH total scores between 0 and 100 with a higher score indicat ing more disability( 104). For this stud y, however, the rating scale was recoded such that a higher score indicated more ability. Thus, 1 was unable (for disability items) or extreme (for symptom items) and 5 no difficulty (for disability items) or none (for symptom items). Rasch methods, outlined in the analysis section, were used to produce person measure scores ranging from 0 to 100. On ly the 30 disability/symptom questions were used for this study (i.e., items from the four op tional high performance sport/music sections were not completed by individuals in the current sample). Appendix A includes a list of all the items on the DASH. Note the DASH presented in the a ppendix is in the form most commonly used in clinics. Procedures Participan t inclusion criteria were upper-extre mity impairment and receiving treatment for the condition. Study participants were given $10 for completing the DASH at two time points (admission and discharge). This study was appr oved by the Institutional Review Board (IRB) at the University of Florida.

PAGE 98

98 Analyses Microsoft A ccess(157) and SPSS(105) were used for data management and descriptive statistics. Winsteps(1) was used for the Rasch an alysis. To test the Rasch model assumption of unidimensionality, goodness of f it statistics were obtained. Gi ven the small number of study participants, factor analysis wa s not possible with this sample, however, results of a previous factor analysis of the DASH with a much larger sample is reported to provide further information regarding the unidimensionality of the items. The difficulty of each item was calibrated and the order was evaluated based on tenant s of motor control theory to a ssess whether the hierarchy was logical (i.e., intuitively believabl e). Once model assumptions were tested, a general keyform was produced, using Winsteps(1), based on the current sample data. From this keyform, a data collection form was created. Responses at ad mission and discharge from three patients of differing ability levels (high, medium, and low) were circled on the data collection form to illustrate how the forms can be used for informing goal setting and treatment planning. Item infit analysis and previous factor an alys is with larger sample to test Rasch measurement model fit All of the items on a measure must contribute to a single construct or idea(116). Using Rasch analysis, the extent to which items repr esent a unidimensional c onstruct is evaluated employing mean square standardized residua ls (MnSq) produced for each item on the instrument. MnSq represents observed va riance divided by expect ed variance(121). Consequently, the desired value of MnSq for an item is 1.0. The acceptable criterion depends on the intended purpose of the measure and the degr ee of rigor desired. Fo r surveys using rating scales (such as the DASH), Wr ight and Linacre(158) suggest reasonable ranges of MnSq fit values between 0.6 and 1.4 associated with standard ized Z values < 2.0. A low MnSq value (i.e., <0.6) suggests that an item is failing to discrimi nate individuals with different levels of ability

PAGE 99

99 (i.e., people with different amounts of difficulty performing upper-extremity tasks) or that an item is redundant (i.e., other items on the instrument are of similar difficulty ). High values (i.e., >1.4) indicate that scores are va riant or erratic, suggesting that an item does not belong with the other items on the same continuum or that the item is being misinterpreted. Items with high MnSq values represent a threat to validity a nd thus, are given greater consideration. For the current study, misfit items will be considered those with infit Mn Sq values greater than 1.4 and standardized Z values (ZSTD) > 2.0. Separa te analyses were conducted for admission and discharge data. To further test for unidimensionality, principal components or factor analyses should also be conducted(124, 125). With the sample size in this study (N = 37), obtaining accurate results from such analyses is not possible. However, previously, with a much larger sample of DASH data (N = 991) with individuals with similar demogr aphic characteristics, the authors conducted exploratory factor analysis (E FA) and confirmatory factor an alysis (CFA) on both admission and discharge data. Although the re sults of this analysis were somewhat mixed, the authors concluded that there was enough support for the uni dimensionality of the DASH to proceed with the Rasch analysis (See Chapter 3). Test of logic of difficulty hierarchy order Based on the theory of motor control(128) hyp otheses were developed about the hierarchy of difficulty of the item s on the DASH. For ex ample, based on the concept from motor control theory that complex items are more difficult, doing heavy household chores (e.g., wash walls, wash floors) was hypothesized to be harder than putting on a pullover sweat er. Likewise, this theory contends that more challenging activities require the use of multiple joints. Thus, putting on a pullover sweater was hypothesized to be more difficult than turning a key in a lock. Rasch methodologies provide a means to investigate th e validity of a proposed hierarchy by computing

PAGE 100

100 item calibrations. Data was fit to a one parame ter Rasch model using the Winsteps software program(1) to calculate item difficulty parame ters. Separate analys es were done utilizing admission and discharge data. The resulting item difficulty order was examined to determine if it supported the basic tenants of motor control theory. Obtaining a general keyform in wi nsteps for d ash admission data Winsteps Rasch analysis program(1) was used to obtain a general keyform for the DASH admission data. To make this scale more clinically useful, Winsteps commands (UMEAN and USCALE) were used to convert the person meas ures obtained from the raw data into person measures on a scale from 0 to 100. The UMEA N and USCALE values applied to achieve a 100 point scale were 48.6 and 13.0, respectively. A dditionally, to obtain person measures at discharge, item average difficulties (i.e., difficulty of responding moderate difficulty to an item) and average rating scale step estimates (i.e., the estimated value for the transition from one rating scale choice to the next, for example, from m ild to moderate difficulty) were anchored at admission values. This was to make person measur es obtained at discharge comparable to those obtained at admission, thus, allowing for any changes in ability to be observed. Creation of data collection form The general keyfor m output from Winsteps was us ed as the basis for the creation of a data collection form. Patient instructi ons were added to the top, items were moved to the left of the response choices, and descriptions were a dded above the response choice numbers. A percentage person ability scale ranging from 0-100 was placed at the bottom of the form. Shading was added to improve readability of the data collection form. Using the admission data, three patients were ch osen: a high ability patient (i.e., one with a high person measure score), a medium ability pati ent (i.e., one with a medium person measure score), and a low ability patient (i.e., one with a low person meas ure score). These individuals

PAGE 101

101 responses were circled on data collection forms to illustrate how completed forms would look. Additionally, using Winsteps(1), person fit indices were obtained for these patients. Person fit indices, similar to those for item fit, are reported in terms of mean square standardized residuals (MnSq), observed variance divided by expected variance(121, 158). As suggested by Wright and Linacre(158), high infit MnSq values were considered those gr eater than 1.4 with standardized Z values greater than 2.0. For mi sfitting individuals, tria ngles were placed around unexpected responses (i.e., patient responses that were either two standard deviations higher or lower than would be expected base d on his/her pattern of responses). Patient responses at both admission and disc harge for each of these three patients were circled on a form to show how pa tient progress might be observed using the form. On the forms, the patients person ability measures (calculated in Winsteps using the conversion to a scale out of 100) were found at the bottom of the scale. A solid line was drawn up at this point to represent where the patients abil ity measure fell in relation to his/her response to each item. Dotted lines were plotted on each side of the solid line to represent two standard errors associated with the person ability measure. These dotted lines were omitted only when they went outside the scale (0-100). Finally, two sets of data collection forms (admission and discharge for selected patients) were created with patient goals indicated. One se t for a high ability patient at admission and one for a low ability patient at admission. A box was placed around potential shorter term goal activities and potential longer term goal activities on the ad mission form. The discharge form illustrates whether these goals had been achieved.

PAGE 102

102 Results Item Infit Analysis and Previous Factor An alysis w ith Larger Sample to Test Rasch Measurement Model Fit To provide some evidence that the requireme nt of unidimensionality was met with this sample (despite the small sample size), infit statistics from Rasch analysis were calculated. Overall the items worked well together with 90 % of the items having a mean infit within the acceptable criterion at admission a nd 83% at discharge. However, at admission 10% (3/30) of the items were not within the acceptable criterion (MnSq 1.4; ZSTD < 2.0) and at discharge 17% (5/30) of the items were outside the accepta ble values. At admission items with high infit included: Difficulty sleeping because of the pa in in your arm, shoulder or hand, Recreational activities which require little effort (e.g., cardplaying, knitt ing, etc.), and Write, while at discharge the items with high infit were: Arm, shoulder or hand pain when you performed a specific activity, I feel less capable, less confident or less useful because of my arm, shoulder or hand problem, Stiffness in your arm, shoulder or hand, Recreational ac tivities in which you move your arm freely (e.g., playing frisbee, ba dminton, etc.), and Write. The admission items were 53%-173% more erratic than expected and the discharge items were 58%-135% more erratic than expected. See Tables 4-2 and 4-3. Test of Logic of Difficulty Hierarchy Order Item difficulty estimates were investigated to determine if the items followed a logical progression of upper-extremity function difficult y, which could be used as a basis for goal setting and treatment intervention. Table 4-2 presents DASH item difficulty calibrations at admission and Table 4-3 presents DASH item difficulty calibrations at discharge in the order of relative challenge at admission. The most chal lenging items based on admission estimates are located at the top and the least challenging at th e bottom. The most difficult items at admission

PAGE 103

103 were: Recreational activities in which you take some force or impact through your arm, shoulder or hand, Arm, shoulder or hand pain when you performed a specific ac tivity, and I feel less capable, less confident or less useful because of my arm, shoulder or hand problem, while the most difficult items at discharge were: Recreationa l activities in which you take some force or impact through your arm, shoulder or hand, I f eel less capable, less conf ident or less useful because of my arm, shoulder or hand problem, and Open a tight or new jar. The least challenging items at admission included: Turn a key, Manage transportation needs, and Write. The least challenging items at discharge included: Turn a key, Manage transportation needs, and Sexual Activities. Creation of Data Collection Form From the general keyform produced by Winste ps Rasch analysis program(1), a data collection form was created. This form is presen ted in Figure 4-1. Patien t instructions: Please rate your ability to do the following activities in the last week by circling the number below the appropriate response, are presented at the top of the form. Items are located at the left in order of decreasing difficulty at admission. Rating scale choices (1-5) are presented to the right of each item with descriptions of what ratings mean above the number responses These rating scale choices are placed at a location re lative to the person ability scal e at the bottom. That is, a person with a person ability meas ure of 50, would be most likely to rate the item Make a bedmild or moderate difficulty and to rate Carry a heavy objectmoderate di fficulty. In contrast, someone with a person ability measure of 30, would be most likely to rate the item Make a bedsevere difficulty and would most likely rating the item Carry a heavy objectunable. In contrast an individual with a person ab ility measure of 90, would mo st likely rate both items no difficulty. Shading was included on the data collection form to improve readability.

PAGE 104

104 Figure 4-2 presents six data collection form s that have been filled out according to responses of three different individuals at admission and discharge. Figure 4-2a shows responses of a patient who began treatment at a high ability level, Figure 4-2b illustrates responses of a patient who began treatment at a medium ability le vel, and Figure 4-2c contains responses of a patient who began treatment at a low ability level. The forms on the left have admission responses of these patients circled, while those on the right have discharge responses circled. Triangles are placed around unexpected responses (i.e., those where patients indicated a response that was either higher or lower than expected based on his/her other responses). As can be seen in the first two forms in Figure 4-2a, the high ability individual entered treatment with a person measure of 71.8 7.0. At th is time, the individual had no difficulty with the majority of the easier items and moderate-mild difficulty on the majority of the more difficult items. At discharge, his/her pers on ability measure had risen to 97.9 + 15.4. Now, this individual had no difficulty with all but three items (mild difficulty) and these three items had a tendency to be the more difficult items. This person did not have a ny responses that were unexpected based on his/he r pattern of responses. The patient who began treatment with medium ability (Figure 4-2b; person measure of 51.2 + 4.6) had a pattern of responses that was so mewhat unexpected, answering that they were unable to perform many of the eas ier items. This individual ha d a high misfit value associated with his/her pattern of responses (MnSq = 1.93, ZSTD = 3.3). The triangles around responses to the items I feel less capable, less confident or le ss useful because of my arm, shoulder or hand problem, Wash back, Carry a shopping bag or brief case, and Write indicate that these responses were unexpected. On several of the easy items, th is individual had mild to no difficulty. With the middle level items, this individual generall y reported having mild difficulty and with hard

PAGE 105

105 items generally reported being unable to perform the activit y or able to perform the activity with mild difficulty. At discharge, this patient re ported having no difficulty with all of the items except one, Arm, shoulder or hand pain when you performed a specific activity. His/her person measure at this time reached the maximum of 100 + 26.2. Finally, the patient who had low ability at ad mission (Figure 4-2c; pers on measure score of 30.5 + 5.8), was having severe difficulty with five of the activities, moderate difficulty with five of the activities, and was unable to comple te the rest of the activities. Three of this patients responses were unexpected (as shown by the triangles around these responses). Write was answered by this individual as being easier than expected, as were Arm, shoulder or hand pain, and Limited in your work or other regular activi ties as a result of your arm, shoulder, or hand problem. At discharge, this individuals person measure had increased to 80.9 + 8.8 and reported having mild difficulty with eight of the items moderate difficulty with one item, and no difficulty with the rest of the items. Figure 4-3 illustrates how goals might be se t for patients of differing admission ability levels (a high ability patient at admission and a low ability patient at admission) using this type of data collection form. For the patient with high ability at admission (Figure 4-3a), shorter term goals might include increasing range of motion for back washing, increasing strength for carrying heavy objects, and interven tions to decrease pain and ti ngling sensations. Longer term goals for this patient might include increasing ha nd strength for activitie s such as opening a jar lid and further decreasing pain. Using a functional approach to treatment (such as one used in Constraint Induced Movement Therapy, CIMT), this patient might be asked to perform activities such as combing his/her hair, reaching overhead to get objects out of a cabinet, ca rrying objects from one side of

PAGE 106

106 the room to another, and opening di fferent sized containers. At disc harge, as seen in Figure 4-3a, this patient had met all his/her shorter term goa ls, but was still having mild difficulty with pain (longer term goal). Figure 4-3b illustrates goal setting for a pati ent with low ability at admission. This individuals responses are circ led at admission on the left form and at discharge on the right form. With this individual, s horter term goals might include in creasing shoulder range of motion for completing tasks such as putting on a pullover sweater, washing and blow drying his/her hair, changing a light bulb overhead, a nd placing an object on a shelf a bove his/her head. Fine motor ability might be addressed to help ease tasks su ch as using a knife to cut food. Perhaps, longer term goals for this patient would include furt her increasing range of shoulder motion, so that washing his/her back, recreationa l activities in which you move your arm freely, and increasing strength for heavy household chores, such as washing walls and fl oors, were easier. Functional activities to work on with this in dividual might include placing various objects on shelves above his/her head to increase s houlder range of motion. Improving fine motor ability through practice picking up various sized coins. Finally, lifting and moving heavy books or grocery bags might improve st rength. At discharge, this i ndividual had met all of these shorter and longer term goals. Discussion In summ ary, the goals of this study were to demonstrate, using data collected on the DASH, how Rasch methodologies can be used to crea te a clinically useful data collection form, to illustrate how different abi lity study participants responses would look on the data collection form at admission and discharge, and to demons trate how these forms can be used in setting patient goals. Prior to addressing these goals psychometric evidence was presented to support the use of Rasch analysis for the creation of th is data collection form (i.e., that necessary

PAGE 107

107 assumptions were met). The two assumptions that were tested were model fit (i.e., that all items contribute to a single under lying trait) and the logic (i.e., beli evability) of the hierarchy of item difficulty. In regards to the infit analysis, with the admission data, only three items misfit: Difficulty sleeping because of the pain in your arm, shoul der or hand, Recreational activities which require little effort (e.g., cardplaying, knitting, etc.), a nd Write. Perhaps, these items misfit because many of the individuals did not have pain to the extent that it inte rfered with their sleeping or did not have fine motor issues that would have aff ected activities such as writing. Therefore, they were uncertain how to respond to these items. At discharge five items misfit. Two of these were symptom items: Arm, shoulder or hand pain when you performed a specific activity and Stiffness in your arm, shoulder or hand. Once again, many individuals may not have been experiencing these symptoms. Another item that misfit at discharge was a more general item that may have been affected by life issues other than the patients uppe r-extremity problems (I feel less capable, less confident or less useful). The small nu mber of misfit items seems to indicate that the items could be conceived as representing one construct. Nevertheless, some authors suggest that it is n ecessary for less than 5% of the items to misfit to assume unidimensionality of a measure(159). This was not the case here, since 10% (3/30) of the items misfit with the admission data and 17% (5/30) misfit with the discharge data. With the larger data set from the previous study, results of exploratory and confirmatory factor analyses showed some support for a one factor solution. Eigenvalues for the first factor in both the admission and discharge expl oratory factor analyses were substantially greater than for the second factor. One factor accounted for most of the variance (> 60%) for both admission and

PAGE 108

108 discharge data and all items loaded high on the fi rst factor. The first two factors were highly correlated. In general, the item difficulty hierarchies s upported the theory of mo tor control. Four tenants of motor control theory are: 1) complex items are more difficult, 2) environmental factors increase the challenge of items, 3) more difficult items require the use of multiple joints, and 4) isolated, abstract items are more difficult th an functional tasks(128, 145, 146). Complex items were found to be more difficult. For example, doing heavy household chores (e.g., wash walls, wash floors) was harder than putting on a pullove r sweater. Likewise, gardening or doing yard work was more difficult than carry ing a shopping bag or briefcase. Environmental factors were shown to influence difficulty ranking. Recreationa l activities requiring little effort, such as card playing or knitting, were rated easier than re creational activities in which you move your arm freely, such as frisbee or badminton. Furtherm ore, recreational activities in which you move your arm freely were rated easier than recreational activities in which you take some force or impact through your arm, shoulder, or hand. Items requiring the use of multiple joints were more difficult than items requiring the use of fewe r joints. For instance, recreational activities, such as frisbee or badminton, were rated more di fficult than using a knife to cut food. Finally, isolated tasks, presented wit hout a concrete context or purpos e were rated as harder. To illustrate, carrying an unspecified heavy object was rated as harder than carrying a heavy shopping bag or briefcase. The general keyform output from Winsteps Ra sch analysis program(1 ) was used as the basis for creating the data collection form. Creati on of such a form proved to be an easy process and potentially offers several adva ntages over the forms used in cl inics today. Instructions were placed at the top of the form and items (activities fo r patients to rate) were listed to the left of the

PAGE 109

109 possible answer choices (both numbe rs to circle and descriptions of what the numbers mean; for example, no difficulty placed above 5). A pers on measure scale ranging from zero to 100 was placed at the bottom of the form. As depicted by the forms filled out accordi ng to patients of differing admission ability levels (high, medium, and low), at a glance it is possible to get a picture of an individuals overall ability to perform functional tasks re quiring the upper-extremit y. The person ability measure scale at the bottom of the form aids in this estimation. On the forms presented with study participant data, a line was drawn up from the actual person measure estimate (i.e., the estimate of person ability derived for that person in Winsteps). It has been suggested that when computer software is not available to estimate th is measure (as is most often the case in the clinic), a line can be dr awn by visually determining where most of the responses fall(156). Such a line could be estimated even with missing respons es to some questions. The down side of this is that the accuracy of such a line drawn by sight is questionable. As seen in the admission form illustrating a person of medium ability (left form in Figure 4-2b), his/her responses are somewhat inconsistent and scattered. Determining an estim ate of this persons ability may be challenging due to the variability of his/her responses. In addition, some patients response patterns differ from what would be expected based on the obtained hierarchy(155, 156). Consider once ag ain the medium ability individual (left form in Figure 4-2b). His/her pattern of responses at admission does not follow the hierarchy of item difficulty very well (i.e., easiest items being rate d less challenging than more difficult items). However, many of the easy items, with which th is patient is having a lot of difficulty, require shoulder range of motion (changing a light bulb overhead, putting on a pullover shoulder). In

PAGE 110

110 this way, unusual patterns in the data can be dia gnostic. Possibly, this in dividual has some type of shoulder impairment. Another advantage of this type of data coll ection form is that they can be used in conjunction with impairment and performance assessments to aid goal setting and treatment planning. The forms filled out with actual patient da ta illustrate this bene fit. Examining such a form gives a sense of what activities and issues an individual is struggl ing with completing. For example, on the forms created and used to demo nstrate goal setting (see Figure 4-3), with the patient with high ability at admission, it is possi ble to see that this individual potentially has some range of motion issues (because of difficulty with the item, Wash your back), some problems with strength (because of indicated difficu lty with the item, Carry a heavy object (over 10 lbs)), and some active symptoms, because of i ndicated difficulty with the items Tingling (pins and needles) in your arm, shoulder or hand and Di fficulty sleeping because of pain in your arm, shoulder or hand. Using this information, the therapis t is able to set goals that will help achieve improved range of motion and strength and de creased negative symptoms (i.e., pain and tingling). On re-evaluation, the clinician once ag ain can see at a glance whether or not these items have improved and areas where new goals might be set (where lower numbers are still being circled). New goals for this patient might include increasing hand strength for activities such as opening a jar lid and further decrea sing pain. Finally, at discharge treatment effectiveness can be determined by whether or no t the patient is circling response choices to the right of the form (4s and 5s). With the high ab ility patient in Figure 4-3a all shorter term goals were met with only one longer term goal not met (Arm, shoulder, or hand pain). The patient with low ability at admission also provides a dramatic example of successful treatment. Looking at the admission form fille d out by this patient (Figure 4-3b) one could

PAGE 111

111 easily see that this individual seems to be having some difficulty with shoulder range of motion and could set shorter term goals to improve this resulting in less difficulty with daily tasks such as pullover sweater, washing and blow drying his/her hair, changing a light bulb overhead, and placing an object on a shelf above his/her head. Additionally, a goal to improve fine motor skills might be set in order to make daily tasks such as using a knife to cut food easier. Once these goals were obtained, longer term goals for this patient might include further increasing range of shoulder motion, so that washing his/her back, recreational activities in which you move your arm freely, and heavy household tasks were easier. At discharge, th is patient had met all of these goals. Several studies exist which have discussed the creation of keyforms utilizing Rasch methodologies. These studies have taken a slight ly different focus than the current study. Earlier work by Linacre(156) and Kielhofner(155), focused on the keyform as a way of obtaining instantaneous measurement. These authors ce nter their discussion on how using the keyform, without the use of computer soft ware, a line can be drawn to repr esent an overall person measure for a patient and how patterns of responses can be used diagnostically. Linacre(156) created a keyform based on data on the Functional Indepe ndence Measure (FIM), while Kielhofner used data from the Occupational Pe rformance History Interview 2nd Version (OPHI-II). Another approach in using keyforms was demonstrat ed by Woodbury and colleagues(160). Keyforms created from data on the Fugl-M eyer Assessment of the upper-extr emity were used to validate the consistency of the pattern of responses across study participants. Using the keyform, it was possible to verify that participants of differing abilities were scoring higher on easier items and lower on harder items. Evidence of this sc oring pattern indicates that the Fugl-Meyer Assessment retains its structure when m easuring subjects acro ss the ability range.

PAGE 112

112 Work by Avery and colleagues(159) using da ta collected on the Gross Motor Ability Estimator (GMAE) demonstrated how a computer program could be us ed to generate output in a form similar to that of a keyform produced by Winsteps. The item hierarchy obtained by these authors shows a clear pattern of gross motor de velopment. For instance, the hierarchy found by these authors showed that a baby can lift its he ad before pulling to a sit, crawling follows attaining a sitting positio n, and walking and hopping come later in development. In order to obtain the keyform output describe d in this study, clinicians w ould be required to enter data collected on an assessment into a computer. However, these authors report that over 50 therapists in Ontario said they would be interested in using su ch a program(159). The advantage of a data collection form designed using the sa me methodologies, however, is that no computer entry would be required of the th erapists. This study represents the first time a data collection form has been created for the DASH based on keyforms generated through Rasch methodologies. Moreover, there are no published st udies on keyforms that present longitudinal data (i.e., data at both admission and discharge). There are several limitations to this study. Although, the previous study with a larger sample, showed some support for the unidimensiona lity of the DASH, other evidence from that study was inconclusive. Perhaps, as indicated by the items loading highest on three different factors in previous work, the DASH should be sp lit into three separate subscales. This would provide for a clearer item diffi culty hierarchy and thus, a mo re obvious progression a patient function. Factor analyses suggested the three constructs present c ould represent: (1) gross motor, complex items, requiring more movement, (2) items requiring hand use, and (3) general/ symptom items (see Chapter 3). Conceptually, it is reasonable to assume that an individual with pain as a symptom of his/her condition would ra te pain items harder than other individuals.

PAGE 113

113 Likewise, one with fine motor problems would ra te items requiring hand use harder than more gross motor items. Thus, having items on one instrument that inquire about pain and other symptoms, fine motor tasks, and more gross motor tasks may in some ways confuses the item difficulty order. Possibly, then three separate data collection forms s hould be created for the DASH. Nonetheless, dividing up the DASH, would result in subscal es with a small number of items and unknown psychometric properties. This leads to another issue regarding the crea tion of a data collection form for the DASH or any other well establis hed instrument. The reliability a nd validity of the original DASH has been well-established(55, 57, 95, 102, 161). If the items are reordered according to relative challenge and the rating scale modified so that higher numbers indicate more ability instead of more disability, how would this affect previ ously established psychom etric properties of the assessment? Of concern is that the ordering of items with the most diffi cult at the top (read and answered first by the patient) and least difficult items at the bottom (read and answered last by the patient), might influence how the questions are perceived and answered. Such an influence of item ordering has been a well known and studied phenomenon in psychological literature for many years(162, 163). If an individual unconsciousl y recognizes that the activities in question are getting easier, they may star t to respond accordingly without critically thinking about how challenging the task really is for them. Of concern is the issue that the genera l keyform produced using Rasch measurement methodologies and the difficulty hierarchies ascertained differ slightly when produced using admission data from when produced using discharge data. If the difficulty ordering of items on an assessment were to vary widely from admission to discharge this could compromise the usefulness (and validity) of a data collection form produced using the methods in this study.

PAGE 114

114 However, previous analyses by the authors (see Chap ter 3) showed that with a larger sample size DASH item difficulties from admissi on to discharge were stable enough to not affect person ability estimates. Additionall y, in the current study, the c oncern of differing admission and discharge difficulty estimates, was handled by obtaining person measures at discharge after anchoring on the basis of item and step difficu lties at admission. Then, the admission general keyform was used to produce the data collection form. This was based on the thinking that therapists would desire to see how much a patien ts ability changed from the point of admission. One further problem with this anchoring method wa s that since difficulties were anchored based on admission data, many of the participants ende d up with maximum person ability estimates at discharge. A final limitation in this study, was the small sample size. This has already been discussed in terms of establishing unidimensionality of th e DASH. Beyond this, with such a small sample, the hierarchy arrived at and used for the creation of the data collection fo rm, may not be stable (i.e., might vary from sample to sample). Pr eliminary analysis by th e authors using a much larger sample of DASH data result ed in the production of a very similar keyform. Nevertheless, collection of more data, re-creation of the da ta collection form, and comparison to the one designed in this study is advised. In conclusion, the positive implications of a data collection form like the one presented in this paper are numerous. First, as demonstrated by the completed forms in this study, this type of assessment form could be clinically useful to therapists, aiding in goal setting and treatment planning. Second, once such a form was adopted and widely used in clinics, it could be further used to justify reimbursement by insurance companies and by accreditation agencies. The clear illustration of improvement shown by this type of form would make an impressive statement

PAGE 115

115 about the possibilities of maki ng functional gains from treatme nt. Finally, if outcomes of standardized assessments were designed, like the ones presented in this study, and became useful tools for clinicians, this would lead to many more possibilities for comparison of treatment effectiveness between therapists at different sites. Future work comparing the psychometric prope rties of the created DASH data collection form to already established reliability, validity, and responsivene ss studies of the original DASH is necessary. Additionally, therapist perceptions on the usefulness of such data collection forms would be beneficial. Perhaps, clinicians might suggest further modifications that would aid the treatment planning process. Actual trials in the clinic with these forms are needed to determine if these data collection forms are useful in actual practice. More over, the creation and testing of data collection forms using Rasch methodologie s with other assessments would enhance the evidence supporting the feasib ility of such forms.

PAGE 116

116 Table 4-1. Sample of 37 demographics Mean age + SD 95% Confidence Interval (CI) 39.0 + 15.3 (N = 37) CI = 33.4, 43.9 Gender Female Male 20 (54.1%) 17 (45.9%) (N = 37) Mean number of medications taking + SD 95% Confidence Interval (CI) 3.3 + 4.6 (N = 36*) CI = 1.6, 4.0 Mean number of surgeries + SD 95% Confidence Interval (CI) 1.5 + 2.1 (N = 35*) CI = 0.8, 2.2 Exercise history At least three times a week One or two times a week Seldom or never 11 (29.7%) 16 (43.2%) 10 (27.0%) N = 37 Note: Several of the totals presented in table do not total 37 due to missing data.

PAGE 117

117Table 4-2. Admission item fit and difficulty estimates

PAGE 118

118Table 4-3. Discharge item fit and difficulty estimates

PAGE 119

119 Figure 4-1. Data collection form for the DASH (Based on analysis of admission data)

PAGE 120

120(a) Figure 4-2. Data collection forms with individuals responses (a. High ability individual at admission, b. Medium ability indiv idual at admission, c. Low ability individual at admission) Note: Triangles indicate unexpected responses Data collection forms are shown at admi ssion (left) and at discharge (right).

PAGE 121

121(b) Figure 4-2. Continued Note: Triangles indicate unexpected responses. Data collection forms are shown at ad mission (left) and at discharge (right).

PAGE 122

122(c) Figure 4-2. Continued Note: Triangles indicate unexpected responses. Data collection forms are shown at admi ssion (left) and at discharge (right).

PAGE 123

123(a) Figure 4-3. Goal setting data co llection form (a. High ability individual at admission, b. Low ability individual at admission) Note: Triangles indicate unexpected respons es. Data collection forms are shown at admission (left) and at discharge (right).

PAGE 124

124 (b) Figure 4-3. Continued Note: Triangles indicate unexpected responses. Data collection forms are shown at ad mission (left) and at discharge (right).

PAGE 125

125 CHAPTER 5 CONCLUSIONS In the rehabilitation clinics where patients with upper-extrem ity impairments are treated, the primary focus is on enabling improvement. Not just progression at the impairment level, but advances in functional ability (i.e ., changes that affect patients da ily lives). In order to ensure the achievement of this goal, asse ssments that are reliable, valid, and able to detect change are essential. The reliability a nd validity of many hand/ upper-extremity assessments have been well established. However, this is not the case for ab ility to detect change. The ability to detect change is a property of instruments called respon siveness(2). Recently in various health-related areas investigators have acknowle dged the importance of studying responsiveness. For instance, many studies have been conducted to assess th e capacity of health-re lated quality of life measures to detect change(3-7). Hand therap ists spend considerable time and energy attempting to bring about change, specifically to the hand/upper-extremity ailm ents of their patients. Thus, using assessments that are able to det ect change is of foremost importance. Outcomes of hand/upper-extremity therapy are generally evaluated by therapis ts using three types of measures: (1) impairment measures (2) performance measur es, and (3) self-report measures. For each of these measures, various authors have reported on reliability and validity, however, responsiveness has receiv ed less attention in the literature. Consequently, although there are a few studies examining responsiveness, the investigation is far from complete. Additionally, studies compari ng the responsiveness of sim ilar assessments have been inconclusive about which ones are superior in ability to detect patient change(57, 71, 72). Since much of a clinicians efforts are intended to effect change in patie nt function, it is clear that the capacity of instruments to measur e change has been under studied.

PAGE 126

126 The purpose of this project was four-fold: (1) To present the state-of-the-art responsiveness designs and methods, highlight problems with coe fficients, and suggest a step toward increasing the existing knowledge of the measurement of cha nge. (2) To compare the responsiveness of a well known upper-extremity assessment, the Disa bilities of the Ar m, Shoulder and Hand (DASH) outcome questionnaire to a lesser know n and researched assessment, the UpperExtremity Functional Index (UEF I). (3) To use item response theory methodologies to assess the item content of the DASH and (4) To dem onstrate how Rasch methodologies can be used to create a clinically useful data collection form, an d illustrate the benefits of such a data collection form in setting patient goals. Chapter 1 provided background detail on responsiveness designs and methods. Two categories of responsiveness designs, as out lined by Stratford and colleagues (73) were presented. These categories are based on how many groups are involved: 1) single-group designs and 2) multiple-group designs. Singl e-group designs include the before-after (no baseline) design and the before-after design with a baseline, while multiple-group designs include three types: 1) those that compare patients who are randomly assigned to receive either a previously proven effective treatment or a pla cebo, 2) those that compare two or more groups whose health status, based on prior evidence, is expected to change by different amounts, and lastly, 3) those that compare two groups exp ected to change by different amounts based on responses to an outside criterion (e .g., a global rating of change score). There are limitations associated with each of the designs and problems associated with using various coefficients. Perhaps, the fore most concerns are asso ciated with single-group designs. A disadvantage associated with the before-after, no baseline design is that if there is not a change detected, it is unclear whether the measure was unable to detect the change or whether

PAGE 127

127 the patients did not undergo the ex pected change. This design is considered the weakest of the designs. With the before-after designs with a baseline, a major disadvantage is that the period during which stability is measured (i.e., period from baseline to measurement taken just before intervention) is often shorter th an the period during which change is assessed (i.e., measure taken just before intervention to measure taken after intervention). Thus, this design may underestimate the magnitude of random variabilit y that occurs over longer periods in patients whose health status is truly stable(73). The addition of a comparison group gives multiple group designs an advantage over single-group designs. However, each of the mul tiple-group designs also has limitations. One disadvantage when comparing patie nts who receive either the intervention or a placebo involves the difficulty finding an interventi on that is known to be effective (i.e., an intervention that is highly likely to result in ch anges to the treatment group). In a similar way, the major disadvantage when comparing groups expected to ch ange by different amounts is that often it is difficult to find groups who meet this criterion (i.e., often there are not two groups of patients available whose condition is expected to cha nge by different amounts). When comparing two groups which have been defined by some external standard (e.g., global ra ting of change scores) the soundness of the study may be compromised fo r two reasons. First, patients complete the measure under study (e.g., the Disabilities of the Arm, Shoulder, and Hand outcome questionnaire) and the criterion rating (e.g., a global rating of change). Thus, the criterion measure is not independent of the measure und er study (i.e., a patient who responds to the questions on the measure indicating they have ch anged is also likely to respond to the criterion measure in the same manner). Secondly, when a gl obal rating of change measure is used as the external criterion, evidence suggests that patients have difficulty recalling their initial state and

PAGE 128

128 consequently, may be inaccurate when assessing the amount of change that has taken place(75, 164, 165). Although an outside crite rion provides the advantage of being a benchmark by which to assess whether change has occurred, if th e outside criterion is not a genuine measure of change the results will not provide a good repres entation of how much change has actually occurred. In addition to the limitations related to sp ecific designs, a problem with measuring the property of responsiveness involves the discrepant results obtained when different coefficients are used. That is, when several similar instruments are compared to determine which assessment is the most responsive, the ranking of these in struments often differs depending on which design and which coefficient is used to calculate res ponsiveness(72, 76, 77). For instance, Wright and Young(72) found different rankings of five different assessments when using five different responsiveness coefficients (Guyatt's Respons iveness Index, standardized response mean, relative efficiency statistic, effect size, and corr elation). Thus, an important question to answer before beginning a responsiveness study is: What is the most appropriate responsiveness design and coefficient to use when calculating responsiv eness? Since no standardized methods exists for calculating responsiveness and multiple possible responsiveness coefficients can be computed (with conflicting results), there is a need for cl arification as to what designs require the use of which coefficients. Erroneously, many researchers calculate an array of different coefficients(78). Since the formulas for each of the coefficients differ, the indices can lead to conflicting results. A more theoretically sound approach is to determine which responsiveness coefficient is most appropriate for the study design and sample. If the design involve s a single group, one of the coefficients appropriate for single-group studies should be utili zed (effect size (ES),

PAGE 129

129 standardized response mean (SRM), paired t value, or Guyatts Responsiveness Index (GRI)). Alternatively if the design in cludes more than one group, you should choose from those coefficients appropriate for use with multip le-group designs (Guyatts Responsiveness Index (GRI), t value for independent change scores, Anal ysis of Variance (ANOVA) of change scores, Normans Srepeat, Normans Sancova, area under the Receiver Opera ting Characteristic (ROC) curve, or correlation). Also, with multiple-g roup designs one should c onsider the amount of change expected of your groups. That is, what differences in change is expected between placebo versus treatment groups, acute versus ch ronic groups, or between groups divided based on an outside criterion. Consider ing these two elements can elim inate the possibility of using several of the coefficients and thus, a more plan ned structured approach. However, it should be noted that even when the appropriate coeffici ents for the given number of groups and sample characteristics is used, there may be discrepancies in the calculations of different coefficients due to variations in the formulas. Because hand/upper-extremity therapists foremost concern is inciting change in the functional ab ility of their patients, more studies leading to higher standards of responsiveness (equal to the standards of reliability and validity) should be mandated. Chapter 2 presented the comparison of the resp onsiveness of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnair e, a well known upper-ex tremity assessment, to the Upper-Extremity Functiona l Index (UEFI), a le sser known and resear ched assessment. The results of this comparison indicated that ne ither instrument has a clear advantage over the other when measuring responsiveness. The DASH questionnaire and the UEFI measure patient change in upper-extremity condition very similarly. A within subjects ANOVA revealed a significant difference between person measure ch ange between the two assessments. However, areas under the ROC curves were almost identica l for the two assessments (.67 and .65), as were

PAGE 130

130 correlations between global rati ngs and the change scores ( r = 0.33 and 0.35). Given that the responsiveness calculations for both assessments are very similar, there appears to be no real advantage of one instrument over the other in de tecting change. Thus, if time is an issue, perhaps, the shorter UEFI is a better choice, even though it is a lesser known instrument. However, if comparison outcome data is need ed or communication of outcomes with other therapists is desired the DASH ma y be preferred since it is the mo re widely used and studied. One concern highlighted in the comparison of the responsiveness of the DASH to the UEFI centered on use of a patient reported global rating of change as a gold standard. The correlations between the global rating and the person measure change scores were only r = 0.33 for the DASH and r = 0.35 for the UEFI. Additionally, the RO C curve was really more of a line, indicating low probabilities of specificity and sensitivity, i.e., 50% chance. Thus, there is a problem with not having a gold standard w ith which to compare DASH and UEFI change scores. In essence in this study, the assessm ent and gold standard are patient reported outcomes. This study tends to imply that patients are not accurate in reporting the amount of change that has occurred in their functional abil ity. Moreover, the wording of the global rating of change question used in this study (Please rate on a scale fr om to +7 how much you think your condition has changed since your first ther apy session. indicates that your condition is much worse, while +7 indicates that your condition is much better. Please fill in the circle above your answer choice.), may have in fluenced how individuals responded. Even when an appropriate external criterion has been established, there are other important decisions that need to be addr essed. One involves considering sensitivity and specificity in determining a cutoff score for patients who have changed versus those who have not changed. This was demonstrated through the ROC curves generated in this study. The issue involves a

PAGE 131

131 trade-off between specificity and sensitivity. A larger total change score on the DASH or UEFI (more stringent criteria) yields a smaller sens itivity and greater specificity, and vice versa; a smaller total change score (less stringent crite ria) yields greater sensitivity and smaller specificity. Clinicians who use assessment change scores need to make a compromise of sorts when deciding on what type of error they are w illing to make. By accepting a lower total score change as a cut off for dividing improved and not improved groups, clinic ians increase their chance of assuming a patient has changed when they have not. Conversely, using a greater total score change increases the chance of assuming a pa tient has not changed and continuing to treat the condition after their functiona l ability has returned to the desired state. Portney and Watkins(112) state that those who use a screeni ng tool must decide what levels of sensitivity and specificity are acceptable. Sensitivity is mo re important when the risk associated with missing a diagnosis is high, as in the case of life-threat ening disease or a debilitating deformity; specificity is more important wh en the cost or risks associated with further intervention are substantial. In the case of assessing change resulting from rehabilita tion, sensitivity may be chosen over specificity since it c ould be argued that it is better to continue to treat a patient longer than actually needed, than to discharge them too early. Chapter 3 investigated the item-level psychometri cs and factor structure of the Disabilities of the Arm, Shoulder, and Hand (DASH) outcome questionnaire. Results of exploratory factor analyses (EFAs) and confirmatory factor analyses (CFAs) were inconclusive as to whether or not a one factor solution was plausibl e. One factor accounted for mo st of the variance (> 60%) for both admission and discharge data with the next two factors each only adding approximately five percent more variance. All items loaded high on the first factor and the first two factors were highly correlated. Eigenvalues for the first f actor in both the admission and discharge EFAs

PAGE 132

132 were much greater than for the second factor, however, the sec ond and third factor eignevalues were greater than 1.0 (criterion necessary for accep tance of the presence of a factor based on the Kaiser rule). Examination of which items loaded highest on each of the f actors in a three factor solution, separates the items into three possible constructs. One (those that load highest on the first factor) representing more gr oss motor, complex activities, a second (those that load highest on the second factor) being solely hand activities, and third (those that load highest on the third factor) involving more general f unctioning or symptom items. T hus, the decision to treat the DASH as unidimensional is debatable. Perhaps, dividing the DASH into three separate scales with three total scores would be more meaningful. Using multiple scales, instead of one multidimensional scale might be more informative for treatment (e.g., when specifically treating a symptom) and provide a cleare r picture of patient progress fr om admission to discharge. CFAs with both admission and discharge data we re conducted to test a one factor solution. The results revealed that with admission data, th e only goodness of fit statis tic that supported the model, was the Tucker-Lewis Index (TLI). With the discharge data, th e TLI and Standardized Root Mean Square Residual (SRMR) supported th e one factor solution. A number of items misfit (ten at admission and eleven at discharge). However, the criterion used with such a large sample (close to one thousand) is very stringent (MnSq fit valu es between 0.6 and 1.1(122, 123)). Moreover, DASH item point measure corre lations (i.e., correlati ons between each item and the entire instrument) were all above .30. Thus, the evid ence is mixed regarding the dimensionality of the DASH. Maps of the person-item match, revealed that the sample performed better at discharge than at admission. The sample mean fell above al l the items at discharge, whereas, at admission several items (I feel less capable, less confident or less useful, Recreational activities in which

PAGE 133

133 you take some force or impact, and Recreationa l activities in which you move your arm freely) were above the sample mean. At discharge 41 individuals had maximum estimated ability levels, in contrast to only two at admission. Item difficulty hierarchies support the theory of motor control. Four tenants of motor control theory are: 1) complex items are more difficult, 2) environmental factors increase the challenge of items, 3) more difficult items require the use of multiple joints, and 4) isolated, abstract items are more difficult than func tional tasks(128, 145, 146). Complex items were found to be more difficult. For example, doi ng heavy household chores (e.g., wash walls, wash floors) was harder than putting on a pullover sweater. Likewise, gardening or doing yard work was more difficult than carrying a shopping bag or briefcase. E nvironmental factors were shown to influence difficulty ranking. Recr eational activities requ iring little effort, such as card playing or knitting, were rated easier than recreational ac tivities in which you move your arm freely, such as frisbee or badminton. Furthermore, r ecreational activities in which you move your arm freely were rated easier than recreational activities in which you take some force or impact through your arm, shoulder, or hand. Items requiring the use of multiple joints were more difficult items. For instance, re creational activities, such as frisbee or badminton, were rated more difficult than using a knife to cut food. Also, preparing a meal was more difficult than turning a key in a lock. Finall y, tasks presented out of context or without clear purpose were rated as harder. To illustrate, carrying a heavy obj ect was rated as harder than carrying a shopping bag. There was evidence that item difficulties were sufficiently stable from admission to discharge. Several items di splayed significant differentia l item functioning (DIF) from admission to discharge. Five items had larg e DIF values (p<.0017): I feel less capable, less

PAGE 134

134 confident or less useful because of my arm, shoulder or hand problem; Difficulty sleeping because of the pain in your arm, shoulder or hand; Push open a heavy door; Tingling (pins and needles) in your arm, shoulder or hand; and Sexu al activities. While four items had small DIF (p<.05) (Weakness in your arm, shoulder or ha nd; Limited in work or other regular daily activities as a result of your arm, shoulder, or hand problem; Put on a pullover sweater; and Make a bed). However, none of the items calib rations from admission to discharge differed by 0.5 logits and mean person ability measures were not affected by the inclusion of DIF items. Furthermore, the intraclass correlation coeffi cient (ICC) was high indicating reliability in measurement between the two time-points. Simila r curves for test information for admission and discharge data provide further evidence for the stab ility of the test at two time points. This stability of the measure from admission to discharg e is important to the measurement of patient change. A further item characteristic of importance is item discrimination. This gives an indication of how responses to a particular item correspond to responses to the overall assessment(137, 138). In general, high values for item disc rimination are consider ed good, indicating that responses to that particular item reflect a similar construct as the other items on the test(138). Zero or negative discrimination valu es indicate that the item does not fit well with the rest of the items(137). Winsteps(1) was used to estimate item discrimination for admission and discharge data. Since none of the item discrimination values were zero or negative; this indicates that all items contribute in a similar fash ion to the overall test(137, 138). Item discrimination estimates are used in th e calculation of item di fficulty and person ability with the two parameter ite m response theory model. However, item discrimination is not used in these calculation s with the one parameter item response model. With the one parameter

PAGE 135

135 item response model, it is assumed that item discri mination is negligible and thus, is set at one for all items. Since item discrimination values were not equal in this study, some might agree that analysis using a two parameter model is necessary. Yet, analysis using the one parameter model appears to be the most parsimonious solution since comparison of person measures calculated using both models revealed little difference in these measures. Finally, Chapter 4 demonstrated the potential for creating clini cally useful data collection forms through the use of Rasch methodologies. The general keyform output from Winsteps Rasch analysis program(1) was used as the basi s for creating the data collection form. Creation of such a form proved to be relatively easy pr ocess and offers severa l advantages over datacollection forms used in clinics today. On the data collection form, items are listed in hierarchical difficulty order. This helps to reve al patterns of scoring (i.e., doing better on easier items than more difficulty items). A person meas ure scale ranging from zer o to 100 is placed at the bottom. A line can be drawn through the poi nt on the form where most of the responses fall(156). The corresponding person measure at the bottom can help to determine the overall ability of the patient. Such a line can be estimated even with missing responses to some items. As depicted by forms filled out according to patients of differing admission ability levels (high, medium, and low), at a glance it is possible to estimate an individuals overall ability to perform functional tasks requiring the upper-extremity. Furthermore, these forms can be used in goal setting and treatment planning. By observing the location where this individual starts to have mild to moderate difficulty with a substa ntial number of items, the clinician can determine what activities should be set as short a nd long-term goals for the patient. The positive implications of this type of data collection form are numerous. First, this type of assessment form could be useful to therapis ts, aiding in goal setting and treatment planning.

PAGE 136

136 The location of the circled responses on the form indicating that the patien t is starting to have mild to moderate difficulty with a substantia l number of items, provides information on what activities should be set as goals for the patient. Second, such a form could be used to justify reimbursement by insurance companies and by accreditation agencies. The illustration of patterns of functional improvement shown by this type of form could make an impressive statement about the possibilities of making functional gains from treatment. Finally, with the advantages provided by these form s, clinicians may be more likely to adopt standardized assessment in practice. Adoption of standardized assessments in clinical practice could lead to more possibilities for comparison of treatmen t effectiveness between different types of interventions and across di fferent clinical sites. In conclusion, the goals of this project were to present responsiveness designs and methods, compare the responsiveness of the DA SH and UEFI, using item response theory methodologies to assess the DASH, and to demonstr ate how a clinically us eful data collection form can be designed. Clearly the study of respons iveness in clinical a ssessments is lacking. This may be due in part to the need for cl arity about correct met hods, designs, and details involved in the study of change in patient ability. Furthermore, the item characteristics of many upper-extremity assessments have not been investig ated. Information about item characteristics may lead to further understanding of responsiv eness. Finally, the use of standardized assessments in the clinics is lacking due to their inability to inform th e treatment process. Chapter 4 attempted to illustrate a method to de sign a more clinically useful assessment. The hope of this project was to move upper-extrem ity assessment a small step towards becoming more clinically meaningful.

PAGE 137

137 APPENDIX A ITEMS ON THE DISABILITI ES OF THE AR M, SHOULDER, AND HAND (DASH) OUTCOME QUESTIONNAIRE The Disabilities of the Arm, Shoulder, and Hand (DASH) Outcome Questionnaire

PAGE 138

138

PAGE 139

139 APPENDIX B ITEMS ON THE UPPER-EXTREMITY FUNCTIONAL INDEX (UEFI) The UpperExtremity F unctional Index (UEFI) We are interested in knowing whether you are ha ving any difficulty at al l with the activities listed below because of your upper limb problem for which you are cu rrently seeking attention. Please provide an answer for each activity. Today, do you or would you have any difficulty with: (Circle one number on each line) Activities Extreme Difficulty or Unable to Perform Activity Quite a bit of Difficulty Moderate Difficulty A Little bit of Difficulty No Difficulty 1. Any of your usual work, household or school activities 01234 2. Your usual hobbies, recreational or sporting activities 01234 3. Lifting a bag of groceries to waist level 01234 4. Lifting a bag of groceries above your head 01234 5. Grooming your hair 01234 6. Preparing food (e.g., peeling, cutting) 01234 7. Pushing up on your hands (e.g., from bathtub or chair) 01234 8. Driving 01234 9. Vacuuming, sweeping or raking 01234 10. Dressing 01234 11. Doing up buttons 01234 12. Using tools or appliances 01234 13. Opening doors 01234 14. Cleaning 01234 15. Tying or lacing shoes 01234 16. Sleeping 01234 17. Laundering clothes (e.g., washing, ironing, folding) 01234 18. Opening a jar 01234 19. Throwing a ball 01234 20. Carrying a small suitcase with your affected limb 01234

PAGE 140

140 APPENDIX C GLOBAL RATING OF CHANGE Please rate on a scale from to +7 how m uch you think your condition has changed since your first therapy session. indicates that your condition is much worse, while +7 indicates that your condition is much better. Please fill in th e circle above your answer choice. -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 WORSE BETTER

PAGE 141

141 LIST OF REFERENCES 1. Linacre JM. WINSTEPS Rasch Measurem ent. Version 3.57.2 Date: 05-01-2005 ed; 2005. 2. Beaton DE, Bombardier C, Ka tz JN, Wright JG. A taxonomy for responsiveness. J Clin Epidemiol. 2001 Dec;54(12):1204-17. 3. Tidermark J, Bergstrom G, Svensson O, Tor nkvist H, Ponzer S. Responsiveness of the EuroQol (EQ 5-D) and the SF-36 in elderly patients with displaced femoral neck fractures. Qual Life Res. 2003 Dec;12(8):1069-79. 4. De La Loge C, Trudeau E, Marquis P, Revicki DA, Rentz AM, Stanghellini V, et al. Responsiveness and interpretati on of a quality of life que stionnaire specific to upper gastrointestinal disorder s. Clin Gastroenterol He patol. 2004 Sep;2(9):778-86. 5. Reine G, Simeoni MC, Auquier P, Loundou A, Aghababian V, Lancon C. Assessing health-related quality of life in patients suffering from schizophrenia: a comparison of instruments. Eur Psychiatry. 2005 Aug 30. 6. Eurich DT, Johnson JA, Reid KJ, Spertus JA. Assessing responsiven ess of generic and specific health related quality of life measures in heart failure. Health Qual Life Outcomes. 2006;4:89. 7. Badia X, Diez-Perez A, Lahoz R, Lizan L, Nogues X, Iborra J. The ECOS-16 questionnaire for the evaluation of health relate d quality of life in post-menopausal women with osteoporosis. Health Qual Li fe Outcomes. 2004 Aug 3;2:41. 8. MacDermid JC. Measurement of health outcomes following tendon and nerve repair. J Hand Ther. 2005 Apr-Jun;18(2):297-312. 9. Cutter N, Kevorkian C, editors. Handbook of Manual Muscle Testing. New York: McGraw-Hill; 1999. 10. Hislop H, Montgomery J. Daniels and Wort hingham's Muscle Testing. Techniques of Manual Examination. Philadel phia: W.B. Saunders; 2002. 11. Kendall F, McCreary E, Provance P. Mu scles Testing and Function. 4th ed. ed. Philadelphia: Lippincott Williams & Wilkins; 1993. 12. Reese N. Muscle and Sensory Test ing. Philadelphia: W.B. Saunders; 1999. 13. Dvir Z. Grade 4 in manual muscle testing: the problem with submaximal strength assessment. Clin Rehabil. 1997 Feb;11(1):36-41. 14. Cuthbert SC, Goodheart GJ, Jr. On the reliabil ity and validity of manual muscle testing: a literature review. Chiropr Osteopat. 2007 Mar 6;15(1):4.

PAGE 142

142 15. Fess E. Grip Strength. In: Casanova J, edito r. Clinical Assessment Recommendations. Chicago: American Society of Hand Therapists; 1992. p. 41-6. 16. Bellace JV, Healy D, Besser MP, Byron T, H ohman L. Validity of the Dexter Evaluation System's Jamar dynamometer attachment for as sessment of hand grip strength in a normal population. J Hand Ther. 2000 Jan-Mar;13(1):46-51. 17. Hamilton A, Balnave R, Adams R. Grip strength testing reliability. J Hand Ther. 1994 Jul-Sep;7(3):163-70. 18. Lagerstrom C, Nordgren B. On the reliability and usefulness of me thods for grip strength measurement. Scand J Reha bil Med. 1998 Jun;30(2):113-9. 19. Lagerstrom C, Nordgren B, Olerud C. Eval uation of grip strength measurements after Colles' fracture: a methodological study. S cand J Rehabil Med. 1999 Mar;31(1):49-54. 20. MacDermid JC, Lee A, Richards RS, Roth JH. Individual finger st rength: are the ulnar digits "powerful"? J Hand Ther. 2004 Jul-Sep;17(3):364-7. 21. Nitschke JE, McMeeken JM, Burry HC, Matyas TA. When is a change a genuine change? A clinically meaningful interpretation of grip strength measurements in healthy and disabled women. J Hand Th er. 1999 Jan-Mar;12(1):25-30. 22. Pincus T, Brooks RH, Callahan LF. Reliability of grip strength, walking time and button test performed according to a standard pr otocol. J Rheumatol. 1991 Jul;18(7):997-1000. 23. Shechtman O, Davenport R, Malcolm M, Nabavi D. Reliability and validity of the BTEPrimus grip tool. J Hand Ther. 2003 Jan-Mar;16(1):36-42. 24. Spijkerman DC, Snijders CJ, Stijnen T, Lankhorst GJ. Standardization of grip strength measurements. Effects on repeatability and peak force. Scand J Rehabil Med. 1991;23(4):203-6. 25. Stephens JL, Pratt N, Michlovitz S. The reliability and validity of the Tekdyne hand dynamometer: Part II. J Hand Ther. 1996 Jan-Mar;9(1):18-26. 26. Stephens JL, Pratt N, Parks B. The re liability and validity of the Tekdyne hand dynamometer: Part I. J Hand Ther. 1996 Jan-Mar;9(1):10-7. 27. Shechtman O, Gestewitz L, Kimble C. Reliability and validity of the DynEx dynamometer. J Hand Ther. 2005 Jul-Sep;18(3):339-47. 28. Agre JC, Magness JL, Hull SZ, Wright KC, Baxter TL, Patterson R, et al. Strength testing with a portable dynamometer: reliability for upper and lower extremities. Arch Phys Med Rehabil. 1987 Jul;68(7):454-8. 29. Brown A, Cramer LD, Eckhaus D, Schmidt J, Ware L, MacKenzie E. Validity and reliability of the dexter hand evaluation and th erapy system in hand-injured patients. J Hand Ther. 2000 Jan-Mar;13(1):37-45.

PAGE 143

143 30. MacDermid JC, Evenhuis W, Louzon M. Inte r-instrument reliability of pinch strength scores. J Hand Ther. 2001 Jan-Mar;14(1):36-42. 31. MacDermid JC, Kramer JF, Woodbury MG, McFarlane RM, Roth JH. Interrater reliability of pinch and grip strength measuremen ts in patients with cumulative trauma disorders. J Hand Ther. 1994 Jan-Mar;7(1):10-4. 32. Schreuders TA, Roebroeck ME, Goumans J, van Nieuwenhuijzen JF, Stijnen TH, Stam HJ. Measurement error in grip and pinch force meas urements in patients w ith hand injuries. Phys Ther. 2003 Sep;83(9):806-15. 33. Novak CB, Mackinnon SE, Williams JI, Kelly L. Establishment of reliability in the evaluation of hand sensibility. Plas t Reconstr Surg. 1993 Aug;92(2):311-22. 34. Novak CB, Mackinnon SE, Kelly L. Correla tion of two-point discrimination and hand function following median nerve injur y. Ann Plast Surg. 1993 Dec;31(6):495-8. 35. King PM. Sensory function assessment. A pilot comparison study of touch pressure threshold with texture and tactile discri mination. J Hand Ther. 1997 Jan-Mar;10(1):24-8. 36. Novak CB, Kelly L, Mackinnon SE. Sensory recovery after median nerve grafting. J Hand Surg [Am]. 1992 Jan;17(1):59-68. 37. Sanford J, Moreland J, Swanson LR, Stratfor d PW, Gowland C. Reliability of the FuglMeyer assessment for testing motor performance in patients following stroke. Phys Ther. 1993 Jul;73(7):447-54. 38. Jebsen RH, Taylor N, Trieschmann RB, Trotter MJ, Howard LA. An objective and standardized test of hand function. Arch Phys Med Rehabil. 1969 Jun;50(6):311-9. 39. Sabari JS, Lim AL, Velozo CA, Lehman L, Kieran O, Lai JS. Assessing arm and hand function after stroke: a validity test of the hierarchical scoring system used in the motor assessment scale for stroke. Arch Phys Med Rehabil. 2005 Aug;86(8):1609-15. 40. Fugl-Meyer AR. Post-stroke hemiplegia assessment of physical properties. Scand J Rehabil Med Suppl. 1980;7:85-93. 41. Fugl-Meyer AR, Jaasko L, Leyman I, Olsson S, Steglind S. The post-stroke hemiplegic patient. 1. a method for evaluation of physical performance. Scand J Rehabil Med. 1975;7(1):1331. 42. Platz T, Pinkowski C, van Wijck FM, Kim I-H, di Bella P, Johnson G. Reliability and validity of the arm function assessment with standardized guidelines for the Fugl-Meyer Test, Action Research Arm Test and Box and Block Te st: A mutlicentre study. Clinical Rehabilitation. 2005;19:404-11.

PAGE 144

144 43. van Wijck FM, Pandyan AD, Johnson GR, Barnes MP. Assessing motor deficits in neurological rehabilitation: patterns of inst rument usage. Neurorehabil Neural Repair. 2001;15(1):23-30. 44. Poole JL, Cordova KJ, Brower LM. Reliabili ty and validity of a self-report of hand function in persons with rheumatoid arthriti s. J Hand Ther. 2006 Jan-Mar;19(1):12-6, quiz 7. 45. Poole JL, Whitney SL. Motor assessment scal e for stroke patients: concurrent validity and interrater reliab ility. Arch Phys Med Rehabi l. 1988 Mar;69(3 Pt 1):195-7. 46. Morris DM, Uswatte G, Crago JE, Cook EW, 3rd, Taub E. The reliability of the wolf motor function test for assessing upper extremity function after stroke. Arch Phys Med Rehabil. 2001 Jun;82(6):750-5. 47. Wolf SL, Catlin PA, Ellis M, Archer AL, Morgan B, Piacentino A. Assessing Wolf motor function test as outcome measure for rese arch in patients after stroke. Stroke. 2001 Jul;32(7):1635-9. 48. Wolf SL, Lecraw D, Barton L, Jann B. Fo rced use of hemiplegic upper extremities to reverse the effect of learned nonuse among chronic stroke and head -injured patients. Exp Neurol. 1989;104:125-32. 49. Barreca SR, Stratford PW, Lambert CL, Master s LM, Streiner DL. Test-retest reliability, validity, and sensitivity of the Chedoke arm a nd hand activity inventory: a new measure of upper-limb function for survivors of stroke. Ar ch Phys Med Rehabil. 2005 Aug;86(8):1616-22. 50. Hsieh CL, Hsueh IP, Chiang F, Lin P. Interrater reliability and validity of the Action Research Arm Test in stroke patients. Age Ageing. 1998;27:107-14. 51. Stamm TA, Cieza A, Coenen M, Machold KP, Nell VP, Smolen JS, et al. Validating the International Classification of Functioning, Disability and Health Comprehensive Core Set for Rheumatoid Arthritis from the patient perspect ive: a qualitative study. Arthritis Rheum. 2005 Jun 15;53(3):431-9. 52. Hunter J, Mackin E, Callahan A, editor s. Rehabilitation of the Hand: Surgery and Therapy. Fourth Edition ed. St. Louis: Mosby; 1995. 53. Stern EB. Stability of the Jebsen-Taylor Ha nd Function Test across three test sessions. Am J Occup Ther. 1992 Jul;46(7):647-9. 54. U.S. Department of Health and Human Se rvices FaDA. Guidance for Industry, PatientReported Outcome Measures: Use in Medical Pr oduct Development to Support Labeling Claims. Draft Guidance; February 2006. 55. Atroshi I, Gummesson C, Andersson B, Dahlgr en E, Johansson A. The disabilities of the arm, shoulder and hand (DASH) outcome questionna ire: reliability and validity of the Swedish version evaluated in 176 patients. Acta Orthop Scand. 2000 Dec;71(6):613-8.

PAGE 145

145 56. Gummesson C, Atroshi I, Ekdahl C. The disabilities of the arm, shoulder and hand (DASH) outcome questionnaire: longitudinal constr uct validity and measuring self-rated health change after surgery. BMC Musculoskelet Disord. 2003 Jun 16;4:11. 57. Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk V, Bombardier C. Measuring the whole or the parts? Validity, reliability, and re sponsiveness of the Disabilities of the Arm, Shoulder and Hand outcome measure in different regions of the upper extremity. J Hand Ther. 2001 Apr-Jun;14(2):128-46. 58. Rosales RS, Delgado EB, Diez de la Lastra-Bos ch I. Evaluation of the Spanish version of the DASH and carpal tunnel syndrome health-related quality-of-life instrume nts: cross-cultural adaptation process and reliability. J Hand Surg [Am]. 2002 Mar;27(2):334-43. 59. Dubert T, Voche P, Dumontier C, Dinh A. [The DASH questionnaire. French translation of a trans-cultural adaptation]. Chir Ma in. 2001 Aug;20(4):294-302. 60. Davis AM, Beaton DE, Hudak P, Amadio P, Bombardier C, Cole D, et al. Measuring disability of the upper ex tremity: a rationale suppor ting the use of a regional outcome measure. J Hand Ther. 1999 Oct-Dec;12(4):269-74. 61. Navsarikar A, Gladman DD, Husted JA, Cook RJ. Validity assessment of the disabilities of arm, shoulder, and hand questionnaire (DAS H) for patients with ps oriatic arthritis. J Rheumatol. 1999 Oct;26(10):2191-4. 62. Arnould C, Penta M, Renders A, Thonnard JL. ABILHAND-Kids: a measure of manual ability in children with cerebral palsy. Neurology. 2004 Sep 28;63(6):1045-52. 63. Gustafsson S, Sunnerhagen K, Dahlin-Ivanoff S. Occupational Therapists' and Patients' Perceptions of ABILHAND, a New Assessmen t Tool for Measuring Manual Ability. Scandinavian Journal of O ccupational Therapy. 2004;11:107-17. 64. Penta M, Tesio L, Arnould C, Zancan A, Thonnard J-L. The ABILHAND Questionnaire as a Measure of Manual Ability in Chroni c Stroke Patients. Stroke. 2001;32:1627-34. 65. Penta M, Thonnard JL, Tesio L. ABILHAND: a Rasch-built measure of manual ability. Arch Phys Med Rehabil. 1998 Sep;79(9):1038-42. 66. Kotsis SV, Chung KC. Responsiveness of th e Michigan Hand Outcomes Questionnaire and the Disabilities of the Arm, Shoulder and Hand questionnaire in carpal tunnel surgery. J Hand Surg [Am]. 2005 Jan;30(1):81-6. 67. Greenslade JR, Mehta RL, Belward P, Wa rwick DJ. Dash and Boston questionnaire assessment of carpal tunnel syndrome outcome: what is the responsiveness of an outcome questionnaire? J Hand Surg [Br]. 2004 Apr;29(2):159-64. 68. MacDermid JC, Tottenham V. Responsiveness of the disability of the arm, shoulder, and hand (DASH) and patient-rated wrist/hand evalua tion (PRWHE) in evalua ting change after hand therapy. J Hand Ther. 2004 Jan-Mar;17(1):18-23.

PAGE 146

146 69. MacDermid JC, Richards RS, Donner A, Be llamy N, Roth JH. Responsiveness of the short form-36, disability of the arm, shoulde r, and hand questionnaire, patient-rated wrist evaluation, and physical impairment measurements in evaluating recovery af ter a distal radius fracture. J Hand Surg [A m]. 2000 Mar;25(2):330-40. 70. Gay RE, Amadio PC, Johnson JC. Comparative responsiveness of the disabilities of the arm, shoulder, and hand, the carpal tunnel questi onnaire, and the SF-36 to clinical change after carpal tunnel release. J Hand Surg [Am]. 2003 Mar;28(2):250-4. 71. Stratford, Binkley JM, Stratford. Devel opment and initial validation of the upper extremity functional index. Phys iotherapy Canada. 2001;Fall 2001. 72. Wright J, Young N. A comparison of diffe rent indices of responsiveness. J Clin Epidemiol. 1997 Mar;50(3):239-46. 73. Stratford, Binkley F, Riddle D. Health stat us measures: strategies and analytic methods for assessing change scores. Phys Ther. 1996 Oct;76(10):1109-23. 74. Portney LG, Watkins MP. Foundations of clinical research : applicati ons to practice. 2nd ed. Upper Saddle River, NJ: Prentice Hall; 2000. 75. Norman GR, Stratford P, Regehr G. Me thodological problems in the retrospective computation of responsiveness to change: th e lesson of Cronbach. J Clin Epidemiol. 1997 Aug;50(8):869-79. 76. Wallace D, Duncan PW, Lai SM. Comparison of the responsiveness of the Barthel Index and the motor component of the Functional Indepe ndence Measure in stroke : the impact of using different methods for measuring responsiven ess. J Clin Epidemiol. 2002 Sep;55(9):922-8. 77. Deyo RA, Centor RM. Assessing the responsiv eness of functional sc ales to clinical change: an analogy to diagnostic test perf ormance. J Chronic Dis. 1986;39(11):897-906. 78. Stratford PW, Riddle DL. Assessing sensit ivity to change: choosing the appropriate change coefficient. Health Qual Life Outcomes. 2005 Apr 5;3(1):23. 79. Liang MH. Evaluating measurement respons iveness. J Rheumatol. 1995 Jun;22(6):11912. 80. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy L. Measuring clinically important changes with patient-oriented questionnair es. Med Care. 2002 Apr;40(4 Suppl):II45-51. 81. Guyatt G, Walter S, Norman G. Measuring change over time: asse ssing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171-8. 82. Jaeschke R, Singer J, Guyatt GH. Measur ement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989 Dec;10(4):407-15.

PAGE 147

147 83. Jerrell JM. Behavior and symptom identificatio n scale 32: sensitivity to change over time. J Behav Health Serv Res. 2005 Jul-Sep;32(3):341-6. 84. Redelmeier DA, Guyatt GH, Goldstein RS. On the debate over methods for estimating the clinically important difference. J Clin Epidemiol. 1996 Nov;49(11):1223-4. 85. Beaton DE. Simple as possible? Or too simple? Possible limits to the universality of the one half standard deviation. Med Care. 2003 May;41(5):593-6. 86. Norman GR, Sloan JA, Wyrwich KW. Is it simple or simplistic? Med Care. 2003 May;41(5):599-600. 87. Norman GR, Sloan JA, Wyrwich KW. Interpre tation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003 May;41(5):58292. 88. Wyrwich KW, Tierney WM, Wolinsky FD. Fu rther evidence supporting an SEM-based criterion for identifying meaningful intra-individu al changes in health-related quality of life. J Clin Epidemiol. 1999 Sep;52(9):861-73. 89. Wright J. The minimal important difference: who's to say what is important? Journal of Clinical Epidemiology. 1996;49:1221-2. 90. Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the minimal important difference in symptoms: a comparison of two technique s. J Clin Epidemiol. 1996 Nov;49(11):1215-9. 91. Jacobson N, Follette W, Revenstorf D. Psychotherapy outcome research: methods for reporting variability and evaluating clinical significance. Behavior Therapy. 1984;15:336-52. 92. Wieduwilt K, Jerrell JM. Measuring sensitivity to change of the Role Functioning Scale using a RCID index. International Journal of Methods in Psychiatric Research. 1998;7(4):163-70. 93. Black N. Evidence based policy: proceed with care. Bmj. 2001 Aug 4;323(7307):275-9. 94. Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in healthrelated quality of lif e. J Clin Epidemiol. 2003 May;56(5):395-407. 95. De Smet L. Responsiveness of the DASH score in surgically treated basal joint arthritis of the thumb: preliminary results. Clin Rheumatol. 2004 Jun;23(3):223-4. 96. Davidson J. A comparison of upper limb amputees and patients with upper limb injuries using the Disability of the Ar m, Shoulder and Hand (DASH). Disabil Rehabil. 2004 Jul 22-Aug 5;26(14-15):917-23. 97. Beaton DE, Wright JG, Katz JN. Developmen t of the QuickDASH: comparison of three item-reduction approaches. J Bone Joint Surg Am. 2005 May;87(5):1038-46.

PAGE 148

148 98. Jester A, Harth A, Germann G. Measuri ng levels of upper-extremity disability in employed adults using the DASH Questionnaire J Hand Surg [Am]. 2005 Sep;30(5):1074 e1e10. 99. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, N.J.: L. Erlbaum Associates; 1988. 100. Dobrzykowski E, Nance T. The Focus On Therapeutic Outcomes (FOTO) outpatient orthopedic rehabilitation database: results of 1994-1996. Journal of Rehabilitation Outcomes Measurement. 1997;1:56-60. 101. Swinkels I, van den Ende C, de Bakker D, va n der Wees J, Hart D, Deutscher D, et al. Clinical databases in physical therapy. Phys iotherapy Theory and Practice. 2007;23(3):153-67. 102. Hudak PL, Amadio PC, Bombardier C. De velopment of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG) Am J Ind Med. 1996 Jun;29(6):602-8. 103. Jester A, Harth A, Wind G, Germann G, Saue rbier M. Disabilities of the arm, shoulder and hand (DASH) questionnaire: De termining functional activity pr ofiles in patients with upper extremity disorders. J Hand Su rg [Br]. 2005 Feb;30(1):23-8. 104. Solway S, Beaton DE, McConnell S, Bo mbardier C. The DASH Outcome Measure User's manual. Toronto, Ontario: Institute for Work & Health; 2002. 105. SPSS. 13.0 ed: SPSS Inc., 1989-2004; 2004. 106. Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific Quality of Life Questi onnaire. J Clin Epidemiol. 1994 Jan;47(1):81-7. 107. Shechtman O. The Use of Rapid Exchange Gr ip Test in Detecting Sincerity of Effort, Part II: Validity of the Test. J ournal of Hand Therapy. 2000;13:203-10. 108. Sackett D, Strauss S, Richardson W, Rosenbe rg W, Haynes R. Evidence Based Medicine. 2nd ed. Edinburgh, UK: Chur chill Livingstone; 2000. 109. Kielhofner G. Research in Occupational Th erapy: Methods of Inquiry for Enhancing Practice. Philadelphia: F.A. Davis Company; 2006. 110. Shechtman O. The Coefficient of Variation as a Measure of Sincerity of Effort of Grip Strength, Part II: Sensitivity and Specific ity. Journal of Hand Therapy. 2001(14):188-94. 111. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generi c health status measures in workers with musculoskeletal disorders. J Clin Epidemiol. 1997 Jan;50(1):79-93. 112. Portney LG, Watkins MP. Foundations of Clin ical Research: Appli cations to Practice. Norwalk, Conn: Apppleton & Lange; 1993. p. 81.

PAGE 149

149 113. Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care. 2000 Sep;38(9 Suppl):II28-42. 114. McHorney C. Ten recommendations for Advancing Patient-Centered Outcomes Measurement for Older Persons. Ann Intern Med. 2003;139:403-9. 115. Ware J, Jr. Item Response Theory and Com puterized Adaptive Testing: Implications for Outcomes Measurement in Rehabilitati on. Rehabilitative Psychology. 2005;50(1):71-8. 116. Thurstone L. Measurement of Social Attitudes. Journal of Abnormal and Social Psychology. 1931;26:249-69. 117. Muthen L, Muthen B. Mplus. Fourth Edit ion ed. Los Angeles, CA: Muthen & Muthen; 1998-2007. 118. Kline R. Principles and practices of st ructural equation modeling. 2nd edition ed. New York: Guilford Public ations, Inc.; 2005. 119. Ajiferuke I. Sample size and informet ric model goodness-of-fit outcomes: a search engine log case study. Journal of In formation Science. 2006;32(3):212-22. 120. Hu L, Bentler P. Cutoff criteria for f it indexes in covariance structure analysis: Conventional criteria versus ne w alternatives. Structural Equation Modeling. 1999;6(1):1-55. 121. Wright BD, Stone MH. Best Test Design. Chicago: MESA Press; 1979. 122. Linacre J. Size vs. Significance: Standard ized Chi-Square Fit Statistic. Rasch Measurement Transactions. 2003;17(1):918. 123. Green K, Franton C. Survey Development and Validation with the Rasch Model. International Conference on Questionnaire De velopment, Evaluation, and Testing; 2002 November 14-17, 2002; Charleston, SC; 2002. 124. Smith EV. Attention Deficit Hyperactivity Di sorder: Scaling and Standard Setting Using Rasch Measurement. Journal of A pplied Measurement. 2000;1(1):3-24. 125. Linacre JM. Structure in Rasch Residuals: Why Principal Components Analysis? Measurement Transactions. 1998;12:636. 126. Wright B. Point-biserial Correlations and Item Fits. Rasch Measurement Transactions. 1992;5(4):174. 127. Allen M, Yen W. Introduction to Measuremen t Theory. Prospect Heights, IL: Waveland Press, Inc.; 1979. 128. Shumway-Cook A, Woollacott M. Motor Cont rol: Theory and Practical Applications. 2nd edition ed. Philadelphia: Lippincott Williams & Wilkins; 2000.

PAGE 150

150 129. Tesio L. Measuring behaviours and per ceptions: Rasch analysis as a tool for rehabilitation research. J Re habil Med. 2003 May;35(3):105-15. 130. Lai J, Teresi J, Gershon R. Procedures for the analysis of differe ntial item functioning (DIF) for small sample sizes. Eval Health Prof. 2005;28(3):283-94. 131. Wright BD, Masters GN. Rating Scale Analysis. Chicago, IL: MESA Press; 1982. 132. Kothari DH, Haley SM, Gill-Body KM, Dumas HM. Measuring functional change in children with acquired brain injury (ABI): comp arison of generic and AB I-specific scales using the Pediatric Evaluation of Di sability Inventory (PEDI). Phys Ther. 2003 Sep;83(9):776-85. 133. Crane P, Hart D, Gibbons L, Cook K. A 37item shoulder functional status item pool had negligible differential item functioning. Jour nal of Clinical Epidemiology. 2006;59(5):478-84. 134. Roznowski M, Reith J. Examining the measurement quality of tests containing Differentially Functioning Items: Do biased items result in poor measurement? Educational and Psychological Measurement. 1999;59(2):248-69. 135. Wright B. Time 1 to Time 2. Rasch Measurement Transactions. 1996;10(1):478-9. 136. Chang WC, Chan C. Rasch analysis fo r outcomes measures: some methodological considerations. Arch Phys Med Rehabil. 1995 Oct;76(10):934-9. 137. Kelley T, Ebel R, Linacre J. Item Discrimination Indices. Rasch Measurement Transactions. 2002;16(3):883-4. 138. Lopez A. The Item Discrimination Inde x: Does it Work? Rasch Measurement Transactions. 1998;12(1):626. 139. MULTILOG for Windows. 7.0.2327.3 ed: Scientific Software International, Inc.; January 2003. 140. http://campar.in.tum.de/twiki/pub/Chair/T eaching Ws05ComputerVision/3DCV_reminder _mle_000.pdf 9.6 Maximum Likelihood Estimation. [c ited 2007 November 8, 2007]; Available from: 141. Hambleton R, Swaminathan H, Rogers H. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Press; 1991. 142. Andrich D. Controversy and the Rasch m odel: a characteristic of incompatible paradigms? Med Care. 2 004 Jan;42(1 Suppl):I7-16. 143. Reise SP, Haviland MG. Item response theory and the measurement of clinical change. J Pers Assess. 2005 Jun;84(3):228-38. 144. Linacre J. Re: RMT Request. In: Lehman L, editor.; 2007. p. Email.

PAGE 151

151 145. Alexander NB, Galecki AT, Grenier ML, Nyquist LV, Hofmeyer MR, Grunawalt JC, et al. Task-specific resistance traini ng to improve the ability of activities of daily living-impaired older adults to rise from a bed and from a chair. J Am Geriatr Soc. 2001 Nov;49(11):1418-27. 146. Collin C, Wade D. Assessing motor impairme nt after stroke: a p ilot reliability study. J Neurol Neurosurg Psychi atry. 1990 Jul;53(7):576-9. 147. Liang HW, Wang HK, Yao G, Horng YS, Hou SM. Psychometric evaluation of the Taiwan version of the Disabil ity of the Arm, Shoulder, a nd Hand (DASH) questionnaire. J Formos Med Assoc. 2004 Oct;103(10):773-9. 148. Carr JH, Shepherd RB, Nordholm L, Lynne D. Investigation of a new motor assessment scale for stroke patients. P hys Ther. 1985 Feb;65(2):175-80. 149. Bot SD, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Clinimetric evaluation of shoulder disability questionnaires: a systematic re view of the literature. Ann Rheum Dis. 2004 Apr;63(4):335-41. 150. Tiffin J, Asher E. The Purdue Pegboard: nor ms and studies of reliability and validity. Journal of Applied Psychology. 1948;32:234. 151. Malouin F, Pichard L, Bonneau C, Durand A, Corriveau D. Evaluating motor recovery early after stroke: comparison of the Fugl-Mey er Assessment and the Motor Assessment Scale. Arch Phys Med Rehabil. 1994 Nov;75(11):1206-12. 152. Schulzer M, Mak E, Calne SM. The psychomet ric properties of the Parkinson's Impact Scale (PIMS) as a measure of quality of life in Parkinson's disease. Parkinsonism Relat Disord. 2003 Jun;9(5):291-4. 153. van der Lee JH, Beckerman H, Lankhorst GJ, Bouter LM. The responsiveness of the Action Research Arm test and th e Fugl-Meyer Assessment scale in chronic stroke patients. J Rehabil Med. 2001 Mar;33(3):110-3. 154. Woodbury M, Velozo CA. Potential for out comes to influence practice and support clinical competency [Electronic Ve rsion]. OT Practice. 2005;10(10):7-8. 155. Kielhofner G, Dobria L, Forsyth K, Basu S. The Construction of Keyforms for Obtaining Instantaneous Measures From the Occupational Performance History Interview Rating Scales. Occupational Therapy Journal of Research: Occupation, Particip ation, and Health. 2005;25(1):110. 156. Linacre J. Instantaneous Measurement and Diagnosis. Physical Medicine and Rehabilitation: State of the Art Reviews. 1997;11(2):315-24. 157. Microsoft Access. 2003 ed: Microsoft Corporation; 2003. 158. Wright BD, Linacre JM. Reasonable Item Mean-Square Fit Values. Rasch Measurement Transactions. 1994;8:370.

PAGE 152

152 159. Avery LM, Russell DJ, Raina PS, Walter SD, Rosenbaum PL. Rasch analysis of the Gross Motor Function Measure: validating the as sumptions of the Rasch model to create an interval-level measure. Arch Phys Med Rehabil. 2003 May;84(5):697-705. 160. Woodbury M, Velozo C, Richards L, Duncan P, Studenski S, Lai S. Dimensionality and Construct Validity of the Fugl-Meyer Assessment of the Upper Extremity. Archives of Physical Medicine and Rehabilitation. 2007;88:715-23. 161. Schuind FA, Mouraux D, Robert C, Brassinne E, Remy P, Salvia P, et al. Functional and outcome evaluation of the hand and wrist. Hand Clin. 2003 Aug;19(3):361-9. 162. Stone L. Social desirability and order of item presentation in the MMPI. Psychological Reports. 1965;17(2):518. 163. Guertin W. The effect of instructions a nd item order on the arithmetic subtest of the Wechsler-Bellevue. Journal of Genetic Psychology. 1954;85:79-83. 164. Kairiss EW, Miranker WL. Cortical memory dynamics. Biol Cybern. 1998 Apr;78(4):277-92. 165. Hayes JA, Black NA, Jenkinson C, Young JD Rowan KM, Daly K, et al. Outcome measures for adult critical care : a systematic review. Health Technol Assess. 2000;4(24):1-111.

PAGE 153

153 BIOGRAPHICAL SKETCH Leigh Lehman graduated from Boiling Springs High School (Boiling Springs, SC) in 1993. She completed her Bachelor of Science degree in psychology and biology at the University of South Carolina Upstate (Spartanburg, SC) in 1998. In 2005, she received a Master of Health Science degree in occupational therapy from the University of Florida (UF; Gainesville, FL). She graduated with a Ph.D. in reha bilitation science from UF in 2008.