<%BANNER%>

Testing the Accuracy of Linking Healthcare Data across the Continuum of Care


PAGE 1

TESTING THE ACCURACY OF LINKING HEALTHCARE DATA ACROSS THE CONTINUUM OF CARE By KATHERINE L. BYERS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004

PAGE 2

ii ACKNOWLEDGMENTS I would like to thank the VA Office of Academic Affairs, Washington, DC, PreDoctoral Associated Health Rehabilitation Research Fellowship Program, and the VA HSR&D/RR&D Rehabilitation Outcomes Res earch Center (RORC) of Excellence, Gainesville, Florida, for funding this study. Within the VA, I would like to especially thank Dr. Maude Rittman for acting as my fellowship’s program director and for her continual support and encouragement throughout the process. Additional thanks go to Dr. Christa Hojlo, Chief of Nursing Home Ca re, for her assistance in obtaining MDS data and to Mr. Clifford Marshall, Rehabilitation Planning Specialist, for his assistance in obtaining FIM data. I extend my gratitude to my doctoral advisor and VA Fellowship Preceptor, Dr. Craig Velozo, who also acted as my mentor and the chairperson of my supervisory committee. His guidance and support throughout this process have been invaluable. Furthermore, I am appreciative of the support provided by the other members of my committee, including the cochair, Dr. Ronald Spitznagel, Dr. Elizabeth Swett and Dr. Anne Seraphine. Special thanks go to Dr. Richard Smith, an expert in Rasch analysis, who analyzed the original data. Other student members of Dr. Velozo’s research team have been invaluable in the completion of this dissertation, and I owe them a debt of gratitude. This is especially true of Ms. Inga Wang, who has worked closely on this project. And finally, my family has been a source of continual support throughout this process, and I would like to thank them for their tireless encouragement.

PAGE 3

iii TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................................ ii LIST OF TABLES....................................................... v LIST OF FIGURES..................................................... vi ABSTRACT.......................................................... vii CHAPTER 1INTRODUCTION .................................................... 1 2REVIEW OF THE LITERATURE...................................... 10 Measuring Outcomes in Rehabilitation ................................... 10 The FIM Instrument Used in Inpatient Rehabilitatio n........................ 11 The MDS Instrument in Skilled Nursing Facilities .......................... 13 Administrative Solution: Adopting a Single Instrument for Measuring Outcomes in Post Acute Care................................................ 14 Potential Measurement Solution: Linking Instruments...................... 17 The Use of Linking Techniques......................................... 20 Linking of Measures in Healthcare...................................... 22 3METHODOLOGY .................................................. 30 Introduction ........................................................ 30 Source of the Data................................................... 30 Sample............................................................ 31 FIM and MDS Motor Items............................................ 32 Procedures Involved in the Creation of the FIM/MDS Conversion Table ....... 36 Statistically Testing the Accuracy of the FIM-MDS Conversion Table .......... 38 4RESULTS......................................................... 44 Statistical Analyses.................................................. 44 Statistical Results at the Level of the Individual............................ 45 Statistical Results at the Group Level .................................... 45 Discrepancies in the Dataset........................................... 52

PAGE 4

iv 5DISCUSSION...................................................... 53 Summary of Results.................................................. 53 Implications for Future Research........................................ 58 Conclusion......................................................... 59 REFERENCES........................................................ 60 BIOGRAPHICAL SKETCH.............................................. 71

PAGE 5

v LIST OF TABLES Table page 1-1Comparison of FIM to MDS ADL/motor items............................ 7 3-1Comparison of FIMTM to MDS ADL/motor items......................... 33 3-2FIM scoring criteria................................................ 35 3-3MDS scoring criteria................................................ 36 3-4FIM-MDS score conversion .......................................... 37 3-5FIM-MDS conversion table .......................................... 39 3-6Effect size ........................................................ 42 3-7Rating scale conversion ............................................. 42 3-8Similar FIM and MDS items.......................................... 43 4-1 Four moments of the distributions ..................................... 47

PAGE 6

vi LIST OF FIGURES Figure page 4-1Score difference between the FIMa and FIMc............................ 46 4-2Score difference between the MDSa and MDSc.......................... 46 4-3Distribution of FIM actual scores...................................... 49 4-4Distribution of FIM converted scores ................................... 49 4-5Distribution of MDS actual scores..................................... 50 4-6Distribution of MDS converted scores .................................. 50 4-7Scatterplot of the FIMa and FIMc scores................................ 51 4-8Scatterplot of actual MDS and converted MDS scores ..................... 51

PAGE 7

vii ABSTRACT Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TESTING THE ACCURACY OF LINKING HEALTHCARE DATA ACROSS THE CONTINUUM OF CARE By Katherine L. Byers August 2004 Chair: Craig A. Velozo Major Department: Rehabilitation Science The purpose of this project was to test the accuracy of a conversion table designed to transform a score on the physical ability component of the Functional Independence MeasureTM (FIM) to its corresponding score on the Minimum Data Set (MDS) and vice versa. The records of 2,297 VA patients with scores on both the FIM and MDS, which were completed within 7 days of one another between July 2002 and June 2003, were obtained from the VA's Austin Automation Center (AAC). The FIM-MDS conversion table, generated from an independent sample using Rasch measurement techniques, was then used to transform actual scores on the FIM and MDS to their corresponding converted scores. The equivalence of the variances of the two score distributions was determined by examining their means and variances. It was hypothesized that 75% of the actual and converted scores on the FIM and the MDS would be within five points of one another. Effect size was determined, as was the percent of subjects having actual and converted FIM and MDS scores within five points of one another.

PAGE 8

viii Twenty-four percent of the FIM and 37% of the MDS actual and converted scores were within five points of one another, respectively, and therefore fell short of the standard set at 75% for the conversion to be considered accurate. Yet, the effect size for the conversion of both FIM and MDS scores was .2, demonstrating an 85.3% overlap between the two score distributions. The correlation between the FIM actual and converted scores was .724 while the correlation between the MDS scores was .745. While the development of a FIM-MDS translation table appears promising, the results of this study do not provide strong enough evidence to support the premise that this first attempt at creating a FIM-MDS conversion table has resulted in an instrument that would provide an accurate means of converting scores within a clinical setting.

PAGE 9

1 CHAPTER 1 INTRODUCTION Over the past few decades, the total number of people receiving rehabilitation services in the United States has grown while the provision of such services has extended into settings outside of acute rehabilitation settings. This growth in demand has been fueled in part by changes in the population demographics, as the number of individuals over the age of 65 has increased, and this trend is expected to continue for many years into the future (Cornman & Kingson, 1996; Gutheil, 1996). According to the U.S. Department of Health and Human Services (2002), 35.6 million people or 12.3% of the U.S. population was over the age of 65 in 2002. Since 1990, this percentage had more than tripled (4.1% in 1900 to 12.3% in 2002). By the year 2030, the older population will more than double again to about 70 million people or 20% of the total population. Similarly, the proportion of U.S. veterans over age 65 is projected to increase from 26% in 1990 to 46% in 2020 (Veterans Administration, 2001). A greater number of elderly people in our society are associated with an increased demand for rehabilitation services. Dillingham, Pezzin, and MacKenzie (2003) reported that aging is accompanied by an increased risk of diminished health status and a greater likelihood of requiring rehabilitation services. As noted in the Centers for Disease Control (CDC) Health, United States, 2001 (2002), the prevalence of both chronic conditions and activity limitations increases with age, with health-related limitation in mobility or self-care increasing fourfold between the ages 65 to 74, and 85 or older. In 1997, more than half of the older population (54.5%) reported having at least one

PAGE 10

2 disability, with more than a third (37.7%) reporting at least one severe disability (U.S. Department of Health and Human Services, 2002). There has also been an increase in the number of patients being admitted to postacute care (PAC) settings after discharge from acute care hospitals (Iwanenko, Fiedler, & Granger, 1999; Johnson, Kramer, Lin, Kowalsky, & Steiner, 2000). Iwanenko et al. note that between 1991 and 1997, the number of patients admitted annually to PAC settings rose from 12,468 to 49,844. Stineman (2001) noted that the most significant challenge to current medicine was likely the care of people with chronic, incurable diseases and injuries. The U.S. healthcare system provides a variety of routes to recovery from physical injuries, ailments, or impediments. PAC settings, also called subacute care or transitional care settings, are a type of short-term care program provided by many long-term care facilities and hospitals. Treatment in such settings may include rehabilitation services, specialized care for certain conditions (such as stroke and diabetes), and postsurgical care and other services associated with the transition between the hospital and home. Residents on these units often have been hospitalized recently and typically have complicated medical needs. The goal of subacute care is to discharge residents to their homes or to a lower level of care. Current PAC settings include comprehensive inpatient rehabilitation units attached to acute care or freestanding hospitals, skilled nursing facilities (SNFs), outpatient rehabilitation facilities, freestanding outpatient clinics, and home health care services. Each person with a potentially disabling impairment has a unique care trajectory that may include sequential admission to more than one PAC setting. An example of the possible variations is that a person diagnosed with stroke, after being discharged from an acute care hospital, is then admitted to an inpatient rehabilitation program before being released to home with outpatient services. Yet,

PAGE 11

3 another individual with the same diagnosis is discharged from the acute hospital setting directly into a skilled nursing facility. SNFs were established under the 1965 Medicare legislation and are certified by Medicare to provide 24-hour nursing care and rehabilitation services in addition to other medical services. SNF-based rehabilitation units have become a rapidly growing segment of the rehabilitation continuum over the past decade, as policy makers have searched for less costly delivery systems for rehabilitation. While inpatient treatment provides a full complement of professionals practicing in a hospital setting, it is one of the most costly of the rehabilitation services (Keith, Wilson, & Gutierrez, 1995). SNFs, on the other hand, have lower costs, mainly because construction, regulatory and staff requirements are less stringent than they are in hospitals (Keith et al.). As a result, SNFbased rehabilitation has been used increasingly as a substitute for traditional inpatient care (Keith et al.). Additionally, many older patients do not meet the Medicare requirements to receive inpatient rehabilitation services, which includes being able to tolerate three hours of therapy on a daily basis, fitting within one of the required diagnostic mixes, as well as being able to make significant progress over a fairly short length of time (Keith et al.). In such cases, SNF settings have become an appropriate alternative. In rehabilitation programs, a patient’s functional enhancement is the primary goal (DeJong, 2001). The ability to evaluate a patient’s status is central to rehabilitation efforts, for example, to track a patient’s recovery, to determine the effectiveness of treatment, or to estimate resource use (Penta, 2004). It is well documented that one’s ability to function physically is an important component of a patient’s self-report of health status (Haley, McHorney, & Ware, 1994; Hart, 2000; McHorney, Haley, & Ware, 1997; Raczek et al., 1998; Segal, Heinemann, Schall, & Wright, 1997). Since the 1950s,

PAGE 12

4 1FIMTM is a trademark of the Uniform Data System for Medical Rehabilitation, a division of U. B. Foundation Activities, Inc. functional status measures have served as a means to monitor outcomes within medical centers. Yet to this day, there is no clear and commonly accepted definition of function or a clear delineation between instruments that assess functional outcomes and those that evaluate other health concepts. As a result, the ability to compare one instrument measuring functional status to another can be fraught with complications. Currently in the U.S., two distinct instruments are used to monitor functional outcomes in in-patient rehabilitation settings and SNFs. Traditional rehabilitation facilities have almost uniformly adopted the Functional Independence Measure (FIMTM)1as a means of monitoring patients’ functional ability. The FIM instrument provides a measure of disability and was put into operation beginning in 1989 (Granger, 1998). Today, it is one of the most widely used instruments that assess the quality of daily living activities in persons with disabilities (Granger, Hamilton, & Sherwin, 1986). The Veterans Health Administration (VHA) rehabilitation services include the FIM in its Functional Status and Outcomes Database (FSOD), which has been operational since 1997 (Veterans Health Administration [VHA], 2000). It is mandated for use by the VHA Directive 2000-016, “Medical Rehabilitation Outcomes for Stroke, Traumatic Brian Injury, and Lower Extremity Amputation Patients,” which requires every VHA medical center to assess functional status and enter this data into the FSOD in order to measure and track rehabilitation outcomes on all new stroke, lower extremity amputee, and traumatic brain injury (TBI) patients (VHA, 2000). Presently, the FSOD has not been linked with other data sources that would allow patients to be monitored as they progress across the continuum of care (e.g., from rehabilitation facilities to skilled nursing facilities or from rehabilitation facilities to home health care).

PAGE 13

5 While the FIM is the “gold standard” for measuring functional outcomes in rehabilitation settings, the Minimum Data Set (MDS) of the nursing home Resident Assessment Instrument (RAI, 1991), is used universally for monitoring rehabilitation outcomes in SNFs. The MDS was developed in response to a 1986 Institute of Medicine study of the quality of care in nursing homes that called for improvements in nursing home quality and more patient-centered care (Morris et al., 1990). The federal Omnibus Budget Reconciliation Act of 1987 (OBRA 87) mandated all U.S. nursing homes to implement the Resident Assessment Instrument, whose core is the Minimal Data Set (MDS) (Rantz et al., 1999). The MDS consists of 284 items designed to assess the cognitive, behavioral, functional, and medical status of nursing home residents (Hawes et al., 1995; Teresi & Homes, 1992). Nursing homes are a critical environment for tracking the health care status of elderly veterans. In fiscal year 2001, there were a total of 89,056 veterans treated in nursing homes with an average daily census of 33,670 (Catalogue of Federal Domestic Assistance, 2002). By 2003, it is projected that 111,953 patients will be treated in nursing homes with an average daily census of 35,132 (Catalogue of Federal Domestic Assistance). In 1995, there are at least 1.5 million nursing home residents who reside in facilities participating in the Medicare or Medicaid programs (Hawes et al., 1995). Within the VHA, the reduction of acute rehabilitation beds from 1,150 five years ago to 617 in 2003 further increases the likelihood that veterans could receive their post-acute rehabilitation care in nursing homes (C. Johnson, personal communication, September 29, 2002). A key to improving services for patients treated in PAC settings is to develop effective and efficient methods for tracking and evaluating functional status changes across rehabilitation and skilled nursing facilities. Through the use of a single instrument

PAGE 14

6 in these settings, a patient may progress from one to the other, while maintaining a functional assessment score that could easily be tracked and compared between settings. Such a tool would benefit patients, as it would facilitate an increased continuity of care between settings. It would also allow for the direct comparison of rehabilitation outcomes between settings, along with resource utilization and costs. Lathem and Haley (2003) note that a clear need exists for an instrument that can accurately assess patients’ functional ability as they move through the health care system. As stated in Buchanan, Andres, Haley, Paddock, and Zaslavsky (2003) “Providers, payers, and consumers would all benefit from comparable measures of functional status and rehabilitation outcomes across multiple care settings to facilitate equitable payment and to monitor the quality and efficiency of care delivery” (p. 45). To date, there has been only one published attempt by Williams, Lee, Fries, and Warren (1997) to link the FIM to the MDS. Yet, other studies have linked othe r measures of global functioning (e.g., Fisher, Eubanks, & Marier, 1997; Fisher, Harvey, & Kilgore, 1995; Fisher, Harvey, Taylor, Kilgore, & Kelly, 1995; Segal, Heinemann, Schall & Wright, 1997; Smith & Taylor, 2004; Tennant & Young, 1997). The dilemma of having multiple yet incompatible instruments measuring the same construct has been confronted and successfully overcome in the physical sciences. Take, for example, the manner in which we measure distance. Currently, in the United States, we have two competing systems of measuring length, namely the metric system and the standard system of measurement. Surprisingly, success has not come through attempts to convert entirely from one system to the other despite the obvious benefits in doing so. Instead, we continue to utilize simple strategies that allow us to convert a measure on one scale to its corresponding measure on the other. Similarly, we routinely convert readings between Celsius and Fahrenheit with a simple conversion table when

PAGE 15

7 measuring temperature. Thus, one could say that a precedent has been set for the manner in which we have successfully reconciled competing systems of quantifying what are essentially abstract concepts. An analogous attempt in health care would be to develop a system of converting scores between the physical functioning components of the FIM and the MDS so that a score on one instrument could be translated into its equivalent score on the other. The hypothesis is that the items included in these two instruments are subsets of items along an ADL/motor construct. Table 1-1 presents a comparison of the ADL/motor items of the FIM and the MDS. Table 1-1. Comparison of FIM to MDS ADL/motor items FIM ItemsMDS Items Eating Grooming Bathing Dressing-Upper Body Dressing-Lower Body Toileting Bladder Management Bowel Management Bed, Chair, Wheelchair (Transfer) Toilet (Transfer) Tub, Shower (Transfer) Walk/wheel Chair Stairs Eating Bed Mobility Personal Hygiene Bathing Dressing Toilet Use Bladder Continence* Bowel Continence* Transfer Walk in Room Walk in Corridor Locomotion on Unit Locomotion off Unit FIM Rating Scale 7 Complete Independence (Timely, Safely) 6 Modified Independence (Device) 5 Supervision 4 Minimal Assist (Subject = 75%+) 3 Moderate Assist (Subject = 50%+) 2 Maximal Assist (Subject = 25%+) 1 Total Assist (Subject = 0%+) MDS-Rating Scale (exceptions noted below) 0 Independent 1 Supervision 2 Limited Assistance 3 Extensive Assistance 4 Total Dependence 8 Activity did not occur during the entire 7-day period *Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually Continent, 2-Occasionally Continent, 3Frequently Incontinent, 4-Incontinent

PAGE 16

8 Similarities are immediately evident between the two instruments. Both instruments include items for eating, dressing, toileting, bowel and bladder functioning, as well as the ability to transfer and to walk. Differences include an item for climbing stairs on the FIM that is not part of the MDS. A mathematical framework is needed in order to convert a score on one instrument to its corresponding score on the other. Item Response Theory (IRT) measurement models have been rapidly gaining popularity over classical test theory (CTT) for analyzing instruments used in healthcare and rehabilitation (Douglas, 1999; Hambleton, 2000; Hays, Morales, & Reise, 2000; Linacre, Heineman, Wright, Granger, & Hamilton, 1994; McHorney, 1997; Prieto, Alonso, Lamar ca, & Wright, 1998; Silverstein, Fisher, Kilgore, Harley, & Harvey, 1992; Velozo, Magalhaes, Pan, & Leiter, 1995). IRT is comprised of a set of generalized linear models and their associated statistical procedures that connect a subject’s response to test items to that subject’s location on the latent trait being tested (Mellenbergh, 1994). In order to create a link between scores on the FIM and MDS, it is hypothesized that Rasch analysis can be used to place the items from both instruments on the same linear continuum (Fisher, Harvey, Taylor, et al., 1995). A precedent for linking instruments in such a manner has been established in the fields of education and psychological measurement. It is the purpose of this dissertation to evaluate the accuracy of a FIM-MDS conversion table that has been created through the use of Rasch analysis. The technical procedures of test equating used in the educational applications of Rasch’s probabilistic models were transferred to the cocalibration of these two functional assessment instruments. An accurate conversion table between the FIM and the MDS would allow

PAGE 17

9 for studies to take place that are necessary to examine the outcomes for persons receiving rehabilitation services in different care settings. This would, in effect, eliminate the need to institute massive changes in measurement procedures across rehabilitation settings. Functional status information could then be used to track changes and follow a patient’s progress across PAC settings and not only monitor but compare quality of care and rehabilitation outcomes in different settings (National Committee on Vital and Health Statistics, 2003).

PAGE 18

10 CHAPTER 2 REVIEW OF THE LITERATURE Measuring Outcomes in Rehabilitation The effectiveness of rehabilitation services is gauged by the restoration and maximization of patient functioning. Functional status in this context has been defined as reflecting, “an individual’s ability to carry out activities of daily living (ADLs) and to participate in various life situations and in society” (Jette, Haley, & Ni, 2003, p. 1). Therefore, the assessment of functional status is a method for describing abilities and activities in order to measure an individual’s use of the variety of skills included in performing the tasks necessary to daily living, vocational pursuits, social interactions, leisure activities, and other required behaviors (Granger, 1998). ADL measures have been used to determine a patient’s level of disability, whether one qualifies for certain types of healthcare services, and to document outcomes of rehabilitation services. The focus of the earliest standardized assessments of function, developed over 50 years ago, were on the basic ADL’s, which consist of self-care activities, such as bathing, grooming, dressing, and walking. Two of the first functional status measures used in rehabilitation were the Katz and Barthel indexes whose items were comprised solely of basic ADL tasks (Cohen & Marino, 2000; Latham & Haley, 2003). Then, “with changing societal expectations, the advent of brain injury rehabilitation, and the independent living movements, medical outcome research began to explore means of documenting social and cognitive-based behaviors as part of rehabilitation outcomes”

PAGE 19

11 (Latham & Haley, 2003, p. 85). The result of this was the development of instruments, such as the Functional Independence Measure (FIM) and the Minimum Data Set (MDS). The FIM Instrument Used in Inpatient Rehabilitation In U.S. inpatient rehabilitation settings, the Uniform Data System for Medical Rehabilitation (UDSMR), is the most widely used clinical database for assessing rehabilitation outcomes (Fiedler & Granger, 1997; Granger & Hamilton, 1993). The FIM is the core functional status measure of the UDSMR and was developed to establish a uniform standard for the assessment of functional status during medical rehabilitation (Granger, Hamilton, Keith, Zielezny & Sherwin, 1986). The FIM incorporates concepts and items from previous functional assessment instruments, such as the Katz Index of ADL, the PULSES profile, the Kenny Self-Car e Evaluation, and the Barthel Index (Hall et al., 1993). The FIM system was developed by a national task force cosponsored by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation Task Force to Develop a National Uniform Data System for Medical Rehabilitation (UDSMR) to rate the severity of patient disability and the outcomes of medical rehabilitation (Hamilton et al., 1987). The original work of this task force was expanded by the Department of Rehabilitation Medicine at the State University of New York at Buffalo. Since 1987, it has been the mission of the UDSMR to measure medical rehabilitation outcomes across the continuum of care—both time and settings (Granger, 1999). The UDSMR maintains a national data repository for research purposes of three million case records from 1,400 facilities around the world (Granger, 1999). The FIM is administered in most inpatient rehabilitation facilities within three days of admission and prior to discharge (Granger, Hamilton & Sherwin, 1986). The scale accounts for a patient’s level of independence, amount of assistance needed, use of

PAGE 20

12 adaptive or assistive devices, and the percentage of a given task completed successfully. This instrument is comprised of 18 items with a seven-level response scale of independent performance in self-care, sphincter control, mobility, locomotion, communication and social cognition (Granger & Hamilton, 1993). As such, it contains items representing three constructs: ADL, mobility, and continence (Granger, Hamilton, & Sherwin). Thirteen of the 18 FIM items (related to functional ability) can be further divided into three more specific subscores rating activities of daily living (ADLs), sphincter management, and mobility (Stineman, Jette, Fiedler & Granger, 1997). A FIM score on each item ranges from 1 (Total Assistance) to 7 (Complete Independence). Thus, a total FIM score ranges from 18 to 126. Hamilton (1989) noted that, “Because each item is scaled on the basis of functional independence, it is expected that the total score (with each item appropriately weighted) will correlate with the burden of care for the disabled person” (p. 862). Psychometric studies of the FIM instrument support its use for research purposes. One of the strengths of the FIM is that it has undergone so many methodological evaluations, in which it has demonstrated good psychometrics (Dodds, Martin, Stolov & Deyo, 1993). Dodds et al. noted a high internal consistency (Cronbach’s coefficient of .93 at admission and .95 at discharge) that demonstrated that the FIM is a reliable instrument. Extensive investigations of the FIM’s reliability and validity have provided evidence of its interrater and test-retest reliability (Ottenbacher, Hsu, Granger & Fiedler, 1996), internal consistency (Stineman et al., 1996; Stineman et al., 1997), concurrent validity (Granger, Cotter, Hamilton & Fiedler, 1993; Oczkowski & Barreco, 1993) and predictive validity (Heinemann, Linacre, Wright, Hamilton & Granger, 1994; Oczkowski & Barreco). Ottenbacher et al. (1996) performed a meta analysis of 11 studies that revealed the median inter-rater reliability for the total FIM was .95 and the test-retest and

PAGE 21

13 equivalence reliability was .95 and .92, respectively. Stineman et al. (1996) showed in a study of 93,829 rehabilitation inpatients that a factor analysis of the FIM instrument supported the identification of ADL/motor and cognitive/communication dimensions across 20 impairment categories. Additionally, the stability of the FIM motor scores was demonstrated in several studies (Linacre et al., 1994; Wright, Linacre & Heineman, 1993). Linacre (1998) confirmed the multidimensional structure of the FIM by means of Rasch analysis followed by factor analysis of standardized residuals and demonstrated the divergence of the five cognitively-oriented items from the 13 motor-oriented items. The MDS Instrument in Skilled Nursing Facilities After receiving inpatient rehabilitation services, patients may be transferred into a SNF, where the FIM instrument is no longer used as a measure of functional ability. Instead, the Minimum Data Set (MDS) is utilized in this role. Since 1990, HCFA (Health Care Finance Administration, now known as the Center for Medicare and Medicaid Services [CMS]) has required that the MDS be administered to all SNF residents. The MDS is a comprehensive assessment of over 85 health status elements organized into measurement categories. An MDS score on each item ranges from 0 (Independent) to 4 (Total Dependence). The definitions of these ratings are provided on the rating form. Full MDS assessments are required at least annually and when there is a significant change in a patient’s condition (such as a stroke, resulting in hemiparesis). A minimum of approximately 1.2 million residents are assessed annually with the full MDS, with 3.6 million briefer updates and an unknown additional number of complete MDS reassessments because of major changes in residents before an annual MDS is due (Casten, Lawton, Parmelee & Cleban, 1998). While the MDS shares ADL content with the FIM, it is targeted exclusively to SNFs, making comparisons across rehabilitation settings difficult (Latham & Haley, 2003).

PAGE 22

14 Although not as extensively studied as the FIM, research on the MDS suggests that it too has adequate reliability and validity for use in research studies. Early research studies by Hawes et al. (1995) showed that MDS items had interclass correlations of .7 or higher in key areas of functional status, such as ADL and cognition. Sixty-three percent of the items achieved reliability coefficients of .6 or higher and 89% achieved .4 or higher (Hawes et al., 1995). Morris et al. (1994) showed that the seven cognitive items (short-term memory, long-term memory, decision making, and four categories of memory recall) showed an internal reliability of .83-.88. While it has been suggested that the above psychometrics findings on the MDS are “inflated” due to being administered by research staff (Stineman & Maislin, 2000) Gruber-Baldini, Zimmerman, Mortimore & Magaziner (2000) showed that when the MDS is administered by clinical staff, cognitive items (MDS-COGS) and cognitive performance scale items (MDS-CPS) correlate moderately well with the Mini Mental Status Exam (MMSE) (R=-0.65 and -0.68, respectively) and the Psychogeriatric Dependency Rating Scale (PGRDS) Orientation scale (R=0.63 and 0.66, respectively). The internal reliability of the MDS-C OGS was .85 and the MDS-CPS (without the comatose items) was 0.80. Confirmatory factor analysis studies, derived from clinical and administrative databases of the MDS, confirmed all MDS domain clusters except social quality (Casten et al., 1998). Administrative Solution: Adopting a Single Instrument for Measuring Outcomes in Post Acute Care The existence of “competing” instruments in PAC settings (i.e., the FIM in rehabilitation facilities and the MDS in SNFs) has led to considerable debate over which of these instruments should be the basis for a post-acute care Prospective Payment

PAGE 23

15 System (PPS) in the private sector (Cente rs for Medicare and Medicaid Services [CMS], 2001; DeJong, 2001). The demands for a PPS for rehabilitation facilities prompted the HCFA in 1998 to develop the Minimum Data Set for Post-Acute Care (MDS-PAC), an instrument similar in design to the MDS, intended to address the needs of subacute facilities, rehabilitation facilities, and long-term care hospital patients (CMS, 2002). One of the intended purposes of the MDS-PAC was to provide CMS with a tool by which it could monitor the quality of health care services across post-acute settings (Granger, Hamilton, Keith, et al., 1986). Originally, items similar to those found in the MDS for rehabilitation were intended to be used in this new instrument. Yet, upon its completion, it consisted of more than 400 items, and it lacked one-to-one correspondence with the FIM (DeJong, 2001, p. 567). Martin Grabois, MD, President of the American Congress of Rehabilitation Medicine (ACRM) stipulated in a letter in 2001 that the ACRM strongly opposed implementation of the MDS-PAC and felt it was premature to use it as a quality-monitoring tool in rehabilitation (DeJong, 2001). This challenge resulted in a change in CMS’s decision from using the MDS-PAC to using the Inpatient Rehabilitation Facility Patient Assessment Instrument (IRF-PAI) as the basis for the post-acute PPS. The IRF-PAI includes the FIM ADL/motor and cognition/communication items. This IRF-PAI was mandated for implementation in rehabilitation facilities on January 1, 2002 (CMS, 2001). While the final decision to use the IRF-PAI has tremendous economical benefits (e.g., rehabilitation facilities not having to convert from the FIM to an MDS-based instrument), it does not facilitate monitoring functional outcomes as patients cross from rehabilitation to skilled nursing facilities. That is, a FIM–based outcome instrument will continue to be used in rehabilitation facilities, while a MDS-based instrument will be

PAGE 24

16 used in nursing home facilities. A drawback to having different functional outcome measures across these two health-care settings is “test dependency.” Data gathered on one instrument cannot be compared to similar data gathered on an alternate instrument. Wilkerson and Johnston (1997) noted that the absence of a single, standardized instrument used in both rehabilitation facilities and SNFs was a fundamental barrier for U.S. policymakers. Without such an instrument, these policymakers were hampered in their ability to fulfill the emerging health policy mandate to monitor the quality and outcomes of services for patients in PAC settings. As patients are transferred across these settings, researchers, managers, and clinicians were unable to easily and accurately track functional status changes. The ability to measure a patient’s functional status is not only important within a particular rehabilitation setting, but also along the continuum of care. It is difficult to accurately and precisely compare and contrast the magnitude of gains made through rehabilitation efforts in various therapeutic PAC settings when such improvements are evaluated on incongruous instruments based on divergent scales of measurement. As Granger (1998) contends, there has been unf ortunately too little effort in addressing assessment and management of persons with disablement across the continuum of care. As a result, the outcomes of therapeutic interventions cannot easily and accurately be compared to determine their relative effectiveness. This measurement of rehabilitation outcomes is an integral component of a major controversy in rehabilitation over the best PAC setting in which to provide care. Optimizing a patient’s recovery while at the same time minimizing costs are the two variables most often considered in this debate. Inpatient rehabilitation units typically provide the highest level of services, although often at the greatest costs. On the other

PAGE 25

17 hand, skilled nursing facilities generally do not provide the same extent and intensity of therapeutic interventions, although at a reduced cost. The ability to directly compare rehabilitation outcomes between, for example, an inpatient rehabilitation setting and a skilled nursing facility would enhance our understanding of where patients benefit most and from what interventions. In January of 2000, Sally Kaplan, Ph.D. of the Medicare Payment Advisory Commission (MedPac) told the Subcommittee on Populations, We strongly believe that it would be extremely useful, to say the least, to have standardization of functional status measures at least in post-acute care so that if similar patients are treated in different post-acute settings, or if patients are treated in successive post-acute care settings, that we would have a means of measuring them. . It would expand the utility of regularly collected information. (National Committee on Vital and Health Statistics, 2003). Potential Measurement Solution: Linking Instruments There is more than one way to coordinate rehabilitation services along the continuum of care. For example, one might institute the use of the same instrument to measure functional ability in all rehabilitation settings so that the results would be directly comparable. In this manner, the patient’s progress can be easily tracked across settings and services. Unfortunately, attempts to use the FIM across settings have met with limited success as there are sizeable obstacles to implementing such a plan. Already, there has been a huge monetary investment in our current attempts to measure rehabilitation outcomes. To figuratively, “throw out” these current instruments and replace them with one new and improved universal measure would likely be costprohibitive (Cohen & Marino, 2000). There would also likely be significant resistance to implementing such a plan. In our capitalistic economy, private fortunes are often tied up in maintaining the status quo and significant resistance would be likely. Furthermore,

PAGE 26

18 significant costs would be incurred in training staff in a new system, as well as in the implementation of a new database. The nature of the predicament now facing rehabilitation services of having multiple, yet incompatible instruments, attempting to quantify a single construct is not a new one. It has been successfully overcome in others areas, for example, within the basic sciences. Major scientific advances have been possible in part because the instruments used to measure a construct were standardized and, in some cases, linked to other similar instruments. An example of this is the historical attempts to measure the construct of temperature. What began as a human sensation of “hot” and “cold” evolved into the field of thermometrics, the measurement of temperature (Bond & Fox, 2001). In A.D. 180, Galen mixed equal quantities of ice and boiling water to establish a “neutral” point for a seven-point scale having three levels of warmth and three levels of coldness (Bond & Fox). Then, in the 17th century, Santorio of Padua used a tube of air inverted in a container of water so that the water level rose and fell with temperature changes. He calibrated the scale by marking the water levels at the temperature of flame and ice (Bond & Fox). Our current mode of measuring temperature uses mercury in a closed tube. Even with a single mechanism by which we measure temperature, we use two competing scales, namely, Celsius and Fahrenheit. Both of these scales merely set two known temperature points (ice and boiling points) and simply divided the scale into equal units. These two independently developed scales have been linked, so a “score” on one scale can easily be converted to a score on the other scale. A problem in developing measures in the human sciences is that, “we are clearly dealing with abstractions (e.g., perceived social support, cognitive ability, and selfesteem), so we need to construct measures of abstractions, using equal units, so that we

PAGE 27

19 can make inferences about constructs rather than remain at the level of describing our data” (Bond & Fox, 2001, p. 4). Yet, it appears hopeless to construct models of human behavior since behavior seems to be so unpredictable. What we can do is estimate the probability of a behavior taking place. We need to build a model, “more like models in modern physics—models which are indeterministic, where chance plays a decisive role” (Rasch, 1960, p.11). What is then being described is the possibility of a behavior occurring, that is the relative frequency of an event occurring. We can say that the probability of something occurring is equal, for example, to 50 percent. An alternative solution to this problem would be to use the same scale of measurement as the basis of each instrument. Linking can be referred to as the development, “of a common metric in IRT by transforming a set of item parameter estimates from one metric onto another, base metric” (Hart & Wright, 2002, p. 2). McHorney, 1997 pointed out, “The development of a shared language that goes beyond specific items to location on an ability scale would provide users tremendous flexibility in building and maintaining an outcomes capacity within and across different databases,” (p. 749). The use of a shared language across rehabilitation settings would allow all services along the continuum of care to be interrelated and coordinated. In this manner, the outcomes, as well as the efficient utilization of resources could be maximized. Doran and Holland (2000) write, “The comparability of measurement made in differing circumstances by different met hods and investigators is a fundamental precondition for all of science” (p. 281). The development of the same scale of measurement may be achieved through the utilization of Rasch analysis, a one-parameter Item Response Theory (IRT). Georg Rasch, a Danish mathematician who examined psychological measurement problems in the 1950s and 1960s, surmised that the

PAGE 28

20 relationship between a person’s ability and an item’s difficulty can be modeled as a probabilistic function. As a person’s ability level increases, the probability of passing an item also increases (Fox & Jones, 1998). The Rasch model specifies exactly how to convert observed counts into linear (and ratio) measures (Wright & Linacre, 1989). IRT, then, is both a theoretical framework and a collection of quantitative techniques used to construct tests, scale responses, and equate scores. It consists of models, each designed to describe a functional relationship between an examinee’s ability and the characteristics (or parameters) of the items on a test (McHorney & Cohen, 2000). Performing a Rasch analysis on the FIM would address a problem that exists with the interpretation of the FIM scores. It has been noted that a change of 10 raw score points at the extremes of the FIM range is equivalent to four times as much change on the linear scale as a change of 10 raw score points at the center of the FIM range (Linacre et al., 1994). Thus, the improvement made by a person with moderate deficits, placing the patient in the center of the scale, will appear to be much greater than a person who is closer to the end of the scale even when the actually improvement might otherwise be seen as equivalent. Linacre et al. (1994) cite this as one example of why a conversion from raw scores to linear measures is so essential to quantifying changes in a patient’s status more appropriately. Performing a Ra sch analysis on the FIM would be one way to accomplish this. The Use of Linking Techniques Educators in the United States have been involved in the process of equating and linking instruments through a variety of statistical procedures for more than 40 years. Kolen (2001) notes that the first pages of the first issue of the Journal of Educational Measurement published in 1964 were on the subject of linking.

PAGE 29

21 Linking is a scaling method used to achieve comparability of scores, to the extent possible, between tests of different framewor ks and test specifications (Muraki, Hombo & Lee, 2000). Linking is distinguished fr om test equating, which involves making statistical adjustments to scores from alternative forms of an instrument to account for small differences in the difficulty of the test items on each form (Kolen, 2001). The term, test equating, is traditionally used to refer to the case of linking when two or more forms of a test have been constructed according to the exact specifications, such as equal difficulty, reliability, and validity and constructed for the same purpose (Muraki et al., 2000). The most basic of the equating methods is linear equating, which assumes that the two tests to be equated differ only in means and standard deviations. In IRT, linking is referred to as developing a common metric by transforming a set of item parameter estimates from one metric to another, base metric (Kim & Cohen, 2002). The process of linking consists of an anchoring design and a transformation method. The anchoring design ensures that there will be a basis for comparison between the item calibrations on the two instruments (Vale, 1986). The linking transformation refers to the equation used to put the item parameters on a common scale of measurement. “These processes rest on two assumptions: (a) the different instruments being linked measure the same underlying construct; and (b) the linking sample represents the population for which the test is intended” (McHorney, 2002, p. 386). The cocalibration of instruments purported to measure a common construct is simply an extension of test equating, item banking, and partial credit principles that have been in use in education for decades (Choppin, 1968; Fisher, Harvey, Taylor, et al., 1995; Wright, 1984). Routine applications of Rasch measurement models in the development and equating of instruments are performed by companies such as the Psychological

PAGE 30

22 Corporation, school districts in Portland, P hoenix, Chicago, and New York, and medical school admissions and certification boards, including the National Board of Medical Examiners, the American Society of Clin ical Pathologists, the American Dental Association, the American Council of State Boar ds of Nursing (Fisher, Harvey, Taylor, et al., 1995). Linking of Measures in Healthcare “Linking studies are in their infancy in health status assessment and functional health assessment in rehabilitation” (McHorney, 2002, p. 389. It has only been in the last 14 years that linking has been used in health status and rehabilitation assessment (McHorney). In this setting, the use of linking techniques has been discussed (Fisher, Harvey & Kilgore, 1995; Fisher, Harvey, Ta ylor et al., 1995; McHorney;) and linking studies have been conducted (Chang & Cella 1997; Fisher, 1997; Fisher, Harvey, Taylor et al.; McHorney & Cohen, 2000). One of the first published applications of IRT to functional health assessments tested the unidimensionality and reproducibility of the 10item Physical Functioning Scale (Ware, 2003). Examples of using IRT, specifically Rasch analysis, to link health care instruments have appeared in the literature over the last six years. These include the linking the FIM and SF-36 (Segal et al., 1997), the FIM and the Barthel Index (Tennant & Young, 1997), the FIM and the Patient Evaluation Conference System (PECS), and the SF-36 a nd the LSU HIS Physical Functioning scale (Fisher, Eubanks & Marier, 1997). Additiona lly, Badia, Prieto, Roset, Diez-Perez, & Herdman (2002) attempted to develop a short Osteoporosis-Specific Quality of Life Questionnaire based on the assemblage (equating) of the items of two existing questionnaires through the Rasch mathematical model.

PAGE 31

23 McHorney (2002) states, “Measurement specialists who serve rehabilitation medicine and other specialties are at the cusp of a paradigm shift away from sizable reliance on classical test methods to broader use of IRT methods” (p. 390). The use of Rasch measurement is becoming the preferred method in the development of functional assessments among rehabilitation professionals for constructing tests (Ottenbacher et al., 1996). Hays et al. (2000), McHorney (2002) Ware (2003), and Cella and Chang (2000), along with many others in the social and medical sciences in the last 30 years, have described IRT as a more promising fram ework for designing tests (Hambleton, 2000). Features of IRT, such as sample independence and test independence, account for this growing popularity over classical test theo ry (CTT) (Douglas, 1999; Linacre et al., 1994; McHorney, 1997; Prieto et al., 1998; Hamblet on, 2000; Silverstein et al., 1992; Velozo et al., 1995). The measurement units of IRT have interval properties versus ordinal raw scores used in CTT. Scores that have interval properties can be analyzed appropriately using parametric statistics, while such analyses may be inappropriate on ordinal data. Additionally, logit measures may remove measurement bias at the extreme ends of the measured construct, while extreme raw scores are biased by nature and may underestimate the magnitude of a difference or change score at the extremes (Cella & Chang, 2000). Another drawback of CTT is that tests developed in this manner are sample dependent. This means that items may look difficult when they are administered to examinees at the low end of the score continuum; and alternately, the same items look easy to those examinees at the high end of the score continuum. Thus, the item statistics are dependent upon the ability level of the subject sample and have little value when measuring subjects of a different ability level. Similarly, the problem of test dependency can be defined as one where the person statistics are dependent upon the difficulty of the

PAGE 32

24 test. If one changes the difficulty of the items in the tests, the two scores are no longer comparable. There is some indication that IRT estimates of health outcomes are more responsive to changes in health status over time. McHorney et al. (1997) found that the sensitivity of the SF-36 physical functioning scale to differences in disease severity was greater for a Rasch model-based scoring than it was for simple summated scoring. Fisher (1997) states, as it becomes increasingly clear that the accountability of educators, psychologists, health care providers, and other professionals cannot remain tied to scale-dependent indicators of unknown or low statistical sufficiency, the practicality, scientific rigor, and mathematical beauty of scale-free measurement will become more widely appreciated. (p. 93) Hays et al. (2000) predict IRT methods will be used in health outcome measurement on a rapidly increasing basis in the 21st century. Two mathematical models that are appropriate for linking functional outcome measures are (a) the one-parameter IRT model (the Rasch model), which solves for person ability through the single parameter and item difficulty, and (b) the two-parameter model, which solves for person ability through two parameters, item difficulty, and item discrimination. There is fervent debate over which model should be employed for psychometric analysis and linking instruments. The debate ranges from whether a scientific model should be made to fit the data (two-parameter model) or the data to fit the model (one-parameter model) (Wright, 1997). There is also the issue of whether item discrimination should be held constant across items (Rasch model) or allowed to vary between items (two-parameter model) (McHorney, 2002). While several studies indicate item discrimination is not constant across functional status items (McHorney, 2002; McHorney & Cohen, 2000; Spector &

PAGE 33

25 Fleishman, 1998), for pragmatic reasons, namely the availability of relatively small sample sizes of patients with linked FIM and MDS data (n= 450), we are choosing the Rasch model because of its simplicity and robustness under conditions of heterogeneous item discrimination and small samples (De Gruijter, 1986; Kolen & Brenan, 1995). The Rasch model has been shown to produce stable linking with sample sizes of 300-400 (Kolen & Brenan, 1995; Skaggs & Lissitz, 1986). Rasch analysis can be used to link healthcare inventories that measure the same construct. By linking inventories in this manner, one can improve the usefulness of both measures through •Refining the rating scale. •Identifying the items that form a unidimensional construct. •Verifying the expected difficulty hierarchy of the items. •Providing for a means of converting scores between the two measures •Matching the ADL measures to specific descriptions provided by the scale. The Rasch theory stipulates that a respondent’s probability of answering an item correctly is dependent only on two factors: the respondent’s ability and the characteristics of the item (Hambleton, Swaminathan & Rogers, 1991). Rasch analysis has the ability to uniquely link a person’s ability to an item’s difficulty level (Velozo & Peterson, 2001). Thus, a score on an instrument can be directly linked to the descriptive content of the instrument (Velozo & Peterson, 2001). The examiner is able to describe precisely a person’s level of ability based on the score they receive. In many other cases, a score on a test is uninterruptible in terms of a meaningful description of the level of ability it represents. You may still be able to say with a reasonable level of confidence that someone has more or less ability than someone else, but you still do not have a clear description of the precise level of ability that person possesses. Furthermore, Rasch analysis allows for the ranking of items so that all items on a scale can be put on a continuum from least challenging to most challenging.

PAGE 34

26 There has been only one published study attempting to link the FIM to the MDS. Williams et al. (1997) compared scores on FIM and rescaled MDS ADL and cognitive items [referred to as the “Pseudo-FIM(E)”] on 173 rehabilitation patients admitted to six nursing homes. The matching and rescaling of the MDS was accomplished through an expert panel, with the panel judging that 8 out of 13 FIM items had a corresponding MDS item. Intraclass correlation between the FIM and rescaled MDS was .81, although the mean calibration of 6 of the 8 FIM items differed statistically from the rescaled MDS items. While this initial attempt at linking the FIM and the MDS was encouraging, the methodology and statistical approach of the study had considerable limitations. For example, expert-panel rescaling of the MDS can be challenged due to the lack of adequate empirical support (i.e., a different panel of experts could develop a different FIM-MDS matching and rescaling; Velozo, Kielhofner & Lai, 1999). Fisher, Harvey, Taylor, et al. (1995) were the first to use common-sample equating to link two global measures of functional ability, the FIM and PECS (Patient Evaluation and Conference System). Using the methodology described above, they showed that the 13 FIM and 22 PECS ADL/motor items could be scaled together in a 35item instrument. The authors found that separate FIM and PECS measures for 54 rehabilitation patients correlated .91 with each other and correlated .94 with the cocalibrated values produced by Rasch analysis. Furthermore, these authors demonstrated that either instrument’s ratings were easily and quickly converted into the other via a table that used a common unit of measurement, which they referred to as the “rehabit.” This common unit of measurement allows for the translation of scores from one instrument to another. Since the results of Rasch analysis are sample-free, these tables/algorithms can be used for all future and past instrument-to-instrument score conversions.

PAGE 35

27 More recently, Fisher et al. (1997) rep licated their previous study using commonsample equating to link two self-report instruments: the 10 physical function items of the Medical Outcome Scale (MOS) SF-36 (the PF-10) and the Louisiana State University Health Status Instrument. Difficulty estimates for a subset of similar items from the two instruments correlated at .95, again indicating that the items from the two scales were working together to measure the same construct. McHorney and Cohen (2000) applied a two-parameter IRT model to 206 physical functioning items (through 71 common items across samples) and in a similar study; they linked 39 physical functioning items (through 16 common items) from three modules of the Asset and Health Dynamics Among the Oldest Old (AHEAD) study. Both studies demonstrated successful linking of item banks through sets of common items, allowing placement of all items on a common metric. Then in 2003, Jette et al. conducted a one-parameter Rasch partial credit analysis for the entire item pool of the FIM, MDS, OASIS (Outcome and Assessment Information Set for Home Health Care), and the PF-10 (the physical functioning scale of the SF-36) items to develop an overall functional ability scale. These authors noted that the MDS instrument covered content from the mid portion of the functional ability continuum with less content coverage on the low and high ends while the FIM instrument covered a relatively small portion along the middle to upper end of this continuum. The above studies represent encouraging evidence that while physical function is presently measured with many different instruments, it need not be tied to any particular instrument. These studies support the idea that there can be a universal use of common units for the measurement of functional ability. As a result, the creation of a translation table between the FIM and the MDS should be possible for the measurement of functional ability in PAC settings. Such a table would help avoid unnecessary

PAGE 36

28 duplication of efforts when patients transf er from one PAC setting to another where different instruments are used to measure functional ability. Scores can be readily determined when they come in to a new setting on the new instrument. The creation of a universal measure of functional ability in rehabilitation would create the following conditions: •It allows one to accurately and precisely evaluate the effectiveness of treatment procedures. •It allows one to accurately and precisely evaluate the effectiveness of the program, thus increasing the program’s efficiency. •Measurements of progress are used to justify reimbursement for services. (Merbitz, Morris & Grip, 1989) In an unpublished study, Velozo (2004) performed a Rasch analysis to develop a conversion table that linked scores on the FIM to analogous scores on the MDS. The Rasch partial credit model Winsteps program (Linacre & Wright, 2001) was used to calibrate item difficulties based on the linked FIM and MDS scores. The development of this conversion table was based on a sample of 254 VHA patients who had completed both the FIM and the MDS within 7 days of one another. The decision to restrict the number of days between the completion of the FIM and MDS was based on the need to minimize the impact any possible change in the patient's condition would have on the scores. The use of 7 days as the criteria was based on the research lab's clinical judgment. The physical ability items from both instruments were placed on the same linear continuum and from this, a FIM-MDS conversion table was produced. The purpose of this current study is to test the accuracy of the conversion table developed by Velozo (2004). A new sample of records from 2,297 patients with linked FIM and MDS scores collected between July 2003 to June 2004 was obtained from the VA’s Austin

PAGE 37

29 Automation Center databases. Using the conversion table developed by Velozo, the new FIM scores are converted to MDS scores, and new MDS scores to FIM scores. The converted-MDS (MDSc) scores are then statistically compared to the actual MDS scores and the converted-FIM (FIMc) scores to the actual FIM scores. From these comparisons, a determination of the accuracy of the FIM-MDS conversion table will be made.

PAGE 38

30 CHAPTER 3 METHODOLOGY Introduction The purpose of this study was to test the accuracy of a FIM-MDS conversion table that was designed to transform a score on the physical component of the FIM to its corresponding score on the MDS and vice versa. The methods utilized to investigate the accuracy of the FIM-MDS conversion table are described in the following sections of this chapter: •Source of the Data •Sample •FIM and MDS Motoric Items •Procedures Involved in the Creation of the FIM-MDS Conversion Table •Statistically Testing the Accuracy of the FIM-MDS Conversion Table This study was approved by the University of Florida’s Institutional Review Board for the protection of human subjects, as well as the Veteran’s Administration’s Subcommittee on Clinical Investigations. This study also obtained a HIPAA Waiver of Authorization. Source of the Data The FSOD and MDS data reside in two separate databases at the VA’s Austin Automation Center (AAC) (Veterans Health Administration, 2004). Upon consultation with Dr. Hojlo, Chief of the VA Nursing Home Care, the most accurate VA-MDS data were available starting in June of 2002. Therefore the data extractions were based on data collected from June 2002 to May 2003. Data from both databases were downloaded and merged on the basis of social security numbers, using the statistical software, Microsoft Access.

PAGE 39

31 A single-group design was used, as both i nventories were completed by the same group of subjects. Because the same population completed both inventories, population invariance and symmetry exist. This eliminates concerns that might otherwise arise over differences caused by variability in the composition of the sample population. Inclusion Criteria of Subjects Subjects included in this study were those who were part of the VA’s FSOD and MDS databases, who had FIM and MDS scores completed no more than seven days apart between June 2002 and May 2003 and who had data on all items included in the development of the FIM-MDS conversion table. The decision to restrict the amount of time that elapsed between the administration of the FIM and MDS was based on the need to minimize the impact possible change(s) in a patient’s condition might have on the resulting FIM and MDS scores. Inclusion of Women and Minorities Inclusion criteria were for males and females and all ethnic groups, as they occurred in the VA’s FSOD and MDS databases. Sample The records of 57,237 patients who underwent a FIM evaluation and 69,954 subjects who had MDS scores in a VA post-acute care setting between June 2002 and May 2003 were obtained from two separate databases housed at the VA’s AAC. The linking of these records, based on patient social security number and no more than 7 days between FIM and MDS test dates, resulted in 2,521 matches. This data was then cleaned to exclude duplicate records of the same subject with more than one match of test dates, as well as those records that included missing or invalid scores (i.e., ratings other than acceptable) for items on either the FIM or the MDS. This la st exclusion was made to ensure that total

PAGE 40

32 scores were compared to total scores, which is the basis of the FIM-MDS conversion table. The result was 2,297 unique subjects with linked FIM and MDS scores. The age of subjects ranged from 19-89+ years. Of those subjects between the ages of 19 and 89, 50.7% were under the age of 70, 31.2% were in their 70’s and 15.2% were in their 80’s. Only 1.5% of the sample was over the age of 89. The majority of the sample was Caucasian at 73% with 20% be ing African American and 5% Hispanic. Ninety-six percent of the sample was male and 44% were married. The days between the administration of the FIM and the MDS ranged from 0-7 days with a mean of 5 (1.9) days. Thirty-five percent (1,531) of the subjects had a diagnosis of stroke, 23% (525) had lower extremity orthopedic problems, and 12% (271) had lower extremity amputations. The remainder of the sample consisted of subjects with a variety of impairments. FIM and MDS Motor Items Table 3-1 is a list and comparison of the FIM and MDS motor items included in this analysis. There are nine pairs of items between the FIM and MDS that are considered to represent the same or nearly the same activity in this study. These pairs include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel management, bladder management, transferring, and walking (Table 3-1). Items included in only one instrument and not the other are bed mobility and stair use. While both the FIM and MDS instruments include an item for eating, the FIM requires a higher skill level in order to achieve the highest rating. This is because the FIM does not permit finger feeding, nor does it allow people eating through adaptive means to achieve the highest rating. There are also similar grooming and hygiene items on the two instruments. The term, “bathing” for both instruments connotes a full body bath, to

PAGE 41

33 Table 3-1. Comparison of FIMTM to MDS ADL/motor items FIMTM ItemsMDS Items Eating Grooming Bathing Dressing-upper body Dressing-lower body Toileting Bladder management Bowel management Bed, chair, wheelchair (transfer) Toilet (transfer) Tub, shower(transfer) Walk/wheelchair Stairs Eating Bed mobility Personal hygiene Bathing* Dressing Toilet use Bladder continence** Bowel continence** Transfer Walk in room Walk in corridor Locomotion on unit Locomotion off unit FIMTM rating scale 7 complete independence (timely, safely) 6 modified independence (device) 5 supervision 4 minimal assist (subject = 75%+) 3 moderate assist(subject = 50%+) 2 maximal assist (subject = 25%+) 1 total assist (subject = 0%+) MDS-Rating scale (exceptions noted below) 0 independent 1 supervision 2 limited assistance 3 extensive assistance 4 total dependence 8 activity did not occur during the entire 7-day period *Bathing in the MDS has a separate rating scale: 0-Independent, 1-Supervision, 2Physical help limited to transfer only, 3-Physical help in part of bathing activity, 4Total dependence, 8-Activity itself did not occur during entire 7 days. **Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually Continent, 2-Occasionally Continent, 3Frequently Incontinent, 4-Incontinent exclude one’s face and hands. Yet, the MDS also incorporates bathtub or shower transfers as part of bathing, while the FIM has a separate item for tub and shower transfers. The MDS has one item for dressing, while the FIM divides the task into dressing, upper body and dressing, lower body. In this study, the FIM item for dressing,

PAGE 42

34 lower body was matched with the MDS item for dressing since, based on the lab’s clinical judgment, dressing the lower body would be considered a more difficult task than dressing the upper body. This more difficult aspect of the ADL is incorporated in the one MDS item for dressing. The FIM item for toileting is matched with the MDS item for toilet use, as they have similar definitions, although the MDS includes toilet transfer in the task while the FIM has a separate item for transfer. The MDS item for transfers is then matched with the FIM item for transfers: bed, chair, and wheelchair. The bowel and bladder control items on the FIM are matched with the bowel and bladder continence items on the MDS. The FIM item for walk/wheelchair addresses one’s ability to walk or use a wheelchair safely on a level surface, while the MDS has four items for walking to include walk in room, walk in corridor, locomotion on unit and locomotion off unit. Although not included in the definition of the FIM item for walk/wheelchair, “150 feet is specified as the performance criterion in th e clarification of the rating scale” (Rogers, Gwinn & Holm, 2001, p. 6). Therefore, the FIM item for walk/wheelchair was matched to the MDS item for locomotion off unit. Furthermore, the FIM incorporates safety into the definition of many of its items, such as grooming, bathing, dressing, transfers, toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers, Gwinn & Holm, 2001). The FIM and MDS have different response scales on which the physical functioning items are scored. A clear distinction in the administration of these two assessments is that the items of the FIM are scored at the time of the assessment, while ratings on the MDS are based on observed performance over a 7-day period. Furthermore, the FIM items have seven response levels, while the MDS has a range of

PAGE 43

35 five. The FIM scoring criteria are shown in Table 3-2. The MDS scoring criteria are shown in Table 3-3. While the FIM motor items assess the percent of effort that is provided to the patient to accomplish a task, the MDS measures the number of times during a 7-day time period a patient required a certain level of assistance to perform a task. Table 3-2.FIM scoring criteria 7Complete independence All of the tasks described as making up the activity are typically performed safely, without modification, assistive devices, or aid and within a reasonable amount of time. 6Modified independence One or more of the following may be true: the activity requires an assistive device; the activity takes more than reasonable time, or there are safety (risk) considerations. 5 Supervision (Standby prompting) Supervision or Setup—Subject requires no more help than standby, cuing or coaxing, without physical contact, or, helper sets up needed items or applies orthoses. 4 Minimal assist (Minimal Prompting) Subject requires no more help than touching, and expends 75% or more of the effort. 3 Moderate Assistance (Moderate Prompting) Subject requires more help than touching, or expends half (50%) or more (up to 75%) of the effort. 2 Maximal assistance (Maximalprompting) Subject expends less than 50% of the effort, but at least 25%. 1Total AssistanceSubject expends less than 25% of the effort. (Evans, 2002)

PAGE 44

36 Table 3-3.MDS scoring criteria 0Independence No help or staff oversight – OR – Staff help/oversight provided only one or two times during the last seven days. 1SupervisionOversight, encouragement, or cueing provided three or more times during last 7 days –ORSupervision (3 or more times) plus physical assistance provided, but only one or two times during the last 7 days. 2Limited AssistanceResident highly involved in activity, received physical help in guided maneuvering of limbs or other nonweight-bearing assistance o three or ore occasions –ORlimited assistance (3 or more times), plus one weight-bearing support provided, but for only one or two times during the last 7 days. 3Extensive AssistanceWhile the resident performed part of activity over last seven days, help of following type(s) was performed three or more times: --Weight-bearing support provided three or ore times; --Full staff performance of activity (3 or more times) during part (but not all) of last 7 days. 4Total DependenceFull staff performance of the activity during the entire 7-day period. There is complete nonparticipation by the resident in all aspects of the ADL definition task. If staff performed the activity for the resident during the entire observation period, but the resident performed part of the activity himself/herself, it would not be coded as a “4” (Total Dependence). (CMS, 2003). Procedures Involved in the Creation of the FIM/MDS Conversion Table For the purposes of creating a FIM-MDS conversion table, Velozo (2004) obtained linked FIM and MDS scores from th e records of 254 subjects. The linking of instruments using IRT methodologies is generally dependent on item calibrations, which are the “difficulty” measures of the items. In essence, item calibrations serve as the markings on the conversion ruler. Rasch analysis of the FIM and MDS converts a patient’s responses on the instrument items to a measure of ADL/motoric function.

PAGE 45

37 Prior to performing the Rasch analysis, several steps were taken so that the FIM and MDS rating scales were conceptually consistent. One inconsistency between the FIM and MDS is that the MDS includes a rating for “activity did not occur.” Using a procedure adapted by Jette, Haley, and Ni (2003), this MDS rating was recoded as part of the “total dependence” rating. The rationale underlying this decision was that a likely explanation for an activity not occurring was that the activity could not be performed (Buchanan, Andres, Haley, Paddock & Zaslavsky, 2002; Jette et al., 2003). Other inconsistencies between the FIM and MDS are that the rating scales progress in different directions and have different ranges (i.e., from 1 to 7 for the FIM and from 4 to 0 for the MDS). In order to adjust for these differences, the MDS scale was rescored and rescaled to match the rating scale used in the FIM. For example, a “4” on the MDS, which represents total dependence, was recoded as a “1” to match Total Assistance on the FIM and a “0” on the MDS, which represents “Independence,” was recoded to a “7” to match the rating for “Complete Independence” on the FIM (Table 3-4). Table 3-4.FIM-MDS score conversion MDSScore ConversionFIM Independent0 to 7Complete Independence Supervision1 to 5Supervision Limited Assistance2 to 4Minimal Assistance Extensive Assistance3 to 2Maximal Assistance Total Dependence4 to 1Total Assistance Following the rescoring of the MDS rating scale, the next step in creating the FIM-MDS conversion table was to run a Rasch partial credit model analysis on the linked FIM and MDS scores, using Winsteps (Linacre & Wright, 2000). This combined analysis placed the FIM and MDS items and rating-scale calibrations on the same linear scale with the same local origin. That is, FIM item and rating-scale calibrations became

PAGE 46

38 “linked” to MDS item and rating-scale calibrations. This also provided cocalibrated item and rating-scale values, which were then used as anchors in separate FIM and MDS analyses. Both the anchored FIM and MDS analyses generated output tables that associated total FIM and MDS raw scores with a common logit scale. These analyses resulted in a conversion table whereby total FIM raw scores could be translated into total MDS raw scores and vice versa (Table 3-4). Converting a score from the FIM to the MDS and vice versa is as simple as locating a score under either the FIM or MDS column (Table 3-5), reading across the adjacent logit’s column to find the equivalent score on the alternate instrument. It is hypothesized that a score of 16 on the MDS, represents the same amount of functional independence as a score of 39 on the FIM. Similarly, a score of 58 on the FIM represents the same amount of functional independence as does an MDS score of 29 (Table 3-4). Nine of the FIM items have corresponding items on the MDS that address similar areas of physical functioning. These items include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel management, bladder management, transferring, and walking. After performing the Rasch analysis on the FIM and MDS total scores to link the two measures, the resulting correlation between the similar FIM and MDS items was .822 at p<.01. Similar person measures between scores on the FIM and the MDS correlated at .703. Statistically Testing the Accuracy of the FIM-MDS Conversion Table The purpose of this research study was to test the accuracy of the FIM-MDS conversion table. In order to accomplish this, a second, independent sample of VA patients with scores on both the FIM and MDS, completed within 7 days of one another, was obtained from the VA’s Austin Automation Center. Before submitting this dataset to

PAGE 47

39 Table 3-5.FIM-MDS conversion table FIMlogitMDSFIMlogitMDSFIMlogitMDS 13-3.805239-0.5336650.4118 14-2.775140-0.4935660.4617 15-2.265141-0.4634670.516 16-1.995042-0.4233680.5515 17-1.815043-0.3933690.615 18-1.674944-0.3532700.6414 19-1.564845-0.3232710.6913 20-1.464846-0.2931720.7512 21-1.384747-0.2530730.811 22-1.314748-0.2230740.8611 23-1.244649-0.1829750.9210 24-1.184550-0.1528760.989 25-1.124451-0.1228771.049 26-1.074452-0.0827781.118 27-1.024353-0.0527791.197 28-0.974254-0.0126801.276 29-0.9242550.0225811.356 30-0.8841560.0624821.455 31-0.8340570.1024831.555 32-0.7940580.1323841.674 33-0.7539590.1722851.83 34-0.7139600.2121861.963 35-0.6738610.2521872.152 36-0.6437620.2920882.402 37-0.6037630.3319892.761 38-0.5636640.3718903.380 914.520 the FIM-MDS conversion table, the same adjustments made to the MDS rating scale when creating the conversion table were applied to the current dataset. Using the Statistical Package for Social Sciences (SPSS), version 12.0 for Windows, the MDS rating for “activity did not occur” was recoded as “total dependence” following the scoring protocol of the FIM. Then, the MD S rating scale was recoded and rescaled also to match the rating scale of the FIM. The FIM-MDS conversion table was then used to convert the second sample of actual FIM scores, designated as FIMa, to converted MDS

PAGE 48

40 scores, designated MDSc. In the same manner, MDSa scores were converted to FIMc scores. Then, the actual scores on the FIM and MDS were compared to their corresponding converted scores to determine how similar the actual and converted scores were. The goal of equivalence testing is to demonstrate that two or more conditions are statistically the same (Stegner, Bostrom & Greenfield, 1996). In this type of testing, one reverses the role of the null and alternative hypotheses and then by testing a set of these reversed hypotheses, demonstrates equivalence with a predetermined significance level just as when demonstrating a difference between groups (Stegner et al.). The equivalence methodology is a simple application of bioequivalence principles proposed in Pharmacokinetics and Biopharmaceutics recently. The idea is to "prove" statistically that two drugs or formulations are equi valent (Berger & Hsu, 1996; U.S. FDA, 1997, 1999). This methodology was used in the current study in order to compare the statistical equivalence of FIMa and FIMc scores, as well as MDSa and MDSc scores. It was hypothesized that a minimum of 75% of the actual and converted scores should be within 5 points of one another in order for the conversion to be considered accurate. If less than that occurred, then the conversion table would not be considered accurate enough for use in a clinical setting. A difference of 5 points was employed, as Forrest, Schwam, and Cohen (2002) found that "each 5-point decrement in the FIM score correlated with the need for about one hour per day of help in mobility, basic activities of daily living, and instrumental activities of daily living" (p. 57). Yet, Granger et al. (1993) indicated that while no recommendations existed for what constituted a clinically significant change on the FIM, a 10-point improvement decreased by almost 50% the time required to care for

PAGE 49

41 a group of stroke patients in the community. For this study, a clinically significant difference in scores was set at the more rigorous 5-point increment. In order to apply a more precise analysis to the determination of the level of accuracy of the FIM-MDS conversion table, techniques as described in Dorans and Lawrence (1990) were utilized. In their study, Dorans and Lawrence tested the score equivalence of nearly identical editions of the Scholastic Aptitude Test (SAT). These two versions were comprised of the same test questions, but the order in which the test was administered differed. In one test situation, the 40-item verbal section of the SAT might precede the 45-item verbal section. In the other situation, the 45-item verbal section might proceed the 40-item verbal section. The test was administered to what was presumed to be statistically equivalent groups of examinees. Linear equating techniques were used to equate one version of the test to the other version and the accuracy testing of the resulting scores was accomplished by checking whether the identity transformation fell within a reasonable confidence interval placed around that equating function (Dorans & Lawrence). The difference between the equating function and the identity transformation was calculated and then that difference was divided by the standard error of the equating function. If the resulting rati on fell within a bandwidth of plus or minus two, then the equating function was considered to be within sampling error of the identity function. Kolen and Brennan (1995) indicated that in order for equating of two measures to be successful, the four moments of the distribution should be statistically equivalent. Therefore, the four moments of the distribution, including the means, standard deviations, skewness, and kurtosis were also calculated and compared.

PAGE 50

42 Correlations between the actual and converted scores on both the FIM and MDS were also determined, as was effect sizes. Effect sizes give a clear indication of the amount of difference that exists between two sc ores distributions. The standard of effect sizes as noted in Cohen (1988) was used to determine the percent of overlap that existed between the converted and actual scores (Table 3-6). Table 3-6.Effect size Cohen's StandardEffect SizePercentile StandingPercent of Overlap Large0.87952.6% 0.77657.0% 0.67361.8% Medium0.56969.0% 0.46672.6% 0.36278.7% Small0.25885.3% 0.15492.3% 0.050100.0% (Adapted from Cohen, 1988) Additionally, an analysis on the data was performed to obtain an understanding of how similar the raw scores on similar items between the FIM and MDS were. A factor that might negatively impact the accuracy of the conversion table would be the presence of large differences in the scores for individuals on similar FIM and MDS items. In order to systematically test for the level of disparate scores present in the current dataset, the ratings on the MDS were converted to their closest corresponding ratings on the FIM, as shown in Table 3-7. Table 3-7.Rating scale conversion MDSScore ConversionFIM Independent0 to 7Complete Independence Supervision1 to 5Supervision Limited Assistance2 to 4Minimal Assistance Extensive Assistance3 to 2Maximal Assistance Total Dependence4 to 1Total Assistance

PAGE 51

43 For example, the score for independence on the MDS, “0,” was converted to a “7” to correspond with the FIM score for complete independence. Similarly, an MDS score of “3,” indicating extensive assistance, was converted to a score of “2” for maximal assistance. This rating conversion was accomplished on the nine FIM and nine MDS items that most closely matched (Table 3-8). Table 3-8.Similar FIM and MDS items FIMTM itemsMDS items Eating Grooming Bathing Dressing-lower body Toileting Bladder management Bowel management Bed, chair, wheelchair (transfer) Walk/wheelchair Eating Personal hygiene Bathing Dressing Toilet use Bladder continence Bowel continence Transfer Locomotion off Unit It was hypothesized that a difference of four points on the rating scale represented a important difference in a person’s functional ability. For instance, a score of “1” on the FIM indicates ‘Total Assistance,’ where the patient exerts less than 25% of the effort required in performing a task. A distance of four points away from the rating of “1” would be a score of “5,” which represents ‘supervision,’ indicating that the subject requires no more help than stand by assistance to complete a task. The criteria of looking at only matched scores with a difference of four or more points is rigorous in that this situation only occurs between the following three score categories: 1-5; 1-7; and 2-7. Setting a criterion of selecting only those subj ects who had a difference of four or more points between scores on similar items demonstrates the examples of ratings that are clinically different on similar items.

PAGE 52

44 CHAPTER 4 RESULTS Statistical Analyses The FIM-MDS conversion table was used to transform FIMa scores, obtained from the records of 2,297 subjects, to MDSc scores and MDSa scores to FIMc scores. The converted scores were then analyzed statistically, first at the individual level and then at the group level, as a means of determ ining the level of accuracy of the FIM-MDS conversion table. In order for the conversion table to be considered accurate for use in clinical settings, it was hypothesized that 75% of the actual and converted scores on the FIM and the MDS should be no more than five points apart. Next, techniques used by Dorans and Lawrence (1990) to test the accuracy of an attempt to equate nearly identical editions of the Scholastic Aptitude Test (SAT) were applied to this dataset. These procedures were used to determine whether the converted scores on the FIM and MDS fell within a reasonable confidence interval placed around that actual FIM and MDS scores. The difference between the actual and converted scores was calculated and then that difference was divided by the standard error of the actual scores. If the resulting ratio fell within a bandwidth of plus or minus two, then the converted scores were considered to be within sampling error of the actual scores. When testing the accuracy of the FIM-MDS conversion table from the perspective of group scores, the equivalence of the four moments of the distributions (i.e., the means, standard deviations, skewness, and kurtosis) were compared. Next, the

PAGE 53

45 amount of overlap that existed between the actual and converted scores on the FIM and MDS was determined by calculating effect sizes. Correlation between the actual and converted scores on the FIM and MDS were also calculated and those results are displayed graphically, as well in scatter plots. Statistical Results at the Level of the Individual Of the 2,297 subjects analyzed, 25% (574) of the sample had FIMa and FIMc scores that fell within five points of one another with the difference in scores ranging from 0 to 71 points (Figure 4-1). For the MDS, 37% (850) of the sample had actual and converted scores within no more than five points of one another with the difference in scores ranging from 0 to 48 (Figure 4-2). These percentages fell well short of the 75% standard for both the FIM and the MDS. For comparison purposes, the percentage of subjects with actual and converted FIM and MDS scores within 10 points of one another was also calculated. On the FIM, 45.5% (1,045) of the subjects had actual and converted scores within 10 points and 64% (1,470) of the subjects had MDS scores within 10 points. Even when this more lenient criterion was used, the results continue to fall short of the 75% standard. The equivalence of the actual and converted FIM scores and the actual and converted MDS scores was also evaluated using the test of equivalence employed by Dorans & Lawrence (1990). For the FIMa vs. FIMc scores, 8.4% of the conversion met the criterion for equivalence and for the MDSa and MDSc scores, 6.4% met this criterion. Statistical Results at the Group Level Cooper (1989) concluded that in order for the equating of scores to be successful, all four moments of the distribution should be similar. The four moments of the

PAGE 54

46 Number of Points by which the FIMa and FIMc Differ70.0 65.0 60.0 55.0 50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0Number of Subjects600 500 400 300 200 100 0 Std. Dev = 10.38 Mean = 13.5 N = 2297.00 Number of Points by which the MDSa and MDSc Differ50.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0Number of Subjects800 600 400 200 0 Std. Dev = 7.05 Mean = 9.1 N = 2297.00 Figure 4-1. Score difference between the FIMa and FIMc Figure 4-2. Score difference between the MDSa and MDSc

PAGE 55

47 distribution, the mean, standard deviation, skewness, and kurtosis of the actual and converted scores on the FIM and MDS are displayed in Table 4-1. Table 4-1. Four moments of the distributions FIMaFIMcMDSaMDSc N Valid2297229722972297 Missing0000 Mean52.3760.3531.8625.86 Std. Error of Mean.432.409.280.295 Median54.0062.0033.0026.00 Std. Deviation20.6919.6013.4314.12 Skewness-.143-.388-.406-.041 Std. Error of Skewness.051.051.051.051 Kurtosis-.855-.573-.630-.901 Std. Error of Kurtosis.102.102.102.102 The mean of the FIMc was eight points greater than the mean of the FIMa while the mean of the MDSa exceeded the mean of the MDSc by six points. The standard deviation between the actual and converted FIM scores (20.69 and 19.60) was within one point of each other and for the two MDS scores, there was slightly more than a one-point difference at 13.43 for the MDSa and 12.12 for the MDSc. Since the mean of the FIMa and FIMc differed by only eight points, these scores fell well within one standard deviation of each other. Similarly, the means of the actual and converted MDS differed by six points and also fell within one standard deviation of each other. Therefore, the first two moments of the distributions, the mean and standard deviation, were equivalent. Since the mean score can be highly influenced by outliers, the median scores for the distributions were also reported. The difference in the medians of the FIMa and FIMc score distributions was eight points, just as it had been with the difference in the means. The medians of the MDSa and MDSc differed by seven points. The medians for all four of the score distributions exceeded their respective means, indicating the presence of a negative skew to the score distributions. Normal distributions produce a

PAGE 56

48 skewness statistic of about zero. A skewness or kurtosis value of two standard errors or greater, regardless of sign, likely deviates from a normal score distribution to a significant degree (Brown, 1997). Two times th e standard error of skewness for all four distributions was .051 and two times the standard error of kurtosis was .102, again for all four score distributions. Therefore, th e distributions of the actual FIM and MDS instruments had a significant negative skew, indicating that the subjects measured on these two inventories were generally more able than the tests were difficult. The distribution of the FIMc was also negatively skewed to a significant degree, yet the MDSc did not have a significant skew value and the distribution would be considered symmetrical in this regard. The distributions of both the actual and converted scores on the FIM and MDS all had negative kurtosis values, indicating that these distributions were flatter than what one would expect a nd differed from normal to a significant degree. The results of this conversion revealed a substantial overlap between the distributions of the actual and converted FIM scores (Figures 4-3 and 4-4), as well as between the actual and converted MDS scores (F igure 4-5 and 4-6). An effect size of .2 demonstrated an 85.3% overlap of the distributions for the actual and converted scores in each case. Correlations between the actual and converted scores were calculated and revealed a .724 correlation at p<.01 between the FIM scores and a correlation of .745 at p<.01 between the MDS scores. Scatter plots of the actual and converted scores for the FIM (Figure 4-7) and for the MDS (Figure 4-8) graphically demonstrate the level of correlation between the actual and converted scores on the two instruments. Figure 4-5 demonstrates a ceiling effect exists for the actual FIM scores, while Figure 4-6 demonstrates a ceiling effect for the converted MDS scores.

PAGE 57

49 Total FIM Actual Scores Figure 4-3. Distribution of FIM actual scores Total FIM Converted Scores Figure 4-4. Distribution of FIM converted scores

PAGE 58

50 Total MDS Actual Scores Figure 4-5. Distribution of MDS actual scores Total MDS Converted Scores Figure 4-6. Distribution of MDS converted scores

PAGE 59

51 Actual FIM Scores100 80 60 40 20 0Converted FIM Scores100 80 60 40 20 0 Actual MDS Scores60 50 40 30 20 10 0 -10Converted MDS Scores60 50 40 30 20 10 0 -10 Figure 4-7. Scatterplot of the FIMa and FIMc scores Figure 4-8. Scatterplot of actual MDS and converted MDS scores

PAGE 60

52 Discrepancies in the Dataset Of the 2,297 subjects in the dataset used in this study, 51% (1,163) had at least one of the similar items in which there was a difference of four or more rating points between scores on the FIM and the MDS. Three percent (76) of the subjects had four or more of the nine similar items with score differences of four or more rating points. Two percent (2) of the 109 subjects with FIM and MDS scores recorded on the same day had more than four similar items with score differences of four or more rating points.

PAGE 61

53 CHAPTER 5 DISCUSSION Summary of Results The results of this study testing the accuracy of the FIM-MDS conversion table were mixed, as those statistics at the group level tended to support the accuracy of the conversion, while those at the individual level did not. The findings that support a conclusion of equivalency between the actual and converted scores on both the FIM and MDS included an 85.3% overlap between the respective score distributions, as well as a correlation of .724 for the FIM and .745 for the MDS. The means of the actual and converted FIM scores were well within one standard deviation of each other, as were the means of the actual and converted MDS scores. The results of the two one-sided test procedures supported this conclusion of equivalency between the respective FIM and MDS means, as well. On the other hand, only 25% (574) of the sample had FIMa and FIMc scores within 5 points of one another and 37% (850) had MDSa and MDSc scores within that range. Those percentages fell well short of the hypothesized 75% of the sample having actual and converted scores that were no more than 5 points apart. If the standard for an accurate conversion system were lowered to allow for up to a 10-point difference between actual and converted scores, then 45% (1,045) of the sample would have FIM scores and 64% (1,470) would have MDS scores within that range. While these percentages were closer to the 75% criteria, they still fall short. The presence of a

PAGE 62

54 negative skew for all four of the score distributions was an indication that the subjects’ ability levels were higher than the difficulty levels of the inventories. All of the score distributions except for the MDSc demonstrated a skewness value that resulted in a significant departure from normal. The level of kurtosis for all four distributions also deviated from normal to a significant degree. When using the very rigorous procedures described by Dorans and Lawrence (1990) to determine equivalency, only 8.4% of the scores on the FIMa and FIMc and 6.4% of the MDSa and MDSc scores met this criterion. A ceiling effect in the distribution of the FIMc and the MDSa scores was present. Submitting the FIM and MDS scores to the FIM-MDS conversion table resulted in an inflation of FIM scores and a deflation of MDS scores. Taking all of the above information into consideration, the FIM-MDS conversion table passed less stringent standards for equivalency, generally at the group level, but failed when focusing on statistics at the individual level. Since other attempts at creating conversion tables between instruments used in rehabilitation have not gone further to test the accuracy of those conversions, no direct comparisons to other research findings can be made. An understanding of the psychometrics of the FIM-MDS conversion table is important when interpreting the results of this study. The conversion table was developed from a dataset of 253 subjects with FIM and MDS scores occurring within 7 days of one another. Similar FIM and MDS items used in the development of the conversion table had a correlation of .82 at p<.01, and there was a correlation of .70 at p<.01between similar person measures. These correlations are not as strong as those reported in the study by Fisher et al. (1997) Fisher et al. found a correlation of .95 between difficulty estimates for a subset of similar items on the PF10 and the PFS, both of which are self-report measures. The lower correlations for the FIM-MDS conversion

PAGE 63

55 table for similar item and similar person measures may be explained by differences in the design between this and previous studies. Fi sher et al. used a convenience sample of 285 patients who were waiting for appointments in a public hospital general medicine clinic. These patients were asked to complete the PF10 and the PFS inventories while they waited to see a doctor and, therefore, no opportunity existed for a change in physical condition to take place between the completion of the two surveys. There was also no possibility for different raters to score the two instruments on the same subject since the “rater” in both cases was the patient. Fisher et al. also removed the least consistent cases from the analysis, meaning cases with the highest outfit statistics were not included in creating the conversion table. The correlations found in the current study were also not as strong as those reported by Fisher, Harvey, Taylor, et al. (1995), who obtained a correlation of similar person measures of .91 between the instruments used in the development of the Rehabits’ translation scale. The stronger correlation obt ained by Fisher, Harvey, Taylor, et al. may be the result of differences in the design of the study, as compared to the current one. Fisher and colleagues used the results of 54 consecutive patients admitted to a freestanding rehabilitation hospital, who were rated on both the FIM and the PECS at admission and discharge. Thus, “rater” variability was controlled, and there was no possibility for physical changes to take place in the patient’s condition between the administrations of the two inventories upon admission and then again upon discharge because the inventories in each case were completed at the same time. Studies by both Fisher, Harvey, Taylor, et al. (1995) and Fisher et al. (1997) that used cocalibration equating measures to link healthcare instruments did not test the correlations between actual and converted scores. Therefore, it is difficult to clearly

PAGE 64

56 define the significance of a correlation of .72 for the FIMa and FIMc and a correlation of .75 for the MDSa and MDSc. If this were a classical test-retest reliability study, these correlations would be considered low. The question left for the current study is whether better results could be obtained by creating a conversion system based on more accurate data. Yet, an attempt to use more “conrolled” data collection methodologies would limit the applicability of the study in clinical settings. The limited accuracy of the FIM-MDS conversion table may be a result of problems with the sample upon which it was based. This conversion system was developed from a dataset of 253 subjects with FIM and MDS scores occurring within seven days of one another. It may be that a larger dataset is necessary to create a highly accurate conversion system. Furthermore, a significant limitation to the dataset used in this study is the presence of scores on similar FIM and MDS items for the same subject that differ markedly. It can be argued that differences in the definitions of similar items between the FIM and MDS led to variability in scores. For example, the definition of “eating” on the MDS focuses “on the intake of nourishment by any means, regardless of skill, and includes the use of alternate forms of obtaining nourishment, such as tube feeding” (Rogers et al., 2001 p. 7-8). Yet, eating on the FIM is limited to the use of suitable utensils to bring the food to the mouth (Rogers et al.). The MDS item for bathing includes bathtub or shower transfers while the FIM bathing item does not. And, the MDS toileting item includes the ability to transfer to and from the toilet while the FIM item for toileting does not. Furthermore, the FIM incorporates safety into the definition of many of its items, such as grooming, bathing, dressing, transfers, toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers et al.). Thus, one person could conceivably obtain a very different score on two of the similar items

PAGE 65

57 between the FIM and MDS. Alternatively, the fact that up to a 7 day difference in test scores between the FIM and the MDS was allowed may have also had an impact on the presence of discrepancies that were seen between similar items for the same subject. While the decision to restrict the number of days between the completion of the FIM and MDS was based on the need to minimize the impact any possible change in the patient's condition would have on their scores, it is likely that even this restriction did not go far enough to eliminate this source of error in the study. Thus, some of the discrepancies in scores that were seen between similar items may be due to a change in the subject's physical condition. A related explanation for the problems encountered in developing a highly accurate conversion table is the existence of discrepancies between similar categories in the rating scales of the FIM and the MDS. A FIM score of "1" refers to "Total Assistance," in which the patient puts forth less than 25% of the effort necessary to perform a task. The corresponding score on the MDS is a “4” for “Total Dependence.” This score is defined as, “full staff performance of the activity during entire 7-day period. There is complete nonparticipation by the resident in all aspects of the ADL definition task.” A noticeable difference exists between these two rating definitions. On the FIM, the patient may exert up to a quarter of the effort needed to perform the test, while on the MDS the patient does not participate at all in the activity. Instead, the staff is required to perform the full activity. It may be reasonably be argued that there is a clinically significant difference in a patient who can exert up to 25% of the effort required to complete a task and one who cannot exert any effort at all. Similarly, a score of "2" on the FIM refers to "Maximal Assistance," in which the patient puts forth less than 50% of the effort necessary to do a task, but at least 25% of the effort. The corresponding score

PAGE 66

58 on the MDS of “1” or “Supervision” is defined as “Oversight encouraged or cuing provided three or more times during the last 7 days OR supervision (three or more times) plus physical assistance provided, but only one or two times during the last 7 days. Once again, a significant difference in meaning is evident between these two categories. A further limitation of this study is that there may be a selection bias in that patients that have scores on both the FIM and MDS may not typical of the patient population in general. Additionally, it is likely that characteristics of this veteran population may not be representative of the population for whom the FIM and MDS are intended. Most notably, 98% of the study population is male. Yet, the greater majority of people in nursing homes, where the MDS is used, are female. Similarly, the ethnic make up of this study population is also likely not an accurate reflection of the make up of the population who most frequently use the FIM and MDS. The unique aspects of the subjects in this study may limit the generalizability of the findings. Implications for Future Research The results of this study lead one to consider other possible situations in which the linking of instruments may be effective. There is evidence to suggest that the creation of a conversion table based on two self-report instruments would have a higher degree of accuracy (Fisher et al., 1997). One possible reason for the higher degree of accuracy is that rater bias is not an issue, as the same “rater” (i.e., the subject or their proxy) would complete both instruments. Additionally, the research study could be set up so that the subject completed both instruments at the same time. In this manner, it would not be possible for a change in the subject’s physical condition to take place and it is hypothesized that it would be unlikely for a subject to interpret similar items between instruments differently. Thus, these two sources of error, rater bias and the possibility

PAGE 67

59 that there has been a change in the subject's physical condition between the administrations of the two instruments, would be eliminated and the conditions required for the creation of a conversion table would be optimized. The use of self-reports in a clinical setting and/or in a research study would also have an economic advantage, as it would reduce the amount of time a trained therapist would need to be involved in the task. Conclusion Maximizing outcomes in rehabilitation, while streamlining the process of providing highly effective and coordinated services, will continue to be a goal of rehabilitation for years to come. Efforts to increase the continuity of care between PAC settings and to improve the effectiveness of rehabilitation services will be pursued on all levels. This research focused on determining the accuracy of one such effort, namely a means of creating an easily implemented and highly effective tool for converting the score from the physical component items on the FIM to those on the MDS and vice versa. The results of this study suggest that scores derived from the FIM-MDS conversion table should, at best, only be considered as rough estimates of similar scores on the two instruments. At the conclusion of this study, the question still remains as to whether the FIM and MDS instruments can measure physical functioning on a common unit of measurement and whether a highly accurate conversion table can be developed so that a patient’s gains in physical functioning can be tracked from inpatient rehabilitation settings to skilled nursing facilities. It may be that pursuing research in alternative directions, such as using these linking techniques to create a conversion table between self-report instruments of functional ability, will provide a solution.

PAGE 68

60 REFERENCES Anderson, S., & Hauck, W. W. (1990). Consideration of individual equivalence. Journal of Pharmacokinetics Biopharmaceutics 18, 259-273. Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42 (Suppl. 1), 1-7. Badia, X., Prieto, L., Roset, M., Diez-Per ez, A., & Herdman, M. (2002). Development of a short osteoporosis quality of life questionnaire by equating items from two existing instruments. Journal of Clinical Epidemiology, 55 (1), 32-40, 2002. Berger, R., & Hsu, J. (1996). Bioequivalence trials, intersection-union tests, and equivalence confidence sets. Statistical Science, 11 283-319. Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Erlbaum. Bond, L., Moss, P., & Carr, P. (1996). Fairness in large-scale performance assessment. In G. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 117-140). Washington, DC: National Center for Education Statistics. Brown, J. D. (1997). Skewness and kurtosis. Shiken: JALT Testing and Evaluation SIG Newsletter, 1 (1), 16-18. Buchanan, J. L., Andres, P. L., Haley, S. M., Paddock, S. M., & Zaslavsky, A. M. (2003). An assessment tool translation study. Health Care Financing Review, 24 (3), 45-60. Casten, R. C., Lawton, M. P., Parmelee, P. A., & Cleban, M. H. (1998). Psychometric characteristics of the Minimum Data Set I: Confirmatory factor analysis. Journal of the American Geriatric Society, 46, 726-736. Catalogue of Federal Domestic Assistance.(2002). Veterans nursing home care. Retrieved September 12, 2002, from http://www.cfda.gov/public/viewprog.asp?progid=777 Cella, D., & Chang, C. (2000). A discussion of item response theory and its applications in health status research. Medical Care, 38 (9, Suppl. 2), 66-72.

PAGE 69

61 Centers for Disease Control. (2002). Health, the United States, 2001. Retrieved March 9, 2002, from http://www.cdc.gov/nchs/hus.htm Centers for Medicare and Medicaid Services. (2001, October-November). Presentation for the inpatient rehabilitation facility-patient assessment instrument. Retrieved September 12, 2002, from http://www.cms.hhs.gov/providers/irfpps/day1%5Firfpai.pdf Centers for Medicare and Medicaid Services. (2002). Long-term care minimum data set. Retrieved September 23, 2002, from http://cms.hhs.gov/medicaid/mds20/default.asp Centers for Medicare and Medicaid Services. (2003). RAI version 2.0 manual. Retrieved June 24, 2004, from http://www.cms.hhs.gov/quality/mds20/raich3.pdf Chang, C., & Cella, D. (1997). Equating health-related quality of life instruments in applied oncology settings. Physical Medicine and Rehabilitation: State of the Art Reviews, 11 (2), 397-406. Choppin B. (1968). An item bank using sample free calibration. Nature, 219 870-872. Cohen, M. E., & Marino, R. J. (2000). The tools of disability outcomes research functional status measures. Archives of Physical Medicine and Rehabilitation, 81 (Suppl. 2), S21-S29. Cook, L., & Peterson, N. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement, 11 225-244. Cornman, J. M., & Kingson, E. R. (1996). Trends, issues, perspectives, and values for the aging of the baby boom cohorts. The Gerontologist, 36 (1), 15-26. De Gruijter, D. N. M. (1986). Small N does not always justify Rash model. Applied Psychological Measurement, 10, 187-194. DeJong, G. (2001). Open letter from ACRM to HCFA on proposed medicare PPS. A letter prepared for the President of the American Congress of Rehabilitation Medicine under the auspices of ACRM's Research Policy and Legislation Committee and the Committee's PPS Workgroup. Archives of Physical Medicine and Rehabilitation, 82, 567-569. Dillingham, T. R., Pezzin, L. E., & MacKenzie, E. J. (2003). Discharge destination after dysvascular lower-limb amputations. Archives of Physical Medicine and Rehabilitation, 84 (11), 1662-1668.

PAGE 70

62 Dodds, T. A., Martin, D. P., Stolov, W. C., & Deyo, R. A. (1993). A validation of the Functional Independence Measurement and its performance among rehabilitation inpatients. Archives of Physical Medicine and Rehabilitation, 74 531-536. Douglas, J. (1999). Item response models for longitudinal quality of life data in clinical trials. Statistics in Medicine, 18, 2917-2931. Evans, C. T. (2002). Functional independence measure. Retrieved June 24, 2004, from http://www.sci-queri.research.med.va.gov/fim.htm Fiedler, R. C., & Granger, C. V. (1997). Uniform data system for medical rehabilitation: Report of first admissions for 1995. American Journal of Physical Medicine and Rehabilitation, 76 (1), 76-81. Fisher, A. G. (1993). The assessment of IADL motor skills: An application of manyfaceted Rasch analysis. American Journal of Occupational Therapy, 47 (4), 319-329. Fisher, W. P. (1997). Physical disability construct convergence across instruments: Toward a universal metric. Journal of Outcome Measurement, 1 (2), 87-113. Fisher, W. P., Eubanks, R. L., & Marier, R. L. (1997). Equating the MOS SF36 and the LSU H.S.I. physical functioning scales. Journal of Outcome Measurement 1 (4), 329-362. Fisher, W. P., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5, 3-25. Fisher, W. P., Harvey, R. F., Taylor, P., Kil gore, K. M. & Kelly, C. K. (1995). Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76, 113-122. Forrest, G., Schwam, A., & Cohen, E. (2002). Time of care by patients discharged from a rehabilitation unit. American Journal of Physical Medicine and Rehabilitation, 81 (1), 57-62. Fox, C. M., & Jones, J. A. (1998). Uses of Rasch modeling in counseling psychology research. Journal of Counseling Psychology, 45, 30-45. Functional independence measure. (2001). Retrieved June 16, 2004, from http://www.medfriendly.com/functionalindependencemeasure.html#range Granger, C. V., (1998). The emerging science of functional assessment: Our tool for outcomes analysis. Archives of Physical Medicine and Rehabilitation, 79, 235-240.

PAGE 71

63 Granger, C. V. (1999). Continuum of care: Measuring medical rehabilitation outcomes. Journal of the Institute of Objective Measurement, 2, 60-62. Granger, C. V., Cotter, A. C., & Hamilton, B. B. (1990). Functional assessment scales: A study of persons with multiple sclerosis. Archives of Physical Medicine and Rehabilitation, 71, 870-875. Granger, C. V., Cotter, A. C., Hamilton, B. B., & Fiedler, R. C. (1993). Functional assessment scales: A study of persons after stroke. Archives of Physical Medicine and Rehabilitation, 74, 133-138. Granger, C. V., & Hamilton, B. B. (1993). The Uniform Data System for medical rehabilitation report of first admissions for 1991. American Journal of Physical Medicine and Rehabilitation, 72 (1), 33-38. Granger, C. V., Hamilton, B. B., Keith, R. A., Zielezny, M., & Sherwin, F. S. (1986). Advances in functional assessment for medical rehabilitation. In C. B. Lewis (Ed.), Topics in geriatric rehabilitation (Vol. 1, pp. 59-74). Baltimore, MD: Aspen. Granger, C. V., Hamilton, B. B., & Sherwin, F. S. (1986). Guide for use of the Uniform Data Set for medical rehabilitation Buffalo, NY: Buffalo General Hospital. Gruber-Baldini, A. L., Zimmerman, S. I., Mortimore, E., & Magaziner J. (2000). The validity of the Minimum Data Set in measuring the cognitive impairment of persons admitted to nursing homes. Journal of the American Geriatric Society, 48 1734-1736. Gutheil, I. A. (1996). Introduction. The many faces of aging: Challenges for the future. Gerontologist, 36 (1), 13-14. Haertel, E. H., & Lynn, R. L. (1996). Comparability. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 59-78). Washington, DC: National Center for Educational Statistics. Haley, S. M., McHorney, C. A. & Ware, J. E. (1994). Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale. Journal of Clinical Epidemiology, 47 671-684. Hall, K. M., Hamilton, B. B., Gordon, W. A., & Zasler, N. D. (1993). Characteristics and comparisons of functional assessment indices: Disability rating scale, functional independence measures, and functional assessment measure. Journal of Head Trauma Rehabilitation, 8 (2), 60-74. Hambleton, R. K. (2000). Response to Hays et al. and McHorney and Cohen: Emergence of item response modeling in instrument development and data analysis. Medical Care, 38 (9, Suppl. 2), 60-65.

PAGE 72

64 Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory Thousand Oaks, CA: Sage. Hamilton, B. B. (1989). Totaled functional score can be valid [Letter to the editor]. Archives of Physical Medicine and Rehabilitation, 70 861-862. Hamilton, B. B., Granger, C. V., Sherwin, F. S., Zielezny, M., & Tashman, J. S. (1987). A uniform national data system for medical rehabilitation. In M. J. Fuhrer (Ed.), Rehabilitation outcomes: Analysis and measurement (pp. 137-147). Baltimore: Brooks. Hart, D. L. (2000). Assessment of unidimens ionality of physical functioning in patients receiving therapy in acute, orthopedic outpatient centers. Journal of Outcome Measurement, 4 413-430. Hart, D. L., & Wright, B. D. (2002). Devel opment of an index of physical functional health status in rehabilitation. Archives of Physical Medicine and Rehabilitation, 83, 655-665. Hawes, C., Morris, J. N., Phillips, C. D., Mor, V., Fries, B. E., & Nonemaker, S. (1995). Reliability estimates for the Minimum Data Set for nursing home resident assessment and care screening (MDS). The Gerontologist, 35 (2), 172-178. Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care, 38 (9), 28-42. Health Care Financing Administration. ( 1998). Minimum Data Set, 2.0 Washington, DC: U.S. Government Printing Office. Heinemann, A. W., Kirk, P., Hastie, B. A., Senik, P., Hamilton, B. B., Linacre, J. M., Wright, B. D., & Granger, C. V. (1997). Relationships between disability measures and nursing effort during medical rehabilitation for patients with traumatic brain and spinal cord injury. Archives of Physical Medicine and Rehabilitation, 78 143-149. Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V. (1993). Relationships between impairment and physical disability as measured by the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation, 74, 566-573. Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V. (1994). Prediction of rehabilitation outcomes with disability measures. Archives of Physical Medicine and Rehabilitation, 75, 133-143. Hyslop, T., Hsuan, F., & Holder, D. J. (2000). A small sample confidence interval approach to assess individual bioequivalence. Statistics in Medicine, 19 2885-2897.

PAGE 73

65 Iwanenko, W., Fiedler, R. C., & Granger, C. V. (1999). Uniform data system for medical rehabilitation—Report of first admissions to subacute rehabilitation for 1995, 1996, and 1997. American Journal of Physical Medicine and Rehabilitation, 78 (4), 384-388. Jette, A. M., Haley, S. M., & Ni, P. (2003). Co mparison of functional status tools used in post-acute care. Health Care Financing Review, 24 (3), 1-12. Johnson, M. F., Kramer, A. M., Lin, M. K., Kowalsky, J. C., & Steiner, J. F. (2000). Outcomes of older persons receiving rehabilitation for medical and surgical conditions compared with hip fracture and stroke. American Journal of the Geriatric Society, 48 (11), 1389-1397. Keith, R. A., Wilson, D. B., & Gutierrez, P. (1995). Acute and subacute rehabilitation for stroke—A comparison. Archives of Physical Medicine and Rehabilitation, 76 (6), 495-500. Kim, S., & Cohen, A. S. (2002). A comparison of linking and concurrent calibration under the graded response model. Applied Psychological Measurement, 26 (1), 25-41. Kolen, M. J. (2001). Linking assessments effectively: Purpose and design. Educational Measurement: Issues and Practice, 20 (1), 5-9. Kolen, M. J., & Brenan, R. L. (1995). Test equating: Methods and practices New York: Springer-Verlag. Latham, N. K., & Haley, S. M. (2003). Measuring functional outcomes across postacute care: Current challenges and future directions. Critical Reviews in Physical and Rehabilitation Medicine, 15 (2), 83-98. Li, Y. H., & Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24 (2), 115-138. Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2 (3), 266-283. Linacre, J. M., Heineman, A. W., Wright, B. D., Granger, C. V., & Hamilton, B. B. (1994). The structure and stability of the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation, 75, 127-132. Linacre, J. M., & Wright, B. D (2000). WINSTEPS (Version 3.17)[Rasch model computer software]. Chicago: MESA. McHorney, C. A. (1997). Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century. Annals of Internal Medicine, 127, 743-750.

PAGE 74

66 McHorney, C. A. (2002). Use of item response theory to link 3 modules of functional status items from the Asset and Health Dynamics Among the Oldest Old Study. Archives of Physical Medicine and Rehabilitation, 83 383-394. McHorney, C. A., & Cohen A. S. (2000). Equating health status measures with item response theory, illustrations with functional status items. Medical Care, 38 (9, Suppl. 2), 43-59. McHorney, C. A., Haley, S. M., & Ware, J. E. (1997). Evaluation of the MOS SF-36 physical functional scale (PF-10). II: Comparison of relative precision using Likert and Rasch scoring methods. Journal of Clinical Epidemiology, 50 (4), 451-461. Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 15, 300-307. Merbitz, C., Morris, J., & Grip, J. C. (1989). Ordinal scales and foundation of misinference. Archives of Physical Medicine and Rehabilitation, 70, 308-312. Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects (Policy issue perspective). Princeton, NJ: Educational Testing Service. Morris, J., Hawes, C., Fries, B. E., Phillips, C. D., Mor, V., Katz, S., Murphy, K., Drugovich, M. L. & Friedlob, A. S. (1990). Designing the national resident assessment instrument for nursing homes. Gerontologist 30,293-302. Morris, J. N., Fries, B. E., Mehr, D. R., Hawes, C., Phillips, C. D., Mor, V., & Lipsitz, L. A. (1994). MDS cognitive performance scale. Journal of Gerontology: Medical Sciences, 49 (4), M124-M182. Muraki, E., Hombo, C. M., & Lee, Y. W. (2000). Equating and linking of performance assessments. Applied Psychological Measurement, 24 (4), 325-337. National Committee on Vital and Health Statistics. (2003). Classifying and reporting functional status. Retrieved April 9, 2004, from http://ncvhs.hhs.gov/010617rp.pdf Oczkowski, W. J., & Barreca, S. (1993). The Functional Independence Measure: Its use to identify rehabilitation needs in stroke survivors. Archives of Physical Medicine and Rehabilitation, 74, 1291-1294. Ottenbacher, K. J., Hsu, Y., Granger, C. V., & Fiedler, R. C. (1996). The reliability of the Functional Independence Measure: A quantitative review. Archives of Physical Medicine and Rehabilitation, 77 1226-1232.

PAGE 75

67 Penta, M. (2004). Evaluation in rehabilitation: Perspectives with the Rasch model. Retrieved July 2, 2004, from http://www.uzleuven.be/UZRoot/files/webed itor/RaschPerspectivesForRehabilitat ionAbstract.pdf Prieto, L., Alonso, J., Lamarca, R., & Wright, B. R. (1998). Rasch measurement for reducing the items of the Nottingham Health Profile. Journal of Outcome Measurement, 2, 258-301. Raczek, A. E., Ware, J. E., Bjorner, J. B., Gandek, B., Haley, S. M., Aaronson, N. K., Apolone, G., Beck, P., Brazier, J. E., Bullinger, M., & Sullivan, M. (1998). Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven count ries: Results from the IQOLA project. Journal of Clinical Epidemiology, 51 (11), 1203-1214. Rantz, M. J., Zwygart-Stauffacher, M., Popejoy, L. L., Mehr, D. R., Grando, V. T., Wipke-Tevis, D. D., Hicks, L. L. & Conn, V. S. (1999). The Minimum Data Set: No longer just for clinical assessment. Annuals of Long-Term Care, 7, 354-360. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests Chicago: University of Chicago Press. Resident assessment instrument training manual and resource guide (1991). Natwick, MA: Eliot Press. Rogers, J. C., Gwinn, S. M., & Holm, M.B. (2001). Comparing activities of daily living assessment instruments: FIM, MDS, OASIS, MDS-PAC. Physical and Occupational Therapy in Geriatrics, 18 (3), 1-25. Segal, M. E., Heinemann, A. W., Schall, R. R., & Wright, B. D. (1997). Rasch analysis of a brief physical ability scale for long-term outcomes of stroke. Physical Medicine and Rehabilitation State of the Art Reviews, 11, 385-396. Silverstein, B., Fisher, W. P., Kilgore, K. M., Harley, J. P., & Harvey, R. F. (1992). Applying psychometric criteria to functional assessment in medical rehabilitation: II. Defining interval measures. Archives of Physical Medicine and Rehabilitation, 73, 507-518. Skaggs, G., & Lissitz, R. (1986). IRT test e quating: Relevant issues and a review of recent research. Review of Education Research, 56, 495-529. Spector, W. D., & Fleishman, J. A. (1998). Combining activities of daily living with instrumental activities of daily living to measure functional disability. Journal of Gerontology, 53, S46-S57.

PAGE 76

68 Stegner, B. L., Bostrom, A. G., & Greenfiel d, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19 (3), 193-198. Stineman, M. G. (2001). The story of functional-related groups—Please, first do no harm. Archives of Physical Medicine and Rehabilitation, 82 (4), 553-557. Stineman, M. G., Jette, A., Fiedler, R. C., & Granger, C. V. (1997). Impairment-specific dimensions within the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation, 78, 636-643. Stineman, M. G., & Maislin, G. (2000). Clinical, epidemiological, and policy implications of minimum data set validity (Editorial). Journal of the American Geriatric Society, 48 (12), 1734-1736. Stineman, M. G., Shea, J. A., Jette, A., Tassoni, C. J., Ottenbacher, K. J., Fiedler, R. C., & Granger, C. V. (1996). The Functional Independence Measure: Tests of scaling assumptions, structure, and reliability across 20 diverse impairment categories. Archives of Physical Medicine and Rehabilitation, 77, 1101-1108. Tennant, A., & Young, C. (1997). Comma to community: Continuity in measurement. In R. M. Smith (Ed.), Outcome measurement: Physical medicine and rehabilitation state of the art reviews (pp. 376-384). Philadelphia: Hanley & Belfus. Teresi, J. A., & Homes, D. (1992). Should MDS data be used for research? [editorial]. Gerontologist, 32, 148-149. Uniform data systems for medical rehabilitation. (1997). Retrieved June 10, 2003, from http://www.udsmr.org/history.htm U.S. Department of Health and Human Services. (2002). A profile of older Americans. Retrieved July 30, 2003, from http://www.aoa.gov/prof/Statistics/profile/highlights.asp U.S. Food and Drug Administration. (1997). Guidance for industry: In vivo bioequivalence studies based in popul ation and individual bioequivalence approaches. Rockville, MD: U.S. Department of Health and Human Services. U.S. Food and Drug Administration. (1999). Draft guidance on average, population, and individual approaches to establishing bioequivalence. Rockville, MD: U.S. Department of Health and Human Services. Vale, C. D. (1986). Linking item parameters onto a common scale. Applied Psychological Measurement, 10 (4), 333-344. Velozo, C. A. (2004). Translating measures across the continuum of care: Creating a crosswalk between the FIM and MDS. Manuscript in preparation.

PAGE 77

69 Velozo, C. A., Kielhofner, G., & Lai, J. (1999). The use of Rasch analysis to produce scale-free measurement of functional ability. American Journal of Occupational Therapy, 53, 83-90. Velozo, C. A., Magalhaes, L., Pan, A., & Leiter, P. (1995). Differences in functional scale discrimination at admission and discharge: Rasch analysis of the Level of Rehabilitation Scale-III (LORS-III). Archives of Physical Medicine and Rehabilitation, 76 (8), 705-712. Velozo, C. A., & Peterson, E. W. (2001). Deve loping meaningful fear of falling measures for community dwelling elderly. American Journal of Physical Medicine and Rehabilitation, 80 (9):662-673. Veterans Administration. (2001). IMPACTS 2000 Retrieved December 12, 2002, from http://www.va.gov/resdev/prt/impacts2000.pdf Veterans Health Administration. (2000). Medical rehabilitation outcomes for stroke, traumatic brian injury, and lower extremity amputation patients. Retrieved September 12, 2002, from http://www.va.gov/publ/direc/health/direct/12000016.pdf Veterans Health Administration. (2002). Austin Automation Center. Retrieved September 12, 2002, from http://www.aac.va.gov/ Wang, W., & Hwang, J. T. (2001). A nearly unbi ased test for individual bioequivalence problems using probability criteria. Journal of Statistical Planning and Inference, 99, 41-58. Ware, J. E. (2003). Conceptualization and m easurement of health-related quality of life: Comments on an evolving field. Archives of Physical Medicine and Rehabilitation, 84 (Supp 2), S43-S51. Wilkerson, D., & Johnston, M. (1997). Clinical program monitoring systems: Current capability and future directions. In M. Fuhrer (Ed.), Assessing medical rehabilitation practices: The promise of outcomes research (pp. 275-305). Baltimore, MD: Paul H. Brookes. Williams, B. C., Lee, Y., Fries, B. E., & Warren, R. L. (1997). Predicting patient scores between the Functional Independence Measure and the Minimum Data Set: Development and performance of a FIM-MDS "crosswalk." Archives of Physical Medicine and Rehabilitation, 78, 48-54. Wright, B.D. (1984). Item banks: What, why, how. Journal of Educational Measurement, 21, 331-345. Wright, B. D. (1997). A history of social science measurement. Retrieved October 6, 2002, from http://www.rasch.org/memo62.htm

PAGE 78

70 Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal: Measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation, 70, 857-860. Wright, B. D., Linacre, J. M., & Heineman, A. W. (1993). Measuring functional status in rehabilitation. Physical Medicine and Rehabilitation State of the Art Reviews, 4, 475-491.

PAGE 79

71 BIOGRAPHICAL SKETCH Katherine L. Byers, MHS, CRC, CVE, is a doctoral candidate in the rehabilitation science doctoral (RSD) program at the Univers ity of Florida, College of Public Health and Health Professions. Ms. Byers received a Bachelor of Arts degree in behavioral sciences from Rice University in Houston, Texas in 1989. She then completed a Master of Health Science (MHS) degree in rehabilitation counseling at the University of Florida in 1991 and subsequently obtained certifications as both a rehabilitation counselor and as a vocational evaluator. Over a period of 9 years, Ms. Byers worked in positions of increasing responsibility in the field of rehabilitation before entering the University of Florida’s rehabilitation science doctoral program in January of 2000. While completing the requirements of the degree, Ms. Byers was employed as a research assistant and then as a program coordinator for Dr. Craig Velozo, an assistant professor in the Department of Occupational Therapy at the University of Florida. Accomplishments during Ms. Byers’ doctoral career include winning the 2002 John Muthard Research Award from the University of Florida’s College of Health Professions, Department of Rehabilitation Counseling. She also was selected to make a poster presentation at the Third National Rehabilitation Research and Development Meeting in Washington, DC, in 2002, and at the 2004 ACRM-ASNR Joint Conference.


Permanent Link: http://ufdc.ufl.edu/UFE0005961/00001

Material Information

Title: Testing the Accuracy of Linking Healthcare Data across the Continuum of Care
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0005961:00001

Permanent Link: http://ufdc.ufl.edu/UFE0005961/00001

Material Information

Title: Testing the Accuracy of Linking Healthcare Data across the Continuum of Care
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0005961:00001


This item has the following downloads:


Full Text











TESTING THE ACCURACY OF LINKING HEALTHCARE DATA
ACROSS THE CONTINUUM OF CARE
















By

KATHERINE L. BYERS


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2004















ACKNOWLEDGMENTS

I would like to thank the VA Office of Academic Affairs, Washington, DC, Pre-

Doctoral Associated Health Rehabilitation Research Fellowship Program, and the VA

HSR&D/RR&D Rehabilitation Outcomes Research Center (RORC) of Excellence,

Gainesville, Florida, for funding this study. Within the VA, I would like to especially

thank Dr. Maude Rittman for acting as my fellowship's program director and for her

continual support and encouragement throughout the process. Additional thanks go to

Dr. Christa Hojlo, Chief of Nursing Home Care, for her assistance in obtaining MDS data

and to Mr. Clifford Marshall, Rehabilitation Planning Specialist, for his assistance in

obtaining FIM data.

I extend my gratitude to my doctoral advisor and VA Fellowship Preceptor, Dr.

Craig Velozo, who also acted as my mentor and the chairperson of my supervisory

committee. His guidance and support throughout this process have been invaluable.

Furthermore, I am appreciative of the support provided by the other members of my

committee, including the cochair, Dr. Ronald Spitznagel, Dr. Elizabeth Swett and

Dr. Anne Seraphine. Special thanks go to Dr. Richard Smith, an expert in Rasch

analysis, who analyzed the original data.

Other student members of Dr. Velozo's research team have been invaluable in the

completion of this dissertation, and I owe them a debt of gratitude. This is especially true

of Ms. Inga Wang, who has worked closely on this project. And finally, my family has

been a source of continual support throughout this process, and I would like to thank

them for their tireless encouragement.















TABLE OF CONTENTS

page

ACKN OW LEDGM EN TS ................................................ ii

LIST OF TABLES ......... ............................................. v

LIST OF FIGURES .............................................. vi

ABSTRACT .............. .................................. vii

CHAPTER

1 INTRODUCTION ........ ..........................................1

2 REVIEW OF THE LITERATURE ......... ...................... 10

Measuring Outcomes in Rehabilitation .............................. ... 10
The FIM Instrument Used in Inpatient Rehabilitation ................... ..... 11
The MDS Instrument in Skilled Nursing Facilities .......................... 13
Administrative Solution: Adopting a Single Instrument for Measuring Outcomes
in Post Acute Care .............. ... ................ ...... 14
Potential Measurement Solution: Linking Instruments ................... .. 17
The Use of Linking Techniques ............... ...................... 20
Linking of Measures in Healthcare ................ ................... 22

3 METHODOLOGY .................................................. 30

Introduction ............. ................... ................... 30
Source of the Data .......... ......................................... 30
Sam ple ............... .............................. .31
FIM and MDS M otor Items .......... ...... ... ...... .............. 32
Procedures Involved in the Creation of the FIM/MDS Conversion Table ....... 36
Statistically Testing the Accuracy of the FIM-MDS Conversion Table .......... 38

4 RESULTS ......... ................................. ........ 44

Statistical Analyses ......... ............ ............. ........ 44
Statistical Results at the Level of the Individual ............................ 45
Statistical Results at the Group Level ......................... ... .. 45
Discrepancies in the Dataset ................ ........................ 52

iii











5 DISCUSSION ........ ............................................ 53

Summary of Results ................................................. 53
Implications for Future Research ............... ..................... 58
Conclusion ........ ...................................... ......... 59

REFERENCES ....... ............................................... 60

BIOGRAPHICAL SKETCH ........................................... 71












































iv















LIST OF TABLES

Table page

1-1 Comparison of FIM to MDS ADL/motor items ............... ........... 7

3-1 Comparison ofFIMTM to MDS ADL/motor items ......................... 33

3-2 FIM scoring criteria .................. ................. ......... 35

3-3 MDS scoring criteria ..................................... ........... 36

3-4 FIM-MDS score conversion .......... ...................... 37

3-5 FIM-MDS conversion table ......... ........................ .39

3-6 Effect size ............... ................ ................. 42

3-7 Rating scale conversion ............... ................ ........ 42

3-8 Similar FIM and MDS items ............... ....................... 43

4-1 Four moments of the distributions ................. .................. 47















LIST OF FIGURES

Figure

4-1 Score difference between the FIMa and FIMc ........................... 46

4-2 Score difference between the MDSa and MDSc ......................... 46

4-3 Distribution of FIM actual scores ....... ... ..... .. 49

4-4 Distribution of FIM converted scores .............. .............. 49

4-5 Distribution of MDS actual scores ............................ .. 50

4-6 Distribution of MDS converted scores ............... ........... .. 50

4-7 Scatterplot of the FIMa and FIMc scores ............................. 51

4-8 Scatterplot of actual MDS and converted MDS scores ................... 51















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

TESTING THE ACCURACY OF LINKING HEALTHCARE DATA
ACROSS THE CONTINUUM OF CARE

By

Katherine L. Byers

August 2004

Chair: Craig A. Velozo
Major Department: Rehabilitation Science

The purpose of this project was to test the accuracy of a conversion table designed

to transform a score on the physical ability component of the Functional Independence

MeasureTM (FIM) to its corresponding score on the Minimum Data Set (MDS) and vice

versa. The records of 2,297 VA patients with scores on both the FIM and MDS, which

were completed within 7 days of one another between July 2002 and June 2003, were

obtained from the VA's Austin Automation Center (AAC). The FIM-MDS conversion

table, generated from an independent sample using Rasch measurement techniques, was

then used to transform actual scores on the FIM and MDS to their corresponding

converted scores. The equivalence of the variances of the two score distributions was

determined by examining their means and variances. It was hypothesized that 75% of the

actual and converted scores on the FIM and the MDS would be within five points of one

another. Effect size was determined, as was the percent of subjects having actual and

converted FIM and MDS scores within five points of one another.









Twenty-four percent of the FIM and 37% of the MDS actual and converted scores

were within five points of one another, respectively, and therefore fell short of the

standard set at 75% for the conversion to be considered accurate. Yet, the effect size for

the conversion of both FIM and MDS scores was .2, demonstrating an 85.3% overlap

between the two score distributions. The correlation between the FIM actual and

converted scores was .724 while the correlation between the MDS scores was .745.

While the development of a FIM-MDS translation table appears promising, the

results of this study do not provide strong enough evidence to support the premise that

this first attempt at creating a FIM-MDS conversion table has resulted in an instrument

that would provide an accurate means of converting scores within a clinical setting.















CHAPTER 1
INTRODUCTION

Over the past few decades, the total number of people receiving rehabilitation

services in the United States has grown while the provision of such services has extended

into settings outside of acute rehabilitation settings. This growth in demand has been

fueled in part by changes in the population demographics, as the number of individuals

over the age of 65 has increased, and this trend is expected to continue for many years

into the future (Cornman & Kingson, 1996; Gutheil, 1996). According to the U.S.

Department of Health and Human Services (2002), 35.6 million people or 12.3% of the

U.S. population was over the age of 65 in 2002. Since 1990, this percentage had more

than tripled (4.1% in 1900 to 12.3% in 2002). By the year 2030, the older population

will more than double again to about 70 million people or 20% of the total population.

Similarly, the proportion of U.S. veterans over age 65 is projected to increase from 26%

in 1990 to 46% in 2020 (Veterans Administration, 2001).

A greater number of elderly people in our society are associated with an increased

demand for rehabilitation services. Dillingham, Pezzin, and MacKenzie (2003) reported

that aging is accompanied by an increased risk of diminished health status and a greater

likelihood of requiring rehabilitation services. As noted in the Centers for Disease

Control (CDC) Health, United States, 2001 (2002), the prevalence of both chronic

conditions and activity limitations increases with age, with health-related limitation in

mobility or self-care increasing fourfold between the ages 65 to 74, and 85 or older. In

1997, more than half of the older population (54.5%) reported having at least one







2

disability, with more than a third (37.7%) reporting at least one severe disability (U.S.

Department of Health and Human Services, 2002). There has also been an increase in

the number of patients being admitted to postacute care (PAC) settings after discharge

from acute care hospitals (Iwanenko, Fiedler, & Granger, 1999; Johnson, Kramer, Lin,

Kowalsky, & Steiner, 2000). Iwanenko et al. note that between 1991 and 1997, the

number of patients admitted annually to PAC settings rose from 12,468 to 49,844.

Stineman (2001) noted that the most significant challenge to current medicine was likely

the care of people with chronic, incurable diseases and injuries.

The U.S. healthcare system provides a variety of routes to recovery from physical

injuries, ailments, or impediments. PAC settings, also called subacute care or transitional

care settings, are a type of short-term care program provided by many long-term care

facilities and hospitals. Treatment in such settings may include rehabilitation services,

specialized care for certain conditions (such as stroke and diabetes), and postsurgical care

and other services associated with the transition between the hospital and home.

Residents on these units often have been hospitalized recently and typically have

complicated medical needs. The goal of subacute care is to discharge residents to their

homes or to a lower level of care. Current PAC settings include comprehensive inpatient

rehabilitation units attached to acute care or freestanding hospitals, skilled nursing

facilities (SNFs), outpatient rehabilitation facilities, freestanding outpatient clinics, and

home health care services. Each person with a potentially disabling impairment has a

unique care trajectory that may include sequential admission to more than one PAC

setting. An example of the possible variations is that a person diagnosed with stroke,

after being discharged from an acute care hospital, is then admitted to an inpatient

rehabilitation program before being released to home with outpatient services. Yet,









another individual with the same diagnosis is discharged from the acute hospital setting

directly into a skilled nursing facility.

SNFs were established under the 1965 Medicare legislation and are certified by

Medicare to provide 24-hour nursing care and rehabilitation services in addition to other

medical services. SNF-based rehabilitation units have become a rapidly growing

segment of the rehabilitation continuum over the past decade, as policy makers have

searched for less costly delivery systems for rehabilitation. While inpatient treatment

provides a full complement of professionals practicing in a hospital setting, it is one of

the most costly of the rehabilitation services (Keith, Wilson, & Gutierrez, 1995). SNFs,

on the other hand, have lower costs, mainly because construction, regulatory and staff

requirements are less stringent than they are in hospitals (Keith et al.). As a result, SNF-

based rehabilitation has been used increasingly as a substitute for traditional inpatient

care (Keith et al.). Additionally, many older patients do not meet the Medicare

requirements to receive inpatient rehabilitation services, which includes being able to

tolerate three hours of therapy on a daily basis, fitting within one of the required

diagnostic mixes, as well as being able to make significant progress over a fairly short

length of time (Keith et al.). In such cases, SNF settings have become an appropriate

alternative.

In rehabilitation programs, a patient's functional enhancement is the primary goal

(DeJong, 2001). The ability to evaluate a patient's status is central to rehabilitation

efforts, for example, to track a patient's recovery, to determine the effectiveness of

treatment, or to estimate resource use (Penta, 2004). It is well documented that one's

ability to function physically is an important component of a patient's self-report of

health status (Haley, McHorney, & Ware, 1994; Hart, 2000; McHorney, Haley, & Ware,

1997; Raczek et al., 1998; Segal, Heinemann, Schall, & Wright, 1997). Since the 1950s,









functional status measures have served as a means to monitor outcomes within medical

centers. Yet to this day, there is no clear and commonly accepted definition of function

or a clear delineation between instruments that assess functional outcomes and those that

evaluate other health concepts. As a result, the ability to compare one instrument

measuring functional status to another can be fraught with complications.

Currently in the U.S., two distinct instruments are used to monitor functional

outcomes in in-patient rehabilitation settings and SNFs. Traditional rehabilitation

facilities have almost uniformly adopted the Functional Independence Measure (FIMM)1

as a means of monitoring patients' functional ability. The FIM instrument provides a

measure of disability and was put into operation beginning in 1989 (Granger, 1998).

Today, it is one of the most widely used instruments that assess the quality of daily living

activities in persons with disabilities (Granger, Hamilton, & Sherwin, 1986).

The Veterans Health Administration (VHA) rehabilitation services include the

FIM in its Functional Status and Outcomes Database (FSOD), which has been

operational since 1997 (Veterans Health Administration [VHA], 2000). It is mandated

for use by the VHA Directive 2000-016, "Medical Rehabilitation Outcomes for Stroke,

Traumatic Brian Injury, and Lower Extremity Amputation Patients," which requires

every VHA medical center to assess functional status and enter this data into the FSOD

in order to measure and track rehabilitation outcomes on all new stroke, lower extremity

amputee, and traumatic brain injury (TBI) patients (VHA, 2000). Presently, the FSOD

has not been linked with other data sources that would allow patients to be monitored as

they progress across the continuum of care (e.g., from rehabilitation facilities to skilled

nursing facilities or from rehabilitation facilities to home health care).



1FIMn' is a trademark of the Uniform Data System for Medical Rehabilitation, a
division ofU. B. Foundation Activities, Inc.









While the FIM is the "gold standard" for measuring functional outcomes in

rehabilitation settings, the Minimum Data Set (MDS) of the nursing home Resident

Assessment Instrument (RAI, 1991), is used universally for monitoring rehabilitation

outcomes in SNFs. The MDS was developed in response to a 1986 Institute of Medicine

study of the quality of care in nursing homes that called for improvements in nursing

home quality and more patient-centered care (Morris et al., 1990). The federal Omnibus

Budget Reconciliation Act of 1987 (OBRA 87) mandated all U.S. nursing homes to

implement the Resident Assessment Instrument, whose core is the Minimal Data Set

(MDS) (Rantz et al., 1999). The MDS consists of 284 items designed to assess the

cognitive, behavioral, functional, and medical status of nursing home residents (Hawes et

al., 1995; Teresi & Homes, 1992).

Nursing homes are a critical environment for tracking the health care status of

elderly veterans. In fiscal year 2001, there were a total of 89,056 veterans treated in

nursing homes with an average daily census of 33,670 (Catalogue of Federal Domestic

Assistance, 2002). By 2003, it is projected that 111,953 patients will be treated in

nursing homes with an average daily census of 35,132 (Catalogue of Federal Domestic

Assistance). In 1995, there are at least 1.5 million nursing home residents who reside in

facilities participating in the Medicare or Medicaid programs (Hawes et al., 1995).

Within the VHA, the reduction of acute rehabilitation beds from 1,150 five years ago to

617 in 2003 further increases the likelihood that veterans could receive their post-acute

rehabilitation care in nursing homes (C. Johnson, personal communication, September

29, 2002).

A key to improving services for patients treated in PAC settings is to develop

effective and efficient methods for tracking and evaluating functional status changes

across rehabilitation and skilled nursing facilities. Through the use of a single instrument









in these settings, a patient may progress from one to the other, while maintaining a

functional assessment score that could easily be tracked and compared between settings.

Such a tool would benefit patients, as it would facilitate an increased continuity of care

between settings. It would also allow for the direct comparison of rehabilitation

outcomes between settings, along with resource utilization and costs. Lathem and Haley

(2003) note that a clear need exists for an instrument that can accurately assess patients'

functional ability as they move through the health care system. As stated in Buchanan,

Andres, Haley, Paddock, and Zaslavsky (2003), "Providers, payers, and consumers

would all benefit from comparable measures of functional status and rehabilitation

outcomes across multiple care settings to facilitate equitable payment and to monitor the

quality and efficiency of care delivery" (p. 45). To date, there has been only one

published attempt by Williams, Lee, Fries, and Warren (1997) to link the FIM to the

MDS. Yet, other studies have linked other measures of global functioning (e.g., Fisher,

Eubanks, & Marier, 1997; Fisher, Harvey, & Kilgore, 1995; Fisher, Harvey, Taylor,

Kilgore, & Kelly, 1995; Segal, Heinemann, Schall & Wright, 1997; Smith & Taylor,

2004; Tennant & Young, 1997).

The dilemma of having multiple yet incompatible instruments measuring the

same construct has been confronted and successfully overcome in the physical sciences.

Take, for example, the manner in which we measure distance. Currently, in the United

States, we have two competing systems of measuring length, namely the metric system

and the standard system of measurement. Surprisingly, success has not come through

attempts to convert entirely from one system to the other despite the obvious benefits in

doing so. Instead, we continue to utilize simple strategies that allow us to convert a

measure on one scale to its corresponding measure on the other. Similarly, we routinely

convert readings between Celsius and Fahrenheit with a simple conversion table when







7

measuring temperature. Thus, one could say that a precedent has been set for the manner

in which we have successfully reconciled competing systems of quantifying what are

essentially abstract concepts.

An analogous attempt in health care would be to develop a system of converting

scores between the physical functioning components of the FIM and the MDS so that a

score on one instrument could be translated into its equivalent score on the other. The

hypothesis is that the items included in these two instruments are subsets of items along

an ADL/motor construct. Table 1-1 presents a comparison of the ADL/motor items of

the FIM and the MDS.

Table 1-1. Comparison of FIM to MDS ADL/motor items
FIM Items MDS Items
Eating Eating
Bed Mobility

Grooming Personal Hygiene
Bathing Bathing
Dressing-Upper Body Dressing
Dressing-Lower Body
Toileting Toilet Use
Bladder Management Bladder Continence*
Bowel Management Bowel Continence*
Bed, Chair, Wheelchair (Transfer) Transfer
Toilet (Transfer)
Tub, Shower (Transfer)

Walk/wheel Chair Walk in Room
Stairs Walk in Corridor
Locomotion on Unit
Locomotion off Unit
FIM Rating Scale MDS-Rating Scale (exceptions noted
7 Complete Independence (Timely, below)
Safely) 0 Independent
6 Modified Independence (Device) 1 Supervision
5 Supervision 2 Limited Assistance
4 Minimal Assist (Subject = 75%+) 3 Extensive Assistance
3 Moderate Assist (Subject = 50%+) 4 Total Dependence
2 Maximal Assist (Subject = 25%+) 8 Activity did not occur during the
1 Total Assist (Subject = 0%+) entire 7-day period
Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually
Continent, 2-Occasionally Continent, 3-Frequently Incontinent, 4-Incontinent










Similarities are immediately evident between the two instruments. Both instruments

include items for eating, dressing, toileting, bowel and bladder functioning, as well as the

ability to transfer and to walk. Differences include an item for climbing stairs on the

FIM that is not part of the MDS.

A mathematical framework is needed in order to convert a score on one instrument

to its corresponding score on the other. Item Response Theory (IRT) measurement

models have been rapidly gaining popularity over classical test theory (CTT) for

analyzing instruments used in healthcare and rehabilitation (Douglas, 1999; Hambleton,

2000; Hays, Morales, & Reise, 2000; Linacre, Heineman, Wright, Granger, & Hamilton,

1994; McHorney, 1997; Prieto, Alonso, Lamarca, & Wright, 1998; Silverstein, Fisher,

Kilgore, Harley, & Harvey, 1992; Velozo, Magalhaes, Pan, & Leiter, 1995). IRT is

comprised of a set of generalized linear models and their associated statistical procedures

that connect a subject's response to test items to that subject's location on the latent trait

being tested (Mellenbergh, 1994). In order to create a link between scores on the FIM

and MDS, it is hypothesized that Rasch analysis can be used to place the items from both

instruments on the same linear continuum (Fisher, Harvey, Taylor, et al., 1995). A

precedent for linking instruments in such a manner has been established in the fields of

education and psychological measurement.

It is the purpose of this dissertation to evaluate the accuracy of a FIM-MDS

conversion table that has been created through the use of Rasch analysis. The technical

procedures of test equating used in the educational applications of Rasch's probabilistic

models were transferred to the cocalibration of these two functional assessment

instruments. An accurate conversion table between the FIM and the MDS would allow







9

for studies to take place that are necessary to examine the outcomes for persons receiving

rehabilitation services in different care settings. This would, in effect, eliminate the need

to institute massive changes in measurement procedures across rehabilitation settings.

Functional status information could then be used to track changes and follow a patient's

progress across PAC settings and not only monitor but compare quality of care and

rehabilitation outcomes in different settings (National Committee on Vital and Health

Statistics, 2003).















CHAPTER 2
REVIEW OF THE LITERATURE

Measuring Outcomes in Rehabilitation

The effectiveness of rehabilitation services is gauged by the restoration and

maximization of patient functioning. Functional status in this context has been defined

as reflecting, "an individual's ability to carry out activities of daily living (ADLs) and to

participate in various life situations and in society" (Jette, Haley, & Ni, 2003, p. 1).

Therefore, the assessment of functional status is a method for describing abilities and

activities in order to measure an individual's use of the variety of skills included in

performing the tasks necessary to daily living, vocational pursuits, social interactions,

leisure activities, and other required behaviors (Granger, 1998). ADL measures have

been used to determine a patient's level of disability, whether one qualifies for certain

types of healthcare services, and to document outcomes of rehabilitation services.

The focus of the earliest standardized assessments of function, developed over 50

years ago, were on the basic ADL's, which consist of self-care activities, such as bathing,

grooming, dressing, and walking. Two of the first functional status measures used in

rehabilitation were the Katz and Barthel indexes whose items were comprised solely of

basic ADL tasks (Cohen & Marino, 2000; Latham & Haley, 2003). Then, "with

changing societal expectations, the advent of brain injury rehabilitation, and the

independent living movements, medical outcome research began to explore means of

documenting social and cognitive-based behaviors as part of rehabilitation outcomes"









(Latham & Haley, 2003, p. 85). The result of this was the development of instruments,

such as the Functional Independence Measure (FIM) and the Minimum Data Set (MDS).

The FIM Instrument Used in Inpatient Rehabilitation

In U.S. inpatient rehabilitation settings, the Uniform Data System for Medical

Rehabilitation (UDSMR), is the most widely used clinical database for assessing

rehabilitation outcomes (Fiedler & Granger, 1997; Granger & Hamilton, 1993). The FIM

is the core functional status measure of the UDSMR and was developed to establish a

uniform standard for the assessment of functional status during medical rehabilitation

(Granger, Hamilton, Keith, Zielezny & Sherwin, 1986). The FIM incorporates concepts

and items from previous functional assessment instruments, such as the Katz Index of

ADL, the PULSES profile, the Kenny Self-Care Evaluation, and the Barthel Index (Hall

et al., 1993). The FIM system was developed by a national task force cosponsored by the

American Congress of Rehabilitation Medicine and the American Academy of Physical

Medicine and Rehabilitation Task Force to Develop a National Uniform Data System for

Medical Rehabilitation (UDSMR) to rate the severity of patient disability and the

outcomes of medical rehabilitation (Hamilton et al., 1987). The original work of this task

force was expanded by the Department of Rehabilitation Medicine at the State University

of New York at Buffalo. Since 1987, it has been the mission of the UDSMR to measure

medical rehabilitation outcomes across the continuum of care-both time and settings

(Granger, 1999). The UDSMR maintains a national data repository for research purposes

of three million case records from 1,400 facilities around the world (Granger, 1999).

The FIM is administered in most inpatient rehabilitation facilities within three

days of admission and prior to discharge (Granger, Hamilton & Sherwin, 1986). The

scale accounts for a patient's level of independence, amount of assistance needed, use of







12

adaptive or assistive devices, and the percentage of a given task completed successfully.

This instrument is comprised of 18 items with a seven-level response scale of

independent performance in self-care, sphincter control, mobility, locomotion,

communication and social cognition (Granger & Hamilton, 1993). As such, it contains

items representing three constructs: ADL, mobility, and continence (Granger, Hamilton,

& Sherwin). Thirteen of the 18 FIM items (related to functional ability) can be further

divided into three more specific subscores rating activities of daily living (ADLs),

sphincter management, and mobility (Stineman, Jette, Fiedler & Granger, 1997). A FIM

score on each item ranges from 1 (Total Assistance) to 7 (Complete Independence).

Thus, a total FIM score ranges from 18 to 126. Hamilton (1989) noted that, "Because

each item is scaled on the basis of functional independence, it is expected that the total

score (with each item appropriately weighted) will correlate with the burden of care for

the disabled person" (p. 862).

Psychometric studies of the FIM instrument support its use for research purposes.

One of the strengths of the FIM is that it has undergone so many methodological

evaluations, in which it has demonstrated good psychometrics (Dodds, Martin, Stolov &

Deyo, 1993). Dodds et al. noted a high internal consistency (Cronbach's coefficient of

.93 at admission and .95 at discharge) that demonstrated that the FIM is a reliable

instrument. Extensive investigations of the FIM's reliability and validity have provided

evidence of its interrater and test-retest reliability (Ottenbacher, Hsu, Granger & Fiedler,

1996), internal consistency (Stineman et al., 1996; Stineman et al., 1997), concurrent

validity (Granger, Cotter, Hamilton & Fiedler, 1993; Oczkowski & Barreco, 1993) and

predictive validity (Heinemann, Linacre, Wright, Hamilton & Granger, 1994; Oczkowski

& Barreco). Ottenbacher et al. (1996) performed a meta analysis of 11 studies that

revealed the median inter-rater reliability for the total FIM was .95 and the test-retest and







13

equivalence reliability was .95 and .92, respectively. Stineman et al. (1996) showed in a

study of 93,829 rehabilitation inpatients that a factor analysis of the FIM instrument

supported the identification of ADL/motor and cognitive/communication dimensions

across 20 impairment categories. Additionally, the stability of the FIM motor scores was

demonstrated in several studies (Linacre et al., 1994; Wright, Linacre & Heineman,

1993). Linacre (1998) confirmed the multidimensional structure of the FIM by means of

Rasch analysis followed by factor analysis of standardized residuals and demonstrated

the divergence of the five cognitively-oriented items from the 13 motor-oriented items.

The MDS Instrument in Skilled Nursing Facilities

After receiving inpatient rehabilitation services, patients may be transferred into a

SNF, where the FIM instrument is no longer used as a measure of functional ability.

Instead, the Minimum Data Set (MDS) is utilized in this role. Since 1990, HCFA (Health

Care Finance Administration, now known as the Center for Medicare and Medicaid

Services [CMS]) has required that the MDS be administered to all SNF residents. The

MDS is a comprehensive assessment of over 85 health status elements organized into

measurement categories. An MDS score on each item ranges from 0 (Independent) to 4

(Total Dependence). The definitions of these ratings are provided on the rating form.

Full MDS assessments are required at least annually and when there is a significant

change in a patient's condition (such as a stroke, resulting in hemiparesis). A minimum

of approximately 1.2 million residents are assessed annually with the full MDS, with 3.6

million briefer updates and an unknown additional number of complete MDS

reassessments because of major changes in residents before an annual MDS is due

(Casten, Lawton, Parmelee & Cleban, 1998). While the MDS shares ADL content with

the FIM, it is targeted exclusively to SNFs, making comparisons across rehabilitation

settings difficult (Latham & Haley, 2003).









Although not as extensively studied as the FIM, research on the MDS suggests

that it too has adequate reliability and validity for use in research studies. Early research

studies by Hawes et al. (1995) showed that MDS items had interclass correlations of .7 or

higher in key areas of functional status, such as ADL and cognition. Sixty-three percent

of the items achieved reliability coefficients of .6 or higher and 89% achieved .4 or

higher (Hawes et al., 1995). Morris et al. (1994) showed that the seven cognitive items

(short-term memory, long-term memory, decision making, and four categories of

memory recall) showed an internal reliability of .83-.88.

While it has been suggested that the above psychometrics findings on the MDS

are "inflated" due to being administered by research staff (Stineman & Maislin, 2000)

Gruber-Baldini, Zimmerman, Mortimore & Magaziner (2000) showed that when the

MDS is administered by clinical staff, cognitive items (MDS-COGS) and cognitive

performance scale items (MDS-CPS) correlate moderately well with the Mini Mental

Status Exam (MMSE) (R=-0.65 and -0.68, respectively) and the Psychogeriatric

Dependency Rating Scale (PGRDS) Orientation scale (R=0.63 and 0.66, respectively).

The internal reliability of the MDS-COGS was .85 and the MDS-CPS (without the

comatose items) was 0.80. Confirmatory factor analysis studies, derived from clinical

and administrative databases of the MDS, confirmed all MDS domain clusters except

social quality (Casten et al., 1998).

Administrative Solution: Adopting a Single Instrument for
Measuring Outcomes in Post Acute Care

The existence of "competing" instruments in PAC settings (i.e., the FIM in

rehabilitation facilities and the MDS in SNFs) has led to considerable debate over which

of these instruments should be the basis for a post-acute care Prospective Payment







15

System (PPS) in the private sector (Centers for Medicare and Medicaid Services [CMS],

2001; DeJong, 2001). The demands for a PPS for rehabilitation facilities prompted the

HCFA in 1998 to develop the Minimum Data Set for Post-Acute Care (MDS-PAC), an

instrument similar in design to the MDS, intended to address the needs of subacute

facilities, rehabilitation facilities, and long-term care hospital patients (CMS, 2002). One

of the intended purposes of the MDS-PAC was to provide CMS with a tool by which it

could monitor the quality of health care services across post-acute settings (Granger,

Hamilton, Keith, et al., 1986). Originally, items similar to those found in the MDS for

rehabilitation were intended to be used in this new instrument. Yet, upon its completion,

it consisted of more than 400 items, and it lacked one-to-one correspondence with the

FIM (DeJong, 2001, p. 567). Martin Grabois, MD, President of the American Congress

of Rehabilitation Medicine (ACRM) stipulated in a letter in 2001 that the ACRM

strongly opposed implementation of the MDS-PAC and felt it was premature to use it as

a quality-monitoring tool in rehabilitation (DeJong, 2001). This challenge resulted in a

change in CMS's decision from using the MDS-PAC to using the Inpatient Rehabilitation

Facility Patient Assessment Instrument (IRF-PAI) as the basis for the post-acute PPS.

The IRF-PAI includes the FIM ADL/motor and cognition/communication items. This

IRF-PAI was mandated for implementation in rehabilitation facilities on January 1, 2002

(CMS, 2001).

While the final decision to use the IRF-PAI has tremendous economical benefits

(e.g., rehabilitation facilities not having to convert from the FIM to an MDS-based

instrument), it does not facilitate monitoring functional outcomes as patients cross from

rehabilitation to skilled nursing facilities. That is, a FIM-based outcome instrument will

continue to be used in rehabilitation facilities, while a MDS-based instrument will be









used in nursing home facilities. A drawback to having different functional outcome

measures across these two health-care settings is "test dependency." Data gathered on

one instrument cannot be compared to similar data gathered on an alternate instrument.

Wilkerson and Johnston (1997) noted that the absence of a single, standardized

instrument used in both rehabilitation facilities and SNFs was a fundamental barrier for

U.S. policymakers. Without such an instrument, these policymakers were hampered in

their ability to fulfill the emerging health policy mandate to monitor the quality and

outcomes of services for patients in PAC settings. As patients are transferred across

these settings, researchers, managers, and clinicians were unable to easily and accurately

track functional status changes.

The ability to measure a patient's functional status is not only important within a

particular rehabilitation setting, but also along the continuum of care. It is difficult to

accurately and precisely compare and contrast the magnitude of gains made through

rehabilitation efforts in various therapeutic PAC settings when such improvements are

evaluated on incongruous instruments based on divergent scales of measurement. As

Granger (1998) contends, there has been unfortunately too little effort in addressing

assessment and management of persons with disablement across the continuum of care.

As a result, the outcomes of therapeutic interventions cannot easily and accurately be

compared to determine their relative effectiveness.

This measurement of rehabilitation outcomes is an integral component of a major

controversy in rehabilitation over the best PAC setting in which to provide care.

Optimizing a patient's recovery while at the same time minimizing costs are the two

variables most often considered in this debate. Inpatient rehabilitation units typically

provide the highest level of services, although often at the greatest costs. On the other







17

hand, skilled nursing facilities generally do not provide the same extent and intensity of

therapeutic interventions, although at a reduced cost. The ability to directly compare

rehabilitation outcomes between, for example, an inpatient rehabilitation setting and a

skilled nursing facility would enhance our understanding of where patients benefit most

and from what interventions. In January of 2000, Sally Kaplan, Ph.D. of the Medicare

Payment Advisory Commission (MedPac) told the Subcommittee on Populations,

We strongly believe that it would be extremely useful, to say the least, to have
standardization of functional status measures at least in post-acute care so that if
similar patients are treated in different post-acute settings, or if patients are
treated in successive post-acute care settings, that we would have a means of
measuring them. ... It would expand the utility of regularly collected
information. (National Committee on Vital and Health Statistics, 2003).

Potential Measurement Solution: Linking Instruments

There is more than one way to coordinate rehabilitation services along the

continuum of care. For example, one might institute the use of the same instrument to

measure functional ability in all rehabilitation settings so that the results would be

directly comparable. In this manner, the patient's progress can be easily tracked across

settings and services. Unfortunately, attempts to use the FIM across settings have met

with limited success as there are sizeable obstacles to implementing such a plan.

Already, there has been a huge monetary investment in our current attempts to measure

rehabilitation outcomes. To figuratively, "throw out" these current instruments and

replace them with one new and improved universal measure would likely be cost-

prohibitive (Cohen & Marino, 2000). There would also likely be significant resistance to

implementing such a plan. In our capitalistic economy, private fortunes are often tied up

in maintaining the status quo and significant resistance would be likely. Furthermore,









significant costs would be incurred in training staff in a new system, as well as in the

implementation of a new database.

The nature of the predicament now facing rehabilitation services of having

multiple, yet incompatible instruments, attempting to quantify a single construct is not a

new one. It has been successfully overcome in others areas, for example, within the basic

sciences. Major scientific advances have been possible in part because the instruments

used to measure a construct were standardized and, in some cases, linked to other similar

instruments. An example of this is the historical attempts to measure the construct of

temperature. What began as a human sensation of"hot" and "cold" evolved into the field

of thermometrics, the measurement of temperature (Bond & Fox, 2001). In A.D. 180,

Galen mixed equal quantities of ice and boiling water to establish a "neutral" point for a

seven-point scale having three levels of warmth and three levels of coldness (Bond &

Fox). Then, in the 17th century, Santorio of Padua used a tube of air inverted in a

container of water so that the water level rose and fell with temperature changes. He

calibrated the scale by marking the water levels at the temperature of flame and ice

(Bond & Fox). Our current mode of measuring temperature uses mercury in a closed

tube. Even with a single mechanism by which we measure temperature, we use two

competing scales, namely, Celsius and Fahrenheit. Both of these scales merely set two

known temperature points (ice and boiling points) and simply divided the scale into equal

units. These two independently developed scales have been linked, so a "score" on one

scale can easily be converted to a score on the other scale.

A problem in developing measures in the human sciences is that, "we are clearly

dealing with abstractions (e.g., perceived social support, cognitive ability, and self-

esteem), so we need to construct measures of abstractions, using equal units, so that we









can make inferences about constructs rather than remain at the level of describing our

data" (Bond & Fox, 2001, p. 4). Yet, it appears hopeless to construct models of human

behavior since behavior seems to be so unpredictable. What we can do is estimate the

probability of a behavior taking place. We need to build a model, "more like models in

modern physics-models which are indeterministic, where chance plays a decisive role"

(Rasch, 1960, p. 11). What is then being described is the possibility of a behavior

occurring, that is the relative frequency of an event occurring. We can say that the

probability of something occurring is equal, for example, to 50 percent.

An alternative solution to this problem would be to use the same scale of

measurement as the basis of each instrument. Linking can be referred to as the

development, "of a common metric in IRT by transforming a set of item parameter

estimates from one metric onto another, base metric" (Hart & Wright, 2002, p. 2).

McHorney, 1997 pointed out, "The development of a shared language that goes beyond

specific items to location on an ability scale would provide users tremendous flexibility

in building and maintaining an outcomes capacity within and across different databases,"

(p. 749). The use of a shared language across rehabilitation settings would allow all

services along the continuum of care to be interrelated and coordinated. In this manner,

the outcomes, as well as the efficient utilization of resources could be maximized.

Doran and Holland (2000) write, "The comparability of measurement made in

differing circumstances by different methods and investigators is a fundamental

precondition for all of science" (p. 281). The development of the same scale of

measurement may be achieved through the utilization of Rasch analysis, a one-parameter

Item Response Theory (IRT). Georg Rasch, a Danish mathematician who examined

psychological measurement problems in the 1950s and 1960s, surmised that the









relationship between a person's ability and an item's difficulty can be modeled as a

probabilistic function. As a person's ability level increases, the probability of passing an

item also increases (Fox & Jones, 1998). The Rasch model specifies exactly how to

convert observed counts into linear (and ratio) measures (Wright & Linacre, 1989). IRT,

then, is both a theoretical framework and a collection of quantitative techniques used to

construct tests, scale responses, and equate scores. It consists of models, each designed

to describe a functional relationship between an examinee's ability and the characteristics

(or parameters) of the items on a test (McHorney & Cohen, 2000).

Performing a Rasch analysis on the FIM would address a problem that exists with

the interpretation of the FIM scores. It has been noted that a change of 10 raw score

points at the extremes of the FIM range is equivalent to four times as much change on the

linear scale as a change of 10 raw score points at the center of the FIM range (Linacre et

al., 1994). Thus, the improvement made by a person with moderate deficits, placing the

patient in the center of the scale, will appear to be much greater than a person who is

closer to the end of the scale even when the actually improvement might otherwise be

seen as equivalent. Linacre et al. (1994) cite this as one example of why a conversion

from raw scores to linear measures is so essential to quantifying changes in a patient's

status more appropriately. Performing a Rasch analysis on the FIM would be one way to

accomplish this.

The Use of Linking Techniques

Educators in the United States have been involved in the process of equating and

linking instruments through a variety of statistical procedures for more than 40 years.

Kolen (2001) notes that the first pages of the first issue of the Journal of Educational

Measurement published in 1964 were on the subject of linking.







21

Linking is a scaling method used to achieve comparability of scores, to the extent

possible, between tests of different frameworks and test specifications (Muraki, Hombo

& Lee, 2000). Linking is distinguished from test equating, which involves making

statistical adjustments to scores from alternative forms of an instrument to account for

small differences in the difficulty of the test items on each form (Kolen, 2001). The

term, test equating, is traditionally used to refer to the case of linking when two or more

forms of a test have been constructed according to the exact specifications, such as equal

difficulty, reliability, and validity and constructed for the same purpose (Muraki et al.,

2000). The most basic of the equating methods is linear equating, which assumes that the

two tests to be equated differ only in means and standard deviations.

In IRT, linking is referred to as developing a common metric by transforming a

set of item parameter estimates from one metric to another, base metric (Kim & Cohen,

2002). The process of linking consists of an anchoring design and a transformation

method. The anchoring design ensures that there will be a basis for comparison between

the item calibrations on the two instruments (Vale, 1986). The linking transformation

refers to the equation used to put the item parameters on a common scale of

measurement. "These processes rest on two assumptions: (a) the different instruments

being linked measure the same underlying construct; and (b) the linking sample

represents the population for which the test is intended" (McHorney, 2002, p. 386).

The cocalibration of instruments purported to measure a common construct is

simply an extension of test equating, item banking, and partial credit principles that have

been in use in education for decades (Choppin, 1968; Fisher, Harvey, Taylor, et al., 1995;

Wright, 1984). Routine applications of Rasch measurement models in the development

and equating of instruments are performed by companies such as the Psychological







22

Corporation, school districts in Portland, Phoenix, Chicago, and New York, and medical

school admissions and certification boards, including the National Board of Medical

Examiners, the American Society of Clinical Pathologists, the American Dental

Association, the American Council of State Boards of Nursing (Fisher, Harvey, Taylor, et

al., 1995).

Linking of Measures in Healthcare

"Linking studies are in their infancy in health status assessment and functional

health assessment in rehabilitation" (McHomey, 2002, p. 389. It has only been in the last

14 years that linking has been used in health status and rehabilitation assessment

(McHomey). In this setting, the use of linking techniques has been discussed (Fisher,

Harvey & Kilgore, 1995; Fisher, Harvey, Taylor et al., 1995; McHorney;) and linking

studies have been conducted (Chang & Cella, 1997; Fisher, 1997; Fisher, Harvey, Taylor

et al.; McHomey & Cohen, 2000). One of the first published applications of IRT to

functional health assessments tested the unidimensionality and reproducibility of the 10-

item Physical Functioning Scale (Ware, 2003). Examples of using IRT, specifically

Rasch analysis, to link health care instruments have appeared in the literature over the

last six years. These include the linking the FIM and SF-36 (Segal et al., 1997), the FIM

and the Barthel Index (Tennant & Young, 1997), the FIM and the Patient Evaluation

Conference System (PECS), and the SF-36 and the LSU HIS Physical Functioning scale

(Fisher, Eubanks & Marier, 1997). Additionally, Badia, Prieto, Roset, Diez-Perez, &

Herdman (2002) attempted to develop a short Osteoporosis-Specific Quality of Life

Questionnaire based on the assemblage (equating) of the items of two existing

questionnaires through the Rasch mathematical model.









McHorney (2002) states, "Measurement specialists who serve rehabilitation

medicine and other specialties are at the cusp of a paradigm shift away from sizable

reliance on classical test methods to broader use of IRT methods" (p. 390). The use of

Rasch measurement is becoming the preferred method in the development of functional

assessments among rehabilitation professionals for constructing tests (Ottenbacher et al.,

1996). Hays et al. (2000), McHorney (2002), Ware (2003), and Cella and Chang (2000),

along with many others in the social and medical sciences in the last 30 years, have

described IRT as a more promising framework for designing tests (Hambleton, 2000).

Features of IRT, such as sample independence and test independence, account for this

growing popularity over classical test theory (CTT) (Douglas, 1999; Linacre et al., 1994;

McHorney, 1997; Prieto et al., 1998; Hambleton, 2000; Silverstein et al., 1992; Velozo et

al., 1995). The measurement units of IRT have interval properties versus ordinal raw

scores used in CTT. Scores that have interval properties can be analyzed appropriately

using parametric statistics, while such analyses may be inappropriate on ordinal data.

Additionally, logit measures may remove measurement bias at the extreme ends of the

measured construct, while extreme raw scores are biased by nature and may

underestimate the magnitude of a difference or change score at the extremes (Cella &

Chang, 2000). Another drawback of CTT is that tests developed in this manner are

sample dependent. This means that items may look difficult when they are administered

to examinees at the low end of the score continuum; and alternately, the same items look

easy to those examinees at the high end of the score continuum. Thus, the item statistics

are dependent upon the ability level of the subject sample and have little value when

measuring subjects of a different ability level. Similarly, the problem of test dependency

can be defined as one where the person statistics are dependent upon the difficulty of the









test. If one changes the difficulty of the items in the tests, the two scores are no longer

comparable.

There is some indication that IRT estimates of health outcomes are more

responsive to changes in health status over time. McHorney et al. (1997) found that the

sensitivity of the SF-36 physical functioning scale to differences in disease severity was

greater for a Rasch model-based scoring than it was for simple summated scoring. Fisher

(1997) states,

as it becomes increasingly clear that the accountability of educators,
psychologists, health care providers, and other professionals cannot remain tied to
scale-dependent indicators of unknown or low statistical sufficiency, the
practicality, scientific rigor, and mathematical beauty of scale-free measurement
will become more widely appreciated. (p. 93)

Hays et al. (2000) predict IRT methods will be used in health outcome measurement on a

rapidly increasing basis in the 21st century.

Two mathematical models that are appropriate for linking functional outcome

measures are (a) the one-parameter IRT model (the Rasch model), which solves for

person ability through the single parameter and item difficulty, and (b) the two-parameter

model, which solves for person ability through two parameters, item difficulty, and item

discrimination. There is fervent debate over which model should be employed for

psychometric analysis and linking instruments. The debate ranges from whether a

scientific model should be made to fit the data (two-parameter model) or the data to fit

the model (one-parameter model) (Wright, 1997). There is also the issue of whether item

discrimination should be held constant across items (Rasch model) or allowed to vary

between items (two-parameter model) (McHorney, 2002).

While several studies indicate item discrimination is not constant across

functional status items (McHorney, 2002; McHorney & Cohen, 2000; Spector &









Fleishman, 1998), for pragmatic reasons, namely the availability of relatively small

sample sizes of patients with linked FIM and MDS data (n= 450), we are choosing the

Rasch model because of its simplicity and robustness under conditions of heterogeneous

item discrimination and small samples (De Gruijter, 1986; Kolen & Brenan, 1995). The

Rasch model has been shown to produce stable linking with sample sizes of 300-400

(Kolen & Brenan, 1995; Skaggs & Lissitz, 1986).

Rasch analysis can be used to link healthcare inventories that measure the same

construct. By linking inventories in this manner, one can improve the usefulness of both

measures through

* Refining the rating scale.
S Identifying the items that form a unidimensional construct.
* Verifying the expected difficulty hierarchy of the items.
* Providing for a means of converting scores between the two measures .
* Matching the ADL measures to specific descriptions provided by the scale.

The Rasch theory stipulates that a respondent's probability of answering an item

correctly is dependent only on two factors: the respondent's ability and the

characteristics of the item (Hambleton, Swaminathan & Rogers, 1991). Rasch analysis

has the ability to uniquely link a person's ability to an item's difficulty level (Velozo &

Peterson, 2001). Thus, a score on an instrument can be directly linked to the descriptive

content of the instrument (Velozo & Peterson, 2001). The examiner is able to describe

precisely a person's level of ability based on the score they receive. In many other cases,

a score on a test is uninterruptible in terms of a meaningful description of the level of

ability it represents. You may still be able to say with a reasonable level of confidence

that someone has more or less ability than someone else, but you still do not have a clear

description of the precise level of ability that person possesses. Furthermore, Rasch

analysis allows for the ranking of items so that all items on a scale can be put on a

continuum from least challenging to most challenging.







26

There has been only one published study attempting to link the FIM to the MDS.

Williams et al. (1997) compared scores on FIM and rescaled MDS ADL and cognitive

items [referred to as the "Pseudo-FIM(E)"] on 173 rehabilitation patients admitted to six

nursing homes. The matching and rescaling of the MDS was accomplished through an

expert panel, with the panel judging that 8 out of 13 FIM items had a corresponding

MDS item. Intraclass correlation between the FIM and rescaled MDS was .81, although

the mean calibration of 6 of the 8 FIM items differed statistically from the rescaled MDS

items. While this initial attempt at linking the FIM and the MDS was encouraging, the

methodology and statistical approach of the study had considerable limitations. For

example, expert-panel rescaling of the MDS can be challenged due to the lack of

adequate empirical support (i.e., a different panel of experts could develop a different

FIM-MDS matching and rescaling; Velozo, Kielhofner & Lai, 1999).

Fisher, Harvey, Taylor, et al. (1995) were the first to use common-sample

equating to link two global measures of functional ability, the FIM and PECS (Patient

Evaluation and Conference System). Using the methodology described above, they

showed that the 13 FIM and 22 PECS ADL/motor items could be scaled together in a 35-

item instrument. The authors found that separate FIM and PECS measures for 54

rehabilitation patients correlated .91 with each other and correlated .94 with the

cocalibrated values produced by Rasch analysis. Furthermore, these authors

demonstrated that either instrument's ratings were easily and quickly converted into the

other via a table that used a common unit of measurement, which they referred to as the

"rehabit." This common unit of measurement allows for the translation of scores from

one instrument to another. Since the results of Rasch analysis are sample-free, these

tables/algorithms can be used for all future and past instrument-to-instrument score

conversions.







27

More recently, Fisher et al. (1997) replicated their previous study using common-

sample equating to link two self-report instruments: the 10 physical function items of the

Medical Outcome Scale (MOS) SF-36 (the PF-10) and the Louisiana State University

Health Status Instrument. Difficulty estimates for a subset of similar items from the two

instruments correlated at .95, again indicating that the items from the two scales were

working together to measure the same construct. McHorney and Cohen (2000) applied a

two-parameter IRT model to 206 physical functioning items (through 71 common items

across samples) and in a similar study; they linked 39 physical functioning items

(through 16 common items) from three modules of the Asset and Health Dynamics

Among the Oldest Old (AHEAD) study. Both studies demonstrated successful linking of

item banks through sets of common items, allowing placement of all items on a common

metric. Then in 2003, Jette et al. conducted a one-parameter Rasch partial credit analysis

for the entire item pool of the FIM, MDS, OASIS (Outcome and Assessment Information

Set for Home Health Care), and the PF-10 (the physical functioning scale of the SF-36)

items to develop an overall functional ability scale. These authors noted that the MDS

instrument covered content from the mid portion of the functional ability continuum with

less content coverage on the low and high ends while the FIM instrument covered a

relatively small portion along the middle to upper end of this continuum.

The above studies represent encouraging evidence that while physical function is

presently measured with many different instruments, it need not be tied to any particular

instrument. These studies support the idea that there can be a universal use of common

units for the measurement of functional ability. As a result, the creation of a translation

table between the FIM and the MDS should be possible for the measurement of

functional ability in PAC settings. Such a table would help avoid unnecessary









duplication of efforts when patients transfer from one PAC setting to another where

different instruments are used to measure functional ability. Scores can be readily

determined when they come in to a new setting on the new instrument. The creation of a

universal measure of functional ability in rehabilitation would create the following

conditions:

* It allows one to accurately and precisely evaluate the effectiveness of treatment
procedures.

S It allows one to accurately and precisely evaluate the effectiveness of the
program, thus increasing the program's efficiency.

S Measurements of progress are used to justify reimbursement for services.
(Merbitz, Morris & Grip, 1989)

In an unpublished study, Velozo (2004) performed a Rasch analysis to develop a

conversion table that linked scores on the FIM to analogous scores on the MDS. The

Rasch partial credit model Winsteps program (Linacre & Wright, 2001) was used to

calibrate item difficulties based on the linked FIM and MDS scores. The development of

this conversion table was based on a sample of 254 VHA patients who had completed

both the FIM and the MDS within 7 days of one another. The decision to restrict the

number of days between the completion of the FIM and MDS was based on the need to

minimize the impact any possible change in the patient's condition would have on the

scores. The use of 7 days as the criteria was based on the research lab's clinical

judgment. The physical ability items from both instruments were placed on the same

linear continuum and from this, a FIM-MDS conversion table was produced. The

purpose of this current study is to test the accuracy of the conversion table developed by

Velozo (2004). A new sample of records from 2,297 patients with linked FIM and MDS

scores collected between July 2003 to June 2004 was obtained from the VA's Austin







29

Automation Center databases. Using the conversion table developed by Velozo, the new

FIM scores are converted to MDS scores, and new MDS scores to FIM scores. The

converted-MDS (MDSc) scores are then statistically compared to the actual MDS scores

and the converted-FIM (FIMc) scores to the actual FIM scores. From these comparisons,

a determination of the accuracy of the FIM-MDS conversion table will be made.















CHAPTER 3
METHODOLOGY

Introduction

The purpose of this study was to test the accuracy of a FIM-MDS conversion

table that was designed to transform a score on the physical component of the FIM to its

corresponding score on the MDS and vice versa. The methods utilized to investigate the

accuracy of the FIM-MDS conversion table are described in the following sections of this

chapter:

S Source of the Data
S Sample
S FIM and MDS Motoric Items
* Procedures Involved in the Creation of the FIM-MDS Conversion Table
* Statistically Testing the Accuracy of the FIM-MDS Conversion Table

This study was approved by the University of Florida's Institutional Review

Board for the protection of human subjects, as well as the Veteran's Administration's

Subcommittee on Clinical Investigations. This study also obtained a HIPAA Waiver of

Authorization.

Source of the Data

The FSOD and MDS data reside in two separate databases at the VA's Austin

Automation Center (AAC) (Veterans Health Administration, 2004). Upon consultation

with Dr. Hojlo, Chief of the VA Nursing Home Care, the most accurate VA-MDS data

were available starting in June of 2002. Therefore the data extractions were based on

data collected from June 2002 to May 2003. Data from both databases were downloaded

and merged on the basis of social security numbers, using the statistical software,

Microsoft Access.







31

A single-group design was used, as both inventories were completed by the same

group of subjects. Because the same population completed both inventories, population

invariance and symmetry exist. This eliminates concerns that might otherwise arise over

differences caused by variability in the composition of the sample population.

Inclusion Criteria of Subjects

Subjects included in this study were those who were part of the VA's FSOD and

MDS databases, who had FIM and MDS scores completed no more than seven days apart

between June 2002 and May 2003 and who had data on all items included in the

development of the FIM-MDS conversion table. The decision to restrict the amount of

time that elapsed between the administration of the FIM and MDS was based on the need

to minimize the impact possible changes) in a patient's condition might have on the

resulting FIM and MDS scores.

Inclusion of Women and Minorities

Inclusion criteria were for males and females and all ethnic groups, as they

occurred in the VA's FSOD and MDS databases.

Sample

The records of 57,237 patients who underwent a FIM evaluation and 69,954

subjects who had MDS scores in a VA post-acute care setting between June 2002 and May

2003 were obtained from two separate databases housed at the VA's AAC. The linking of

these records, based on patient social security number and no more than 7 days between

FIM and MDS test dates, resulted in 2,521 matches. This data was then cleaned to exclude

duplicate records of the same subject with more than one match of test dates, as well as

those records that included missing or invalid scores (i.e., ratings other than acceptable) for

items on either the FIM or the MDS. This last exclusion was made to ensure that total







32

scores were compared to total scores, which is the basis of the FIM-MDS conversion table.

The result was 2,297 unique subjects with linked FIM and MDS scores.

The age of subjects ranged from 19-89+ years. Of those subjects between the

ages of 19 and 89, 50.7% were under the age of 70, 31.2% were in their 70's and 15.2%

were in their 80's. Only 1.5% of the sample was over the age of 89. The majority of the

sample was Caucasian at 73% with 20% being African American and 5% Hispanic.

Ninety-six percent of the sample was male and 44% were married. The days between the

administration of the FIM and the MDS ranged from 0-7 days with a mean of 5 (1.9)

days. Thirty-five percent (1,531) of the subjects had a diagnosis of stroke, 23% (525)

had lower extremity orthopedic problems, and 12% (271) had lower extremity

amputations. The remainder of the sample consisted of subjects with a variety of

impairments.

FIM and MDS Motor Items

Table 3-1 is a list and comparison of the FIM and MDS motor items included in

this analysis. There are nine pairs of items between the FIM and MDS that are

considered to represent the same or nearly the same activity in this study. These pairs

include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel

management, bladder management, transferring, and walking (Table 3-1). Items

included in only one instrument and not the other are bed mobility and stair use. While

both the FIM and MDS instruments include an item for eating, the FIM requires a higher

skill level in order to achieve the highest rating. This is because the FIM does not permit

finger feeding, nor does it allow people eating through adaptive means to achieve the

highest rating. There are also similar grooming and hygiene items on the two

instruments. The term, "bathing" for both instruments connotes a full body bath, to









Table 3-1. Comparison ofFIMT to MDS ADL/motor items


MDS Items
Eating
Bed mobility


Grooming
Bathing
Dressing-upper body
Dressing-lower body
Toileting
Bladder management
Bowel management
Bed, chair, wheelchair (transfer)
Toilet (transfer)
Tub, shower(transfer)

Walk/wheelchair


Personal hygiene
Bathing*
Dressing

Toilet use
Bladder continence**
Bowel continence**
Transfer



Walk in room
Walk in corridor
Locomotion on unit
Locomotion off unit


Stairs


FIMTM rating scale
7 complete independence (timely, safely)
6 modified independence (device)
5 supervision
4 minimal assist (subject = 75%+)
3 moderate assist(subject = 50%+)
2 maximal assist (subject = 25%+)


1 total assist (subject = 0%+)


MDS-Rating scale (exceptions noted
below)
0 independent
1 supervision
2 limited assistance
3 extensive assistance
4 total dependence
8 activity did not occur during the
entire 7-day period


* Bathing in the MDS has a separate rating scale: 0-Independent, 1-Supervision, 2-
Physical help limited to transfer only, 3-Physical help in part of bathing activity, 4-
Total dependence, 8-Activity itself did not occur during entire 7 days.
** Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually
Continent, 2-Occasionally Continent, 3-Frequently Incontinent, 4-Incontinent

exclude one's face and hands. Yet, the MDS also incorporates bathtub or shower

transfers as part of bathing, while the FIM has a separate item for tub and shower

transfers. The MDS has one item for dressing, while the FIM divides the task into

dressing, upper body and dressing, lower body. In this study, the FIM item for dressing,


FIMTM Items


Eating









lower body was matched with the MDS item for dressing since, based on the lab's

clinical judgment, dressing the lower body would be considered a more difficult task than

dressing the upper body. This more difficult aspect of the ADL is incorporated in the one

MDS item for dressing. The FIM item for toileting is matched with the MDS item for

toilet use, as they have similar definitions, although the MDS includes toilet transfer in

the task while the FIM has a separate item for transfer. The MDS item for transfers is

then matched with the FIM item for transfers: bed, chair, and wheelchair. The bowel and

bladder control items on the FIM are matched with the bowel and bladder continence

items on the MDS. The FIM item for walk/wheelchair addresses one's ability to walk or

use a wheelchair safely on a level surface, while the MDS has four items for walking to

include walk in room, walk in corridor, locomotion on unit and locomotion off unit.

Although not included in the definition of the FIM item for walk/wheelchair, "150 feet is

specified as the performance criterion in the clarification of the rating scale" (Rogers,

Gwinn & Holm, 2001, p. 6). Therefore, the FIM item for walk/wheelchair was matched

to the MDS item for locomotion off unit. Furthermore, the FIM incorporates safety into

the definition of many of its items, such as grooming, bathing, dressing, transfers,

toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers,

Gwinn & Holm, 2001).

The FIM and MDS have different response scales on which the physical

functioning items are scored. A clear distinction in the administration of these two

assessments is that the items of the FIM are scored at the time of the assessment, while

ratings on the MDS are based on observed performance over a 7-day period.

Furthermore, the FIM items have seven response levels, while the MDS has a range of









five. The FIM scoring criteria are shown in Table 3-2. The MDS scoring criteria are

shown in Table 3-3.

While the FIM motor items assess the percent of effort that is provided to the

patient to accomplish a task, the MDS measures the number of times during a 7-day time

period a patient required a certain level of assistance to perform a task.

Table 3-2. FIM scoring criteria


7 Complete independence


6 Modified independence


Supervision
(Standby prompting)


All of the tasks described as making up the activity are
typically performed safely, without modification,
assistive devices, or aid and within a reasonable
amount of time.


One or more of the following may be true: the activity
requires an assistive device; the activity takes more
than reasonable time, or there are safety (risk)
considerations.


Supervision or Setup-Subject requires no more help
than standby, cuing or coaxing, without physical
contact, or, helper sets up needed items or applies
orthoses.


Minimal assist Subject requires no more help than touching, and
(Minimal Prompting) expends 75% or more of the effort.


Moderate Assistance Subject requires more help than touching, or expends
(Moderate Prompting) half (50%) or more (up to 75%) of the effort.


Maximal assistance Subject expends less than 50% of the effort, but at
(Maximalprompting) least 25%.


1 Total Assistance


Subject expends less than 25% of the effort.


(Evans, 2002)









Table 3-3. MDS scoring criteria
0 Independence No help or staff oversight OR Staff help/oversight
provided only one or two times during the last seven
days.
1 Supervision Oversight, encouragement, or cueing provided three or
more times during last 7 days -OR- Supervision (3 or
more times) plus physical assistance provided, but
only one or two times during the last 7 days.
2 Limited Assistance Resident highly involved in activity, received physical
help in guided maneuvering of limbs or other
nonweight-bearing assistance o three or ore occasions
-OR- limited assistance (3 or more times), plus one
weight-bearing support provided, but for only one or
two times during the last 7 days.
3 Extensive Assistance While the resident performed part of activity over last
seven days, help of following type(s) was performed
three or more times:
--Weight-bearing support provided three or ore times;
--Full staff performance of activity (3 or more times)
during part (but not all) of last 7 days.
4 Total Dependence Full staff performance of the activity during the entire
7-day period. There is complete nonparticipation by
the resident in all aspects of the ADL definition task.
If staff performed the activity for the resident during
the entire observation period, but the resident
performed part of the activity himself/herself, it would
not be coded as a "4" (Total Dependence).
(CMS, 2003).

Procedures Involved in the Creation of the FIM/MDS Conversion Table

For the purposes of creating a FIM-MDS conversion table, Velozo (2004)

obtained linked FIM and MDS scores from the records of 254 subjects. The linking of

instruments using IRT methodologies is generally dependent on item calibrations, which

are the "difficulty" measures of the items. In essence, item calibrations serve as the

markings on the conversion ruler. Rasch analysis of the FIM and MDS converts a

patient's responses on the instrument items to a measure of ADL/motoric function.







37

Prior to performing the Rasch analysis, several steps were taken so that the FIM

and MDS rating scales were conceptually consistent. One inconsistency between the

FIM and MDS is that the MDS includes a rating for "activity did not occur." Using a

procedure adapted by Jette, Haley, and Ni (2003), this MDS rating was recorded as part of

the "total dependence" rating. The rationale underlying this decision was that a likely

explanation for an activity not occurring was that the activity could not be performed

(Buchanan, Andres, Haley, Paddock & Zaslavsky, 2002; Jette et al., 2003). Other

inconsistencies between the FIM and MDS are that the rating scales progress in different

directions and have different ranges (i.e., from 1 to 7 for the FIM and from 4 to 0 for the

MDS). In order to adjust for these differences, the MDS scale was rescored and rescaled

to match the rating scale used in the FIM. For example, a "4" on the MDS, which

represents total dependence, was recorded as a "1" to match Total Assistance on the FIM

and a "0" on the MDS, which represents "Independence," was recorded to a "7" to match

the rating for "Complete Independence" on the FIM (Table 3-4).

Table 3-4. FIM-MDS score conversion
MDS Score Conversion FIM
Independent 0 to 7 Complete Independence
Supervision 1 to 5 Supervision
Limited Assistance 2 to 4 Minimal Assistance
Extensive Assistance 3 to 2 Maximal Assistance
Total Dependence 4 to 1 Total Assistance

Following the rescoring of the MDS rating scale, the next step in creating the

FIM-MDS conversion table was to run a Rasch partial credit model analysis on the linked

FIM and MDS scores, using Winsteps (Linacre & Wright, 2000). This combined

analysis placed the FIM and MDS items and rating-scale calibrations on the same linear

scale with the same local origin. That is, FIM item and rating-scale calibrations became







38

"linked" to MDS item and rating-scale calibrations. This also provided cocalibrated item

and rating-scale values, which were then used as anchors in separate FIM and MDS

analyses. Both the anchored FIM and MDS analyses generated output tables that

associated total FIM and MDS raw scores with a common logit scale. These analyses

resulted in a conversion table whereby total FIM raw scores could be translated into total

MDS raw scores and vice versa (Table 3-4).

Converting a score from the FIM to the MDS and vice versa is as simple as

locating a score under either the FIM or MDS column (Table 3-5), reading across the

adjacent logit's column to find the equivalent score on the alternate instrument. It is

hypothesized that a score of 16 on the MDS, represents the same amount of functional

independence as a score of 39 on the FIM. Similarly, a score of 58 on the FIM represents

the same amount of functional independence as does an MDS score of 29 (Table 3-4).

Nine of the FIM items have corresponding items on the MDS that address similar

areas of physical functioning. These items include eating, grooming/personal hygiene,

bathing, dressing, toileting, bowel management, bladder management, transferring, and

walking. After performing the Rasch analysis on the FIM and MDS total scores to link

the two measures, the resulting correlation between the similar FIM and MDS items was

.822 at p<.01. Similar person measures between scores on the FIM and the MDS

correlated at .703.

Statistically Testing the Accuracy of the FIM-MDS Conversion Table

The purpose of this research study was to test the accuracy of the FIM-MDS

conversion table. In order to accomplish this, a second, independent sample of VA

patients with scores on both the FIM and MDS, completed within 7 days of one another,

was obtained from the VA's Austin Automation Center. Before submitting this dataset to











FIM logit MDS FIM logit MDS FIM logit MDS


13 -3.80
14 -2.77
15 -2.26
16 -1.99
17 -1.81
18 -1.67
19 -1.56
20 -1.46
21 -1.38
22 -1.31
23 -1.24
24 -1.18
25 -1.12
26 -1.07
27 -1.02
28 -0.97
29 -0.92
30 -0.88
31 -0.83
32 -0.79
33 -0.75
34 -0.71
35 -0.67
36 -0.64
37 -0.60
38 -0.56


39 -0.53
40 -0.49
41 -0.46
42 -0.42
43 -0.39
44 -0.35
45 -0.32
46 -0.29
47 -0.25
48 -0.22
49 -0.18
50 -0.15
51 -0.12
52 -0.08
53 -0.05
54 -0.01
55 0.02
56 0.06
57 0.10
58 0.13
59 0.17
60 0.21
61 0.25
62 0.29
63 0.33
64 0.37


65 0.41
66 0.46
67 0.5
68 0.55
69 0.6
70 0.64
71 0.69
72 0.75
73 0.8
74 0.86
75 0.92
76 0.98
77 1.04
78 1.11
79 1.19
80 1.27
81 1.35
82 1.45
83 1.55
84 1.67
85 1.8
86 1.96
87 2.15
88 2.40
89 2.76
90 3.38
91 4.52


the FIM-MDS conversion table, the same adjustments made to the MDS rating scale

when creating the conversion table were applied to the current dataset. Using the

Statistical Package for Social Sciences (SPSS), version 12.0 for Windows, the MDS

rating for "activity did not occur" was recorded as "total dependence" following the

scoring protocol of the FIM. Then, the MDS rating scale was recorded and rescaled also

to match the rating scale of the FIM. The FIM-MDS conversion table was then used to

convert the second sample of actual FIM scores, designated as FIMa, to converted MDS


Table 3-5.


FIM-MDS conversion table









scores, designated MDSc In the same manner, MDSa scores were converted to FIMc

scores. Then, the actual scores on the FIM and MDS were compared to their

corresponding converted scores to determine how similar the actual and converted scores

were.

The goal of equivalence testing is to demonstrate that two or more conditions are

statistically the same (Stegner, Bostrom & Greenfield, 1996). In this type of testing, one

reverses the role of the null and alternative hypotheses and then by testing a set of these

reversed hypotheses, demonstrates equivalence with a predetermined significance level

just as when demonstrating a difference between groups (Stegner et al.). The

equivalence methodology is a simple application of bioequivalence principles proposed

in Pharmacokinetics and Biopharmaceutics recently. The idea is to "prove" statistically

that two drugs or formulations are equivalent (Berger & Hsu, 1996; U.S. FDA, 1997,

1999). This methodology was used in the current study in order to compare the statistical

equivalence of FIMa and FIMc scores, as well as MDSa and MDSc scores. It was

hypothesized that a minimum of 75% of the actual and converted scores should be within

5 points of one another in order for the conversion to be considered accurate. If less than

that occurred, then the conversion table would not be considered accurate enough for use

in a clinical setting. A difference of 5 points was employed, as Forrest, Schwam, and

Cohen (2002) found that "each 5-point decrement in the FIM score correlated with the

need for about one hour per day of help in mobility, basic activities of daily living, and

instrumental activities of daily living" (p. 57). Yet, Granger et al. (1993) indicated that

while no recommendations existed for what constituted a clinically significant change on

the FIM, a 10-point improvement decreased by almost 50% the time required to care for









a group of stroke patients in the community. For this study, a clinically significant

difference in scores was set at the more rigorous 5-point increment.

In order to apply a more precise analysis to the determination of the level of

accuracy of the FIM-MDS conversion table, techniques as described in Dorans and

Lawrence (1990) were utilized. In their study, Dorans and Lawrence tested the score

equivalence of nearly identical editions of the Scholastic Aptitude Test (SAT). These

two versions were comprised of the same test questions, but the order in which the test

was administered differed. In one test situation, the 40-item verbal section of the SAT

might precede the 45-item verbal section. In the other situation, the 45-item verbal

section might proceed the 40-item verbal section. The test was administered to what was

presumed to be statistically equivalent groups of examinees. Linear equating techniques

were used to equate one version of the test to the other version and the accuracy testing

of the resulting scores was accomplished by checking whether the identity transformation

fell within a reasonable confidence interval placed around that equating function (Dorans

& Lawrence). The difference between the equating function and the identity

transformation was calculated and then that difference was divided by the standard error

of the equating function. If the resulting ration fell within a bandwidth of plus or minus

two, then the equating function was considered to be within sampling error of the identity

function.

Kolen and Brennan (1995) indicated that in order for equating of two measures to

be successful, the four moments of the distribution should be statistically equivalent.

Therefore, the four moments of the distribution, including the means, standard

deviations, skewness, and kurtosis were also calculated and compared.







42

Correlations between the actual and converted scores on both the FIM and MDS

were also determined, as was effect sizes. Effect sizes give a clear indication of the

amount of difference that exists between two scores distributions. The standard of effect

sizes as noted in Cohen (1988) was used to determine the percent of overlap that existed

between the converted and actual scores (Table 3-6).

Table 3-6. Effect size
Cohen's Standard Effect Size Percentile Standing Percent of Overlap
Large 0.8 79 52.6%
0.7 76 57.0%
0.6 73 61.8%
Medium 0.5 69 69.0%
0.4 66 72.6%
0.3 62 78.7%
Small 0.2 58 85.3%
0.1 54 92.3%
0.0 50 100.0%
(Adapted from Cohen, 1988)

Additionally, an analysis on the data was performed to obtain an understanding of

how similar the raw scores on similar items between the FIM and MDS were. A factor

that might negatively impact the accuracy of the conversion table would be the presence

of large differences in the scores for individuals on similar FIM and MDS items. In order

to systematically test for the level of disparate scores present in the current dataset, the

ratings on the MDS were converted to their closest corresponding ratings on the FIM, as

shown in Table 3-7.

Table 3-7. Rating scale conversion
MDS Score Conversion FIM
Independent 0 to 7 Complete Independence
Supervision 1 to 5 Supervision
Limited Assistance 2 to 4 Minimal Assistance
Extensive Assistance 3 to 2 Maximal Assistance
Total Dependence 4 to 1 Total Assistance







43

For example, the score for independence on the MDS, "0," was converted to a "7"

to correspond with the FIM score for complete independence. Similarly, an MDS score

of"3," indicating extensive assistance, was converted to a score of "2" for maximal

assistance. This rating conversion was accomplished on the nine FIM and nine MDS

items that most closely matched (Table 3-8).

Table 3-8. Similar FIM and MDS items
FIMTM items MDS items
Eating Eating
Grooming Personal hygiene
Bathing Bathing
Dressing-lower body Dressing
Toileting Toilet use
Bladder management Bladder continence
Bowel management Bowel continence
Bed, chair, wheelchair (transfer) Transfer
Walk/wheelchair Locomotion off Unit


It was hypothesized that a difference of four points on the rating scale represented

a important difference in a person's functional ability. For instance, a score of "" on the

FIM indicates 'Total Assistance,' where the patient exerts less than 25% of the effort

required in performing a task. A distance of four points away from the rating of "1"

would be a score of "5," which represents 'supervision,' indicating that the subject

requires no more help than stand by assistance to complete a task. The criteria of looking

at only matched scores with a difference of four or more points is rigorous in that this

situation only occurs between the following three score categories: 1-5; 1-7; and 2-7.

Setting a criterion of selecting only those subjects who had a difference of four or more

points between scores on similar items demonstrates the examples of ratings that are

clinically different on similar items.















CHAPTER 4
RESULTS

Statistical Analyses

The FIM-MDS conversion table was used to transform FIMa scores, obtained

from the records of 2,297 subjects, to MDSc scores and MDSa scores to FIMc scores.

The converted scores were then analyzed statistically, first at the individual level and

then at the group level, as a means of determining the level of accuracy of the FIM-MDS

conversion table. In order for the conversion table to be considered accurate for use in

clinical settings, it was hypothesized that 75% of the actual and converted scores on the

FIM and the MDS should be no more than five points apart. Next, techniques used by

Dorans and Lawrence (1990) to test the accuracy of an attempt to equate nearly identical

editions of the Scholastic Aptitude Test (SAT) were applied to this dataset. These

procedures were used to determine whether the converted scores on the FIM and MDS

fell within a reasonable confidence interval placed around that actual FIM and MDS

scores. The difference between the actual and converted scores was calculated and then

that difference was divided by the standard error of the actual scores. If the resulting

ratio fell within a bandwidth of plus or minus two, then the converted scores were

considered to be within sampling error of the actual scores.

When testing the accuracy of the FIM-MDS conversion table from the

perspective of group scores, the equivalence of the four moments of the distributions

(i.e., the means, standard deviations, skewness, and kurtosis) were compared. Next, the









amount of overlap that existed between the actual and converted scores on the FIM and

MDS was determined by calculating effect sizes. Correlation between the actual and

converted scores on the FIM and MDS were also calculated and those results are

displayed graphically, as well in scatter plots.

Statistical Results at the Level of the Individual

Of the 2,297 subjects analyzed, 25% (574) of the sample had FIMa and FIMc

scores that fell within five points of one another with the difference in scores ranging

from 0 to 71 points (Figure 4-1). For the MDS, 37% (850) of the sample had actual and

converted scores within no more than five points of one another with the difference in

scores ranging from 0 to 48 (Figure 4-2). These percentages fell well short of the 75%

standard for both the FIM and the MDS.

For comparison purposes, the percentage of subjects with actual and converted

FIM and MDS scores within 10 points of one another was also calculated. On the FIM,

45.5% (1,045) of the subjects had actual and converted scores within 10 points and 64%

(1,470) of the subjects had MDS scores within 10 points. Even when this more lenient

criterion was used, the results continue to fall short of the 75% standard.

The equivalence of the actual and converted FIM scores and the actual and

converted MDS scores was also evaluated using the test of equivalence employed by

Dorans & Lawrence (1990). For the FIMa vs. FIMc scores, 8.4% of the conversion met

the criterion for equivalence and for the MDSa and MDSc scores, 6.4% met this

criterion.

Statistical Results at the Group Level

Cooper (1989) concluded that in order for the equating of scores to be successful,

all four moments of the distribution should be similar. The four moments of the














Ouu -


500



400-



300-
U,
0
3 200-
Ao

L 100.
E,
E
T7 f


Std. Dev = 10.38
Mean = 13.5
N = 2297.00


0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
5.0 15.0 25.0 35.0 45.0 55.0 65.0


Number of Points by which the FIMa and FIMc Differ


Figure 4-1. Score difference between the FIMa and FIMc


800,





600.





400-



(D
* 200-


0


Std. Dev = 7.05
Mean =9.1
N = 2297.00


0.0 10.0 20.0 30.0 400 50.0
5.0 15.0 25.0 35.0 45.0


Number of Points by which the MDSa and MDSc Differ


Figure 4-2. Score difference between the MDSa and MDSc


z 0


0









distribution, the mean, standard deviation, skewness, and kurtosis of the actual and

converted scores on the FIM and MDS are displayed in Table 4-1.

Table 4-1. Four moments of the distributions
FIMa FIMc MDSa MDSc
N
Valid 2297 2297 2297 2297
Missing 0 0 0 0
Mean 52.37 60.35 31.86 25.86
Std. Error of Mean .432 .409 .280 .295
Median 54.00 62.00 33.00 26.00
Std. Deviation 20.69 19.60 13.43 14.12
Skewness -.143 -.388 -.406 -.041
Std. Error of Skewness .051 .051 .051 .051
Kurtosis -.855 -.573 -.630 -.901
Std. Error of Kurtosis .102 .102 .102 .102

The mean of the FIMc was eight points greater than the mean of the FIMa while

the mean of the MDSa exceeded the mean of the MDSc by six points. The standard

deviation between the actual and converted FIM scores (20.69 and 19.60) was within one

point of each other and for the two MDS scores, there was slightly more than a one-point

difference at 13.43 for the MDSa and 12.12 for the MDSc. Since the mean of the FIMa

and FIMc differed by only eight points, these scores fell well within one standard

deviation of each other. Similarly, the means of the actual and converted MDS differed

by six points and also fell within one standard deviation of each other. Therefore, the

first two moments of the distributions, the mean and standard deviation, were equivalent.

Since the mean score can be highly influenced by outliers, the median scores for

the distributions were also reported. The difference in the medians of the FIMa and

FIMc score distributions was eight points, just as it had been with the difference in the

means. The medians of the MDSa and MDSc differed by seven points. The medians for

all four of the score distributions exceeded their respective means, indicating the

presence of a negative skew to the score distributions. Normal distributions produce a







48

skewness statistic of about zero. A skewness or kurtosis value of two standard errors or

greater, regardless of sign, likely deviates from a normal score distribution to a

significant degree (Brown, 1997). Two times the standard error of skewness for all four

distributions was .051 and two times the standard error of kurtosis was .102, again for all

four score distributions. Therefore, the distributions of the actual FIM and MDS

instruments had a significant negative skew, indicating that the subjects measured on

these two inventories were generally more able than the tests were difficult. The

distribution of the FIMc was also negatively skewed to a significant degree, yet the

MDSc did not have a significant skew value and the distribution would be considered

symmetrical in this regard. The distributions of both the actual and converted scores on

the FIM and MDS all had negative kurtosis values, indicating that these distributions

were flatter than what one would expect and differed from normal to a significant degree.

The results of this conversion revealed a substantial overlap between the

distributions of the actual and converted FIM scores (Figures 4-3 and 4-4), as well as

between the actual and converted MDS scores (Figure 4-5 and 4-6). An effect size of .2

demonstrated an 85.3% overlap of the distributions for the actual and converted scores in

each case.

Correlations between the actual and converted scores were calculated and

revealed a .724 correlation at p<.01 between the FIM scores and a correlation of .745 at

p<.01 between the MDS scores. Scatter plots of the actual and converted scores for the

FIM (Figure 4-7) and for the MDS (Figure 4-8) graphically demonstrate the level of

correlation between the actual and converted scores on the two instruments. Figure 4-5

demonstrates a ceiling effect exists for the actual FIM scores, while Figure 4-6

demonstrates a ceiling effect for the converted MDS scores.





























Std. Dev= 20.69
Mean =52.4
N = 2297.00


15JD 25JD 35.0 46D5 55JD 65JD 75D 85JD
20.0 300 400 50JO 600 700 80.0 90O


Total FIM Actual Scores


Figure 4-3. Distribution of FIM actual scores


Std. Dev= 19.59
Mean = 60.4
N = 2297.00


15.0 25.0 35.0 45.0 55.0 65.0 75.0 85.0
20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0


Total FIM Converted Scores


Figure 4-4. Distribution of FIM converted scores

























Std.Dev= 1 343
Mean= 31.9
N = 2297.00


0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0


Total MDS Actual Scores

Figure 4-5. Distribution of MDS actual scores


300





200





100




0
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0

Total MDS Converted Scores

Figure 4-6. Distribution of MDS converted scores


Std. Dev = 14.12
Mean= 25.9
N =2297.00













100-




80-




60-


0 20 40 60 80 100


Actual FIM Scores

Scatterplot of the FIMa and FIMc scores



60


50 -a -


40

Au a aD aScre
30 a a
a ou
o

20 -o a

10 on on

oU
a0
0- ----- aaa m a an a



-10
-10 0 10 20 30 40 50 60


Actual MDS Scores


Scatterplot of actual MDS and converted MDS scores


0 w a a i1 'P. -U12 -M U
a a*FM


IP a, u
u a0 % 3
c a6 cg3 9 on




air r -a nun a, p, 4
u aPUa
EtmH am


Figure 4-7.

















o




a,
0



-o
C)




C-


Figure 4-8.







52

Discrepancies in the Dataset

Of the 2,297 subjects in the dataset used in this study, 51% (1,163) had at least

one of the similar items in which there was a difference of four or more rating points

between scores on the FIM and the MDS. Three percent (76) of the subjects had four or

more of the nine similar items with score differences of four or more rating points. Two

percent (2) of the 109 subjects with FIM and MDS scores recorded on the same day had

more than four similar items with score differences of four or more rating points.















CHAPTER 5
DISCUSSION

Summary of Results

The results of this study testing the accuracy of the FIM-MDS conversion table

were mixed, as those statistics at the group level tended to support the accuracy of the

conversion, while those at the individual level did not. The findings that support a

conclusion of equivalency between the actual and converted scores on both the FIM and

MDS included an 85.3% overlap between the respective score distributions, as well as a

correlation of .724 for the FIM and .745 for the MDS. The means of the actual and

converted FIM scores were well within one standard deviation of each other, as were the

means of the actual and converted MDS scores. The results of the two one-sided test

procedures supported this conclusion of equivalency between the respective FIM and

MDS means, as well.

On the other hand, only 25% (574) of the sample had FIMa and FIMc scores

within 5 points of one another and 37% (850) had MDSa and MDSc scores within that

range. Those percentages fell well short of the hypothesized 75% of the sample having

actual and converted scores that were no more than 5 points apart. If the standard for an

accurate conversion system were lowered to allow for up to a 10-point difference

between actual and converted scores, then 45% (1,045) of the sample would have FIM

scores and 64% (1,470) would have MDS scores within that range. While these

percentages were closer to the 75% criteria, they still fall short. The presence of a









negative skew for all four of the score distributions was an indication that the subjects'

ability levels were higher than the difficulty levels of the inventories. All of the score

distributions except for the MDSc demonstrated a skewness value that resulted in a

significant departure from normal. The level of kurtosis for all four distributions also

deviated from normal to a significant degree. When using the very rigorous procedures

described by Dorans and Lawrence (1990) to determine equivalency, only 8.4% of the

scores on the FIMa and FIMc and 6.4% of the MDSa and MDSc scores met this criterion.

A ceiling effect in the distribution of the FIMc and the MDSa scores was present.

Submitting the FIM and MDS scores to the FIM-MDS conversion table resulted in an

inflation of FIM scores and a deflation of MDS scores. Taking all of the above

information into consideration, the FIM-MDS conversion table passed less stringent

standards for equivalency, generally at the group level, but failed when focusing on

statistics at the individual level. Since other attempts at creating conversion tables

between instruments used in rehabilitation have not gone further to test the accuracy of

those conversions, no direct comparisons to other research findings can be made.

An understanding of the psychometrics of the FIM-MDS conversion table is

important when interpreting the results of this study. The conversion table was

developed from a dataset of 253 subjects with FIM and MDS scores occurring within 7

days of one another. Similar FIM and MDS items used in the development of the

conversion table had a correlation of .82 at p<.01, and there was a correlation of .70 at

p<.01between similar person measures. These correlations are not as strong as those

reported in the study by Fisher et al. (1997). Fisher et al. found a correlation of .95

between difficulty estimates for a subset of similar items on the PF10 and the PFS, both

of which are self-report measures. The lower correlations for the FIM-MDS conversion







55

table for similar item and similar person measures may be explained by differences in the

design between this and previous studies. Fisher et al. used a convenience sample of 285

patients who were waiting for appointments in a public hospital general medicine clinic.

These patients were asked to complete the PF10 and the PFS inventories while they

waited to see a doctor and, therefore, no opportunity existed for a change in physical

condition to take place between the completion of the two surveys. There was also no

possibility for different raters to score the two instruments on the same subject since the

raterr" in both cases was the patient. Fisher et al. also removed the least consistent cases

from the analysis, meaning cases with the highest outfit statistics were not included in

creating the conversion table.

The correlations found in the current study were also not as strong as those

reported by Fisher, Harvey, Taylor, et al. (1995), who obtained a correlation of similar

person measures of .91 between the instruments used in the development of the Rehabits'

translation scale. The stronger correlation obtained by Fisher, Harvey, Taylor, et al. may

be the result of differences in the design of the study, as compared to the current one.

Fisher and colleagues used the results of 54 consecutive patients admitted to a free-

standing rehabilitation hospital, who were rated on both the FIM and the PECS at

admission and discharge. Thus, raterr" variability was controlled, and there was no

possibility for physical changes to take place in the patient's condition between the

administrations of the two inventories upon admission and then again upon discharge

because the inventories in each case were completed at the same time.

Studies by both Fisher, Harvey, Taylor, et al. (1995) and Fisher et al. (1997) that

used cocalibration equating measures to link healthcare instruments did not test the

correlations between actual and converted scores. Therefore, it is difficult to clearly







56

define the significance of a correlation of .72 for the FIMa and FIMc and a correlation of

.75 for the MDSa and MDSc. If this were a classical test-retest reliability study, these

correlations would be considered low. The question left for the current study is whether

better results could be obtained by creating a conversion system based on more accurate

data. Yet, an attempt to use more controlled data collection methodologies would limit

the applicability of the study in clinical settings.

The limited accuracy of the FIM-MDS conversion table may be a result of

problems with the sample upon which it was based. This conversion system was

developed from a dataset of 253 subjects with FIM and MDS scores occurring within

seven days of one another. It may be that a larger dataset is necessary to create a highly

accurate conversion system. Furthermore, a significant limitation to the dataset used in

this study is the presence of scores on similar FIM and MDS items for the same subject

that differ markedly. It can be argued that differences in the definitions of similar items

between the FIM and MDS led to variability in scores. For example, the definition of

"eating" on the MDS focuses "on the intake of nourishment by any means, regardless of

skill, and includes the use of alternate forms of obtaining nourishment, such as tube

feeding" (Rogers et al., 2001 p. 7-8). Yet, eating on the FIM is limited to the use of

suitable utensils to bring the food to the mouth (Rogers et al.). The MDS item for

bathing includes bathtub or shower transfers while the FIM bathing item does not. And,

the MDS toileting item includes the ability to transfer to and from the toilet while the

FIM item for toileting does not. Furthermore, the FIM incorporates safety into the

definition of many of its items, such as grooming, bathing, dressing, transfers, toileting

skills, walking and wheelchair mobility, while the MDS does not (Rogers et al.). Thus,

one person could conceivably obtain a very different score on two of the similar items









between the FIM and MDS. Alternatively, the fact that up to a 7 day difference in test

scores between the FIM and the MDS was allowed may have also had an impact on the

presence of discrepancies that were seen between similar items for the same subject.

While the decision to restrict the number of days between the completion of the FIM and

MDS was based on the need to minimize the impact any possible change in the patient's

condition would have on their scores, it is likely that even this restriction did not go far

enough to eliminate this source of error in the study. Thus, some of the discrepancies in

scores that were seen between similar items may be due to a change in the subject's

physical condition.

A related explanation for the problems encountered in developing a highly

accurate conversion table is the existence of discrepancies between similar categories in

the rating scales of the FIM and the MDS. A FIM score of "1" refers to "Total

Assistance," in which the patient puts forth less than 25% of the effort necessary to

perform a task. The corresponding score on the MDS is a "4" for "Total Dependence."

This score is defined as, "full staff performance of the activity during entire 7-day period.

There is complete nonparticipation by the resident in all aspects of the ADL definition

task." A noticeable difference exists between these two rating definitions. On the FIM,

the patient may exert up to a quarter of the effort needed to perform the test, while on the

MDS the patient does not participate at all in the activity. Instead, the staff is required to

perform the full activity. It may be reasonably be argued that there is a clinically

significant difference in a patient who can exert up to 25% of the effort required to

complete a task and one who cannot exert any effort at all. Similarly, a score of "2" on

the FIM refers to "Maximal Assistance," in which the patient puts forth less than 50% of

the effort necessary to do a task, but at least 25% of the effort. The corresponding score









on the MDS of"l" or "Supervision" is defined as "Oversight encouraged or cuing

provided three or more times during the last 7 days OR supervision (three or more times)

plus physical assistance provided, but only one or two times during the last 7 days. Once

again, a significant difference in meaning is evident between these two categories.

A further limitation of this study is that there may be a selection bias in that

patients that have scores on both the FIM and MDS may not typical of the patient

population in general. Additionally, it is likely that characteristics of this veteran

population may not be representative of the population for whom the FIM and MDS are

intended. Most notably, 98% of the study population is male. Yet, the greater majority

of people in nursing homes, where the MDS is used, are female. Similarly, the ethnic

make up of this study population is also likely not an accurate reflection of the make up

of the population who most frequently use the FIM and MDS. The unique aspects of the

subjects in this study may limit the generalizability of the findings.

Implications for Future Research

The results of this study lead one to consider other possible situations in which

the linking of instruments may be effective. There is evidence to suggest that the

creation of a conversion table based on two self-report instruments would have a higher

degree of accuracy (Fisher et al., 1997). One possible reason for the higher degree of

accuracy is that rater bias is not an issue, as the same raterr" (i.e., the subject or their

proxy) would complete both instruments. Additionally, the research study could be set

up so that the subject completed both instruments at the same time. In this manner, it

would not be possible for a change in the subject's physical condition to take place and it

is hypothesized that it would be unlikely for a subject to interpret similar items between

instruments differently. Thus, these two sources of error, rater bias and the possibility









that there has been a change in the subject's physical condition between the

administrations of the two instruments, would be eliminated and the conditions required

for the creation of a conversion table would be optimized. The use of self-reports in a

clinical setting and/or in a research study would also have an economic advantage, as it

would reduce the amount of time a trained therapist would need to be involved in the

task.

Conclusion

Maximizing outcomes in rehabilitation, while streamlining the process of

providing highly effective and coordinated services, will continue to be a goal of

rehabilitation for years to come. Efforts to increase the continuity of care between PAC

settings and to improve the effectiveness of rehabilitation services will be pursued on all

levels. This research focused on determining the accuracy of one such effort, namely a

means of creating an easily implemented and highly effective tool for converting the

score from the physical component items on the FIM to those on the MDS and vice

versa. The results of this study suggest that scores derived from the FIM-MDS

conversion table should, at best, only be considered as rough estimates of similar scores

on the two instruments. At the conclusion of this study, the question still remains as to

whether the FIM and MDS instruments can measure physical functioning on a common

unit of measurement and whether a highly accurate conversion table can be developed so

that a patient's gains in physical functioning can be tracked from inpatient rehabilitation

settings to skilled nursing facilities. It may be that pursuing research in alternative

directions, such as using these linking techniques to create a conversion table between

self-report instruments of functional ability, will provide a solution.















REFERENCES


Anderson, S., & Hauck, W. W. (1990). Consideration of individual equivalence. Journal
ofPharmacokinetics Biopharmaceutics 18, 259-273.

Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible
paradigms? Medical Care, 42 (Suppl. 1), 1-7.

Badia, X., Prieto, L., Roset, M., Diez-Perez, A., & Herdman, M. (2002). Development of
a short osteoporosis quality of life questionnaire by equating items from two
existing instruments. Journal of ClinicalEpidemiology, 55(1), 32-40, 2002.

Berger, R., & Hsu, J. (1996). Bioequivalence trials, intersection-union tests, and
equivalence confidence sets. Statistical Science, 11, 283-319.

Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, NJ: Erlbaum.

Bond, L., Moss, P., & Carr, P. (1996). Fairness in large-scale performance assessment. In
G. Phillips (Ed.), Technical issues in large-scale performance assessment
(pp. 117-140). Washington, DC: National Center for Education Statistics.

Brown, J. D. (1997). Skewness and kurtosis. .lnkhi, JALT Testing andEvaluation SIG
Newsletter, 1(1), 16-18.

Buchanan, J. L., Andres, P. L., Haley, S. M., Paddock, S. M., & Zaslavsky, A. M. (2003).
An assessment tool translation study. Health Care Financing Review, 24(3),
45-60.

Casten, R. C., Lawton, M. P., Parmelee, P. A., & Cleban, M. H. (1998). Psychometric
characteristics of the Minimum Data Set I: Confirmatory factor analysis. Journal
of the American Geriatric Society, 46, 726-736.

Catalogue of Federal Domestic Assistance.(2002). Veterans nursing home care.
Retrieved September 12, 2002, from
http://www.cfda.gov/public/viewprog.asp?progid=777

Cella, D., & Chang, C. (2000). A discussion of item response theory and its applications
in health status research. Medical Care, 38(9, Suppl. 2), 66-72.









Centers for Disease Control. (2002). Health, the United States, 2001. Retrieved March 9,
2002, from http://www.cdc.gov/nchs/hus.htm

Centers for Medicare and Medicaid Services. (2001, October-November). Presentation
for the inpatient rehabilitation facility-patient assessment instrument. Retrieved
September 12, 2002, from
http://www.cms.hhs.gov/providers/irfpps/dayl%5Firfpai.pdf

Centers for Medicare and Medicaid Services. (2002). Long-term care minimum data set.
Retrieved September 23, 2002, from
http://cms.hhs.gov/medicaid/mds20/default.asp

Centers for Medicare and Medicaid Services. (2003). RAI version 2.0 manual. Retrieved
June 24, 2004, from http://www.cms.hhs.gov/quality/mds20/raich3.pdf

Chang, C., & Cella, D. (1997). Equating health-related quality of life instruments in
applied oncology settings. Physical Medicine and Rehabilitation: State of the Art
Reviews, 11(2), 397-406.

Choppin B. (1968). An item bank using sample free calibration. Nature, 219, 870-872.

Cohen, M. E., & Marino, R. J. (2000). The tools of disability outcomes research
functional status measures. Archives ofPhysical Medicine and Rehabilitation,
81(Suppl. 2), S21-S29.

Cook, L., & Peterson, N. (1987). Problems related to the use of conventional and item
response theory equating methods in less than optimal circumstances. Applied
Psychological Measurement, 11, 225-244.

Cornman, J. M., & Kingson, E. R. (1996). Trends, issues, perspectives, and values for the
aging of the baby boom cohorts. The Gerontologist, 36(1), 15-26.

De Gruijter, D. N. M. (1986). Small N does not always justify Rash model. Applied
Psychological Measurement, 10, 187-194.

DeJong, G. (2001). Open letter from ACRM to HCFA on proposed medicare PPS. A
letter prepared for the President of the American Congress of Rehabilitation
Medicine under the auspices of ACRM's Research Policy and Legislation
Committee and the Committee's PPS Workgroup. Archives ofPhysical Medicine
and Rehabilitation, 82, 567-569.

Dillingham, T. R., Pezzin, L. E., & MacKenzie, E. J. (2003). Discharge destination after
dysvascular lower-limb amputations. Archives of Physical Medicine and
Rehabilitation, 84(11), 1662-1668.









Dodds, T. A., Martin, D. P., Stolov, W. C., & Deyo, R. A. (1993). A validation of the
Functional Independence Measurement and its performance among rehabilitation
inpatients. Archives of Physical Medicine and Rehabilitation, 74, 531-536.

Douglas, J. (1999). Item response models for longitudinal quality of life data in clinical
trials. Statistics in Medicine, 18, 2917-2931.

Evans, C. T. (2002). Functional independence measure. Retrieved June 24, 2004, from
http://www.sci-queri.research.med.va.gov/fim.htm

Fiedler, R. C., & Granger, C. V. (1997). Uniform data system for medical rehabilitation:
Report of first admissions for 1995. American Journal ofPhysical Medicine and
Rehabilitation, 76(1), 76-81.

Fisher, A. G. (1993). The assessment of IADL motor skills: An application of many-
faceted Rasch analysis. American Journal of Occupational Therapy, 47(4),
319-329.

Fisher, W. P. (1997). Physical disability construct convergence across instruments:
Toward a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Eubanks, R. L., & Marier, R. L. (1997). Equating the MOS SF36 and the
LSU H.S.I. physical functioning scales. Journal of Outcome Measurement, 1(4),
329-362.

Fisher, W. P., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional
assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5,
3-25.

Fisher, W. P., Harvey, R. F., Taylor, P., Kilgore, K. M. & Kelly, C. K. (1995). Rehabits:
A common language of functional assessment. Archives ofPhysical Medicine and
Rehabilitation, 76, 113-122.

Forrest, G., Schwam, A., & Cohen, E. (2002). Time of care by patients discharged from a
rehabilitation unit. American Journal of Physical Medicine and Rehabilitation,
81(1), 57-62.

Fox, C. M., & Jones, J. A. (1998). Uses of Rasch modeling in counseling psychology
research. Journal of Counseling Psychology, 45, 30-45.

Functional independence measure. (2001). Retrieved June 16, 2004, from
http://www.medfriendly.com/functionalindependencemeasure.html#range

Granger, C. V., (1998). The emerging science of functional assessment: Our tool for
outcomes analysis. Archives of Physical Medicine and Rehabilitation, 79,
235-240.









Granger, C. V. (1999). Continuum of care: Measuring medical rehabilitation outcomes.
Journal of the Institute of Objective Measurement, 2, 60-62.

Granger, C. V., Cotter, A. C., & Hamilton, B. B. (1990). Functional assessment scales: A
study of persons with multiple sclerosis. Archives ofPhysical Medicine and
Rehabilitation, 71, 870-875.

Granger, C. V., Cotter, A. C., Hamilton, B. B., & Fiedler, R. C. (1993). Functional
assessment scales: A study of persons after stroke. Archives of Physical Medicine
and Rehabilitation, 74, 133-138.

Granger, C. V., & Hamilton, B. B. (1993). The Uniform Data System for medical
rehabilitation report of first admissions for 1991. American Journal of Physical
Medicine and Rehabilitation, 72(1), 33-38.

Granger, C. V., Hamilton, B. B., Keith, R. A., Zielezny, M., & Sherwin, F. S. (1986).
Advances in functional assessment for medical rehabilitation. In C. B. Lewis
(Ed.), Topics in geriatric rehabilitation (Vol. 1, pp. 59-74). Baltimore, MD:
Aspen.

Granger, C. V., Hamilton, B. B., & Sherwin, F. S. (1986). Guide for use of the Uniform
Data Set for medical rehabilitation. Buffalo, NY: Buffalo General Hospital.

Gruber-Baldini, A. L., Zimmerman, S. I., Mortimore, E., & Magaziner J. (2000). The
validity of the Minimum Data Set in measuring the cognitive impairment of
persons admitted to nursing homes. Journal of the American Geriatric Society,
48, 1734-1736.

Gutheil, I. A. (1996). Introduction. The many faces of aging: Challenges for the future.
Gerontologist, 36(1), 13-14.

Haertel, E. H., & Lynn, R. L. (1996). Comparability. In G. W. Phillips (Ed.), Technical
issues in large-scale performance assessment (pp. 59-78). Washington, DC:
National Center for Educational Statistics.

Haley, S. M., McHorney, C. A., & Ware, J. E. (1994). Evaluation of the MOS SF-36
physical functioning scale (PF-10): I. Unidimensionality and reproducibility of
the Rasch item scale. Journal of Clinical Epidemiology, 47, 671-684.

Hall, K. M., Hamilton, B. B., Gordon, W. A., & Zasler, N. D. (1993). Characteristics and
comparisons of functional assessment indices: Disability rating scale, functional
independence measures, and functional assessment measure. Journal of Head
Trauma Rehabilitation, 8(2), 60-74.

Hambleton, R. K. (2000). Response to Hays et al. and McHorney and Cohen: Emergence
of item response modeling in instrument development and data analysis. Medical
Care, 38(9, Suppl. 2), 60-65.









Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item
response theory. Thousand Oaks, CA: Sage.

Hamilton, B. B. (1989). Totaled functional score can be valid [Letter to the editor].
Archives ofPhysical Medicine and Rehabilitation, 70, 861-862.

Hamilton, B. B., Granger, C. V., Sherwin, F. S., Zielezny, M., & Tashman, J. S. (1987).
A uniform national data system for medical rehabilitation. In M. J. Fuhrer (Ed.),
Rehabilitation outcomes: Analysis and measurement (pp. 137-147). Baltimore:
Brooks.

Hart, D. L. (2000). Assessment of unidimensionality of physical functioning in patients
receiving therapy in acute, orthopedic outpatient centers. Journal of Outcome
Measurement, 4, 413-430.

Hart, D. L., & Wright, B. D. (2002). Development of an index of physical functional
health status in rehabilitation. Archives ofPhysical Medicine and Rehabilitation,
83, 655-665.

Hawes, C., Morris, J. N., Phillips, C. D., Mor, V., Fries, B. E., & Nonemaker, S. (1995).
Reliability estimates for the Minimum Data Set for nursing home resident
assessment and care screening (MDS). The Gerontologist, 35(2), 172-178.

Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health
outcomes measurement in the 21st century. Medical Care, 38(9), 28-42.

Health Care Financing Administration. (1998). Minimum Data Set, 2.0 Washington, DC:
U.S. Government Printing Office.

Heinemann, A. W., Kirk, P., Hastie, B. A., Senik, P., Hamilton, B. B., Linacre, J. M.,
Wright, B. D., & Granger, C. V. (1997). Relationships between disability
measures and nursing effort during medical rehabilitation for patients with
traumatic brain and spinal cord injury. Archives ofPhysical Medicine and
Rehabilitation, 78, 143-149.

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V.
(1993). Relationships between impairment and physical disability as measured by
the Functional Independence Measure. Archives ofPhysical Medicine and
Rehabilitation, 74, 566-573.

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V.
(1994). Prediction of rehabilitation outcomes with disability measures. Archives
ofPhysical Medicine and Rehabilitation, 75, 133-143.

Hyslop, T., Hsuan, F., & Holder, D. J. (2000). A small sample confidence interval
approach to assess individual bioequivalence. Statistics in Medicine, 19,
2885-2897.









Iwanenko, W., Fiedler, R. C., & Granger, C. V. (1999). Uniform data system for medical
rehabilitation-Report of first admissions to subacute rehabilitation for 1995,
1996, and 1997. American Journal of Physical Medicine and Rehabilitation,
78(4), 384-388.

Jette, A. M., Haley, S. M., & Ni, P. (2003). Comparison of functional status tools used in
post-acute care. Health Care Financing Review, 24(3), 1-12.

Johnson, M. F., Kramer, A. M., Lin, M. K., Kowalsky, J. C., & Steiner, J. F. (2000).
Outcomes of older persons receiving rehabilitation for medical and surgical
conditions compared with hip fracture and stroke. American Journal of the
Geriatric Society, 48(11), 1389-1397.

Keith, R. A., Wilson, D. B., & Gutierrez, P. (1995). Acute and subacute rehabilitation for
stroke-A comparison. Archives ofPhysical Medicine and Rehabilitation, 76(6),
495-500.

Kim, S., & Cohen, A. S. (2002). A comparison of linking and concurrent calibration
under the graded response model. Applied Psychological Measurement, 26(1),
25-41.

Kolen, M. J. (2001). Linking assessments effectively: Purpose and design. Educational
Measurement: Issues and Practice, 20(1), 5-9.

Kolen, M. J., & Brenan, R. L. (1995). Test equating: Methods and practices. New York:
Springer-Verlag.

Latham, N. K., & Haley, S. M. (2003). Measuring functional outcomes across postacute
care: Current challenges and future directions. Critical Reviews in Physical and
Rehabilitation Medicine, 15(2), 83-98.

Li, Y. H., & Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional
IRT linking. Applied Psychological Measurement, 24(2), 115-138.

Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works
best? Journal of Outcome Measurement, 2(3), 266-283.

Linacre, J. M., Heineman, A. W., Wright, B. D., Granger, C. V., & Hamilton, B. B.
(1994). The structure and stability of the Functional Independence Measure.
Archives ofPhysical Medicine and Rehabilitation, 75, 127-132.

Linacre, J. M., & Wright, B. D (2000). WINSTEPS (Version 3.17)[Rasch model
computer software]. Chicago: MESA.

McHorney, C. A. (1997). Generic health measurement: Past accomplishments and a
measurement paradigm for the 21st century. Annals oflnternal Medicine, 127,
743-750.









McHorney, C. A. (2002). Use of item response theory to link 3 modules of functional
status items from the Asset and Health Dynamics Among the Oldest Old Study.
Archives ofPhysical Medicine and Rehabilitation, 83, 383-394.

McHorney, C. A., & Cohen A. S. (2000). Equating health status measures with item
response theory, illustrations with functional status items. Medical Care, 38(9,
Suppl. 2), 43-59.

McHorney, C. A., Haley, S. M., & Ware, J. E. (1997). Evaluation of the MOS SF-36
physical functional scale (PF-10). II: Comparison of relative precision using
Likert and Rasch scoring methods. Journal of Clinical Epidemiology, 50(4),
451-461.

Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological
Bulletin, 15, 300-307.

Merbitz, C., Morris, J., & Grip, J. C. (1989). Ordinal scales and foundation of
misinference. Archives ofPhysical Medicine and Rehabilitation, 70, 308-312.

Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and
prospects (Policy issue perspective). Princeton, NJ: Educational Testing Service.

Morris, J., Hawes, C., Fries, B. E., Phillips, C. D., Mor, V., Katz, S., Murphy, K.,
Drugovich, M. L. & Friedlob, A. S. (1990). Designing the national resident
assessment instrument for nursing homes. Gerontologist, 30,293-302.

Morris, J. N., Fries, B. E., Mehr, D. R., Hawes, C., Phillips, C. D., Mor, V., & Lipsitz, L.
A. (1994). MDS cognitive performance scale. Journal of Gerontology: Medical
Sciences, 49(4), M124-M182.

Muraki, E., Hombo, C. M., & Lee, Y. W. (2000). Equating and linking of performance
assessments. Applied Psychological Measurement, 24(4), 325-337.

National Committee on Vital and Health Statistics. (2003). Classifying and reporting
functional status. Retrieved April 9, 2004, from
http://ncvhs.hhs.gov/010617rp.pdf

Oczkowski, W. J., & Barreca, S. (1993). The Functional Independence Measure: Its use
to identify rehabilitation needs in stroke survivors. Archives ofPhysical Medicine
and Rehabilitation, 74, 1291-1294.

Ottenbacher, K. J., Hsu, Y., Granger, C. V., & Fiedler, R. C. (1996). The reliability of the
Functional Independence Measure: A quantitative review. Archives of Physical
Medicine and Rehabilitation, 77, 1226-1232.









Penta, M. (2004). Evaluation in rehabilitation: Perspectives with the Rasch model.
Retrieved July 2, 2004, from
http://www.uzleuven.be/UZRoot/files/webeditor/RaschPerspectivesForRehabilitat
ionAbstract.pdf

Prieto, L., Alonso, J., Lamarca, R., & Wright, B. R. (1998). Rasch measurement for
reducing the items of the Nottingham Health Profile. Journal of Outcome
Measurement, 2, 258-301.

Raczek, A. E., Ware, J. E., Bjorner, J. B., Gandek, B., Haley, S. M., Aaronson, N. K.,
Apolone, G., Beck, P., Brazier, J. E., Bullinger, M., & Sullivan, M. (1998).
Comparison of Rasch and summated rating scales constructed from SF-36
physical functioning items in seven countries: Results from the IQOLA project.
Journal of Clinical Epidemiology, 51(11), 1203-1214.

Rantz, M. J., Zwygart-Stauffacher, M., Popejoy, L. L., Mehr, D. R., Grando, V. T.,
Wipke-Tevis, D. D., Hicks, L. L. & Conn, V. S. (1999). The Minimum Data Set:
No longer just for clinical assessment. Annuals ofLong-Term Care, 7, 354-360.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests,
Chicago: University of Chicago Press.

Resident assessment instrument training manual and resource guide. (1991). Natwick,
MA: Eliot Press.

Rogers, J. C., Gwinn, S. M., & Holm, M.B. (2001). Comparing activities of daily living
assessment instruments: FIM, MDS, OASIS, MDS-PAC. Physical and
Occupational Therapy in Geriatrics, 18(3), 1-25.

Segal, M. E., Heinemann, A. W., Schall, R. R., & Wright, B. D. (1997). Rasch analysis
of a brief physical ability scale for long-term outcomes of stroke. Physical
Medicine and Rehabilitation State of the Art Reviews, 11, 385-396.

Silverstein, B., Fisher, W. P., Kilgore, K. M., Harley, J. P., & Harvey, R. F. (1992).
Applying psychometric criteria to functional assessment in medical rehabilitation:
II. Defining interval measures. Archives ofPhysical Medicine and Rehabilitation,
73, 507-518.

Skaggs, G., & Lissitz, R. (1986). IRT test equating: Relevant issues and a review of
recent research. Review of Education Research, 56, 495-529.

Spector, W. D., & Fleishman, J. A. (1998). Combining activities of daily living with
instrumental activities of daily living to measure functional disability. Journal of
Gerontology, 53, S46-S57.









Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in
psychosocial and services research: An introduction with examples. Evaluation
and Program Planning, 19(3), 193-198.

Stineman, M. G. (2001). The story of functional-related groups-Please, first do no
harm. Archives of Physical Medicine and Rehabilitation, 82(4), 553-557.

Stineman, M. G., Jette, A., Fiedler, R. C., & Granger, C. V. (1997). Impairment-specific
dimensions within the Functional Independence Measure. Archives of Physical
Medicine and Rehabilitation, 78, 636-643.

Stineman, M. G., & Maislin, G. (2000). Clinical, epidemiological, and policy
implications of minimum data set validity (Editorial). Journal of the American
Geriatric Society, 48(12), 1734-1736.

Stineman, M. G., Shea, J. A., Jette, A., Tassoni, C. J., Ottenbacher, K. J., Fiedler, R. C.,
& Granger, C. V. (1996). The Functional Independence Measure: Tests of scaling
assumptions, structure, and reliability across 20 diverse impairment categories.
Archives ofPhysical Medicine and Rehabilitation, 77, 1101-1108.

Tennant, A., & Young, C. (1997). Comma to community: Continuity in measurement. In
R. M. Smith (Ed.), Outcome measurement: Physical medicine and rehabilitation
state of the art reviews (pp. 376-384). Philadelphia: Hanley & Belfus.

Teresi, J. A., & Homes, D. (1992). Should MDS data be used for research? [editorial].
Gerontologist, 32, 148-149.

Uniform data systems for medical rehabilitation. (1997). Retrieved June 10, 2003, from
http://www.udsmr.org/history.htm

U.S. Department of Health and Human Services. (2002). A profile of older Americans.
Retrieved July 30, 2003, from
http://www.aoa.gov/prof/Statistics/profile/highlights.asp

U.S. Food and Drug Administration. (1997). Guidance for industry: In vivo
bioequivalence studies based in population and individual bioequivalence
approaches. Rockville, MD: U.S. Department of Health and Human Services.

U.S. Food and Drug Administration. (1999). Draft guidance on average, population, and
individual approaches to establishing bioequivalence. Rockville, MD: U.S.
Department of Health and Human Services.

Vale, C. D. (1986). Linking item parameters onto a common scale. Applied
Psychological Measurement, 10(4), 333-344.

Velozo, C. A. (2004). Translating measures across the continuum of care: Creating a
crosswalk between the FIM and MDS. Manuscript in preparation.









Velozo, C. A., Kielhofner, G., & Lai, J. (1999). The use of Rasch analysis to produce
scale-free measurement of functional ability. American Journal of Occupational
Therapy, 53, 83-90.

Velozo, C. A., Magalhaes, L., Pan, A., & Leiter, P. (1995). Differences in functional
scale discrimination at admission and discharge: Rasch analysis of the Level of
Rehabilitation Scale-III (LORS-III). Archives ofPhysical Medicine and
Rehabilitation, 76(8), 705-712.

Velozo, C. A., & Peterson, E. W. (2001). Developing meaningful fear of falling measures
for community dwelling elderly. American Journal ofPhysical Medicine and
Rehabilitation, 80(9):662-673.

Veterans Administration. (2001). IMPACTS 2000. Retrieved December 12, 2002, from
http://www.va.gov/resdev/prt/impacts2000.pdf

Veterans Health Administration. (2000). Medical rehabilitation outcomes for stroke,
traumatic brian injury, and lower extremity amputation patients. Retrieved
September 12, 2002, from
http://www.va.gov/publ/direc/health/direct/12000016.pdf

Veterans Health Administration. (2002). Austin Automation Center. Retrieved September
12, 2002, from http://www.aac.va.gov/

Wang, W., & Hwang, J. T. (2001). A nearly unbiased test for individual bioequivalence
problems using probability criteria. Journal of Statistical Planning and Inference,
99, 41-58.

Ware, J. E. (2003). Conceptualization and measurement of health-related quality of life:
Comments on an evolving field. Archives ofPhysical Medicine and
Rehabilitation, 84(Supp 2), S43-S51.

Wilkerson, D., & Johnston, M. (1997). Clinical program monitoring systems: Current
capability and future directions. In M. Fuhrer (Ed.), Assessing medical
rehabilitation practices: The promise of outcomes research (pp. 275-305).
Baltimore, MD: Paul H. Brookes.

Williams, B. C., Lee, Y., Fries, B. E., & Warren, R. L. (1997). Predicting patient scores
between the Functional Independence Measure and the Minimum Data Set:
Development and performance of a FIM-MDS "crosswalk." Archives of Physical
Medicine and Rehabilitation, 78, 48-54.

Wright, B.D. (1984). Item banks: What, why, how. Journal of Educational
Measurement, 21, 331-345.

Wright, B. D. (1997). A history of social science measurement. Retrieved October 6,
2002, from http://www.rasch.org/memo62.htm







70

Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal: Measurements,
however, must be interval. Archives ofPhysical Medicine and Rehabilitation, 70,
857-860.

Wright, B. D., Linacre, J. M., & Heineman, A. W. (1993). Measuring functional status in
rehabilitation. Physical Medicine and Rehabilitation State of the Art Reviews, 4,
475-491.















BIOGRAPHICAL SKETCH

Katherine L. Byers, MHS, CRC, CVE, is a doctoral candidate in the rehabilitation

science doctoral (RSD) program at the University of Florida, College of Public Health

and Health Professions. Ms. Byers received a Bachelor of Arts degree in behavioral

sciences from Rice University in Houston, Texas in 1989. She then completed a Master

of Health Science (MHS) degree in rehabilitation counseling at the University of Florida

in 1991 and subsequently obtained certifications as both a rehabilitation counselor and as

a vocational evaluator. Over a period of 9 years, Ms. Byers worked in positions of

increasing responsibility in the field of rehabilitation before entering the University of

Florida's rehabilitation science doctoral program in January of 2000. While completing

the requirements of the degree, Ms. Byers was employed as a research assistant and then

as a program coordinator for Dr. Craig Velozo, an assistant professor in the Department

of Occupational Therapy at the University of Florida. Accomplishments during Ms.

Byers' doctoral career include winning the 2002 John Muthard Research Award from the

University of Florida's College of Health Professions, Department of Rehabilitation

Counseling. She also was selected to make a poster presentation at the Third National

Rehabilitation Research and Development Meeting in Washington, DC, in 2002, and at

the 2004 ACRM-ASNR Joint Conference.