|UFDC Home||myUFDC Home | Help|
This item has the following downloads:
TESTING THE ACCURACY OF LINKING HEALTHCARE DATA
ACROSS THE CONTINUUM OF CARE
KATHERINE L. BYERS
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
I would like to thank the VA Office of Academic Affairs, Washington, DC, Pre-
Doctoral Associated Health Rehabilitation Research Fellowship Program, and the VA
HSR&D/RR&D Rehabilitation Outcomes Research Center (RORC) of Excellence,
Gainesville, Florida, for funding this study. Within the VA, I would like to especially
thank Dr. Maude Rittman for acting as my fellowship's program director and for her
continual support and encouragement throughout the process. Additional thanks go to
Dr. Christa Hojlo, Chief of Nursing Home Care, for her assistance in obtaining MDS data
and to Mr. Clifford Marshall, Rehabilitation Planning Specialist, for his assistance in
obtaining FIM data.
I extend my gratitude to my doctoral advisor and VA Fellowship Preceptor, Dr.
Craig Velozo, who also acted as my mentor and the chairperson of my supervisory
committee. His guidance and support throughout this process have been invaluable.
Furthermore, I am appreciative of the support provided by the other members of my
committee, including the cochair, Dr. Ronald Spitznagel, Dr. Elizabeth Swett and
Dr. Anne Seraphine. Special thanks go to Dr. Richard Smith, an expert in Rasch
analysis, who analyzed the original data.
Other student members of Dr. Velozo's research team have been invaluable in the
completion of this dissertation, and I owe them a debt of gratitude. This is especially true
of Ms. Inga Wang, who has worked closely on this project. And finally, my family has
been a source of continual support throughout this process, and I would like to thank
them for their tireless encouragement.
TABLE OF CONTENTS
ACKN OW LEDGM EN TS ................................................ ii
LIST OF TABLES ......... ............................................. v
LIST OF FIGURES .............................................. vi
ABSTRACT .............. .................................. vii
1 INTRODUCTION ........ ..........................................1
2 REVIEW OF THE LITERATURE ......... ...................... 10
Measuring Outcomes in Rehabilitation .............................. ... 10
The FIM Instrument Used in Inpatient Rehabilitation ................... ..... 11
The MDS Instrument in Skilled Nursing Facilities .......................... 13
Administrative Solution: Adopting a Single Instrument for Measuring Outcomes
in Post Acute Care .............. ... ................ ...... 14
Potential Measurement Solution: Linking Instruments ................... .. 17
The Use of Linking Techniques ............... ...................... 20
Linking of Measures in Healthcare ................ ................... 22
3 METHODOLOGY .................................................. 30
Introduction ............. ................... ................... 30
Source of the Data .......... ......................................... 30
Sam ple ............... .............................. .31
FIM and MDS M otor Items .......... ...... ... ...... .............. 32
Procedures Involved in the Creation of the FIM/MDS Conversion Table ....... 36
Statistically Testing the Accuracy of the FIM-MDS Conversion Table .......... 38
4 RESULTS ......... ................................. ........ 44
Statistical Analyses ......... ............ ............. ........ 44
Statistical Results at the Level of the Individual ............................ 45
Statistical Results at the Group Level ......................... ... .. 45
Discrepancies in the Dataset ................ ........................ 52
5 DISCUSSION ........ ............................................ 53
Summary of Results ................................................. 53
Implications for Future Research ............... ..................... 58
Conclusion ........ ...................................... ......... 59
REFERENCES ....... ............................................... 60
BIOGRAPHICAL SKETCH ........................................... 71
LIST OF TABLES
1-1 Comparison of FIM to MDS ADL/motor items ............... ........... 7
3-1 Comparison ofFIMTM to MDS ADL/motor items ......................... 33
3-2 FIM scoring criteria .................. ................. ......... 35
3-3 MDS scoring criteria ..................................... ........... 36
3-4 FIM-MDS score conversion .......... ...................... 37
3-5 FIM-MDS conversion table ......... ........................ .39
3-6 Effect size ............... ................ ................. 42
3-7 Rating scale conversion ............... ................ ........ 42
3-8 Similar FIM and MDS items ............... ....................... 43
4-1 Four moments of the distributions ................. .................. 47
LIST OF FIGURES
4-1 Score difference between the FIMa and FIMc ........................... 46
4-2 Score difference between the MDSa and MDSc ......................... 46
4-3 Distribution of FIM actual scores ....... ... ..... .. 49
4-4 Distribution of FIM converted scores .............. .............. 49
4-5 Distribution of MDS actual scores ............................ .. 50
4-6 Distribution of MDS converted scores ............... ........... .. 50
4-7 Scatterplot of the FIMa and FIMc scores ............................. 51
4-8 Scatterplot of actual MDS and converted MDS scores ................... 51
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TESTING THE ACCURACY OF LINKING HEALTHCARE DATA
ACROSS THE CONTINUUM OF CARE
Katherine L. Byers
Chair: Craig A. Velozo
Major Department: Rehabilitation Science
The purpose of this project was to test the accuracy of a conversion table designed
to transform a score on the physical ability component of the Functional Independence
MeasureTM (FIM) to its corresponding score on the Minimum Data Set (MDS) and vice
versa. The records of 2,297 VA patients with scores on both the FIM and MDS, which
were completed within 7 days of one another between July 2002 and June 2003, were
obtained from the VA's Austin Automation Center (AAC). The FIM-MDS conversion
table, generated from an independent sample using Rasch measurement techniques, was
then used to transform actual scores on the FIM and MDS to their corresponding
converted scores. The equivalence of the variances of the two score distributions was
determined by examining their means and variances. It was hypothesized that 75% of the
actual and converted scores on the FIM and the MDS would be within five points of one
another. Effect size was determined, as was the percent of subjects having actual and
converted FIM and MDS scores within five points of one another.
Twenty-four percent of the FIM and 37% of the MDS actual and converted scores
were within five points of one another, respectively, and therefore fell short of the
standard set at 75% for the conversion to be considered accurate. Yet, the effect size for
the conversion of both FIM and MDS scores was .2, demonstrating an 85.3% overlap
between the two score distributions. The correlation between the FIM actual and
converted scores was .724 while the correlation between the MDS scores was .745.
While the development of a FIM-MDS translation table appears promising, the
results of this study do not provide strong enough evidence to support the premise that
this first attempt at creating a FIM-MDS conversion table has resulted in an instrument
that would provide an accurate means of converting scores within a clinical setting.
Over the past few decades, the total number of people receiving rehabilitation
services in the United States has grown while the provision of such services has extended
into settings outside of acute rehabilitation settings. This growth in demand has been
fueled in part by changes in the population demographics, as the number of individuals
over the age of 65 has increased, and this trend is expected to continue for many years
into the future (Cornman & Kingson, 1996; Gutheil, 1996). According to the U.S.
Department of Health and Human Services (2002), 35.6 million people or 12.3% of the
U.S. population was over the age of 65 in 2002. Since 1990, this percentage had more
than tripled (4.1% in 1900 to 12.3% in 2002). By the year 2030, the older population
will more than double again to about 70 million people or 20% of the total population.
Similarly, the proportion of U.S. veterans over age 65 is projected to increase from 26%
in 1990 to 46% in 2020 (Veterans Administration, 2001).
A greater number of elderly people in our society are associated with an increased
demand for rehabilitation services. Dillingham, Pezzin, and MacKenzie (2003) reported
that aging is accompanied by an increased risk of diminished health status and a greater
likelihood of requiring rehabilitation services. As noted in the Centers for Disease
Control (CDC) Health, United States, 2001 (2002), the prevalence of both chronic
conditions and activity limitations increases with age, with health-related limitation in
mobility or self-care increasing fourfold between the ages 65 to 74, and 85 or older. In
1997, more than half of the older population (54.5%) reported having at least one
disability, with more than a third (37.7%) reporting at least one severe disability (U.S.
Department of Health and Human Services, 2002). There has also been an increase in
the number of patients being admitted to postacute care (PAC) settings after discharge
from acute care hospitals (Iwanenko, Fiedler, & Granger, 1999; Johnson, Kramer, Lin,
Kowalsky, & Steiner, 2000). Iwanenko et al. note that between 1991 and 1997, the
number of patients admitted annually to PAC settings rose from 12,468 to 49,844.
Stineman (2001) noted that the most significant challenge to current medicine was likely
the care of people with chronic, incurable diseases and injuries.
The U.S. healthcare system provides a variety of routes to recovery from physical
injuries, ailments, or impediments. PAC settings, also called subacute care or transitional
care settings, are a type of short-term care program provided by many long-term care
facilities and hospitals. Treatment in such settings may include rehabilitation services,
specialized care for certain conditions (such as stroke and diabetes), and postsurgical care
and other services associated with the transition between the hospital and home.
Residents on these units often have been hospitalized recently and typically have
complicated medical needs. The goal of subacute care is to discharge residents to their
homes or to a lower level of care. Current PAC settings include comprehensive inpatient
rehabilitation units attached to acute care or freestanding hospitals, skilled nursing
facilities (SNFs), outpatient rehabilitation facilities, freestanding outpatient clinics, and
home health care services. Each person with a potentially disabling impairment has a
unique care trajectory that may include sequential admission to more than one PAC
setting. An example of the possible variations is that a person diagnosed with stroke,
after being discharged from an acute care hospital, is then admitted to an inpatient
rehabilitation program before being released to home with outpatient services. Yet,
another individual with the same diagnosis is discharged from the acute hospital setting
directly into a skilled nursing facility.
SNFs were established under the 1965 Medicare legislation and are certified by
Medicare to provide 24-hour nursing care and rehabilitation services in addition to other
medical services. SNF-based rehabilitation units have become a rapidly growing
segment of the rehabilitation continuum over the past decade, as policy makers have
searched for less costly delivery systems for rehabilitation. While inpatient treatment
provides a full complement of professionals practicing in a hospital setting, it is one of
the most costly of the rehabilitation services (Keith, Wilson, & Gutierrez, 1995). SNFs,
on the other hand, have lower costs, mainly because construction, regulatory and staff
requirements are less stringent than they are in hospitals (Keith et al.). As a result, SNF-
based rehabilitation has been used increasingly as a substitute for traditional inpatient
care (Keith et al.). Additionally, many older patients do not meet the Medicare
requirements to receive inpatient rehabilitation services, which includes being able to
tolerate three hours of therapy on a daily basis, fitting within one of the required
diagnostic mixes, as well as being able to make significant progress over a fairly short
length of time (Keith et al.). In such cases, SNF settings have become an appropriate
In rehabilitation programs, a patient's functional enhancement is the primary goal
(DeJong, 2001). The ability to evaluate a patient's status is central to rehabilitation
efforts, for example, to track a patient's recovery, to determine the effectiveness of
treatment, or to estimate resource use (Penta, 2004). It is well documented that one's
ability to function physically is an important component of a patient's self-report of
health status (Haley, McHorney, & Ware, 1994; Hart, 2000; McHorney, Haley, & Ware,
1997; Raczek et al., 1998; Segal, Heinemann, Schall, & Wright, 1997). Since the 1950s,
functional status measures have served as a means to monitor outcomes within medical
centers. Yet to this day, there is no clear and commonly accepted definition of function
or a clear delineation between instruments that assess functional outcomes and those that
evaluate other health concepts. As a result, the ability to compare one instrument
measuring functional status to another can be fraught with complications.
Currently in the U.S., two distinct instruments are used to monitor functional
outcomes in in-patient rehabilitation settings and SNFs. Traditional rehabilitation
facilities have almost uniformly adopted the Functional Independence Measure (FIMM)1
as a means of monitoring patients' functional ability. The FIM instrument provides a
measure of disability and was put into operation beginning in 1989 (Granger, 1998).
Today, it is one of the most widely used instruments that assess the quality of daily living
activities in persons with disabilities (Granger, Hamilton, & Sherwin, 1986).
The Veterans Health Administration (VHA) rehabilitation services include the
FIM in its Functional Status and Outcomes Database (FSOD), which has been
operational since 1997 (Veterans Health Administration [VHA], 2000). It is mandated
for use by the VHA Directive 2000-016, "Medical Rehabilitation Outcomes for Stroke,
Traumatic Brian Injury, and Lower Extremity Amputation Patients," which requires
every VHA medical center to assess functional status and enter this data into the FSOD
in order to measure and track rehabilitation outcomes on all new stroke, lower extremity
amputee, and traumatic brain injury (TBI) patients (VHA, 2000). Presently, the FSOD
has not been linked with other data sources that would allow patients to be monitored as
they progress across the continuum of care (e.g., from rehabilitation facilities to skilled
nursing facilities or from rehabilitation facilities to home health care).
1FIMn' is a trademark of the Uniform Data System for Medical Rehabilitation, a
division ofU. B. Foundation Activities, Inc.
While the FIM is the "gold standard" for measuring functional outcomes in
rehabilitation settings, the Minimum Data Set (MDS) of the nursing home Resident
Assessment Instrument (RAI, 1991), is used universally for monitoring rehabilitation
outcomes in SNFs. The MDS was developed in response to a 1986 Institute of Medicine
study of the quality of care in nursing homes that called for improvements in nursing
home quality and more patient-centered care (Morris et al., 1990). The federal Omnibus
Budget Reconciliation Act of 1987 (OBRA 87) mandated all U.S. nursing homes to
implement the Resident Assessment Instrument, whose core is the Minimal Data Set
(MDS) (Rantz et al., 1999). The MDS consists of 284 items designed to assess the
cognitive, behavioral, functional, and medical status of nursing home residents (Hawes et
al., 1995; Teresi & Homes, 1992).
Nursing homes are a critical environment for tracking the health care status of
elderly veterans. In fiscal year 2001, there were a total of 89,056 veterans treated in
nursing homes with an average daily census of 33,670 (Catalogue of Federal Domestic
Assistance, 2002). By 2003, it is projected that 111,953 patients will be treated in
nursing homes with an average daily census of 35,132 (Catalogue of Federal Domestic
Assistance). In 1995, there are at least 1.5 million nursing home residents who reside in
facilities participating in the Medicare or Medicaid programs (Hawes et al., 1995).
Within the VHA, the reduction of acute rehabilitation beds from 1,150 five years ago to
617 in 2003 further increases the likelihood that veterans could receive their post-acute
rehabilitation care in nursing homes (C. Johnson, personal communication, September
A key to improving services for patients treated in PAC settings is to develop
effective and efficient methods for tracking and evaluating functional status changes
across rehabilitation and skilled nursing facilities. Through the use of a single instrument
in these settings, a patient may progress from one to the other, while maintaining a
functional assessment score that could easily be tracked and compared between settings.
Such a tool would benefit patients, as it would facilitate an increased continuity of care
between settings. It would also allow for the direct comparison of rehabilitation
outcomes between settings, along with resource utilization and costs. Lathem and Haley
(2003) note that a clear need exists for an instrument that can accurately assess patients'
functional ability as they move through the health care system. As stated in Buchanan,
Andres, Haley, Paddock, and Zaslavsky (2003), "Providers, payers, and consumers
would all benefit from comparable measures of functional status and rehabilitation
outcomes across multiple care settings to facilitate equitable payment and to monitor the
quality and efficiency of care delivery" (p. 45). To date, there has been only one
published attempt by Williams, Lee, Fries, and Warren (1997) to link the FIM to the
MDS. Yet, other studies have linked other measures of global functioning (e.g., Fisher,
Eubanks, & Marier, 1997; Fisher, Harvey, & Kilgore, 1995; Fisher, Harvey, Taylor,
Kilgore, & Kelly, 1995; Segal, Heinemann, Schall & Wright, 1997; Smith & Taylor,
2004; Tennant & Young, 1997).
The dilemma of having multiple yet incompatible instruments measuring the
same construct has been confronted and successfully overcome in the physical sciences.
Take, for example, the manner in which we measure distance. Currently, in the United
States, we have two competing systems of measuring length, namely the metric system
and the standard system of measurement. Surprisingly, success has not come through
attempts to convert entirely from one system to the other despite the obvious benefits in
doing so. Instead, we continue to utilize simple strategies that allow us to convert a
measure on one scale to its corresponding measure on the other. Similarly, we routinely
convert readings between Celsius and Fahrenheit with a simple conversion table when
measuring temperature. Thus, one could say that a precedent has been set for the manner
in which we have successfully reconciled competing systems of quantifying what are
essentially abstract concepts.
An analogous attempt in health care would be to develop a system of converting
scores between the physical functioning components of the FIM and the MDS so that a
score on one instrument could be translated into its equivalent score on the other. The
hypothesis is that the items included in these two instruments are subsets of items along
an ADL/motor construct. Table 1-1 presents a comparison of the ADL/motor items of
the FIM and the MDS.
Table 1-1. Comparison of FIM to MDS ADL/motor items
FIM Items MDS Items
Grooming Personal Hygiene
Dressing-Upper Body Dressing
Toileting Toilet Use
Bladder Management Bladder Continence*
Bowel Management Bowel Continence*
Bed, Chair, Wheelchair (Transfer) Transfer
Tub, Shower (Transfer)
Walk/wheel Chair Walk in Room
Stairs Walk in Corridor
Locomotion on Unit
Locomotion off Unit
FIM Rating Scale MDS-Rating Scale (exceptions noted
7 Complete Independence (Timely, below)
Safely) 0 Independent
6 Modified Independence (Device) 1 Supervision
5 Supervision 2 Limited Assistance
4 Minimal Assist (Subject = 75%+) 3 Extensive Assistance
3 Moderate Assist (Subject = 50%+) 4 Total Dependence
2 Maximal Assist (Subject = 25%+) 8 Activity did not occur during the
1 Total Assist (Subject = 0%+) entire 7-day period
Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually
Continent, 2-Occasionally Continent, 3-Frequently Incontinent, 4-Incontinent
Similarities are immediately evident between the two instruments. Both instruments
include items for eating, dressing, toileting, bowel and bladder functioning, as well as the
ability to transfer and to walk. Differences include an item for climbing stairs on the
FIM that is not part of the MDS.
A mathematical framework is needed in order to convert a score on one instrument
to its corresponding score on the other. Item Response Theory (IRT) measurement
models have been rapidly gaining popularity over classical test theory (CTT) for
analyzing instruments used in healthcare and rehabilitation (Douglas, 1999; Hambleton,
2000; Hays, Morales, & Reise, 2000; Linacre, Heineman, Wright, Granger, & Hamilton,
1994; McHorney, 1997; Prieto, Alonso, Lamarca, & Wright, 1998; Silverstein, Fisher,
Kilgore, Harley, & Harvey, 1992; Velozo, Magalhaes, Pan, & Leiter, 1995). IRT is
comprised of a set of generalized linear models and their associated statistical procedures
that connect a subject's response to test items to that subject's location on the latent trait
being tested (Mellenbergh, 1994). In order to create a link between scores on the FIM
and MDS, it is hypothesized that Rasch analysis can be used to place the items from both
instruments on the same linear continuum (Fisher, Harvey, Taylor, et al., 1995). A
precedent for linking instruments in such a manner has been established in the fields of
education and psychological measurement.
It is the purpose of this dissertation to evaluate the accuracy of a FIM-MDS
conversion table that has been created through the use of Rasch analysis. The technical
procedures of test equating used in the educational applications of Rasch's probabilistic
models were transferred to the cocalibration of these two functional assessment
instruments. An accurate conversion table between the FIM and the MDS would allow
for studies to take place that are necessary to examine the outcomes for persons receiving
rehabilitation services in different care settings. This would, in effect, eliminate the need
to institute massive changes in measurement procedures across rehabilitation settings.
Functional status information could then be used to track changes and follow a patient's
progress across PAC settings and not only monitor but compare quality of care and
rehabilitation outcomes in different settings (National Committee on Vital and Health
REVIEW OF THE LITERATURE
Measuring Outcomes in Rehabilitation
The effectiveness of rehabilitation services is gauged by the restoration and
maximization of patient functioning. Functional status in this context has been defined
as reflecting, "an individual's ability to carry out activities of daily living (ADLs) and to
participate in various life situations and in society" (Jette, Haley, & Ni, 2003, p. 1).
Therefore, the assessment of functional status is a method for describing abilities and
activities in order to measure an individual's use of the variety of skills included in
performing the tasks necessary to daily living, vocational pursuits, social interactions,
leisure activities, and other required behaviors (Granger, 1998). ADL measures have
been used to determine a patient's level of disability, whether one qualifies for certain
types of healthcare services, and to document outcomes of rehabilitation services.
The focus of the earliest standardized assessments of function, developed over 50
years ago, were on the basic ADL's, which consist of self-care activities, such as bathing,
grooming, dressing, and walking. Two of the first functional status measures used in
rehabilitation were the Katz and Barthel indexes whose items were comprised solely of
basic ADL tasks (Cohen & Marino, 2000; Latham & Haley, 2003). Then, "with
changing societal expectations, the advent of brain injury rehabilitation, and the
independent living movements, medical outcome research began to explore means of
documenting social and cognitive-based behaviors as part of rehabilitation outcomes"
(Latham & Haley, 2003, p. 85). The result of this was the development of instruments,
such as the Functional Independence Measure (FIM) and the Minimum Data Set (MDS).
The FIM Instrument Used in Inpatient Rehabilitation
In U.S. inpatient rehabilitation settings, the Uniform Data System for Medical
Rehabilitation (UDSMR), is the most widely used clinical database for assessing
rehabilitation outcomes (Fiedler & Granger, 1997; Granger & Hamilton, 1993). The FIM
is the core functional status measure of the UDSMR and was developed to establish a
uniform standard for the assessment of functional status during medical rehabilitation
(Granger, Hamilton, Keith, Zielezny & Sherwin, 1986). The FIM incorporates concepts
and items from previous functional assessment instruments, such as the Katz Index of
ADL, the PULSES profile, the Kenny Self-Care Evaluation, and the Barthel Index (Hall
et al., 1993). The FIM system was developed by a national task force cosponsored by the
American Congress of Rehabilitation Medicine and the American Academy of Physical
Medicine and Rehabilitation Task Force to Develop a National Uniform Data System for
Medical Rehabilitation (UDSMR) to rate the severity of patient disability and the
outcomes of medical rehabilitation (Hamilton et al., 1987). The original work of this task
force was expanded by the Department of Rehabilitation Medicine at the State University
of New York at Buffalo. Since 1987, it has been the mission of the UDSMR to measure
medical rehabilitation outcomes across the continuum of care-both time and settings
(Granger, 1999). The UDSMR maintains a national data repository for research purposes
of three million case records from 1,400 facilities around the world (Granger, 1999).
The FIM is administered in most inpatient rehabilitation facilities within three
days of admission and prior to discharge (Granger, Hamilton & Sherwin, 1986). The
scale accounts for a patient's level of independence, amount of assistance needed, use of
adaptive or assistive devices, and the percentage of a given task completed successfully.
This instrument is comprised of 18 items with a seven-level response scale of
independent performance in self-care, sphincter control, mobility, locomotion,
communication and social cognition (Granger & Hamilton, 1993). As such, it contains
items representing three constructs: ADL, mobility, and continence (Granger, Hamilton,
& Sherwin). Thirteen of the 18 FIM items (related to functional ability) can be further
divided into three more specific subscores rating activities of daily living (ADLs),
sphincter management, and mobility (Stineman, Jette, Fiedler & Granger, 1997). A FIM
score on each item ranges from 1 (Total Assistance) to 7 (Complete Independence).
Thus, a total FIM score ranges from 18 to 126. Hamilton (1989) noted that, "Because
each item is scaled on the basis of functional independence, it is expected that the total
score (with each item appropriately weighted) will correlate with the burden of care for
the disabled person" (p. 862).
Psychometric studies of the FIM instrument support its use for research purposes.
One of the strengths of the FIM is that it has undergone so many methodological
evaluations, in which it has demonstrated good psychometrics (Dodds, Martin, Stolov &
Deyo, 1993). Dodds et al. noted a high internal consistency (Cronbach's coefficient of
.93 at admission and .95 at discharge) that demonstrated that the FIM is a reliable
instrument. Extensive investigations of the FIM's reliability and validity have provided
evidence of its interrater and test-retest reliability (Ottenbacher, Hsu, Granger & Fiedler,
1996), internal consistency (Stineman et al., 1996; Stineman et al., 1997), concurrent
validity (Granger, Cotter, Hamilton & Fiedler, 1993; Oczkowski & Barreco, 1993) and
predictive validity (Heinemann, Linacre, Wright, Hamilton & Granger, 1994; Oczkowski
& Barreco). Ottenbacher et al. (1996) performed a meta analysis of 11 studies that
revealed the median inter-rater reliability for the total FIM was .95 and the test-retest and
equivalence reliability was .95 and .92, respectively. Stineman et al. (1996) showed in a
study of 93,829 rehabilitation inpatients that a factor analysis of the FIM instrument
supported the identification of ADL/motor and cognitive/communication dimensions
across 20 impairment categories. Additionally, the stability of the FIM motor scores was
demonstrated in several studies (Linacre et al., 1994; Wright, Linacre & Heineman,
1993). Linacre (1998) confirmed the multidimensional structure of the FIM by means of
Rasch analysis followed by factor analysis of standardized residuals and demonstrated
the divergence of the five cognitively-oriented items from the 13 motor-oriented items.
The MDS Instrument in Skilled Nursing Facilities
After receiving inpatient rehabilitation services, patients may be transferred into a
SNF, where the FIM instrument is no longer used as a measure of functional ability.
Instead, the Minimum Data Set (MDS) is utilized in this role. Since 1990, HCFA (Health
Care Finance Administration, now known as the Center for Medicare and Medicaid
Services [CMS]) has required that the MDS be administered to all SNF residents. The
MDS is a comprehensive assessment of over 85 health status elements organized into
measurement categories. An MDS score on each item ranges from 0 (Independent) to 4
(Total Dependence). The definitions of these ratings are provided on the rating form.
Full MDS assessments are required at least annually and when there is a significant
change in a patient's condition (such as a stroke, resulting in hemiparesis). A minimum
of approximately 1.2 million residents are assessed annually with the full MDS, with 3.6
million briefer updates and an unknown additional number of complete MDS
reassessments because of major changes in residents before an annual MDS is due
(Casten, Lawton, Parmelee & Cleban, 1998). While the MDS shares ADL content with
the FIM, it is targeted exclusively to SNFs, making comparisons across rehabilitation
settings difficult (Latham & Haley, 2003).
Although not as extensively studied as the FIM, research on the MDS suggests
that it too has adequate reliability and validity for use in research studies. Early research
studies by Hawes et al. (1995) showed that MDS items had interclass correlations of .7 or
higher in key areas of functional status, such as ADL and cognition. Sixty-three percent
of the items achieved reliability coefficients of .6 or higher and 89% achieved .4 or
higher (Hawes et al., 1995). Morris et al. (1994) showed that the seven cognitive items
(short-term memory, long-term memory, decision making, and four categories of
memory recall) showed an internal reliability of .83-.88.
While it has been suggested that the above psychometrics findings on the MDS
are "inflated" due to being administered by research staff (Stineman & Maislin, 2000)
Gruber-Baldini, Zimmerman, Mortimore & Magaziner (2000) showed that when the
MDS is administered by clinical staff, cognitive items (MDS-COGS) and cognitive
performance scale items (MDS-CPS) correlate moderately well with the Mini Mental
Status Exam (MMSE) (R=-0.65 and -0.68, respectively) and the Psychogeriatric
Dependency Rating Scale (PGRDS) Orientation scale (R=0.63 and 0.66, respectively).
The internal reliability of the MDS-COGS was .85 and the MDS-CPS (without the
comatose items) was 0.80. Confirmatory factor analysis studies, derived from clinical
and administrative databases of the MDS, confirmed all MDS domain clusters except
social quality (Casten et al., 1998).
Administrative Solution: Adopting a Single Instrument for
Measuring Outcomes in Post Acute Care
The existence of "competing" instruments in PAC settings (i.e., the FIM in
rehabilitation facilities and the MDS in SNFs) has led to considerable debate over which
of these instruments should be the basis for a post-acute care Prospective Payment
System (PPS) in the private sector (Centers for Medicare and Medicaid Services [CMS],
2001; DeJong, 2001). The demands for a PPS for rehabilitation facilities prompted the
HCFA in 1998 to develop the Minimum Data Set for Post-Acute Care (MDS-PAC), an
instrument similar in design to the MDS, intended to address the needs of subacute
facilities, rehabilitation facilities, and long-term care hospital patients (CMS, 2002). One
of the intended purposes of the MDS-PAC was to provide CMS with a tool by which it
could monitor the quality of health care services across post-acute settings (Granger,
Hamilton, Keith, et al., 1986). Originally, items similar to those found in the MDS for
rehabilitation were intended to be used in this new instrument. Yet, upon its completion,
it consisted of more than 400 items, and it lacked one-to-one correspondence with the
FIM (DeJong, 2001, p. 567). Martin Grabois, MD, President of the American Congress
of Rehabilitation Medicine (ACRM) stipulated in a letter in 2001 that the ACRM
strongly opposed implementation of the MDS-PAC and felt it was premature to use it as
a quality-monitoring tool in rehabilitation (DeJong, 2001). This challenge resulted in a
change in CMS's decision from using the MDS-PAC to using the Inpatient Rehabilitation
Facility Patient Assessment Instrument (IRF-PAI) as the basis for the post-acute PPS.
The IRF-PAI includes the FIM ADL/motor and cognition/communication items. This
IRF-PAI was mandated for implementation in rehabilitation facilities on January 1, 2002
While the final decision to use the IRF-PAI has tremendous economical benefits
(e.g., rehabilitation facilities not having to convert from the FIM to an MDS-based
instrument), it does not facilitate monitoring functional outcomes as patients cross from
rehabilitation to skilled nursing facilities. That is, a FIM-based outcome instrument will
continue to be used in rehabilitation facilities, while a MDS-based instrument will be
used in nursing home facilities. A drawback to having different functional outcome
measures across these two health-care settings is "test dependency." Data gathered on
one instrument cannot be compared to similar data gathered on an alternate instrument.
Wilkerson and Johnston (1997) noted that the absence of a single, standardized
instrument used in both rehabilitation facilities and SNFs was a fundamental barrier for
U.S. policymakers. Without such an instrument, these policymakers were hampered in
their ability to fulfill the emerging health policy mandate to monitor the quality and
outcomes of services for patients in PAC settings. As patients are transferred across
these settings, researchers, managers, and clinicians were unable to easily and accurately
track functional status changes.
The ability to measure a patient's functional status is not only important within a
particular rehabilitation setting, but also along the continuum of care. It is difficult to
accurately and precisely compare and contrast the magnitude of gains made through
rehabilitation efforts in various therapeutic PAC settings when such improvements are
evaluated on incongruous instruments based on divergent scales of measurement. As
Granger (1998) contends, there has been unfortunately too little effort in addressing
assessment and management of persons with disablement across the continuum of care.
As a result, the outcomes of therapeutic interventions cannot easily and accurately be
compared to determine their relative effectiveness.
This measurement of rehabilitation outcomes is an integral component of a major
controversy in rehabilitation over the best PAC setting in which to provide care.
Optimizing a patient's recovery while at the same time minimizing costs are the two
variables most often considered in this debate. Inpatient rehabilitation units typically
provide the highest level of services, although often at the greatest costs. On the other
hand, skilled nursing facilities generally do not provide the same extent and intensity of
therapeutic interventions, although at a reduced cost. The ability to directly compare
rehabilitation outcomes between, for example, an inpatient rehabilitation setting and a
skilled nursing facility would enhance our understanding of where patients benefit most
and from what interventions. In January of 2000, Sally Kaplan, Ph.D. of the Medicare
Payment Advisory Commission (MedPac) told the Subcommittee on Populations,
We strongly believe that it would be extremely useful, to say the least, to have
standardization of functional status measures at least in post-acute care so that if
similar patients are treated in different post-acute settings, or if patients are
treated in successive post-acute care settings, that we would have a means of
measuring them. ... It would expand the utility of regularly collected
information. (National Committee on Vital and Health Statistics, 2003).
Potential Measurement Solution: Linking Instruments
There is more than one way to coordinate rehabilitation services along the
continuum of care. For example, one might institute the use of the same instrument to
measure functional ability in all rehabilitation settings so that the results would be
directly comparable. In this manner, the patient's progress can be easily tracked across
settings and services. Unfortunately, attempts to use the FIM across settings have met
with limited success as there are sizeable obstacles to implementing such a plan.
Already, there has been a huge monetary investment in our current attempts to measure
rehabilitation outcomes. To figuratively, "throw out" these current instruments and
replace them with one new and improved universal measure would likely be cost-
prohibitive (Cohen & Marino, 2000). There would also likely be significant resistance to
implementing such a plan. In our capitalistic economy, private fortunes are often tied up
in maintaining the status quo and significant resistance would be likely. Furthermore,
significant costs would be incurred in training staff in a new system, as well as in the
implementation of a new database.
The nature of the predicament now facing rehabilitation services of having
multiple, yet incompatible instruments, attempting to quantify a single construct is not a
new one. It has been successfully overcome in others areas, for example, within the basic
sciences. Major scientific advances have been possible in part because the instruments
used to measure a construct were standardized and, in some cases, linked to other similar
instruments. An example of this is the historical attempts to measure the construct of
temperature. What began as a human sensation of"hot" and "cold" evolved into the field
of thermometrics, the measurement of temperature (Bond & Fox, 2001). In A.D. 180,
Galen mixed equal quantities of ice and boiling water to establish a "neutral" point for a
seven-point scale having three levels of warmth and three levels of coldness (Bond &
Fox). Then, in the 17th century, Santorio of Padua used a tube of air inverted in a
container of water so that the water level rose and fell with temperature changes. He
calibrated the scale by marking the water levels at the temperature of flame and ice
(Bond & Fox). Our current mode of measuring temperature uses mercury in a closed
tube. Even with a single mechanism by which we measure temperature, we use two
competing scales, namely, Celsius and Fahrenheit. Both of these scales merely set two
known temperature points (ice and boiling points) and simply divided the scale into equal
units. These two independently developed scales have been linked, so a "score" on one
scale can easily be converted to a score on the other scale.
A problem in developing measures in the human sciences is that, "we are clearly
dealing with abstractions (e.g., perceived social support, cognitive ability, and self-
esteem), so we need to construct measures of abstractions, using equal units, so that we
can make inferences about constructs rather than remain at the level of describing our
data" (Bond & Fox, 2001, p. 4). Yet, it appears hopeless to construct models of human
behavior since behavior seems to be so unpredictable. What we can do is estimate the
probability of a behavior taking place. We need to build a model, "more like models in
modern physics-models which are indeterministic, where chance plays a decisive role"
(Rasch, 1960, p. 11). What is then being described is the possibility of a behavior
occurring, that is the relative frequency of an event occurring. We can say that the
probability of something occurring is equal, for example, to 50 percent.
An alternative solution to this problem would be to use the same scale of
measurement as the basis of each instrument. Linking can be referred to as the
development, "of a common metric in IRT by transforming a set of item parameter
estimates from one metric onto another, base metric" (Hart & Wright, 2002, p. 2).
McHorney, 1997 pointed out, "The development of a shared language that goes beyond
specific items to location on an ability scale would provide users tremendous flexibility
in building and maintaining an outcomes capacity within and across different databases,"
(p. 749). The use of a shared language across rehabilitation settings would allow all
services along the continuum of care to be interrelated and coordinated. In this manner,
the outcomes, as well as the efficient utilization of resources could be maximized.
Doran and Holland (2000) write, "The comparability of measurement made in
differing circumstances by different methods and investigators is a fundamental
precondition for all of science" (p. 281). The development of the same scale of
measurement may be achieved through the utilization of Rasch analysis, a one-parameter
Item Response Theory (IRT). Georg Rasch, a Danish mathematician who examined
psychological measurement problems in the 1950s and 1960s, surmised that the
relationship between a person's ability and an item's difficulty can be modeled as a
probabilistic function. As a person's ability level increases, the probability of passing an
item also increases (Fox & Jones, 1998). The Rasch model specifies exactly how to
convert observed counts into linear (and ratio) measures (Wright & Linacre, 1989). IRT,
then, is both a theoretical framework and a collection of quantitative techniques used to
construct tests, scale responses, and equate scores. It consists of models, each designed
to describe a functional relationship between an examinee's ability and the characteristics
(or parameters) of the items on a test (McHorney & Cohen, 2000).
Performing a Rasch analysis on the FIM would address a problem that exists with
the interpretation of the FIM scores. It has been noted that a change of 10 raw score
points at the extremes of the FIM range is equivalent to four times as much change on the
linear scale as a change of 10 raw score points at the center of the FIM range (Linacre et
al., 1994). Thus, the improvement made by a person with moderate deficits, placing the
patient in the center of the scale, will appear to be much greater than a person who is
closer to the end of the scale even when the actually improvement might otherwise be
seen as equivalent. Linacre et al. (1994) cite this as one example of why a conversion
from raw scores to linear measures is so essential to quantifying changes in a patient's
status more appropriately. Performing a Rasch analysis on the FIM would be one way to
The Use of Linking Techniques
Educators in the United States have been involved in the process of equating and
linking instruments through a variety of statistical procedures for more than 40 years.
Kolen (2001) notes that the first pages of the first issue of the Journal of Educational
Measurement published in 1964 were on the subject of linking.
Linking is a scaling method used to achieve comparability of scores, to the extent
possible, between tests of different frameworks and test specifications (Muraki, Hombo
& Lee, 2000). Linking is distinguished from test equating, which involves making
statistical adjustments to scores from alternative forms of an instrument to account for
small differences in the difficulty of the test items on each form (Kolen, 2001). The
term, test equating, is traditionally used to refer to the case of linking when two or more
forms of a test have been constructed according to the exact specifications, such as equal
difficulty, reliability, and validity and constructed for the same purpose (Muraki et al.,
2000). The most basic of the equating methods is linear equating, which assumes that the
two tests to be equated differ only in means and standard deviations.
In IRT, linking is referred to as developing a common metric by transforming a
set of item parameter estimates from one metric to another, base metric (Kim & Cohen,
2002). The process of linking consists of an anchoring design and a transformation
method. The anchoring design ensures that there will be a basis for comparison between
the item calibrations on the two instruments (Vale, 1986). The linking transformation
refers to the equation used to put the item parameters on a common scale of
measurement. "These processes rest on two assumptions: (a) the different instruments
being linked measure the same underlying construct; and (b) the linking sample
represents the population for which the test is intended" (McHorney, 2002, p. 386).
The cocalibration of instruments purported to measure a common construct is
simply an extension of test equating, item banking, and partial credit principles that have
been in use in education for decades (Choppin, 1968; Fisher, Harvey, Taylor, et al., 1995;
Wright, 1984). Routine applications of Rasch measurement models in the development
and equating of instruments are performed by companies such as the Psychological
Corporation, school districts in Portland, Phoenix, Chicago, and New York, and medical
school admissions and certification boards, including the National Board of Medical
Examiners, the American Society of Clinical Pathologists, the American Dental
Association, the American Council of State Boards of Nursing (Fisher, Harvey, Taylor, et
Linking of Measures in Healthcare
"Linking studies are in their infancy in health status assessment and functional
health assessment in rehabilitation" (McHomey, 2002, p. 389. It has only been in the last
14 years that linking has been used in health status and rehabilitation assessment
(McHomey). In this setting, the use of linking techniques has been discussed (Fisher,
Harvey & Kilgore, 1995; Fisher, Harvey, Taylor et al., 1995; McHorney;) and linking
studies have been conducted (Chang & Cella, 1997; Fisher, 1997; Fisher, Harvey, Taylor
et al.; McHomey & Cohen, 2000). One of the first published applications of IRT to
functional health assessments tested the unidimensionality and reproducibility of the 10-
item Physical Functioning Scale (Ware, 2003). Examples of using IRT, specifically
Rasch analysis, to link health care instruments have appeared in the literature over the
last six years. These include the linking the FIM and SF-36 (Segal et al., 1997), the FIM
and the Barthel Index (Tennant & Young, 1997), the FIM and the Patient Evaluation
Conference System (PECS), and the SF-36 and the LSU HIS Physical Functioning scale
(Fisher, Eubanks & Marier, 1997). Additionally, Badia, Prieto, Roset, Diez-Perez, &
Herdman (2002) attempted to develop a short Osteoporosis-Specific Quality of Life
Questionnaire based on the assemblage (equating) of the items of two existing
questionnaires through the Rasch mathematical model.
McHorney (2002) states, "Measurement specialists who serve rehabilitation
medicine and other specialties are at the cusp of a paradigm shift away from sizable
reliance on classical test methods to broader use of IRT methods" (p. 390). The use of
Rasch measurement is becoming the preferred method in the development of functional
assessments among rehabilitation professionals for constructing tests (Ottenbacher et al.,
1996). Hays et al. (2000), McHorney (2002), Ware (2003), and Cella and Chang (2000),
along with many others in the social and medical sciences in the last 30 years, have
described IRT as a more promising framework for designing tests (Hambleton, 2000).
Features of IRT, such as sample independence and test independence, account for this
growing popularity over classical test theory (CTT) (Douglas, 1999; Linacre et al., 1994;
McHorney, 1997; Prieto et al., 1998; Hambleton, 2000; Silverstein et al., 1992; Velozo et
al., 1995). The measurement units of IRT have interval properties versus ordinal raw
scores used in CTT. Scores that have interval properties can be analyzed appropriately
using parametric statistics, while such analyses may be inappropriate on ordinal data.
Additionally, logit measures may remove measurement bias at the extreme ends of the
measured construct, while extreme raw scores are biased by nature and may
underestimate the magnitude of a difference or change score at the extremes (Cella &
Chang, 2000). Another drawback of CTT is that tests developed in this manner are
sample dependent. This means that items may look difficult when they are administered
to examinees at the low end of the score continuum; and alternately, the same items look
easy to those examinees at the high end of the score continuum. Thus, the item statistics
are dependent upon the ability level of the subject sample and have little value when
measuring subjects of a different ability level. Similarly, the problem of test dependency
can be defined as one where the person statistics are dependent upon the difficulty of the
test. If one changes the difficulty of the items in the tests, the two scores are no longer
There is some indication that IRT estimates of health outcomes are more
responsive to changes in health status over time. McHorney et al. (1997) found that the
sensitivity of the SF-36 physical functioning scale to differences in disease severity was
greater for a Rasch model-based scoring than it was for simple summated scoring. Fisher
as it becomes increasingly clear that the accountability of educators,
psychologists, health care providers, and other professionals cannot remain tied to
scale-dependent indicators of unknown or low statistical sufficiency, the
practicality, scientific rigor, and mathematical beauty of scale-free measurement
will become more widely appreciated. (p. 93)
Hays et al. (2000) predict IRT methods will be used in health outcome measurement on a
rapidly increasing basis in the 21st century.
Two mathematical models that are appropriate for linking functional outcome
measures are (a) the one-parameter IRT model (the Rasch model), which solves for
person ability through the single parameter and item difficulty, and (b) the two-parameter
model, which solves for person ability through two parameters, item difficulty, and item
discrimination. There is fervent debate over which model should be employed for
psychometric analysis and linking instruments. The debate ranges from whether a
scientific model should be made to fit the data (two-parameter model) or the data to fit
the model (one-parameter model) (Wright, 1997). There is also the issue of whether item
discrimination should be held constant across items (Rasch model) or allowed to vary
between items (two-parameter model) (McHorney, 2002).
While several studies indicate item discrimination is not constant across
functional status items (McHorney, 2002; McHorney & Cohen, 2000; Spector &
Fleishman, 1998), for pragmatic reasons, namely the availability of relatively small
sample sizes of patients with linked FIM and MDS data (n= 450), we are choosing the
Rasch model because of its simplicity and robustness under conditions of heterogeneous
item discrimination and small samples (De Gruijter, 1986; Kolen & Brenan, 1995). The
Rasch model has been shown to produce stable linking with sample sizes of 300-400
(Kolen & Brenan, 1995; Skaggs & Lissitz, 1986).
Rasch analysis can be used to link healthcare inventories that measure the same
construct. By linking inventories in this manner, one can improve the usefulness of both
* Refining the rating scale.
S Identifying the items that form a unidimensional construct.
* Verifying the expected difficulty hierarchy of the items.
* Providing for a means of converting scores between the two measures .
* Matching the ADL measures to specific descriptions provided by the scale.
The Rasch theory stipulates that a respondent's probability of answering an item
correctly is dependent only on two factors: the respondent's ability and the
characteristics of the item (Hambleton, Swaminathan & Rogers, 1991). Rasch analysis
has the ability to uniquely link a person's ability to an item's difficulty level (Velozo &
Peterson, 2001). Thus, a score on an instrument can be directly linked to the descriptive
content of the instrument (Velozo & Peterson, 2001). The examiner is able to describe
precisely a person's level of ability based on the score they receive. In many other cases,
a score on a test is uninterruptible in terms of a meaningful description of the level of
ability it represents. You may still be able to say with a reasonable level of confidence
that someone has more or less ability than someone else, but you still do not have a clear
description of the precise level of ability that person possesses. Furthermore, Rasch
analysis allows for the ranking of items so that all items on a scale can be put on a
continuum from least challenging to most challenging.
There has been only one published study attempting to link the FIM to the MDS.
Williams et al. (1997) compared scores on FIM and rescaled MDS ADL and cognitive
items [referred to as the "Pseudo-FIM(E)"] on 173 rehabilitation patients admitted to six
nursing homes. The matching and rescaling of the MDS was accomplished through an
expert panel, with the panel judging that 8 out of 13 FIM items had a corresponding
MDS item. Intraclass correlation between the FIM and rescaled MDS was .81, although
the mean calibration of 6 of the 8 FIM items differed statistically from the rescaled MDS
items. While this initial attempt at linking the FIM and the MDS was encouraging, the
methodology and statistical approach of the study had considerable limitations. For
example, expert-panel rescaling of the MDS can be challenged due to the lack of
adequate empirical support (i.e., a different panel of experts could develop a different
FIM-MDS matching and rescaling; Velozo, Kielhofner & Lai, 1999).
Fisher, Harvey, Taylor, et al. (1995) were the first to use common-sample
equating to link two global measures of functional ability, the FIM and PECS (Patient
Evaluation and Conference System). Using the methodology described above, they
showed that the 13 FIM and 22 PECS ADL/motor items could be scaled together in a 35-
item instrument. The authors found that separate FIM and PECS measures for 54
rehabilitation patients correlated .91 with each other and correlated .94 with the
cocalibrated values produced by Rasch analysis. Furthermore, these authors
demonstrated that either instrument's ratings were easily and quickly converted into the
other via a table that used a common unit of measurement, which they referred to as the
"rehabit." This common unit of measurement allows for the translation of scores from
one instrument to another. Since the results of Rasch analysis are sample-free, these
tables/algorithms can be used for all future and past instrument-to-instrument score
More recently, Fisher et al. (1997) replicated their previous study using common-
sample equating to link two self-report instruments: the 10 physical function items of the
Medical Outcome Scale (MOS) SF-36 (the PF-10) and the Louisiana State University
Health Status Instrument. Difficulty estimates for a subset of similar items from the two
instruments correlated at .95, again indicating that the items from the two scales were
working together to measure the same construct. McHorney and Cohen (2000) applied a
two-parameter IRT model to 206 physical functioning items (through 71 common items
across samples) and in a similar study; they linked 39 physical functioning items
(through 16 common items) from three modules of the Asset and Health Dynamics
Among the Oldest Old (AHEAD) study. Both studies demonstrated successful linking of
item banks through sets of common items, allowing placement of all items on a common
metric. Then in 2003, Jette et al. conducted a one-parameter Rasch partial credit analysis
for the entire item pool of the FIM, MDS, OASIS (Outcome and Assessment Information
Set for Home Health Care), and the PF-10 (the physical functioning scale of the SF-36)
items to develop an overall functional ability scale. These authors noted that the MDS
instrument covered content from the mid portion of the functional ability continuum with
less content coverage on the low and high ends while the FIM instrument covered a
relatively small portion along the middle to upper end of this continuum.
The above studies represent encouraging evidence that while physical function is
presently measured with many different instruments, it need not be tied to any particular
instrument. These studies support the idea that there can be a universal use of common
units for the measurement of functional ability. As a result, the creation of a translation
table between the FIM and the MDS should be possible for the measurement of
functional ability in PAC settings. Such a table would help avoid unnecessary
duplication of efforts when patients transfer from one PAC setting to another where
different instruments are used to measure functional ability. Scores can be readily
determined when they come in to a new setting on the new instrument. The creation of a
universal measure of functional ability in rehabilitation would create the following
* It allows one to accurately and precisely evaluate the effectiveness of treatment
S It allows one to accurately and precisely evaluate the effectiveness of the
program, thus increasing the program's efficiency.
S Measurements of progress are used to justify reimbursement for services.
(Merbitz, Morris & Grip, 1989)
In an unpublished study, Velozo (2004) performed a Rasch analysis to develop a
conversion table that linked scores on the FIM to analogous scores on the MDS. The
Rasch partial credit model Winsteps program (Linacre & Wright, 2001) was used to
calibrate item difficulties based on the linked FIM and MDS scores. The development of
this conversion table was based on a sample of 254 VHA patients who had completed
both the FIM and the MDS within 7 days of one another. The decision to restrict the
number of days between the completion of the FIM and MDS was based on the need to
minimize the impact any possible change in the patient's condition would have on the
scores. The use of 7 days as the criteria was based on the research lab's clinical
judgment. The physical ability items from both instruments were placed on the same
linear continuum and from this, a FIM-MDS conversion table was produced. The
purpose of this current study is to test the accuracy of the conversion table developed by
Velozo (2004). A new sample of records from 2,297 patients with linked FIM and MDS
scores collected between July 2003 to June 2004 was obtained from the VA's Austin
Automation Center databases. Using the conversion table developed by Velozo, the new
FIM scores are converted to MDS scores, and new MDS scores to FIM scores. The
converted-MDS (MDSc) scores are then statistically compared to the actual MDS scores
and the converted-FIM (FIMc) scores to the actual FIM scores. From these comparisons,
a determination of the accuracy of the FIM-MDS conversion table will be made.
The purpose of this study was to test the accuracy of a FIM-MDS conversion
table that was designed to transform a score on the physical component of the FIM to its
corresponding score on the MDS and vice versa. The methods utilized to investigate the
accuracy of the FIM-MDS conversion table are described in the following sections of this
S Source of the Data
S FIM and MDS Motoric Items
* Procedures Involved in the Creation of the FIM-MDS Conversion Table
* Statistically Testing the Accuracy of the FIM-MDS Conversion Table
This study was approved by the University of Florida's Institutional Review
Board for the protection of human subjects, as well as the Veteran's Administration's
Subcommittee on Clinical Investigations. This study also obtained a HIPAA Waiver of
Source of the Data
The FSOD and MDS data reside in two separate databases at the VA's Austin
Automation Center (AAC) (Veterans Health Administration, 2004). Upon consultation
with Dr. Hojlo, Chief of the VA Nursing Home Care, the most accurate VA-MDS data
were available starting in June of 2002. Therefore the data extractions were based on
data collected from June 2002 to May 2003. Data from both databases were downloaded
and merged on the basis of social security numbers, using the statistical software,
A single-group design was used, as both inventories were completed by the same
group of subjects. Because the same population completed both inventories, population
invariance and symmetry exist. This eliminates concerns that might otherwise arise over
differences caused by variability in the composition of the sample population.
Inclusion Criteria of Subjects
Subjects included in this study were those who were part of the VA's FSOD and
MDS databases, who had FIM and MDS scores completed no more than seven days apart
between June 2002 and May 2003 and who had data on all items included in the
development of the FIM-MDS conversion table. The decision to restrict the amount of
time that elapsed between the administration of the FIM and MDS was based on the need
to minimize the impact possible changes) in a patient's condition might have on the
resulting FIM and MDS scores.
Inclusion of Women and Minorities
Inclusion criteria were for males and females and all ethnic groups, as they
occurred in the VA's FSOD and MDS databases.
The records of 57,237 patients who underwent a FIM evaluation and 69,954
subjects who had MDS scores in a VA post-acute care setting between June 2002 and May
2003 were obtained from two separate databases housed at the VA's AAC. The linking of
these records, based on patient social security number and no more than 7 days between
FIM and MDS test dates, resulted in 2,521 matches. This data was then cleaned to exclude
duplicate records of the same subject with more than one match of test dates, as well as
those records that included missing or invalid scores (i.e., ratings other than acceptable) for
items on either the FIM or the MDS. This last exclusion was made to ensure that total
scores were compared to total scores, which is the basis of the FIM-MDS conversion table.
The result was 2,297 unique subjects with linked FIM and MDS scores.
The age of subjects ranged from 19-89+ years. Of those subjects between the
ages of 19 and 89, 50.7% were under the age of 70, 31.2% were in their 70's and 15.2%
were in their 80's. Only 1.5% of the sample was over the age of 89. The majority of the
sample was Caucasian at 73% with 20% being African American and 5% Hispanic.
Ninety-six percent of the sample was male and 44% were married. The days between the
administration of the FIM and the MDS ranged from 0-7 days with a mean of 5 (1.9)
days. Thirty-five percent (1,531) of the subjects had a diagnosis of stroke, 23% (525)
had lower extremity orthopedic problems, and 12% (271) had lower extremity
amputations. The remainder of the sample consisted of subjects with a variety of
FIM and MDS Motor Items
Table 3-1 is a list and comparison of the FIM and MDS motor items included in
this analysis. There are nine pairs of items between the FIM and MDS that are
considered to represent the same or nearly the same activity in this study. These pairs
include eating, grooming/personal hygiene, bathing, dressing, toileting, bowel
management, bladder management, transferring, and walking (Table 3-1). Items
included in only one instrument and not the other are bed mobility and stair use. While
both the FIM and MDS instruments include an item for eating, the FIM requires a higher
skill level in order to achieve the highest rating. This is because the FIM does not permit
finger feeding, nor does it allow people eating through adaptive means to achieve the
highest rating. There are also similar grooming and hygiene items on the two
instruments. The term, "bathing" for both instruments connotes a full body bath, to
Table 3-1. Comparison ofFIMT to MDS ADL/motor items
Bed, chair, wheelchair (transfer)
Walk in room
Walk in corridor
Locomotion on unit
Locomotion off unit
FIMTM rating scale
7 complete independence (timely, safely)
6 modified independence (device)
4 minimal assist (subject = 75%+)
3 moderate assist(subject = 50%+)
2 maximal assist (subject = 25%+)
1 total assist (subject = 0%+)
MDS-Rating scale (exceptions noted
2 limited assistance
3 extensive assistance
4 total dependence
8 activity did not occur during the
entire 7-day period
* Bathing in the MDS has a separate rating scale: 0-Independent, 1-Supervision, 2-
Physical help limited to transfer only, 3-Physical help in part of bathing activity, 4-
Total dependence, 8-Activity itself did not occur during entire 7 days.
** Bladder and Bowel Continence in the MDS also has a separate rating scale: 0-Usually
Continent, 2-Occasionally Continent, 3-Frequently Incontinent, 4-Incontinent
exclude one's face and hands. Yet, the MDS also incorporates bathtub or shower
transfers as part of bathing, while the FIM has a separate item for tub and shower
transfers. The MDS has one item for dressing, while the FIM divides the task into
dressing, upper body and dressing, lower body. In this study, the FIM item for dressing,
lower body was matched with the MDS item for dressing since, based on the lab's
clinical judgment, dressing the lower body would be considered a more difficult task than
dressing the upper body. This more difficult aspect of the ADL is incorporated in the one
MDS item for dressing. The FIM item for toileting is matched with the MDS item for
toilet use, as they have similar definitions, although the MDS includes toilet transfer in
the task while the FIM has a separate item for transfer. The MDS item for transfers is
then matched with the FIM item for transfers: bed, chair, and wheelchair. The bowel and
bladder control items on the FIM are matched with the bowel and bladder continence
items on the MDS. The FIM item for walk/wheelchair addresses one's ability to walk or
use a wheelchair safely on a level surface, while the MDS has four items for walking to
include walk in room, walk in corridor, locomotion on unit and locomotion off unit.
Although not included in the definition of the FIM item for walk/wheelchair, "150 feet is
specified as the performance criterion in the clarification of the rating scale" (Rogers,
Gwinn & Holm, 2001, p. 6). Therefore, the FIM item for walk/wheelchair was matched
to the MDS item for locomotion off unit. Furthermore, the FIM incorporates safety into
the definition of many of its items, such as grooming, bathing, dressing, transfers,
toileting skills, walking and wheelchair mobility, while the MDS does not (Rogers,
Gwinn & Holm, 2001).
The FIM and MDS have different response scales on which the physical
functioning items are scored. A clear distinction in the administration of these two
assessments is that the items of the FIM are scored at the time of the assessment, while
ratings on the MDS are based on observed performance over a 7-day period.
Furthermore, the FIM items have seven response levels, while the MDS has a range of
five. The FIM scoring criteria are shown in Table 3-2. The MDS scoring criteria are
shown in Table 3-3.
While the FIM motor items assess the percent of effort that is provided to the
patient to accomplish a task, the MDS measures the number of times during a 7-day time
period a patient required a certain level of assistance to perform a task.
Table 3-2. FIM scoring criteria
7 Complete independence
6 Modified independence
All of the tasks described as making up the activity are
typically performed safely, without modification,
assistive devices, or aid and within a reasonable
amount of time.
One or more of the following may be true: the activity
requires an assistive device; the activity takes more
than reasonable time, or there are safety (risk)
Supervision or Setup-Subject requires no more help
than standby, cuing or coaxing, without physical
contact, or, helper sets up needed items or applies
Minimal assist Subject requires no more help than touching, and
(Minimal Prompting) expends 75% or more of the effort.
Moderate Assistance Subject requires more help than touching, or expends
(Moderate Prompting) half (50%) or more (up to 75%) of the effort.
Maximal assistance Subject expends less than 50% of the effort, but at
(Maximalprompting) least 25%.
1 Total Assistance
Subject expends less than 25% of the effort.
Table 3-3. MDS scoring criteria
0 Independence No help or staff oversight OR Staff help/oversight
provided only one or two times during the last seven
1 Supervision Oversight, encouragement, or cueing provided three or
more times during last 7 days -OR- Supervision (3 or
more times) plus physical assistance provided, but
only one or two times during the last 7 days.
2 Limited Assistance Resident highly involved in activity, received physical
help in guided maneuvering of limbs or other
nonweight-bearing assistance o three or ore occasions
-OR- limited assistance (3 or more times), plus one
weight-bearing support provided, but for only one or
two times during the last 7 days.
3 Extensive Assistance While the resident performed part of activity over last
seven days, help of following type(s) was performed
three or more times:
--Weight-bearing support provided three or ore times;
--Full staff performance of activity (3 or more times)
during part (but not all) of last 7 days.
4 Total Dependence Full staff performance of the activity during the entire
7-day period. There is complete nonparticipation by
the resident in all aspects of the ADL definition task.
If staff performed the activity for the resident during
the entire observation period, but the resident
performed part of the activity himself/herself, it would
not be coded as a "4" (Total Dependence).
Procedures Involved in the Creation of the FIM/MDS Conversion Table
For the purposes of creating a FIM-MDS conversion table, Velozo (2004)
obtained linked FIM and MDS scores from the records of 254 subjects. The linking of
instruments using IRT methodologies is generally dependent on item calibrations, which
are the "difficulty" measures of the items. In essence, item calibrations serve as the
markings on the conversion ruler. Rasch analysis of the FIM and MDS converts a
patient's responses on the instrument items to a measure of ADL/motoric function.
Prior to performing the Rasch analysis, several steps were taken so that the FIM
and MDS rating scales were conceptually consistent. One inconsistency between the
FIM and MDS is that the MDS includes a rating for "activity did not occur." Using a
procedure adapted by Jette, Haley, and Ni (2003), this MDS rating was recorded as part of
the "total dependence" rating. The rationale underlying this decision was that a likely
explanation for an activity not occurring was that the activity could not be performed
(Buchanan, Andres, Haley, Paddock & Zaslavsky, 2002; Jette et al., 2003). Other
inconsistencies between the FIM and MDS are that the rating scales progress in different
directions and have different ranges (i.e., from 1 to 7 for the FIM and from 4 to 0 for the
MDS). In order to adjust for these differences, the MDS scale was rescored and rescaled
to match the rating scale used in the FIM. For example, a "4" on the MDS, which
represents total dependence, was recorded as a "1" to match Total Assistance on the FIM
and a "0" on the MDS, which represents "Independence," was recorded to a "7" to match
the rating for "Complete Independence" on the FIM (Table 3-4).
Table 3-4. FIM-MDS score conversion
MDS Score Conversion FIM
Independent 0 to 7 Complete Independence
Supervision 1 to 5 Supervision
Limited Assistance 2 to 4 Minimal Assistance
Extensive Assistance 3 to 2 Maximal Assistance
Total Dependence 4 to 1 Total Assistance
Following the rescoring of the MDS rating scale, the next step in creating the
FIM-MDS conversion table was to run a Rasch partial credit model analysis on the linked
FIM and MDS scores, using Winsteps (Linacre & Wright, 2000). This combined
analysis placed the FIM and MDS items and rating-scale calibrations on the same linear
scale with the same local origin. That is, FIM item and rating-scale calibrations became
"linked" to MDS item and rating-scale calibrations. This also provided cocalibrated item
and rating-scale values, which were then used as anchors in separate FIM and MDS
analyses. Both the anchored FIM and MDS analyses generated output tables that
associated total FIM and MDS raw scores with a common logit scale. These analyses
resulted in a conversion table whereby total FIM raw scores could be translated into total
MDS raw scores and vice versa (Table 3-4).
Converting a score from the FIM to the MDS and vice versa is as simple as
locating a score under either the FIM or MDS column (Table 3-5), reading across the
adjacent logit's column to find the equivalent score on the alternate instrument. It is
hypothesized that a score of 16 on the MDS, represents the same amount of functional
independence as a score of 39 on the FIM. Similarly, a score of 58 on the FIM represents
the same amount of functional independence as does an MDS score of 29 (Table 3-4).
Nine of the FIM items have corresponding items on the MDS that address similar
areas of physical functioning. These items include eating, grooming/personal hygiene,
bathing, dressing, toileting, bowel management, bladder management, transferring, and
walking. After performing the Rasch analysis on the FIM and MDS total scores to link
the two measures, the resulting correlation between the similar FIM and MDS items was
.822 at p<.01. Similar person measures between scores on the FIM and the MDS
correlated at .703.
Statistically Testing the Accuracy of the FIM-MDS Conversion Table
The purpose of this research study was to test the accuracy of the FIM-MDS
conversion table. In order to accomplish this, a second, independent sample of VA
patients with scores on both the FIM and MDS, completed within 7 days of one another,
was obtained from the VA's Austin Automation Center. Before submitting this dataset to
FIM logit MDS FIM logit MDS FIM logit MDS
the FIM-MDS conversion table, the same adjustments made to the MDS rating scale
when creating the conversion table were applied to the current dataset. Using the
Statistical Package for Social Sciences (SPSS), version 12.0 for Windows, the MDS
rating for "activity did not occur" was recorded as "total dependence" following the
scoring protocol of the FIM. Then, the MDS rating scale was recorded and rescaled also
to match the rating scale of the FIM. The FIM-MDS conversion table was then used to
convert the second sample of actual FIM scores, designated as FIMa, to converted MDS
FIM-MDS conversion table
scores, designated MDSc In the same manner, MDSa scores were converted to FIMc
scores. Then, the actual scores on the FIM and MDS were compared to their
corresponding converted scores to determine how similar the actual and converted scores
The goal of equivalence testing is to demonstrate that two or more conditions are
statistically the same (Stegner, Bostrom & Greenfield, 1996). In this type of testing, one
reverses the role of the null and alternative hypotheses and then by testing a set of these
reversed hypotheses, demonstrates equivalence with a predetermined significance level
just as when demonstrating a difference between groups (Stegner et al.). The
equivalence methodology is a simple application of bioequivalence principles proposed
in Pharmacokinetics and Biopharmaceutics recently. The idea is to "prove" statistically
that two drugs or formulations are equivalent (Berger & Hsu, 1996; U.S. FDA, 1997,
1999). This methodology was used in the current study in order to compare the statistical
equivalence of FIMa and FIMc scores, as well as MDSa and MDSc scores. It was
hypothesized that a minimum of 75% of the actual and converted scores should be within
5 points of one another in order for the conversion to be considered accurate. If less than
that occurred, then the conversion table would not be considered accurate enough for use
in a clinical setting. A difference of 5 points was employed, as Forrest, Schwam, and
Cohen (2002) found that "each 5-point decrement in the FIM score correlated with the
need for about one hour per day of help in mobility, basic activities of daily living, and
instrumental activities of daily living" (p. 57). Yet, Granger et al. (1993) indicated that
while no recommendations existed for what constituted a clinically significant change on
the FIM, a 10-point improvement decreased by almost 50% the time required to care for
a group of stroke patients in the community. For this study, a clinically significant
difference in scores was set at the more rigorous 5-point increment.
In order to apply a more precise analysis to the determination of the level of
accuracy of the FIM-MDS conversion table, techniques as described in Dorans and
Lawrence (1990) were utilized. In their study, Dorans and Lawrence tested the score
equivalence of nearly identical editions of the Scholastic Aptitude Test (SAT). These
two versions were comprised of the same test questions, but the order in which the test
was administered differed. In one test situation, the 40-item verbal section of the SAT
might precede the 45-item verbal section. In the other situation, the 45-item verbal
section might proceed the 40-item verbal section. The test was administered to what was
presumed to be statistically equivalent groups of examinees. Linear equating techniques
were used to equate one version of the test to the other version and the accuracy testing
of the resulting scores was accomplished by checking whether the identity transformation
fell within a reasonable confidence interval placed around that equating function (Dorans
& Lawrence). The difference between the equating function and the identity
transformation was calculated and then that difference was divided by the standard error
of the equating function. If the resulting ration fell within a bandwidth of plus or minus
two, then the equating function was considered to be within sampling error of the identity
Kolen and Brennan (1995) indicated that in order for equating of two measures to
be successful, the four moments of the distribution should be statistically equivalent.
Therefore, the four moments of the distribution, including the means, standard
deviations, skewness, and kurtosis were also calculated and compared.
Correlations between the actual and converted scores on both the FIM and MDS
were also determined, as was effect sizes. Effect sizes give a clear indication of the
amount of difference that exists between two scores distributions. The standard of effect
sizes as noted in Cohen (1988) was used to determine the percent of overlap that existed
between the converted and actual scores (Table 3-6).
Table 3-6. Effect size
Cohen's Standard Effect Size Percentile Standing Percent of Overlap
Large 0.8 79 52.6%
0.7 76 57.0%
0.6 73 61.8%
Medium 0.5 69 69.0%
0.4 66 72.6%
0.3 62 78.7%
Small 0.2 58 85.3%
0.1 54 92.3%
0.0 50 100.0%
(Adapted from Cohen, 1988)
Additionally, an analysis on the data was performed to obtain an understanding of
how similar the raw scores on similar items between the FIM and MDS were. A factor
that might negatively impact the accuracy of the conversion table would be the presence
of large differences in the scores for individuals on similar FIM and MDS items. In order
to systematically test for the level of disparate scores present in the current dataset, the
ratings on the MDS were converted to their closest corresponding ratings on the FIM, as
shown in Table 3-7.
Table 3-7. Rating scale conversion
MDS Score Conversion FIM
Independent 0 to 7 Complete Independence
Supervision 1 to 5 Supervision
Limited Assistance 2 to 4 Minimal Assistance
Extensive Assistance 3 to 2 Maximal Assistance
Total Dependence 4 to 1 Total Assistance
For example, the score for independence on the MDS, "0," was converted to a "7"
to correspond with the FIM score for complete independence. Similarly, an MDS score
of"3," indicating extensive assistance, was converted to a score of "2" for maximal
assistance. This rating conversion was accomplished on the nine FIM and nine MDS
items that most closely matched (Table 3-8).
Table 3-8. Similar FIM and MDS items
FIMTM items MDS items
Grooming Personal hygiene
Dressing-lower body Dressing
Toileting Toilet use
Bladder management Bladder continence
Bowel management Bowel continence
Bed, chair, wheelchair (transfer) Transfer
Walk/wheelchair Locomotion off Unit
It was hypothesized that a difference of four points on the rating scale represented
a important difference in a person's functional ability. For instance, a score of "" on the
FIM indicates 'Total Assistance,' where the patient exerts less than 25% of the effort
required in performing a task. A distance of four points away from the rating of "1"
would be a score of "5," which represents 'supervision,' indicating that the subject
requires no more help than stand by assistance to complete a task. The criteria of looking
at only matched scores with a difference of four or more points is rigorous in that this
situation only occurs between the following three score categories: 1-5; 1-7; and 2-7.
Setting a criterion of selecting only those subjects who had a difference of four or more
points between scores on similar items demonstrates the examples of ratings that are
clinically different on similar items.
The FIM-MDS conversion table was used to transform FIMa scores, obtained
from the records of 2,297 subjects, to MDSc scores and MDSa scores to FIMc scores.
The converted scores were then analyzed statistically, first at the individual level and
then at the group level, as a means of determining the level of accuracy of the FIM-MDS
conversion table. In order for the conversion table to be considered accurate for use in
clinical settings, it was hypothesized that 75% of the actual and converted scores on the
FIM and the MDS should be no more than five points apart. Next, techniques used by
Dorans and Lawrence (1990) to test the accuracy of an attempt to equate nearly identical
editions of the Scholastic Aptitude Test (SAT) were applied to this dataset. These
procedures were used to determine whether the converted scores on the FIM and MDS
fell within a reasonable confidence interval placed around that actual FIM and MDS
scores. The difference between the actual and converted scores was calculated and then
that difference was divided by the standard error of the actual scores. If the resulting
ratio fell within a bandwidth of plus or minus two, then the converted scores were
considered to be within sampling error of the actual scores.
When testing the accuracy of the FIM-MDS conversion table from the
perspective of group scores, the equivalence of the four moments of the distributions
(i.e., the means, standard deviations, skewness, and kurtosis) were compared. Next, the
amount of overlap that existed between the actual and converted scores on the FIM and
MDS was determined by calculating effect sizes. Correlation between the actual and
converted scores on the FIM and MDS were also calculated and those results are
displayed graphically, as well in scatter plots.
Statistical Results at the Level of the Individual
Of the 2,297 subjects analyzed, 25% (574) of the sample had FIMa and FIMc
scores that fell within five points of one another with the difference in scores ranging
from 0 to 71 points (Figure 4-1). For the MDS, 37% (850) of the sample had actual and
converted scores within no more than five points of one another with the difference in
scores ranging from 0 to 48 (Figure 4-2). These percentages fell well short of the 75%
standard for both the FIM and the MDS.
For comparison purposes, the percentage of subjects with actual and converted
FIM and MDS scores within 10 points of one another was also calculated. On the FIM,
45.5% (1,045) of the subjects had actual and converted scores within 10 points and 64%
(1,470) of the subjects had MDS scores within 10 points. Even when this more lenient
criterion was used, the results continue to fall short of the 75% standard.
The equivalence of the actual and converted FIM scores and the actual and
converted MDS scores was also evaluated using the test of equivalence employed by
Dorans & Lawrence (1990). For the FIMa vs. FIMc scores, 8.4% of the conversion met
the criterion for equivalence and for the MDSa and MDSc scores, 6.4% met this
Statistical Results at the Group Level
Cooper (1989) concluded that in order for the equating of scores to be successful,
all four moments of the distribution should be similar. The four moments of the
Std. Dev = 10.38
Mean = 13.5
N = 2297.00
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
5.0 15.0 25.0 35.0 45.0 55.0 65.0
Number of Points by which the FIMa and FIMc Differ
Figure 4-1. Score difference between the FIMa and FIMc
Std. Dev = 7.05
N = 2297.00
0.0 10.0 20.0 30.0 400 50.0
5.0 15.0 25.0 35.0 45.0
Number of Points by which the MDSa and MDSc Differ
Figure 4-2. Score difference between the MDSa and MDSc
distribution, the mean, standard deviation, skewness, and kurtosis of the actual and
converted scores on the FIM and MDS are displayed in Table 4-1.
Table 4-1. Four moments of the distributions
FIMa FIMc MDSa MDSc
Valid 2297 2297 2297 2297
Missing 0 0 0 0
Mean 52.37 60.35 31.86 25.86
Std. Error of Mean .432 .409 .280 .295
Median 54.00 62.00 33.00 26.00
Std. Deviation 20.69 19.60 13.43 14.12
Skewness -.143 -.388 -.406 -.041
Std. Error of Skewness .051 .051 .051 .051
Kurtosis -.855 -.573 -.630 -.901
Std. Error of Kurtosis .102 .102 .102 .102
The mean of the FIMc was eight points greater than the mean of the FIMa while
the mean of the MDSa exceeded the mean of the MDSc by six points. The standard
deviation between the actual and converted FIM scores (20.69 and 19.60) was within one
point of each other and for the two MDS scores, there was slightly more than a one-point
difference at 13.43 for the MDSa and 12.12 for the MDSc. Since the mean of the FIMa
and FIMc differed by only eight points, these scores fell well within one standard
deviation of each other. Similarly, the means of the actual and converted MDS differed
by six points and also fell within one standard deviation of each other. Therefore, the
first two moments of the distributions, the mean and standard deviation, were equivalent.
Since the mean score can be highly influenced by outliers, the median scores for
the distributions were also reported. The difference in the medians of the FIMa and
FIMc score distributions was eight points, just as it had been with the difference in the
means. The medians of the MDSa and MDSc differed by seven points. The medians for
all four of the score distributions exceeded their respective means, indicating the
presence of a negative skew to the score distributions. Normal distributions produce a
skewness statistic of about zero. A skewness or kurtosis value of two standard errors or
greater, regardless of sign, likely deviates from a normal score distribution to a
significant degree (Brown, 1997). Two times the standard error of skewness for all four
distributions was .051 and two times the standard error of kurtosis was .102, again for all
four score distributions. Therefore, the distributions of the actual FIM and MDS
instruments had a significant negative skew, indicating that the subjects measured on
these two inventories were generally more able than the tests were difficult. The
distribution of the FIMc was also negatively skewed to a significant degree, yet the
MDSc did not have a significant skew value and the distribution would be considered
symmetrical in this regard. The distributions of both the actual and converted scores on
the FIM and MDS all had negative kurtosis values, indicating that these distributions
were flatter than what one would expect and differed from normal to a significant degree.
The results of this conversion revealed a substantial overlap between the
distributions of the actual and converted FIM scores (Figures 4-3 and 4-4), as well as
between the actual and converted MDS scores (Figure 4-5 and 4-6). An effect size of .2
demonstrated an 85.3% overlap of the distributions for the actual and converted scores in
Correlations between the actual and converted scores were calculated and
revealed a .724 correlation at p<.01 between the FIM scores and a correlation of .745 at
p<.01 between the MDS scores. Scatter plots of the actual and converted scores for the
FIM (Figure 4-7) and for the MDS (Figure 4-8) graphically demonstrate the level of
correlation between the actual and converted scores on the two instruments. Figure 4-5
demonstrates a ceiling effect exists for the actual FIM scores, while Figure 4-6
demonstrates a ceiling effect for the converted MDS scores.
Std. Dev= 20.69
N = 2297.00
15JD 25JD 35.0 46D5 55JD 65JD 75D 85JD
20.0 300 400 50JO 600 700 80.0 90O
Total FIM Actual Scores
Figure 4-3. Distribution of FIM actual scores
Std. Dev= 19.59
Mean = 60.4
N = 2297.00
15.0 25.0 35.0 45.0 55.0 65.0 75.0 85.0
20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0
Total FIM Converted Scores
Figure 4-4. Distribution of FIM converted scores
Std.Dev= 1 343
N = 2297.00
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0
Total MDS Actual Scores
Figure 4-5. Distribution of MDS actual scores
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0
Total MDS Converted Scores
Figure 4-6. Distribution of MDS converted scores
Std. Dev = 14.12
0 20 40 60 80 100
Actual FIM Scores
Scatterplot of the FIMa and FIMc scores
50 -a -
Au a aD aScre
30 a a
20 -o a
10 on on
0- ----- aaa m a an a
-10 0 10 20 30 40 50 60
Actual MDS Scores
Scatterplot of actual MDS and converted MDS scores
0 w a a i1 'P. -U12 -M U
IP a, u
u a0 % 3
c a6 cg3 9 on
air r -a nun a, p, 4
Discrepancies in the Dataset
Of the 2,297 subjects in the dataset used in this study, 51% (1,163) had at least
one of the similar items in which there was a difference of four or more rating points
between scores on the FIM and the MDS. Three percent (76) of the subjects had four or
more of the nine similar items with score differences of four or more rating points. Two
percent (2) of the 109 subjects with FIM and MDS scores recorded on the same day had
more than four similar items with score differences of four or more rating points.
Summary of Results
The results of this study testing the accuracy of the FIM-MDS conversion table
were mixed, as those statistics at the group level tended to support the accuracy of the
conversion, while those at the individual level did not. The findings that support a
conclusion of equivalency between the actual and converted scores on both the FIM and
MDS included an 85.3% overlap between the respective score distributions, as well as a
correlation of .724 for the FIM and .745 for the MDS. The means of the actual and
converted FIM scores were well within one standard deviation of each other, as were the
means of the actual and converted MDS scores. The results of the two one-sided test
procedures supported this conclusion of equivalency between the respective FIM and
MDS means, as well.
On the other hand, only 25% (574) of the sample had FIMa and FIMc scores
within 5 points of one another and 37% (850) had MDSa and MDSc scores within that
range. Those percentages fell well short of the hypothesized 75% of the sample having
actual and converted scores that were no more than 5 points apart. If the standard for an
accurate conversion system were lowered to allow for up to a 10-point difference
between actual and converted scores, then 45% (1,045) of the sample would have FIM
scores and 64% (1,470) would have MDS scores within that range. While these
percentages were closer to the 75% criteria, they still fall short. The presence of a
negative skew for all four of the score distributions was an indication that the subjects'
ability levels were higher than the difficulty levels of the inventories. All of the score
distributions except for the MDSc demonstrated a skewness value that resulted in a
significant departure from normal. The level of kurtosis for all four distributions also
deviated from normal to a significant degree. When using the very rigorous procedures
described by Dorans and Lawrence (1990) to determine equivalency, only 8.4% of the
scores on the FIMa and FIMc and 6.4% of the MDSa and MDSc scores met this criterion.
A ceiling effect in the distribution of the FIMc and the MDSa scores was present.
Submitting the FIM and MDS scores to the FIM-MDS conversion table resulted in an
inflation of FIM scores and a deflation of MDS scores. Taking all of the above
information into consideration, the FIM-MDS conversion table passed less stringent
standards for equivalency, generally at the group level, but failed when focusing on
statistics at the individual level. Since other attempts at creating conversion tables
between instruments used in rehabilitation have not gone further to test the accuracy of
those conversions, no direct comparisons to other research findings can be made.
An understanding of the psychometrics of the FIM-MDS conversion table is
important when interpreting the results of this study. The conversion table was
developed from a dataset of 253 subjects with FIM and MDS scores occurring within 7
days of one another. Similar FIM and MDS items used in the development of the
conversion table had a correlation of .82 at p<.01, and there was a correlation of .70 at
p<.01between similar person measures. These correlations are not as strong as those
reported in the study by Fisher et al. (1997). Fisher et al. found a correlation of .95
between difficulty estimates for a subset of similar items on the PF10 and the PFS, both
of which are self-report measures. The lower correlations for the FIM-MDS conversion
table for similar item and similar person measures may be explained by differences in the
design between this and previous studies. Fisher et al. used a convenience sample of 285
patients who were waiting for appointments in a public hospital general medicine clinic.
These patients were asked to complete the PF10 and the PFS inventories while they
waited to see a doctor and, therefore, no opportunity existed for a change in physical
condition to take place between the completion of the two surveys. There was also no
possibility for different raters to score the two instruments on the same subject since the
raterr" in both cases was the patient. Fisher et al. also removed the least consistent cases
from the analysis, meaning cases with the highest outfit statistics were not included in
creating the conversion table.
The correlations found in the current study were also not as strong as those
reported by Fisher, Harvey, Taylor, et al. (1995), who obtained a correlation of similar
person measures of .91 between the instruments used in the development of the Rehabits'
translation scale. The stronger correlation obtained by Fisher, Harvey, Taylor, et al. may
be the result of differences in the design of the study, as compared to the current one.
Fisher and colleagues used the results of 54 consecutive patients admitted to a free-
standing rehabilitation hospital, who were rated on both the FIM and the PECS at
admission and discharge. Thus, raterr" variability was controlled, and there was no
possibility for physical changes to take place in the patient's condition between the
administrations of the two inventories upon admission and then again upon discharge
because the inventories in each case were completed at the same time.
Studies by both Fisher, Harvey, Taylor, et al. (1995) and Fisher et al. (1997) that
used cocalibration equating measures to link healthcare instruments did not test the
correlations between actual and converted scores. Therefore, it is difficult to clearly
define the significance of a correlation of .72 for the FIMa and FIMc and a correlation of
.75 for the MDSa and MDSc. If this were a classical test-retest reliability study, these
correlations would be considered low. The question left for the current study is whether
better results could be obtained by creating a conversion system based on more accurate
data. Yet, an attempt to use more controlled data collection methodologies would limit
the applicability of the study in clinical settings.
The limited accuracy of the FIM-MDS conversion table may be a result of
problems with the sample upon which it was based. This conversion system was
developed from a dataset of 253 subjects with FIM and MDS scores occurring within
seven days of one another. It may be that a larger dataset is necessary to create a highly
accurate conversion system. Furthermore, a significant limitation to the dataset used in
this study is the presence of scores on similar FIM and MDS items for the same subject
that differ markedly. It can be argued that differences in the definitions of similar items
between the FIM and MDS led to variability in scores. For example, the definition of
"eating" on the MDS focuses "on the intake of nourishment by any means, regardless of
skill, and includes the use of alternate forms of obtaining nourishment, such as tube
feeding" (Rogers et al., 2001 p. 7-8). Yet, eating on the FIM is limited to the use of
suitable utensils to bring the food to the mouth (Rogers et al.). The MDS item for
bathing includes bathtub or shower transfers while the FIM bathing item does not. And,
the MDS toileting item includes the ability to transfer to and from the toilet while the
FIM item for toileting does not. Furthermore, the FIM incorporates safety into the
definition of many of its items, such as grooming, bathing, dressing, transfers, toileting
skills, walking and wheelchair mobility, while the MDS does not (Rogers et al.). Thus,
one person could conceivably obtain a very different score on two of the similar items
between the FIM and MDS. Alternatively, the fact that up to a 7 day difference in test
scores between the FIM and the MDS was allowed may have also had an impact on the
presence of discrepancies that were seen between similar items for the same subject.
While the decision to restrict the number of days between the completion of the FIM and
MDS was based on the need to minimize the impact any possible change in the patient's
condition would have on their scores, it is likely that even this restriction did not go far
enough to eliminate this source of error in the study. Thus, some of the discrepancies in
scores that were seen between similar items may be due to a change in the subject's
A related explanation for the problems encountered in developing a highly
accurate conversion table is the existence of discrepancies between similar categories in
the rating scales of the FIM and the MDS. A FIM score of "1" refers to "Total
Assistance," in which the patient puts forth less than 25% of the effort necessary to
perform a task. The corresponding score on the MDS is a "4" for "Total Dependence."
This score is defined as, "full staff performance of the activity during entire 7-day period.
There is complete nonparticipation by the resident in all aspects of the ADL definition
task." A noticeable difference exists between these two rating definitions. On the FIM,
the patient may exert up to a quarter of the effort needed to perform the test, while on the
MDS the patient does not participate at all in the activity. Instead, the staff is required to
perform the full activity. It may be reasonably be argued that there is a clinically
significant difference in a patient who can exert up to 25% of the effort required to
complete a task and one who cannot exert any effort at all. Similarly, a score of "2" on
the FIM refers to "Maximal Assistance," in which the patient puts forth less than 50% of
the effort necessary to do a task, but at least 25% of the effort. The corresponding score
on the MDS of"l" or "Supervision" is defined as "Oversight encouraged or cuing
provided three or more times during the last 7 days OR supervision (three or more times)
plus physical assistance provided, but only one or two times during the last 7 days. Once
again, a significant difference in meaning is evident between these two categories.
A further limitation of this study is that there may be a selection bias in that
patients that have scores on both the FIM and MDS may not typical of the patient
population in general. Additionally, it is likely that characteristics of this veteran
population may not be representative of the population for whom the FIM and MDS are
intended. Most notably, 98% of the study population is male. Yet, the greater majority
of people in nursing homes, where the MDS is used, are female. Similarly, the ethnic
make up of this study population is also likely not an accurate reflection of the make up
of the population who most frequently use the FIM and MDS. The unique aspects of the
subjects in this study may limit the generalizability of the findings.
Implications for Future Research
The results of this study lead one to consider other possible situations in which
the linking of instruments may be effective. There is evidence to suggest that the
creation of a conversion table based on two self-report instruments would have a higher
degree of accuracy (Fisher et al., 1997). One possible reason for the higher degree of
accuracy is that rater bias is not an issue, as the same raterr" (i.e., the subject or their
proxy) would complete both instruments. Additionally, the research study could be set
up so that the subject completed both instruments at the same time. In this manner, it
would not be possible for a change in the subject's physical condition to take place and it
is hypothesized that it would be unlikely for a subject to interpret similar items between
instruments differently. Thus, these two sources of error, rater bias and the possibility
that there has been a change in the subject's physical condition between the
administrations of the two instruments, would be eliminated and the conditions required
for the creation of a conversion table would be optimized. The use of self-reports in a
clinical setting and/or in a research study would also have an economic advantage, as it
would reduce the amount of time a trained therapist would need to be involved in the
Maximizing outcomes in rehabilitation, while streamlining the process of
providing highly effective and coordinated services, will continue to be a goal of
rehabilitation for years to come. Efforts to increase the continuity of care between PAC
settings and to improve the effectiveness of rehabilitation services will be pursued on all
levels. This research focused on determining the accuracy of one such effort, namely a
means of creating an easily implemented and highly effective tool for converting the
score from the physical component items on the FIM to those on the MDS and vice
versa. The results of this study suggest that scores derived from the FIM-MDS
conversion table should, at best, only be considered as rough estimates of similar scores
on the two instruments. At the conclusion of this study, the question still remains as to
whether the FIM and MDS instruments can measure physical functioning on a common
unit of measurement and whether a highly accurate conversion table can be developed so
that a patient's gains in physical functioning can be tracked from inpatient rehabilitation
settings to skilled nursing facilities. It may be that pursuing research in alternative
directions, such as using these linking techniques to create a conversion table between
self-report instruments of functional ability, will provide a solution.
Anderson, S., & Hauck, W. W. (1990). Consideration of individual equivalence. Journal
ofPharmacokinetics Biopharmaceutics 18, 259-273.
Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible
paradigms? Medical Care, 42 (Suppl. 1), 1-7.
Badia, X., Prieto, L., Roset, M., Diez-Perez, A., & Herdman, M. (2002). Development of
a short osteoporosis quality of life questionnaire by equating items from two
existing instruments. Journal of ClinicalEpidemiology, 55(1), 32-40, 2002.
Berger, R., & Hsu, J. (1996). Bioequivalence trials, intersection-union tests, and
equivalence confidence sets. Statistical Science, 11, 283-319.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, NJ: Erlbaum.
Bond, L., Moss, P., & Carr, P. (1996). Fairness in large-scale performance assessment. In
G. Phillips (Ed.), Technical issues in large-scale performance assessment
(pp. 117-140). Washington, DC: National Center for Education Statistics.
Brown, J. D. (1997). Skewness and kurtosis. .lnkhi, JALT Testing andEvaluation SIG
Newsletter, 1(1), 16-18.
Buchanan, J. L., Andres, P. L., Haley, S. M., Paddock, S. M., & Zaslavsky, A. M. (2003).
An assessment tool translation study. Health Care Financing Review, 24(3),
Casten, R. C., Lawton, M. P., Parmelee, P. A., & Cleban, M. H. (1998). Psychometric
characteristics of the Minimum Data Set I: Confirmatory factor analysis. Journal
of the American Geriatric Society, 46, 726-736.
Catalogue of Federal Domestic Assistance.(2002). Veterans nursing home care.
Retrieved September 12, 2002, from
Cella, D., & Chang, C. (2000). A discussion of item response theory and its applications
in health status research. Medical Care, 38(9, Suppl. 2), 66-72.
Centers for Disease Control. (2002). Health, the United States, 2001. Retrieved March 9,
2002, from http://www.cdc.gov/nchs/hus.htm
Centers for Medicare and Medicaid Services. (2001, October-November). Presentation
for the inpatient rehabilitation facility-patient assessment instrument. Retrieved
September 12, 2002, from
Centers for Medicare and Medicaid Services. (2002). Long-term care minimum data set.
Retrieved September 23, 2002, from
Centers for Medicare and Medicaid Services. (2003). RAI version 2.0 manual. Retrieved
June 24, 2004, from http://www.cms.hhs.gov/quality/mds20/raich3.pdf
Chang, C., & Cella, D. (1997). Equating health-related quality of life instruments in
applied oncology settings. Physical Medicine and Rehabilitation: State of the Art
Reviews, 11(2), 397-406.
Choppin B. (1968). An item bank using sample free calibration. Nature, 219, 870-872.
Cohen, M. E., & Marino, R. J. (2000). The tools of disability outcomes research
functional status measures. Archives ofPhysical Medicine and Rehabilitation,
81(Suppl. 2), S21-S29.
Cook, L., & Peterson, N. (1987). Problems related to the use of conventional and item
response theory equating methods in less than optimal circumstances. Applied
Psychological Measurement, 11, 225-244.
Cornman, J. M., & Kingson, E. R. (1996). Trends, issues, perspectives, and values for the
aging of the baby boom cohorts. The Gerontologist, 36(1), 15-26.
De Gruijter, D. N. M. (1986). Small N does not always justify Rash model. Applied
Psychological Measurement, 10, 187-194.
DeJong, G. (2001). Open letter from ACRM to HCFA on proposed medicare PPS. A
letter prepared for the President of the American Congress of Rehabilitation
Medicine under the auspices of ACRM's Research Policy and Legislation
Committee and the Committee's PPS Workgroup. Archives ofPhysical Medicine
and Rehabilitation, 82, 567-569.
Dillingham, T. R., Pezzin, L. E., & MacKenzie, E. J. (2003). Discharge destination after
dysvascular lower-limb amputations. Archives of Physical Medicine and
Rehabilitation, 84(11), 1662-1668.
Dodds, T. A., Martin, D. P., Stolov, W. C., & Deyo, R. A. (1993). A validation of the
Functional Independence Measurement and its performance among rehabilitation
inpatients. Archives of Physical Medicine and Rehabilitation, 74, 531-536.
Douglas, J. (1999). Item response models for longitudinal quality of life data in clinical
trials. Statistics in Medicine, 18, 2917-2931.
Evans, C. T. (2002). Functional independence measure. Retrieved June 24, 2004, from
Fiedler, R. C., & Granger, C. V. (1997). Uniform data system for medical rehabilitation:
Report of first admissions for 1995. American Journal ofPhysical Medicine and
Rehabilitation, 76(1), 76-81.
Fisher, A. G. (1993). The assessment of IADL motor skills: An application of many-
faceted Rasch analysis. American Journal of Occupational Therapy, 47(4),
Fisher, W. P. (1997). Physical disability construct convergence across instruments:
Toward a universal metric. Journal of Outcome Measurement, 1(2), 87-113.
Fisher, W. P., Eubanks, R. L., & Marier, R. L. (1997). Equating the MOS SF36 and the
LSU H.S.I. physical functioning scales. Journal of Outcome Measurement, 1(4),
Fisher, W. P., Harvey, R. F., & Kilgore, K. M. (1995). New developments in functional
assessment: Probabilistic models for gold standards. NeuroRehabilitation, 5,
Fisher, W. P., Harvey, R. F., Taylor, P., Kilgore, K. M. & Kelly, C. K. (1995). Rehabits:
A common language of functional assessment. Archives ofPhysical Medicine and
Rehabilitation, 76, 113-122.
Forrest, G., Schwam, A., & Cohen, E. (2002). Time of care by patients discharged from a
rehabilitation unit. American Journal of Physical Medicine and Rehabilitation,
Fox, C. M., & Jones, J. A. (1998). Uses of Rasch modeling in counseling psychology
research. Journal of Counseling Psychology, 45, 30-45.
Functional independence measure. (2001). Retrieved June 16, 2004, from
Granger, C. V., (1998). The emerging science of functional assessment: Our tool for
outcomes analysis. Archives of Physical Medicine and Rehabilitation, 79,
Granger, C. V. (1999). Continuum of care: Measuring medical rehabilitation outcomes.
Journal of the Institute of Objective Measurement, 2, 60-62.
Granger, C. V., Cotter, A. C., & Hamilton, B. B. (1990). Functional assessment scales: A
study of persons with multiple sclerosis. Archives ofPhysical Medicine and
Rehabilitation, 71, 870-875.
Granger, C. V., Cotter, A. C., Hamilton, B. B., & Fiedler, R. C. (1993). Functional
assessment scales: A study of persons after stroke. Archives of Physical Medicine
and Rehabilitation, 74, 133-138.
Granger, C. V., & Hamilton, B. B. (1993). The Uniform Data System for medical
rehabilitation report of first admissions for 1991. American Journal of Physical
Medicine and Rehabilitation, 72(1), 33-38.
Granger, C. V., Hamilton, B. B., Keith, R. A., Zielezny, M., & Sherwin, F. S. (1986).
Advances in functional assessment for medical rehabilitation. In C. B. Lewis
(Ed.), Topics in geriatric rehabilitation (Vol. 1, pp. 59-74). Baltimore, MD:
Granger, C. V., Hamilton, B. B., & Sherwin, F. S. (1986). Guide for use of the Uniform
Data Set for medical rehabilitation. Buffalo, NY: Buffalo General Hospital.
Gruber-Baldini, A. L., Zimmerman, S. I., Mortimore, E., & Magaziner J. (2000). The
validity of the Minimum Data Set in measuring the cognitive impairment of
persons admitted to nursing homes. Journal of the American Geriatric Society,
Gutheil, I. A. (1996). Introduction. The many faces of aging: Challenges for the future.
Gerontologist, 36(1), 13-14.
Haertel, E. H., & Lynn, R. L. (1996). Comparability. In G. W. Phillips (Ed.), Technical
issues in large-scale performance assessment (pp. 59-78). Washington, DC:
National Center for Educational Statistics.
Haley, S. M., McHorney, C. A., & Ware, J. E. (1994). Evaluation of the MOS SF-36
physical functioning scale (PF-10): I. Unidimensionality and reproducibility of
the Rasch item scale. Journal of Clinical Epidemiology, 47, 671-684.
Hall, K. M., Hamilton, B. B., Gordon, W. A., & Zasler, N. D. (1993). Characteristics and
comparisons of functional assessment indices: Disability rating scale, functional
independence measures, and functional assessment measure. Journal of Head
Trauma Rehabilitation, 8(2), 60-74.
Hambleton, R. K. (2000). Response to Hays et al. and McHorney and Cohen: Emergence
of item response modeling in instrument development and data analysis. Medical
Care, 38(9, Suppl. 2), 60-65.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item
response theory. Thousand Oaks, CA: Sage.
Hamilton, B. B. (1989). Totaled functional score can be valid [Letter to the editor].
Archives ofPhysical Medicine and Rehabilitation, 70, 861-862.
Hamilton, B. B., Granger, C. V., Sherwin, F. S., Zielezny, M., & Tashman, J. S. (1987).
A uniform national data system for medical rehabilitation. In M. J. Fuhrer (Ed.),
Rehabilitation outcomes: Analysis and measurement (pp. 137-147). Baltimore:
Hart, D. L. (2000). Assessment of unidimensionality of physical functioning in patients
receiving therapy in acute, orthopedic outpatient centers. Journal of Outcome
Measurement, 4, 413-430.
Hart, D. L., & Wright, B. D. (2002). Development of an index of physical functional
health status in rehabilitation. Archives ofPhysical Medicine and Rehabilitation,
Hawes, C., Morris, J. N., Phillips, C. D., Mor, V., Fries, B. E., & Nonemaker, S. (1995).
Reliability estimates for the Minimum Data Set for nursing home resident
assessment and care screening (MDS). The Gerontologist, 35(2), 172-178.
Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health
outcomes measurement in the 21st century. Medical Care, 38(9), 28-42.
Health Care Financing Administration. (1998). Minimum Data Set, 2.0 Washington, DC:
U.S. Government Printing Office.
Heinemann, A. W., Kirk, P., Hastie, B. A., Senik, P., Hamilton, B. B., Linacre, J. M.,
Wright, B. D., & Granger, C. V. (1997). Relationships between disability
measures and nursing effort during medical rehabilitation for patients with
traumatic brain and spinal cord injury. Archives ofPhysical Medicine and
Rehabilitation, 78, 143-149.
Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V.
(1993). Relationships between impairment and physical disability as measured by
the Functional Independence Measure. Archives ofPhysical Medicine and
Rehabilitation, 74, 566-573.
Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., & Granger, C. V.
(1994). Prediction of rehabilitation outcomes with disability measures. Archives
ofPhysical Medicine and Rehabilitation, 75, 133-143.
Hyslop, T., Hsuan, F., & Holder, D. J. (2000). A small sample confidence interval
approach to assess individual bioequivalence. Statistics in Medicine, 19,
Iwanenko, W., Fiedler, R. C., & Granger, C. V. (1999). Uniform data system for medical
rehabilitation-Report of first admissions to subacute rehabilitation for 1995,
1996, and 1997. American Journal of Physical Medicine and Rehabilitation,
Jette, A. M., Haley, S. M., & Ni, P. (2003). Comparison of functional status tools used in
post-acute care. Health Care Financing Review, 24(3), 1-12.
Johnson, M. F., Kramer, A. M., Lin, M. K., Kowalsky, J. C., & Steiner, J. F. (2000).
Outcomes of older persons receiving rehabilitation for medical and surgical
conditions compared with hip fracture and stroke. American Journal of the
Geriatric Society, 48(11), 1389-1397.
Keith, R. A., Wilson, D. B., & Gutierrez, P. (1995). Acute and subacute rehabilitation for
stroke-A comparison. Archives ofPhysical Medicine and Rehabilitation, 76(6),
Kim, S., & Cohen, A. S. (2002). A comparison of linking and concurrent calibration
under the graded response model. Applied Psychological Measurement, 26(1),
Kolen, M. J. (2001). Linking assessments effectively: Purpose and design. Educational
Measurement: Issues and Practice, 20(1), 5-9.
Kolen, M. J., & Brenan, R. L. (1995). Test equating: Methods and practices. New York:
Latham, N. K., & Haley, S. M. (2003). Measuring functional outcomes across postacute
care: Current challenges and future directions. Critical Reviews in Physical and
Rehabilitation Medicine, 15(2), 83-98.
Li, Y. H., & Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional
IRT linking. Applied Psychological Measurement, 24(2), 115-138.
Linacre, J. M. (1998). Detecting multidimensionality: Which residual data-type works
best? Journal of Outcome Measurement, 2(3), 266-283.
Linacre, J. M., Heineman, A. W., Wright, B. D., Granger, C. V., & Hamilton, B. B.
(1994). The structure and stability of the Functional Independence Measure.
Archives ofPhysical Medicine and Rehabilitation, 75, 127-132.
Linacre, J. M., & Wright, B. D (2000). WINSTEPS (Version 3.17)[Rasch model
computer software]. Chicago: MESA.
McHorney, C. A. (1997). Generic health measurement: Past accomplishments and a
measurement paradigm for the 21st century. Annals oflnternal Medicine, 127,
McHorney, C. A. (2002). Use of item response theory to link 3 modules of functional
status items from the Asset and Health Dynamics Among the Oldest Old Study.
Archives ofPhysical Medicine and Rehabilitation, 83, 383-394.
McHorney, C. A., & Cohen A. S. (2000). Equating health status measures with item
response theory, illustrations with functional status items. Medical Care, 38(9,
Suppl. 2), 43-59.
McHorney, C. A., Haley, S. M., & Ware, J. E. (1997). Evaluation of the MOS SF-36
physical functional scale (PF-10). II: Comparison of relative precision using
Likert and Rasch scoring methods. Journal of Clinical Epidemiology, 50(4),
Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological
Bulletin, 15, 300-307.
Merbitz, C., Morris, J., & Grip, J. C. (1989). Ordinal scales and foundation of
misinference. Archives ofPhysical Medicine and Rehabilitation, 70, 308-312.
Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and
prospects (Policy issue perspective). Princeton, NJ: Educational Testing Service.
Morris, J., Hawes, C., Fries, B. E., Phillips, C. D., Mor, V., Katz, S., Murphy, K.,
Drugovich, M. L. & Friedlob, A. S. (1990). Designing the national resident
assessment instrument for nursing homes. Gerontologist, 30,293-302.
Morris, J. N., Fries, B. E., Mehr, D. R., Hawes, C., Phillips, C. D., Mor, V., & Lipsitz, L.
A. (1994). MDS cognitive performance scale. Journal of Gerontology: Medical
Sciences, 49(4), M124-M182.
Muraki, E., Hombo, C. M., & Lee, Y. W. (2000). Equating and linking of performance
assessments. Applied Psychological Measurement, 24(4), 325-337.
National Committee on Vital and Health Statistics. (2003). Classifying and reporting
functional status. Retrieved April 9, 2004, from
Oczkowski, W. J., & Barreca, S. (1993). The Functional Independence Measure: Its use
to identify rehabilitation needs in stroke survivors. Archives ofPhysical Medicine
and Rehabilitation, 74, 1291-1294.
Ottenbacher, K. J., Hsu, Y., Granger, C. V., & Fiedler, R. C. (1996). The reliability of the
Functional Independence Measure: A quantitative review. Archives of Physical
Medicine and Rehabilitation, 77, 1226-1232.
Penta, M. (2004). Evaluation in rehabilitation: Perspectives with the Rasch model.
Retrieved July 2, 2004, from
Prieto, L., Alonso, J., Lamarca, R., & Wright, B. R. (1998). Rasch measurement for
reducing the items of the Nottingham Health Profile. Journal of Outcome
Measurement, 2, 258-301.
Raczek, A. E., Ware, J. E., Bjorner, J. B., Gandek, B., Haley, S. M., Aaronson, N. K.,
Apolone, G., Beck, P., Brazier, J. E., Bullinger, M., & Sullivan, M. (1998).
Comparison of Rasch and summated rating scales constructed from SF-36
physical functioning items in seven countries: Results from the IQOLA project.
Journal of Clinical Epidemiology, 51(11), 1203-1214.
Rantz, M. J., Zwygart-Stauffacher, M., Popejoy, L. L., Mehr, D. R., Grando, V. T.,
Wipke-Tevis, D. D., Hicks, L. L. & Conn, V. S. (1999). The Minimum Data Set:
No longer just for clinical assessment. Annuals ofLong-Term Care, 7, 354-360.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests,
Chicago: University of Chicago Press.
Resident assessment instrument training manual and resource guide. (1991). Natwick,
MA: Eliot Press.
Rogers, J. C., Gwinn, S. M., & Holm, M.B. (2001). Comparing activities of daily living
assessment instruments: FIM, MDS, OASIS, MDS-PAC. Physical and
Occupational Therapy in Geriatrics, 18(3), 1-25.
Segal, M. E., Heinemann, A. W., Schall, R. R., & Wright, B. D. (1997). Rasch analysis
of a brief physical ability scale for long-term outcomes of stroke. Physical
Medicine and Rehabilitation State of the Art Reviews, 11, 385-396.
Silverstein, B., Fisher, W. P., Kilgore, K. M., Harley, J. P., & Harvey, R. F. (1992).
Applying psychometric criteria to functional assessment in medical rehabilitation:
II. Defining interval measures. Archives ofPhysical Medicine and Rehabilitation,
Skaggs, G., & Lissitz, R. (1986). IRT test equating: Relevant issues and a review of
recent research. Review of Education Research, 56, 495-529.
Spector, W. D., & Fleishman, J. A. (1998). Combining activities of daily living with
instrumental activities of daily living to measure functional disability. Journal of
Gerontology, 53, S46-S57.
Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in
psychosocial and services research: An introduction with examples. Evaluation
and Program Planning, 19(3), 193-198.
Stineman, M. G. (2001). The story of functional-related groups-Please, first do no
harm. Archives of Physical Medicine and Rehabilitation, 82(4), 553-557.
Stineman, M. G., Jette, A., Fiedler, R. C., & Granger, C. V. (1997). Impairment-specific
dimensions within the Functional Independence Measure. Archives of Physical
Medicine and Rehabilitation, 78, 636-643.
Stineman, M. G., & Maislin, G. (2000). Clinical, epidemiological, and policy
implications of minimum data set validity (Editorial). Journal of the American
Geriatric Society, 48(12), 1734-1736.
Stineman, M. G., Shea, J. A., Jette, A., Tassoni, C. J., Ottenbacher, K. J., Fiedler, R. C.,
& Granger, C. V. (1996). The Functional Independence Measure: Tests of scaling
assumptions, structure, and reliability across 20 diverse impairment categories.
Archives ofPhysical Medicine and Rehabilitation, 77, 1101-1108.
Tennant, A., & Young, C. (1997). Comma to community: Continuity in measurement. In
R. M. Smith (Ed.), Outcome measurement: Physical medicine and rehabilitation
state of the art reviews (pp. 376-384). Philadelphia: Hanley & Belfus.
Teresi, J. A., & Homes, D. (1992). Should MDS data be used for research? [editorial].
Gerontologist, 32, 148-149.
Uniform data systems for medical rehabilitation. (1997). Retrieved June 10, 2003, from
U.S. Department of Health and Human Services. (2002). A profile of older Americans.
Retrieved July 30, 2003, from
U.S. Food and Drug Administration. (1997). Guidance for industry: In vivo
bioequivalence studies based in population and individual bioequivalence
approaches. Rockville, MD: U.S. Department of Health and Human Services.
U.S. Food and Drug Administration. (1999). Draft guidance on average, population, and
individual approaches to establishing bioequivalence. Rockville, MD: U.S.
Department of Health and Human Services.
Vale, C. D. (1986). Linking item parameters onto a common scale. Applied
Psychological Measurement, 10(4), 333-344.
Velozo, C. A. (2004). Translating measures across the continuum of care: Creating a
crosswalk between the FIM and MDS. Manuscript in preparation.
Velozo, C. A., Kielhofner, G., & Lai, J. (1999). The use of Rasch analysis to produce
scale-free measurement of functional ability. American Journal of Occupational
Therapy, 53, 83-90.
Velozo, C. A., Magalhaes, L., Pan, A., & Leiter, P. (1995). Differences in functional
scale discrimination at admission and discharge: Rasch analysis of the Level of
Rehabilitation Scale-III (LORS-III). Archives ofPhysical Medicine and
Rehabilitation, 76(8), 705-712.
Velozo, C. A., & Peterson, E. W. (2001). Developing meaningful fear of falling measures
for community dwelling elderly. American Journal ofPhysical Medicine and
Veterans Administration. (2001). IMPACTS 2000. Retrieved December 12, 2002, from
Veterans Health Administration. (2000). Medical rehabilitation outcomes for stroke,
traumatic brian injury, and lower extremity amputation patients. Retrieved
September 12, 2002, from
Veterans Health Administration. (2002). Austin Automation Center. Retrieved September
12, 2002, from http://www.aac.va.gov/
Wang, W., & Hwang, J. T. (2001). A nearly unbiased test for individual bioequivalence
problems using probability criteria. Journal of Statistical Planning and Inference,
Ware, J. E. (2003). Conceptualization and measurement of health-related quality of life:
Comments on an evolving field. Archives ofPhysical Medicine and
Rehabilitation, 84(Supp 2), S43-S51.
Wilkerson, D., & Johnston, M. (1997). Clinical program monitoring systems: Current
capability and future directions. In M. Fuhrer (Ed.), Assessing medical
rehabilitation practices: The promise of outcomes research (pp. 275-305).
Baltimore, MD: Paul H. Brookes.
Williams, B. C., Lee, Y., Fries, B. E., & Warren, R. L. (1997). Predicting patient scores
between the Functional Independence Measure and the Minimum Data Set:
Development and performance of a FIM-MDS "crosswalk." Archives of Physical
Medicine and Rehabilitation, 78, 48-54.
Wright, B.D. (1984). Item banks: What, why, how. Journal of Educational
Measurement, 21, 331-345.
Wright, B. D. (1997). A history of social science measurement. Retrieved October 6,
2002, from http://www.rasch.org/memo62.htm
Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal: Measurements,
however, must be interval. Archives ofPhysical Medicine and Rehabilitation, 70,
Wright, B. D., Linacre, J. M., & Heineman, A. W. (1993). Measuring functional status in
rehabilitation. Physical Medicine and Rehabilitation State of the Art Reviews, 4,
Katherine L. Byers, MHS, CRC, CVE, is a doctoral candidate in the rehabilitation
science doctoral (RSD) program at the University of Florida, College of Public Health
and Health Professions. Ms. Byers received a Bachelor of Arts degree in behavioral
sciences from Rice University in Houston, Texas in 1989. She then completed a Master
of Health Science (MHS) degree in rehabilitation counseling at the University of Florida
in 1991 and subsequently obtained certifications as both a rehabilitation counselor and as
a vocational evaluator. Over a period of 9 years, Ms. Byers worked in positions of
increasing responsibility in the field of rehabilitation before entering the University of
Florida's rehabilitation science doctoral program in January of 2000. While completing
the requirements of the degree, Ms. Byers was employed as a research assistant and then
as a program coordinator for Dr. Craig Velozo, an assistant professor in the Department
of Occupational Therapy at the University of Florida. Accomplishments during Ms.
Byers' doctoral career include winning the 2002 John Muthard Research Award from the
University of Florida's College of Health Professions, Department of Rehabilitation
Counseling. She also was selected to make a poster presentation at the Third National
Rehabilitation Research and Development Meeting in Washington, DC, in 2002, and at
the 2004 ACRM-ASNR Joint Conference.