<%BANNER%>

Developing precise disability measures for back pain

Permanent Link: http://ufdc.ufl.edu/UFE0041914/00001

Material Information

Title: Developing precise disability measures for back pain
Physical Description: 1 online resource (148 p.)
Language: english
Creator: Choi, Bongsam
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: activity, cat, disability, function, irt, lbp, measurement, pain, precision, rasch
Rehabilitation Science -- Dissertations, Academic -- UF
Genre: Rehabilitation Science thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Measurement of disability is crucial to many aspects of the rehabilitation process including capturing individual level changes, evaluating treatment effectiveness, and making policy decision, and administration costs. Many condition-specific self-reported instruments were developed over the past three decades to meet the need for assessment of disability resulting from back pain. However, these existing back pain disability measures have considerable limitations in terms of measurement precision and comprehensiveness. In order to overcome these limitations, precise measures should have the tremendous number of items either covering the wide range of ability traits or matching items to person ability closely. However, it is impossible to achieve the goals under the conventional classical test theory framework. Therefore, the aims of this study are to create precise disability measures with adequate measurement precision. The study consisted of the following three steps to accomplish the goals: 1) investigating the item level psychometrics of the ICF Activity Measure will be determined by using Rasch analysis (one-parameter Item Response Theory model), 2) creating three short forms of the ICFAM based on the item level psychometrics, 3) comparison of three measures in terms of relative precision; the Computer Adaptive Testing (CAT) measure of the ICFAM, the three 10-item short form measures of the ICFAM, and the Oswestry Back Pain Disability Questionnaire (ODQ) measure as a most popular conventional back pain disability instrument. Three constructs of the ICFAM and three 10-item short forms of the ICFAM were found to have a multidimensional construct. However, some findings still implicate the possibility of sub-constructs of essentially unidimensional construct. The item difficulty hierarchical order did not reflect the hypothetical hierarchy based on the MET values except walking/moving construct. The empirical hierarchy fairly well follow either a clinical feature of back pain or motor control theory. The three IRT-based short forms with adequate breadth were created based on item-level psychometric properties. These were applied to 42 back pain and 42 non-back pain groups as well as the CAT of the ICFAM and the ODQ. The CAT outperformed the short forms and the ODQ except walking/moving construct, while the short forms outperformed the ODQ in terms of precision. The results may implicate that researchers/clinicians should be encouraged to use the CAT measure of the ICFAM, since it is precise and efficient measure. The IRT-based short forms of the ICFAM may be an alternative, when computer systems are not readily available.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Bongsam Choi.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Velozo, Craig A.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041914:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041914/00001

Material Information

Title: Developing precise disability measures for back pain
Physical Description: 1 online resource (148 p.)
Language: english
Creator: Choi, Bongsam
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: activity, cat, disability, function, irt, lbp, measurement, pain, precision, rasch
Rehabilitation Science -- Dissertations, Academic -- UF
Genre: Rehabilitation Science thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Measurement of disability is crucial to many aspects of the rehabilitation process including capturing individual level changes, evaluating treatment effectiveness, and making policy decision, and administration costs. Many condition-specific self-reported instruments were developed over the past three decades to meet the need for assessment of disability resulting from back pain. However, these existing back pain disability measures have considerable limitations in terms of measurement precision and comprehensiveness. In order to overcome these limitations, precise measures should have the tremendous number of items either covering the wide range of ability traits or matching items to person ability closely. However, it is impossible to achieve the goals under the conventional classical test theory framework. Therefore, the aims of this study are to create precise disability measures with adequate measurement precision. The study consisted of the following three steps to accomplish the goals: 1) investigating the item level psychometrics of the ICF Activity Measure will be determined by using Rasch analysis (one-parameter Item Response Theory model), 2) creating three short forms of the ICFAM based on the item level psychometrics, 3) comparison of three measures in terms of relative precision; the Computer Adaptive Testing (CAT) measure of the ICFAM, the three 10-item short form measures of the ICFAM, and the Oswestry Back Pain Disability Questionnaire (ODQ) measure as a most popular conventional back pain disability instrument. Three constructs of the ICFAM and three 10-item short forms of the ICFAM were found to have a multidimensional construct. However, some findings still implicate the possibility of sub-constructs of essentially unidimensional construct. The item difficulty hierarchical order did not reflect the hypothetical hierarchy based on the MET values except walking/moving construct. The empirical hierarchy fairly well follow either a clinical feature of back pain or motor control theory. The three IRT-based short forms with adequate breadth were created based on item-level psychometric properties. These were applied to 42 back pain and 42 non-back pain groups as well as the CAT of the ICFAM and the ODQ. The CAT outperformed the short forms and the ODQ except walking/moving construct, while the short forms outperformed the ODQ in terms of precision. The results may implicate that researchers/clinicians should be encouraged to use the CAT measure of the ICFAM, since it is precise and efficient measure. The IRT-based short forms of the ICFAM may be an alternative, when computer systems are not readily available.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Bongsam Choi.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Velozo, Craig A.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041914:00001


This item has the following downloads:


Full Text





DEVELOPING PRECISE DISABILITY MEASURES FOR BACK PAIN


By

BONGSAM CHOI

















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2010



































2010 Bongsam Choi


































To my dad who has been fighting with stomach cancer, my mom devoting her life to his care,
and my family Keonhwa, Jayeon, and Jasun









ACKNOWLEDGMENTS

I am heartily thankful to my advisor, Dr. Craig Velozo, whose encouragement, guidance,

and support from the initial to the final level enabled me to develop an understanding of the

subject. I would like to thank my committee members. First, I would like to thank Dr. Mark

Bishop for believing in me from the very first meeting. To Dr. Steven George, I would like to

thank for all the recommendations including the arrangement of data collection sites. Finally, I

would like to thank to Dr. I-Chan Huang for having me to the heart of issues and introducing me

to many different aspects of research. I am honored to have had the opportunity to work with

such high caliber people. I would also like to thank my colleague, Dr. Leigh Lehman, for all her

supports during my dissertation writing.

I should specially thank my wife, Keonhwa, for her unlimited supports with

encouragements, and my lovely two daughters, Jayeon and Jasun, for checking spelling mistakes

and errors throughout whole pages of my draft. My last thanks should go to my parents,

Myungsik and Youngdong; my parents-in law, Gwyhee and Pilhee Lee; my brother Bongno; my

sister Yongsoon for supporting me to complete new learning; last but not least, my catholic

Brother Michael Chun for endless blessings.









TABLE OF CONTENTS

page

A CK N O W LED G M EN T S ................................................................. ........... ............. .....

L IS T O F T A B L E S ................................................................................. 7

LIST OF FIGURES .................................. .. ..... ..... ................. .9

L IST O F A B B R E V IA T IO N S .................................................................. .............................. 10

A B S T R A C T ................................ ............................................................ 12

CHAPTER

1 THE IMPORTANCE OF PRECISELY MEASURING DISABILITY FOR BACK
P A IN ............................................................................. ...................... 14

Introdu action ................... .... ........ ............................................... ......... 14
Item Response Theory (IRT) and Computer Adaptive Testing (CAT).........................16
Physical Function CAT and ICFmeasure.com ..................................... ...............18
Short Form s of B ack Pain D disability .................................................................... .....19
Existing Self-Report Back Pain Disability Measures............................................20
The Oswestry Back Pain Disability Questionnaire as a Gold Standard..........................22
CAT, Short Form, and Existing Back Pain Measure in Measurement Precision............23
R research Q question 1 ...................................... .. .... ........ ...... ....... 24
Research Question 2 ..................................... ... .. .... ........ ...... ....... 24
Research Question 3 ..................................... ... .. .... ........ ...... ....... 25

2 THE PSYCHOMETRICS OF THE ICF ACTIVITY MEASURE FOR BACK PAIN.........26

In tro d u ctio n ............ ...... ... .... .. ................ ...................................... 2 6
M eth o d s ............................................................................................. 2 9
R research P articipants........ .................................................................... ......... ....... 29
In strum entation ............................................................................... 2 9
R asch R eating Scale M odel.............................................................................. ........ 30
D ata A n a ly sis ............................................................................................................. 3 1
Results .................. ........................................34
Positioning/transfer Construct ............................................................................ 34
L ifting/carrying C construct ............................................... .......................... ................37
W alking/m moving C construct ..................................................................... ..................39
D iscu ssion ................................41.............................
Sum m ary of R result ......... ............................................................. ............ 41
U nidim ensionality ................................... .. .... ...... .. ............42
R a sch M o d el F it .......................... ... .. ........ ..... ........ .......... ............... 4 3
The Hierarchy of Item Difficulty Calibration and Physical Activity ............................44
Lim stations and Future Im plications ........................................ .......... ............... 45









3 PRECISION OF THREE SHORT FORMS FOR BACK PAIN.....................................66

In tro d u ctio n ................... ...................6...................6..........
M e th o d ...............................................................................................6 9
R research P articipants........ .................................................................... ......... ....... 69
Instrumentation ................... ...............................70
Rasch Rating Scale Model............ ......................................... ...............71
D ata A n a ly sis ............................................................................................................. 7 2
Results ...... ........... ...... ... .......... .......... ................... ...... 74
Short Form for Positioning/Transfer .......................... ...........................75
Short Form for Lifting/Carrying ........................................................ ............. 77
Short Form for Walking/Moving........................................................ 79
D iscu ssio n ................................ .......................................................... 8 1
Sum m ary of R results ....................... ......................... ...... ........... .... 81
Item L evel P sychom etrics ....................................................................... ..................8 1
U nidim ensionality of the Short Form s ...........................................................................83
Person Separation and Person Reliability.................................... ........................ 84
T est Inform action F unction .................................................................... .....................86
Lim stations and Future Im plications ........................................ .......... ............... 87

4 COMPARISONS OF THE RELATIVE PRECISION OF THREE DIFFERENT TYPE
BACK PAIN MEASURES: THE ICF ACTIVITY MEASURE (ICFAM) COMPUTER
ADAPTIVE TEST, ICFAM SHORT FORMS, AND OSWESTRY BACK PAIN
D ISA B ILITY Q U A TIO N N A IR E ............................................................. .....................104

In tro d u ctio n ..........................................................................................................1 0 4
M ethod ......... .................. ....................... 108
R research P articipants........ .................................................................. .......... ....... 108
In stru m en tatio n ........................................................................................................ 10 9
A analysis ................................................................. ............ ...............111
R e su lts ................... ...................1.............................2
D iscu ssio n ................... ...................1.............................4
Summary of Results ..................................................................... ........ 114
Correlations ................................................................ ..... ..... ......... 115
Relative Precision .................................. ................................ ........ 116
Lim stations and Future Im plications ................................................................... 117

5 CONCLUSION: INTEGRATING THE FINDINGS ...................................... ...............126

APPENDIX: THE OSWESTRY BACK PAIN DISABILITY QUESTIONNAIRE (ODQ)......136

LIST OF REFEREN CE S ..............................38......... ...............

BIOGRAPHICAL SKETCH ................ ........ ............... ..................148






6









LIST OF TABLES


Table page

2-1 Examples of items for three constructs of the ICFAM .............. ...................................47

2-2 Demographic information of research participants.........................................................49

2-3 Demographic information of research participants..........................................................50

2-4 Number of retaining factors for the ICFAM ...................................................50

2-5 Factor structure of positioning/transfer construct following EFA............... .................. 51

2-6 Factor structure of lifting/carrying construct following EFA................. .................54

2-7 Factor structure of walking/moving construct following EFA........................................56

2-8 Fit statistics for positioning/transfer construct ............................. ..................... 57

2-9 Fit statistics for lifting/carrying construct.................................................. ............... 60

2-10 Fit statistics for walking/moving construct ............................ .... ................................61

3-1 Dem graphic inform ation of research participants........................................................... 89

3-2 Results of confirmatory factor analysis for short forms of the ICFAM ............................90

3-3 Factor structure of short form for positioning/transfer construct ..................................... 91

3-4 Factor structure of short form for lifting/carrying construct............................................92

3-5 Factor structure of short form for walking/moving construct................ .............. ....93

3-6 Short form of the IC F A M .................................................................... .... ....................97

3-7 Fit statistics for positioning/carrying ........................................ ........................... 99

3-8 Fit statistics for lifting/carrying construct.................................................. ............... 99

3-9 Fit statistics for walking/moving construct...................................................................100

4-1 Demographic characteristics of study participants .................................. ...............120

4-2 Correlations coefficients for CAT, short forms, and ODQ measure for back pain
g ro u p ............................................................................................12 4

4-3 Correlations coefficients for CAT, short forms, and ODQ measure for non-back pain
g ro u p ............................................................................................12 4









4-4 Mean difference between means for back pain and non-back pain groups.....................125









LIST OF FIGURES


Figure page

2-1 Item-person map of positioning/transfer construct of the ICFAM.............. .....................59

2-2 Item-person map of lifting/carrying construct of the ICFAM. ................ ..................64

2-3 Item-person map of walking/moving construct of the ICFAM.............. ...................65

3-1 Item-person map of positioning/transfer construct of the ICFAM following 10 items
rem oval and prior to 10 item rem oval........................................ ............................ 94

3-2 Item-person map of lifting/carrying construct of the ICFAM following 10 items
rem oval and prior to 10 item rem oval........................................ ............................ 95

3-3 Item-person map of walking/moving construct of the ICFAM following 10 items
rem oval and prior to 10 item rem oval........................................ ............................ 96

3-4 Item-person map of three short forms (positioning/transfer, lifting/carrying, and
walking/moving) of the ICFAM following the item removal. .................. ..................101

3-5 Test information function of short form versus entire set of items for
positioning/transfer ..................................... ................. ........... ............... 99

3-6 Test information function of short form versus entire set of items for lifting/carrying..... 99

3-7 Test information function short form versus entire set of items for walking/moving..... 103

4-1 Scatter plot of ability measures from the CAT measure versus the short form
measure for positioning/transfer and lifting/carrying construct..................................... 121

4-2 Scatter plot of ability measures from the CAT measure versus the short form
m ea su re .........................................................................................12 2

4-3 Scatter plot of ability measures from the CAT measure versus the ODQ measure.........123









LIST OF ABBREVIATIONS

ADL Activity of Daily Living

AMPAC Activity Measure for Post-Acute Care

CAT Computer Adaptive Test/Testing

CATs Computer Adaptive Tests

CFA Confirmatory Factor Analysis

CFI Comparative Fit Index

CTT Classical Test Theory

EFA Exploratory Factor Analysis

HIT Headache Impact Test

ICF International Classification of Functioning, Disability, and Health

ICFAM ICF Activity Measure

IRT Item Response Theory

MET Metabolic Equivalent

MnSq Mean Square

MOS Medical Outcome Scale

NIDRR National Institute of Disability and Rehabilitation Research

ODQ Oswestry Back Pain Disability Questionnaire

PCS Physical Component Summary

PF Physical Function

QBDS Quebec Back Pain Disability Scale

RFS Lumbar Spine Functional Status

RMDQ Roland-Morris Disability Questionnaire

RP Relative Precision

SR Separation Ratio









TLI Tucker-Lewis Index

WHO World Health Organization

WRMR Weighted Root Mean Square Residual









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

DEVELOPING PRECISE DISABILITY MEASURES FOR BACK PAIN

By

Bongsam Choi

August 2010

Chair: Craig A. Velozo
Major: Rehabilitation Science

Measurement of disability is crucial to many aspects of the rehabilitation process including

capturing individual level changes, evaluating treatment effectiveness, and making policy

decision, and administration costs. Many condition-specific self-reported instruments were

developed over the past three decades to meet the need for assessment of disability resulting

from back pain. However, these existing back pain disability measures have considerable

limitations in terms of measurement precision and comprehensiveness. In order to overcome

these limitations, precise measures should have the tremendous number of items either covering

the wide range of ability traits or matching items to person ability closely. However, it is

impossible to achieve the goals under the conventional classical test theory framework.

Therefore, the aims of this study are to create precise disability measures with adequate

measurement precision.

The study consisted of the following three steps to accomplish the goals; 1) investigating

the item level psychometrics of the ICF Activity Measure will be determined by using Rasch

analysis (one-parameter Item Response Theory model), 2) creating three short forms of the

ICFAM based on the item level psychometrics, 3) comparison of three measures in terms of

relative precision; the Computer Adaptive Testing (CAT) measure of the ICFAM, the three 10-









item short form measures of the ICFAM, and the Oswestry Back Pain Disability Questionnaire

(ODQ) measure as a most popular conventional back pain disability instrument.

Three constructs of the ICFAM and three 10-item short forms of the ICFAM were found to

have a multidimensional construct. However, some findings still implicate the possibility of sub-

constructs of essentially unidimensional construct. The item difficulty hierarchical order did not

reflect the hypothetical hierarchy based on the MET values except walking/moving construct.

The empirical hierarchy fairly well follows either a clinical feature of back pain or motor control

theory. The three IRT-based short forms with adequate breadth were created based on item-level

psychometric properties. These were applied to 42 back pain and 42 non-back pain groups as

well as the CAT of the ICFAM and the ODQ. The CAT outperformed the short forms and the

ODQ except walking/moving construct, while the short forms outperformed the ODQ in terms of

precision. The results may implicate that researchers/clinicians should be encouraged to use the

CAT measure of the ICFAM, since it is precise and efficient measure. The IRT-based short

forms of the ICFAM may be an alternative, when computer systems are not readily available.









CHAPTER 1
THE IMPORTANCE OF PRECISELY MEASURING DISABILITY FOR BACK PAIN

Introduction

Disability measurement is crucial in capturing clinical changes, evidence-based

rehabilitation practice, administration of disability management, and policy making process.

Over the past three decades, the need for assessment of disability resulting from back pain has

grown. This is due to the fact that back pain is the most common cause of activity limitation in

our society (1). This need has prompted extensive research on self-report outcome measure of

function and disability resulting from back pain (2-22). To date, nearly 82 back-related condition

specific disability instruments have been introduced (23). Most, if not all of these instruments

appear in peer reviewed journals and show adequate psychometrics (e.g., good reliability,

validity, and responsiveness). However, only a few outcome measures have been widely used

and commonly accepted for disability measures for back pain (10,23-27).

Despite the myriad of back pain outcome measures available, selecting an optimal

disability measure is a prevailing challenge. This stems from the need to carefully consider the

preferences of investigators or clinician when choosing a measure (25). While published

measures have adequate psychometric properties, they may or may not be sensitive to all severity

groups or to the actual improvements that result from clinical interventions. Because these

instruments are developed to target the "average" person, they tend to be more sensitive at the

center than at the extremes (e.g., low and high levels of disability) of the ability range (28). For

example, while the Oswestry Back Pain Disability Questionnaire (ODQ), is often considered as a

gold standard, it demonstrates ceiling effects (i.e., persons with high ability) when it is

administered to persons with minimal impairments (29,30) and floor effects (i.e., person with









low ability) when it is administered to persons with severe impairments (31,32). Therefore, the

ODQ often fails to precisely measure the disability of back pain across the full range of ability.

Imprecise measurement, especially ceiling effects result in type II errors. That is, the

number of false-negatives is large among those scoring in the upper extreme of the instrument

(28). Furthermore, it is impossible to measure improvement in health status over time for those

in the ceiling (i.e., those able to complete all items without difficulty or scoring high initially).

These problems in measurement precision are partly due to the fixed number of items included

on instruments. Thus, these instruments do not have adequate breadth for the underlying

construct being measured (33). This leads to ceiling and/or floor effects and subsequently, failure

to capture small but potentially significant increments of improvement across the full ranges of

the construct (11, 34).

Measurements may also be the result of using items that do not closely match to the ability

of the population of interest (35). Deficits in precision occur when easy items are administered to

high ability populations (e.g., administering items measuring basic ADLs to elite athletes) and

difficult items are administered to low ability populations (e.g., administering items measuring

ability to lift heavy objects to individuals with severe back pain). Furthermore, since individuals

are asked to respond all items on an instrument regardless of how they respond to previous items

(i.e., asked if they can lift 10 pounds after responding that they cannot lift 5 pounds) and

regardless of the relevance of the items to the individual (i.e., asking individual with no

movement in legs if they can walk a mile), respondent burden and administration costs are

increased. Unfortunately, test-level statistics provide little insight in regards to how to eliminate

these limitations. These limitations are a function of characteristics of the Classical Test Theory

(CTT) model.









Despite the popularity and widespread use of instruments developed using the CTT model,

existing disability measures have numerous shortcomings (36). In general, disability instruments

created under the CTT paradigm yield total scores obtained by adding individual item responses.

These scores provide only a general sense of a person's ability level (i.e., disability level) and

often fail to provide detailed item level psychometrics (i.e., no detail information is provided

about how an individual performs on each item). First, the total score is dependent on the items

chosen to represent the underlying construct (test-dependent) (37). That is, respondents will

have lower scores on difficult items and higher scores on easier items while their ability remains

the same. Second, the test scores obtained from a sample cannot be compared across different

samples (sample-dependent) (37,38). That is, test statistics such as coefficient alpha for the

estimate of reliability or correlations for estimates of vary from sample to sample (i.e., sample

dependent). Third, test scores are non-linear summed scores, which yield ordinal raw scores (39).

These ordinal scores may be insensitive to changes at the extremes of the scale (39).

Item Response Theory (IRT) and Computer Adaptive Testing (CAT)

In contrast to the CTT, Item Response Theory (IRT) focuses on the psychometric

properties of the items making up instrument instead of the instrument as a whole (40, 41). By

estimating the probability that a respondent will select a particular rating for an item, item

difficulty and person ability (or disability) can be placed on the same linear continuum. Thus,

IRT model allows "connecting" individuals' responses to items at their ability level (40,42).

Estimates of person ability (or disability) on an underlying construct obtained using IRT methods

are invariant regardless of the items used (i.e., test free measurement), whereas under the CTT

paradigm, person scores vary depending on the difficulty of the instrument (41). Furthermore,

item difficulty estimates derived from the IRT analyses are independent of the ability of the

sample (i.e., sample free measurement), while test statistics in CTT are dependent on the sample









taking the test. In addition, the Rasch model (one-parameter IRT model) can linearly transform

raw scores (typically used in analyses based on CTT) into equal interval measures (34). These

advantages of IRT allow for the creation of invariantly calibrated large item banks that can more

precisely discriminate individuals' ability levels and thus, capture smaller increments of change.

In order to achieve the goal of measurement precision, a disability measure should have

items covering the full range of the underlying construct and capturing the small increments of

changes (36). With optimal measurement precision, one can theoretically yield measures of equal

precision at all levels of the underlying construct, thus achieving what has been termed

equiprecise measurement (43). That is, the measure is capable of measuring a wide range of

disability from the least able (or most disabled) to the most able (or least disabled). Unlike the

existing "fixed" disability assessments that require all the items of an instrument, equiprecise

measurement fosters item selection determined by disability level. For example, when measuring

the physical function of a person with mild back pain, items would be chosen which closely

match the ability of this individuals (i.e., more difficult items would be chosen). Similarly, when

measuring the physical function of a person with severe back pain, items will be chosen that

closely match the severely impaired person. These two persons will be measured on the same

physical-function scale with different sets of items (34).

While IRT methodologies provide the means for generating and linking person ability and

item difficulty calibrations, Computer Adaptive Testing (CAT) methods promise a means for

administrating items in a way that is both efficient and precise (28,34,36,44-48). Studies have

shown that CAT improves test efficiency maintaining adequate precisions with fewer items than

the full test. Six to 7 items have been shown on average to achieve a standard error of ability

estimates of 0.3 (44,49-53). The CAT often requires a testing algorithm which defines iterative









processes with a set of rules specifying the test questions to be administered to respondents. This

includes procedures for item selection, ability estimation, and termination criteria. By selectively

administering items that are matched to the ability level of the individuals, measurement

efficiency can be accomplished without the loss of precision provided by the full item bank. For

example, when measuring the disability of a person with mild back pain, items would be chosen

that matched the mildly impaired ability. Similarly, when measuring the disability of a person

with more severe back pain, a different set of items would be chosen that match that individual's

severely impaired ability. With this technology, a small number of items can be selected from the

item bank which are most relevant and targeted to a person of a particular ability (34). IRT in

combination with CAT has recently become an alternative to conventional fixed-format

disability measurement (25,36).

Physical Function CAT and ICFmeasure.com

In order to measure the impact of back pain on individuals, the use of World Health

Organization's (WHO) International Classification of Functioning, Disability and Health (ICF)

framework is useful. The ICF describes health and health status in terms of functioning and

disability (54). The conceptual model describes three domains of functioning and disability,

which are body function and structure; activities as a whole person; participation as a whole

person in a social context. Disability therefore involves dysfunction at one or more levels of

impairments, activity limitations and participation restrictions influenced by environmental and

personal factors. Despite attempts to clarify whether disability assessment should focus on how

much difficulty or how frequent one performs an activity, disability measures can be organized

along the single construct of daily functioning such as activities of daily living (ADLs) (55).

Disability may be assessed in terms of physical function or activity limitation because most

individuals with back pain are restricted in their daily functioning (56). Accordingly, the newly









created ICF Activity Measure (ICFAM-ICFmeasure.com) embraced the ICF framework as a

conceptual basis of measuring a person's physical ability.

The ICFAM development was funded by the National Institute of Disability and

Rehabilitation Research (NIDRR). The primary goal of the research was to develop an efficient

and precise measurement system based on the activity domain of the International Classification

of Functioning, Disability and Health (ICF). Equiprecise measurement, covering across the

entire range of a construct, was applied to activities involving movement, moving around and

daily life tasks as defined by the activity domain of the ICF. Based on applying the Rasch model

(one-parameter IRT model) on the activity domain, the ICFAM with 264 question item bank was

developed. With CAT methods, these questions are selectively administered to respondents from

a large item bank. Furthermore, the measure is now accessible worldwide through the web

(http://icfmeasure.phhp.ufl.edu/).

Short Forms of Back Pain Disability

While both CAT and IRT framework have considerable advantages in terms of efficiency

and precision of measurement, fixed short forms have been primarily used to achieve

measurement efficiency, especially in the absence of computer technology. Accordingly, the

efficient measurement is achieved by reducing the number of items of a larger instrument to

relieve administration and respondent burden (28,35). Despite achieving measurement efficiency

with fewer items, loss of precision became an issue when developing a short form from its full

instrument (15,36,44-46). Not surprisingly, several studies have reported that CAT measures

outperform fixed short form versions of assessments in terms of measurement precision (57-59).

It should be noted that the CAT method often faces challenges, such as financial or technological

requirements, for many settings (59). This leads researchers and healthcare professionals to seek

practical measures to overcome those challenges. A goal of this study is to compare CAT and









fixed short form versions of the ICFAM, to determine which achieves optimal precision required

in clinical settings and research.

Several methods have been applied to develop short forms from their full tests. CTT

methods include the deletion of items with low item-total score correlation and with the least

impact on the overall internal consistency of the tests (60). Several studies have recently

developed short forms using the IRT framework (46,58,61). These studies involve selecting

items that are most frequently administered items in CAT administration, have high test

information, or show broad item difficulty coverage. Using IRT framework, items that show

poor item fit statistics or similar calibrations also can be deleted. Deleting items with fit statistics

and having similar item calibrations while maintaining instrument measurement quality.

Recently, investigators created short forms from an IRT-based item pools and confirmed that

short form scores are nearly as precise as the CAT scores in different diagnostic groups (44,46).

Several short forms evolved from generic health status measures under the CTT framework such

as the Physical Function (PF)-10 (8) and PF-12 PCS (Physical Component Summary) from the

Short Form (SF)-36 of the Medical Outcome Scale (MOS). These particular short forms have

been applied to back pain populations to assess the impact of back pain on quality of life. Short

forms have also been created from condition specific back pain instruments (62) such as the 24-

item Roland-Morris Disability Questionnaire (RMDQ) (6,7) from 136-item the Sickness Impact

Profile and the 18-item PF-18 (14) from the multiple instruments, which include the Oswestry

Disability Index, the RMDQ, and the PF-10.

Existing Self-Report Back Pain Disability Measures

Self-reported outcome measures are generally classified as generic or condition-specific

measure (28,35). Generic measures often include global ratings of health status as well as ratings

of multi dimensional status of health-related quality of life. These instruments often measure a









broad spectrum of health concepts and are intended to provide scores that are sensitive to disease

severity. By contrast, condition-specific measures are designed to assess the aspects of health

status affected by certain disease pathology and view the attribution of symptom and functional

limitations to a specific condition (25). Thus, in contrast to generic measures, condition-specific

measures are likely to be sensitive to treatment and natural history of a specific disease or

condition.

Although generic measures were not primarily designed to assess the specific conditions,

two instruments, the Sickness Impact Profile (SIP) and the Physical Function scale (PF-10) have

been applied to chronic back pain. The SIP was originally developed and validated as a measure

of sickness-related behavioral dysfunction consisting of 189 items in 14 categories (63). With

few revisions, the final version of the SIP was developed as a behavioral-based measure of health

status for use in a variety of chronic diseases (62). The PF-10 is a subscale of the SF-36 that

measures physical functioning, which assesses limitations in a variety of physical activities.

Other versions of PF-10, such as a general population version PF-12 PCS (Physical Component

Summary) and specific low back version Physical Functioning (PF)-18 (14) have been developed.

Among patients with back pain, studies report adequate psychometric properties for these two

instruments (4, 8, 15, 64-66).

As disease specific measures for back pain, the Roland-Morris Disability Questionnaire

(RMDQ), the Quebec Back Pain Disability Scale (QBDS), and the Oswestry Back Pain

Disability Questionnaire (ODQ) are the most widely accepted instruments. The RMDQ consists

of 24 items of daily physical activity from the Sickness Impact Profile. In contrast to the SIP, the

RMDQ is short, simple to complete, and readily understood by patients (7). The QBDS consists

of 20 items of a comprehensive view of person's disability for back pain, which adopted the









World Health Organization's International Classification of Functioning, Disability and Health

(ICF) as a conceptual model to select test items relevant to ICF activity and participation

domains (10,67). One of unique features about the QBDS is that it measures only physical

function domain, while most instruments appear to assess more than one domain within the

assessment (67). All of them appear to have good psychometric properties supported by many

studies (7, 23, 68-72).

The Oswestry Back Pain Disability Questionnaire as a Gold Standard

The Oswestry Back Pain Disability Questionnaire (ODQ) was first introduced by John

O'Brien in 1976 and further developed by Fairbank and colleagues in 1980 (29,73,74). The ODQ

consisted of 10 items assessing the level of pain and interference with personal care, physical

activities (i.e., lifting, walking, sitting, and standing, sleeping, sex life, social life, and traveling.

Its several validated versions have also been published omitting a single item (i.e., sex life or

social life) (75) or replacing 'sex life' item with employment/homemaking item (13). The ODQ

and its revised versions have been proved to be much more sensitive to patients with severe

symptoms, while they also appear to be occasionally responsive to those with minor symptoms

(29). The ODQ, whether in the original or revised versions, remains a salient measure of

condition-specific disability with good validity and reliability (3, 13, 23, 24, 29, 30, 73, 74). The

ODQ is one of the most widely accepted back pain-specific instruments (25, 30, 76, 77). It is

presently considered as the "gold standard" in the assessments of back pain (29) because of its

many advantages such as popularity, internally consistent scale, good reliability and

responsiveness to clinical change. In numerous studies, the ODQ and the revised versions of it

are recommended as a standardized measure of physical function in individuals with back pain

(3, 13, 14, 23, 25, 29, 30, 32, 70, 73, 74, 76-79).









Despite the popularity of ODQ in health care, there have been a few concerns about several

of its measurement properties. The ODQ is shown to be the multidimensional construct. Physical

function and pain item as separate construct (30,80) and lacks of sensitivity to reliably

discriminate individuals in particular ranges of the scale due to "gaps" between test items (e.g.,

none of items were available a gap between "standing" and "lifting" on item difficulty

hierarchical order) for the underlying continuum (30). The lack of breadth may lead to

inadequate sensitivity at the extremes of the scale. Not surprisingly, the developers of ODQ and

researchers indicate that the instrument is better at detecting change only in a specific disability

level due to its substantial measurement imprecision (3, 29, 73, 74, 77). Despite these limitations,

the ODQ remains a leading back pain disability instrument in health care

(3, 13, 14, 23, 29, 30, 32, 70, 73, 74, 77-79).

CAT, Short Form, and Existing Back Pain Measure in Measurement Precision

Although the psychometric property in CTT paradigm such as reliability, validity, or

responsiveness, are well-known and rigorous criteria to select a proper outcome measure, the

properties may not be sufficient in terms of measurement precision and efficiency. Numerous

studies have found that Computer Adaptive Testing (CAT) improves both in measurement

precision and efficiency relative to the full test (41, 43, 48, 50, 52, 53, 57-59). Several studies

report that CAT measures are highly correlated with other instruments measuring same construct

and require fewer number of items with an average 6 items to reach the ability estimation (81-84).

The construction of fixed short forms is a conventional approach of achieving measurement

efficiency, which reduces the burden of respondent and administration (44, 46). Despite the loss

of some precision, short forms have been shown to be valid and practical for use in outcome

measurement (34, 44-46). The purpose of this study is to determine; 1) the item level

psychometrics of the ICFAM in chronic back pain population and three short forms, 2) how the









ICFAM items respond differently across different diagnostic group versus chronic back pain, 3)

compare the relative precision of the person measure of computer adaptive ICFAM versus short

forms and the Oswestry Back Pain Disability Questionnaire.

Research Question 1

What are the psychometric properties of the computer adaptive ICF activity measure

(ICFAM) in terms of positioning/transfer, lifting/pushing, and walking/moving constructs

associated with individuals with low back pain?

Hypothesis: The ICF activity measure for back pain (ICFAM) will demonstrate a

unidimensional construct for each construct.

Hypothesis: Item level psychometric properties of the ICFAM will demonstrate item

difficulty hierarchy empirically versus hypothetically derived.

Research Question 2

What are the psychometric properties of newly generated short forms of

positioning/transfer, lifting/carrying, and walking/moving?

Hypothesis 2.1: The ICFAM short forms will demonstrate a unidimensional construct

for each of the three constructs respectively (positioning/transfer, lifting/carrying, and

walking/moving).

Hypothesis 2.2: The ICFAM short forms will show acceptable item level psychometrics

(item fit, person separation, item-person match, logical item difficulty hierarchy relative

to metabolic equivalents).

Hypothesis 2.3: The ICFAM short forms will show a precision distribution (information

function) that is similar to that of the entire item bank but will show overall lower

precision than the entire item bank across the breath of the measure.









Research Question 3

How precise are the ICF activity CAT measures and the IRT-based ICFAM short form

measures relative to the Oswestry Back Pain Disability Questionnaire (ODQ)?

Hypothesis 3.1: The relative precision of the ICFAM CAT measures are superior to the

ICFAM short forms.

Hypothesis 3.2: The relative precision of the short form measures are superior to that of

the Oswestry Disability Questionnaire.









CHAPTER 2
THE PSYCHOMETRICS OF THE ICF ACTIVITY MEASURE FOR BACK PAIN

Introduction

Back pain is one of the most common health problems that affect activity limitation in a

age group younger than 45 years in the United States, the second most frequent reason for

physician visit, the third most common cause of surgical interventions, and the fifth-ranking

cause of admission to hospital (85). Its lifetime incidence and annual prevalence estimated of

general population are about 70-80% and 15-45% respectively (85). The impact of chronic back

pain on the US work force is remarkably significant. According to the U.S. Bureau of Labor

Statistics, there were 63% back related injuries for a total 4.2 million nonfatal occupational

injuries reported in 2005 (86). The estimated annual cost incurred by back pain was $20 billion

to 50 billion in 2004 (87). Recently, not only is the population vulnerable to back pain, but

people age 65 and older are also the fastest-growing back pain population. It is a leading source

of health care expenditures and financial compensation for a temporary or permanent disability

(88, 89). That is, about three in four people experience back pain at sometime in their life and

almost half of population suffer from back pain every year. Many individuals do not recover and

remain with limitations in activity and physical functioning, which may further lead to the

chronic condition of the limitation.

In order to monitor the health status of back pain population, a precise measure of

disability resulting from back pain is essential. Traditionally, investigators and clinicians have

used disability instruments based on test-level psychometrics such as reliability (38,90).

However the reliability values which are widely accepted as a criterion for good measurements

varies from sample to sample (90, 91). That is, reliability values obtained with one sample are

not necessarily reflective of reliability values from other samples. In addition, test scores









obtained from disability measures are always dependent on selection of assessment tasks from

the underlying construct being measured. These scores will exhibit lower scores on difficult tests

and higher scores on easy tests, while the respondents' ability remains the same. Moreover,

existing conventional instruments frequently exhibit inadequate breadth for the wide range of

underlying construct because these are developed to target the average persons for whom the

instruments are designed (28). Along with the breadth of measurement, these instruments also

often show a lack of precision in which the items of the instruments do not closely match to the

ability of individuals. For example, this precision problem may appear when "lifting 25 pounds

weight" item (i.e., difficult item) is administered to individuals of low ability who cannot

perform "lifting 1 pound weight" item (i.e., easy item) or when an easy item is administered to

individuals of high ability. Similarly, it may happen when an easy test is administered to

individuals of high ability or a difficult test is administered to individuals of low ability.

The ICF Activity Measure (ICFAM) was developed to create an efficient and precise

measurement system based on the activity dimension of International Classification of

Functioning, Disability and Health (ICF). The ICF by World Health Organization (WHO)

provided the conceptual framework and classification system for developing items used in the

study. Equiprecise measurement (i.e., measurement across the entire range of a construct) was

applied to activities involving movement, moving around and daily life tasks as defined by the

activity dimension of the ICF. By applying Item Response Theory (IRT) and Computer Adaptive

Testing (CAT) methods, Velozo and colleagues (41) created ICFAM, which is a web-based

computer adaptive survey system. The administrative core of the instrument allows setting a

wide range of functions, including initial theta value (i.e., directing the initial question that most

closely matches the ability level of the respondent) and standard error (i.e., for terminating the









test). The questions are targeted to individuals at their ability level requiring only 5-10 questions

per construct to reach at a final measure of person ability. In addition, immediate results are

provided to the respondents/clinician in the form of graphs and summary statistics.

The ICFAM consists of 6 constructs; positioning/transfers, lifting/carrying, fine hand,

walking/climbing, wheelchair/scooters, and self care activities measuring activity limitations.

While the ICFAM was designed for individuals with upper extremity deficits, lower extremity

deficits, spinal cord injury and back pain, the focus of this study is only on back patients. To

comprehensively cover the extensive activity limitations of chronic back pain population, there

are three constructs that are particularly relevant to individuals with back pain. We identified the

ICFAM constructs for this study by two criteria; 1) the most frequently cited as deficit constructs

for back pain and relevance of activity for individuals with back pain and 2)

positioning/transfers, lifting/carrying, and walking/moving.

The purpose of this study is to investigate the item-level measurement qualities of the

ICFAM with a sample of patients with back pain. Factor analysis and the Rasch model (one -

parameter IRT model) was used to investigate the following measurement qualities of three

constructs of the ICFAM; 1) unidimensionality, 2) item-level psychometrics, and 3) the

hierarchical order of item difficulty (hypothetical versus empirical). Unidimensionality refers to

measuring a single dominant construct even while multiple attributes are being measured

(39,92). This property is a basic assumption of measurement theory that allows combining the

items to obtain a total score for an assessment and the validity of interpretations based on a total

score (93). Rasch analysis was used to scrutinize the data at the item-level including item

difficulty and rating scale structure. These item parameters are invariant whichever subgroups of

sample are used (sample-free). Rasch analysis also provides person-item match map, which









places both person ability and item difficulties on the same linear continuum. This map can

reveal ceiling and floor effects and other "gaps" where items difficulty calibrations do not match

person ability estimates (40). In addition, the person-item match map also can provide insight on

construct validity (i.e., supporting that "staying in a kneeling position on both knees for 10-20

minutes" item is more challenging than "staying in a lying position on back for 1 hour" item). A

hierarchy of item difficulty continuum refers to a possible logical progression in which relevant

items of a unidimensional construct are arrayed from easy to difficult (93). The empirically

derived hierarchy of item difficulties based on Rasch analysis can be compared the

hypothetically derived hierarchy of activities based on Metabolic Equivalent (MET).

Methods

Research Participants

The data used in this study was retrieved from a research that developed the ICFAM

funded by the National Institute of Disability and Rehabilitation Research (NIDRR). The

developmental research was approved by the Institutional Review Board of the University of

Florida (Approved by IRB # 568-2000). Through 1) focus group presentation with test items, 2)

professional panel consultations, 3) cognitive interviewing, and 4) paper-pencil version filed test

with 255 items for different diagnostic groups, the study was conducted to develop the ICFAM

with 264 items measuring activity limitation. Three hundred twelve individuals with 3 diagnostic

groups (i.e., low back pain, lower extremity, and upper extremity injury) who completed the

paper-pencil version test were selected for this study.

Instrumentation

In effort to capture limitations in activities, the ICFAM was designed with 6 constructs

(positioning/transfers, gross upper extremity, fine hand, walking/moving, wheelchair/scooters,

and self care activities). Three constructs (103 total items) of the ICFAM (56 items for









positioning/transfer, 27 items for lifting/carrying and 20 items for walking/moving construct)

were chosen for this study. The three constructs with examples of items are presented in the

Table 2-1. Items difficulties for the items exemplified for each construct in the Table 2 are listed

in descending order from the most difficulty to the easiest. Response categories for these items

consist of four choices; 1) a lot of difficulty, 2) some difficulty, 3) no difficulty, and 4) have not

done. If participants have not performed the activity for the past 30 days, unable to perform the

activity, require the help/assistance of another person, or your doctor told you not to do the

activity, they are instructed to answer 'have not done'. This response category 'have not done' is

regarded as missing value in the analysis.

Rasch Rating Scale Model

Rasch rating scale model is generalization of the dichotomous Rasch model and

sometimes referred to as the polytomous Rasch model. It was derived by Andrich (1978) (94).

The Rasch rating scale model can be explained by a probability equation: In (Pnik/Pni(k-1)) = Bn-

Di- Fk. The left side of the equation is the logarithmic function (In is the natural logarithmic

which uses e = 2.718 as the base). Pnmkis the probability that person n, encountering item i would

be observed in category k. By taking the probability of passing rating category k (Pnmk) divided by

the probability of passing one less rating category k-1 (Pn,(k-), it computes the odds ratio of

passing the rating category from k rated to k-1 level. The log transformation then turns ordinal

level data into interval level data where the probability of passing the rating scale at the next

higher level can be a conjoint measurement of the person ability (Bn), item difficulty (Di), and

the step category between the rating categories Fk. The unit of measurement that results when the

Rasch model is used to transform raw scores into log odds ratios on a common interval scale is

the "logits" (95).









Data Analysis

Several studies have shown that dimensionality cannot be determined by solely by fit

statistics (96,97). Thus prior to the application of Rasch analysis to the items of the ICFAM,

confirmatory factor analysis (CFA) using MplusTM (Muthen & Muthen, Los Angeles, CA,

version 4.21) was conducted to determine the goodness of fit of the items to the 3 factor model

of the ICFAM (n=312). In addition, CFA were conducted to determine a goodness of fit of the

items to one factor model for the ICFAM and one factor model for each construct of the ICFAM.

The following criteria were used to determine goodness of fit to the one and multi factor model;

1) the p-value of chi square > 0.05 indicating a significant fit, 2) comparative fit index (CFI) and

Tucker-Lewis Index (TLI) 1.0 indicating the closer to 1.0, the better the fit, 3) root mean square

error of approximations (RMSEA) < 0.06, and 4) weighted root mean square residual (WRMR)

< 0.01 (98,99).

Traditionally, exploratory factor analysis (EFA) has been used to explore the possible

underlying structure of a set of interrelated variable without any preconceived structure on the

outcome (100). In this study, we conducted EFA on the construct of the ICFAM, if the CFA

failed to confirm the unidimensionality of each construct to further investigate the potential

factor structure. EFA was performed using MplusTM (Muthen & Muthen, Los Angeles, CA,

version 4.21). We used the unweighted least squares method for estimators, varimax rotation

following the initial factor extraction, and replaced missing data with with mean values. Criteria

to determine the number of retaining factors were; 1) Kiser's eigenvalues greater than 1, 2)

factors accounting for greater than 5% of the variance, and 3) scree test where the slope changes

substantially in the factor versus eigenvalue graph (101). A criterion of greater than 0.46 was

used as a significant factor loading (102).









Rasch analysis with rating scale model using Winsteps computer program (103, 104) was

conducted to determine the model fit as well as the item level psychometrics of the ICFAM for

back pain patients. Rasch model (i.e., one-parameter IRT model) is the most robust of the IRT

models in which stable and accurate item parameters such as fit statistics could be obtained with

relatively small sample size (105). The Winsteps program produces goodness of fit statistics for

each item and person, which were used to identify items that did not fit the unidimensional

Rasch model. Items with infit and outfit mean square (MnSq) presented greater than 1.4 and

smaller than 0.6 indicate misfit, which means that the items were responded erratically relative to

other items (95,106). The erratic pattern of response may indicate that the item might be

measuring a different construct or the item needs further clarification. Infit means inlier-sensitive

or information-weighted fit, which is more sensitive to the pattern of responses to items targeted

on the person, while outfit means outlier sensitive fit, which is more sensitive to the pattern of

responses to items with difficulty far from a person (107). Rasch analysis also provides point

measure correlation coefficients as an immediate check that the item-level scoring accords with

the latent variable. A negative correlations coefficient may indicate reversed survey item. The

point measure correlations should be > 0.3 or better (108).

Rasch analysis also provides person separation (SR) values, which identifies whether items

are effective in separating individuals into distinct ability levels. The SR provides an indication

of the number of statistically significant strata into meaningful categories (e.g., low, medium,

and high ability back pain groups). The formula used to calculate is SR = (4Gp+1)/3, where

"Gp" represents the person separation. Person separation is an index of the sample standard

deviation in terms of standard error units and person separation reliability (analogous to









Cronbach's a) is the proportion of observed sample variance that is not attributable to

measurement error (104).

The item-person map detailing an empirically derived hierarchy produced by Rasch

analysis was compared a hypothetically derived item difficulty hierarchy based on Metabolic

Equivalents (METs). The MET system provides the energy cost of physical activities as

multiples of resting metabolic rate (RMR) (109-112). Although there is an evidence that the

MET may be inaccurate in estimating energy expenditure for people of different body weights

and fat percentages, it is a universally accepted concept to express energy expenditure for

various physical activities (112). In addition, the American College of Sports Medicine has

recently defined light, moderate, and vigorous physical activity based on specific MET levels

(113). The MET system is used by many researchers and clinicians to identify and prescribe

physical activities. For items with no corresponding MET values, estimates were determined by

inspecting values for similar activities. For example, since there was no exact matching MET

value for the item "walking on carpeting" as estimate value was determined by examining the

value for a similar activity "household walking", which has a MET value of 2.0. This

hypothetical hierarchy based on MET values was compared to the empirical hierarchy of item

difficulty of the short forms was determined with the Rasch analysis. Rasch analysis provides

item difficulty estimates in logits. The order of difficulty of items based on these estimates was

compared to the hypothetical hierarchy based on MET values. Support for the hypothetical

hierarchy might be found if "staying in a kneeling position on both knees for 10-20 minutes", a

higher MET value activity, also has a higher logits value than "staying in a lying position on

back for 1 hour", a lower MET value activity. Additionally, the comparison of item difficulty









and person ability (i.e., item-person map) can be used to determine whether or not the items of

each construct cover the range of person ability (i.e., no gaps, ceiling or floor effects).

Results

The demographic and clinical features of two groups of the study participants are presented

in Table 2-2. The three diagnostic groups include low back pain (n=101), lower extremity

(n=108), and upper extremity impairment groups (n=103). The average age is 5017.6 and

4817.3 years for the combined and back pain group, respectively. Almost one third of the back

pain group reported having the problem (i.e., back pain) for more than a year suggesting a

chronic condition.

The results of the CFA failed to confirm the three constructs of the ICFAM. Table 2-3

represents the five indices for the three factor model of the ICFAM. None of five indices for

goodness of fit test reached the criteria of model fit, while only Tucker-Lewis Index (TLI) was

approximate to its criterion (0.907)

Positioning/Transfer Construct

CFA did not confirm one factor model for positioning/transfer constructs (Table 2-3).

None of the indices for the goodness of fit test reached its criterion. To further explore the factor

structure of positioning/transfer, exploratory factor analysis (EFA) was conducted (Table 2-4).

We retained eleven factors based on a criterion of eigenvalue greater than 1, four factors based

on a criterion of variance greater than 5%, and 3 factors based on a criterion of the scree test.

These factors accounted for 76%, 60%, and 53% of total variance, respectively. We extracted 4

factors to further investigate the interpretability of the factor loadings.

Items loaded onto factors that contained items which appeared to be activities with staying

in upright position/shifting weight/changing position, staying in seated position/bending/shifting

weight/changing position, staying lying and standing position, and moving yourself in various









positions (factor loadings greater than 0.46 are bolded) (Table 2-5). Most of items loaded onto

factor 1 (20 of 56 items) and factor 2 (19 of 56 items), while twelve items onto factor 3 and

eleven items onto factor 4. Of these items, 7 items loaded onto more than one factor factoriall

complex) while 2 items did not load onto any factors. Items related to "staying in a standing for

longer than 1 hour, kneeling, and squatting position....." "changing position from....." and

"moving yourself....." had a tendency to load onto factor 1. Items related to "staying seated

., "shifting weight.....", and few items of "changing position ....." had tendency to load onto

factor 2, while "staying in a lying position....." had a tendency to load onto factor 3. Items of

"moving yourself into/out of bathtub......" had tendency to load onto factor 4 while "moving

yourself from mattress/sitting/floor to....." had tendency to cross load onto factor 1 and factor 4.

Table 2-8 presents item measures, error, infit/outfit statistics, and point measure correlation

coefficients for 56 items. The result shows that 54/56 items showed adequate infit/outfit statistics

and point measure correlations for 56 items; two items slightly exceeded the fit statistics criterion

of mean square of 1.4. All items exhibited adequate infit/outfit statistics except "lying down

stomach 2-4 hours" and moving into bathtub to take shower" (items presented in bold) (1.44

/1.41 and 1.49/1.52, respectively). In addition, all 56 items showed adequate point measure

correlations distributing from 0.32 and 0.73.

Items of positioning/transfer construct were effective in differentiating individuals with

chronic back pain into 6 statistically distinct levels of person ability. Person separation index

(person standard deviation in calibration error units) was 4.52, defining 6.36 statistically

meaningful levels of disability (person separation ratio). These items also showed good person

separation reliability (analogous to Cronbach's a) at 0.95.









Table 2-8 also presents the item difficulty hierarchy, which displays the most difficult

items at the top of table and the easiest item at the bottom. Items least likely to be endorsed with

a high rating (i.e., the most difficult items) were "kneeling 10-20 minutes" and "lying stomach 5-

8 hours", while items most likely to be endorsed with a low rating (i.e., the easiest items) were

"change position standing to sitting chair" and "shift lying in bed". That is, these individuals

with chronic back pain demonstrated greater difficulties in maintaining postures for a prolonged

time (i.e., "lying on stomach 5-8 hours" items was at 1.49 logits) than shifting or changing

postures (i.e., "changing position squatting to standing" and "shifting lying in bed items" were at

0.43 and -1.69 logits, respectively). Item difficulty calibrations match person ability measures

fairly well on Positioning/transfer construct (Figure 2-1). The items of each construct at their

average measure are listed to the right side of each map, with the easiest items at the bottom of

map and the most difficult items at the top. "M" to the left/right of the vertical lines represents

the average person measure and item measure, respectively, while "S" and "T" to the vertical

line represents 1 and 2 standard deviation, respectively. The map showed a relative normal

distribution of individual abilities ranging between-2.0 and 4.0 logits. The person ability

distribution also showed no apparent ceiling or floor effects. The average item difficulty and

average person ability were virtually identical with item difficulty 0.06 +0.93 logits lower than

average person ability.

A hypothetically derived hierarchy of activity based on MET values (33,37,39) is

compared with an empirically derived hierarchy of item difficulty. Empirical evidence based on

Rasch analysis did not support the difficulty hierarchy based on MET levels. As energy cost

measure of physical activities, MET level for "lying down on back 5-8 hours" item is 1.0 MET,

while "changing standing to sitting in chair" item is 2.0METs. In our empirical hierarchy of item









difficulty, "lying down on back 5-8 hours" item was one of the most difficult items and

"changing standing to sitting in chair" item was the easiest item for the individuals with chronic

back pain (see Figure 2-1).

Lifting/Carrying Construct

The result of the CFA did not confirm one factor model for lifting/carrying construct

(Table 2-3). None of indices for the goodness of fit test reached the criteria, while only Tucker-

Lewis Index (TLI) was approximate to its criterion (0.932). To further explore the factor

structure of Lifting/carrying construct, EFA was conducted. We retained six factors based on a

criterion of eigenvalue greater than 1, four factors based on a criterion of variance greater than

5%, and three factors based on a criterion of scree test (Table 2-4). These factors based on each

criterion accounted for 78%, 69%, and 54% of total variance, respectively. We extracted four

factors to further investigate the interpretability of the factor loadings.

Items loaded onto factors that contained items which appeared to be activities with lifting

light objects, lifting heavy objects, carrying/pushing/pulling, and carrying infants/toddler (Table

2-6). Items loaded onto factor 1 (9 of 27 items), factor 2 (8 of 27 items), factor 3 (8 of 27 items)

and factor 4 (6 of 27 items). Four items loaded onto more than one factor factoriall complex).

Items of lifting objects 10 pounds or heavier had tendency to load onto factor 1. Items of

carrying 5 to 10 pounds for 25 feet and pulling and pushing had tendency to load onto factor 2.

Items of lifting objects 5 pounds or less had tendency to load onto factor 3, while items of

carrying 10 pounds up/down stairs and infants/toddlers had tendency to load onto factor 4.

Table 2-9 presents item measures, error, infit/outfit statistics, and point measure

correlations for 27 items. Twenty of twenty-seven items showed an acceptable infit/outfit with

seven items with slightly high infit/oufit statistics (carrying toddler on shoulders, on back, and on

hip, carrying infant in arm, carrying 10 pounds up/down one flight of stairs, pushing a shopping









cart, and carrying one pound 25 feet) and one item with low infit/outfit statistics (lifting 10

pounds waist to shoulder). The item "carrying infants in arm" item significantly misfit on both

infit/outfit criteria (presented in bold) (1.95 and 1.89). In addition, all 27 items showed adequate

point measure correlations distributing from 0.40 to 0.79.

Items of lifting/carrying construct are effective in differentiating individuals with back pain

into 5 statistically distinct levels of person ability. Person separation index was 3.67, defining

5.23 statistically meaningful levels of disability. These items also showed good person separation

reliability (Cronbach's a), which was 0.93.

Table 2-8 presents the item difficulty hierarchy of lifting/carrying construct. The least

likely items to be endorsed with high rating (i.e., the most difficult item) were "carrying toddler

on the shoulder" and "carrying toddlers on back" with similar item difficulty calibrations (2.82

and 2.73 logits, respectively). In addition, items most likely to be endorsed with low rating (i.e.,

the easiest items) were "pulling open refrigerator door" and "carrying 1 pound for 25 feet" (-2.99

and 2.15 logits, respectively). That is, individuals with back pain are more likely to have

difficulties with carrying toddler related activities than pulling related activities.

Item difficulty calibrations matched person ability measures fairly well on lifting/carrying

construct (Figure 2-2). The item-person map shows a relative normal distribution of individual

abilities ranging from -3.0 and 5.0 logits. The person ability distribution also showed no apparent

ceiling or floor effects. The average item difficulty was 0.31+1.31 logits lower than average

person ability.

Empirical evidence based on Rasch analysis supported the difficulty hierarchy based on

MET levels. Since only a few items of lifting/carrying construct correspond to activities of MET

level, we attempted detailing the comparisons with estimated values. The estimated MET level









for "lifting 25 pounds shoulder to above head"," lifting 25 pounds floor to waist", and "carrying

25 pounds for 25 feet" was about 3.0 METs. The item of "pulling wet laundry out from washing

machine" item was estimated at about 2.0 METs. In our empirical hierarchy of item difficulty,

the above 3 items of lifting/carrying construct were among the difficult items and pulling wet

laundry out from washing machine" item was among easy items for the individuals with back

pain.

Walking/Moving Construct

The result of the CFA only partially confirmed the one factor model for walking/moving

construct (Table 2-3). Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) of the indices

were marginally adequate (0.960 and 0.978, respectively). To further explore the factor structure

of lifting/carrying construct, EFA was conducted. We retained three factors based on a criterion

of eigenvalue greater than 1, three factors based on a criterion of variance greater than 5%, and

two factors based on a criterion of scree test (Table 2-4). These factors based on each criterion

accounted for 70%, 65%, and 54% of total variance, respectively. Based on these results, we

extracted three factors to further investigate the interpretability of the factor loadings.

Most items loaded onto factors that contained items which appeared to be activities with

walking/stepping, climbing/walking, and climbing/running/jogging (Table 2-7). Ten items

loaded onto factor 1; a 6 items had tendency to load onto factor 2 and 5 items loaded onto factor

3. Walking related items except "walking one mile" item and stepping related items had

tendency to load onto factor 1. Four climbing related items and two walking-related items had

tendency to load onto factor 2, while two climbing-related items, running and jogging items had

tendency to load onto factor 3. Two items (walking 4-8 blocks without stopping and walking one

mile without stopping) loaded onto more than one factor factoriall complex).









Table 2-10 also presents item measures, errors, infit/outfit statistics, and point measure

correlations for the 20 items. Fifteen items showed an acceptable infit/outfit, while 5 items

showed high infit/outfit (jogging one mile, running one block, climbing up or down, stepping

onto or off a bus, and stepping into or out of an elevator). Of these items, "jogging one mile" and

"running one block" items significantly misfit on both infit/outfit criteria (presented in bold,

1.56/2.34 and 1.80/2.16, respectively). In addition, all 20 items showed adequate point measure

correlations distributing from 0.44 to 0.79.

Items of walking/moving construct were effective in differentiating individuals with back

pain into almost 5 statistically distinct levels of person ability. Person separation index was 3.44,

defining 4.92 statistically meaningful levels of disability. These items also showed good person

separation reliability (Cronbach's a), which was 0.92.

Table 2-9 also presents the item difficulty hierarchy of walking/moving construct. Items

least likely to be endorsed with high rating (i.e., the most difficult item) were "jogging one mile"

and "running one block" with similar item difficulty calibrations (3.17 and 2.54 logits,

respectively). In addition, items most likely to be endorsed with low rating (i.e., the easiest

items) were "walking on grass" and "walking on carpet" -1.92 and -2.38 logits, respectively).

That is, individuals with back pain are more likely to have difficulties with jogging/running

related activities than walking on grass/carpet.

Item difficulty calibrations matched person ability measures fairly well on walking/moving

construct (Figure 2-2). The item-person map shows a relative normal distribution of individual

abilities ranging from -3.0 and 5.0 logits. The person ability distribution also showed 8

individuals in the ceilings but no floor effect. The average item difficulty was 0.43+1.39 logits

lower than average person ability.









Empirical evidence generated by Rasch analysis supports the hypothetically derived

hierarchy based on MET levels. The hierarchy of walking/running activities for MET system is

primarily determined by speed (33,37,39), while our empirical hierarchy of item difficulty is

determined by its conceptual difficulties such as challenges of distance or environments. The

most challenging items in our empirical hierarchy, such as "jogging one mile" and "running one

block", match with the vigorous activity (> 6.0 METs) category in METs. In addition,

moderately challenging items in our empirical hierarchy, such as most climbing related items,

matched with moderate activity in METs ranging from 3.0 METs to 6.0 METs. Moreover, the

least challenging items in our empirical hierarchy clearly match the light activity (< 3.0 METs)

category in METs. That is, empirically derived item difficulty hierarchy of walking/moving

construct of the ICFAM is similar to the activity hierarchy of associated MET levels.

Discussion

Summary of Result

While CFA failed to confirm the unidimensionality of the ICFAM for the

positioning/transfer, lifting/carrying and walking/moving construct, overall, the item level

psychometrics of the ICFAM showed good measurement qualities as determined by the fit

statistics, item difficulty hierarchy, and person separation reliability. Since the hypothesized

single factor structure did not provide a good fit of data, exploratory factor analysis was

conducted to further investigate the factor structure of three constructs. EFA suggested multi-

factor solution for the three constructs. The majority of the items for each of the three constructs

fit the Rasch rating scale model. Items of three constructs of the ICFAM were effective in

separating individuals with back pain into statistically meaningful disability groups. One of the

three constructs, the walking/moving construct, presented an empirical hierarchy of item

difficulty that supported the hypothetical hierarchy of activity based on MET values. These









findings may implicate that the association between the multifactor models of physical activity

domain and the empirical hierarchical order of item difficulty needs further investigation.

Unidimensionality

CFA failed to confirm the unidimensionality of each construct of the ICFAM. Therefore,

EFA was used to explore the factor structure of each of the three constructs. The EFA revealed

four factor solution for positioning/transfer, four factor solution for lifting/carrying, and three

factor solution for walking/moving construct. Several possible reasons might account for the

failure to support unidimensionality. First, although all constructs of the ICFAM were

theoretically generated, they might differ from their practical dimensionality (96,97). A few

latent traits for each construct were identified by EFA as follows. For positioning/transfer

construct, EFA showed that there is a tendency to separate the construct into four potential latent

traits, which could be labeled as 1) staying in upright position (standing, kneeling, and

squatting)/shifting weight (in kneeling and squatting)/changing position (sitting to kneeling to

squatting to standing), 2) staying in seated position/bending/shifting weight (in chair and

standing)/changing position (lying to sit to standing), 3) staying lying and standing position, and

4) moving yourself in various positions. For lifting/carrying construct, EFA showed that there is

tendency to separate the construct into four latent traits, which could be labeled as 1) lifting light

and heavy objects, 2) carrying/pushing/pulling, and 3) carrying infants/toddler. For

walking/moving construct, EFA showed that there is a tendency to separate the construct into

three different latent traits, which could be labeled as 1) walking/stepping, 2) climbing/walking,

and 3) climbing/running/jogging. These findings may suggest that the theoretically generated

construct of the ICFAM instrument have multidimensional structures. Further investigation

would be necessary to ascertain the dimensionality.









Second, although unidimensionality is a requisite assumption for IRT approaches the

concept of unidimensionality remains obscure. Reckase (1985) indicates that no measures are

purely unidimensional (114) and McHorney (2004) states that there is no single test available to

check the unidimensionality (8). However, in many cases, studies can be justified by applying

"essential unidimentionality", which involves minimizing methodological or trivial dimensions

(115). Box and Draper (1987) state that all models are essentially wrong, but only some of them

are useful. That is, although a statistical model violates its assumptions, the model may be still

useful (116). Thus, unidimensionality may be a quantitative ideal that can only be approximated

(39). Future research should take into account the influence multidimensionality on measuring

individuals. That is, how robust are the IRT models to multidimensionality.

Rasch Model Fit

In regards to the fit statistics obtained from Rasch analysis, none of items misfit

significantly except "carrying infant in arms" in the lifting/carrying construct and "jogging one

mile" and "running one block" in the walking/moving construct. These fit statistics are a

measure of observed variance over expected variance. For the positioning/transfer construct, all

items showed adequate infit/outfit statistics and fit to the Rasch rating scale model. For the

lifting/carrying construct, infit/outfit statistics of "carrying infant in arms" showed that this item

was showing 95%-89 % more variance than expected. That is, individuals with low disability

(i.e., high ability) had a tendency to score low or high ratings on the item. Furthermore, 56% -

73% of the respondents scored the rating of "have not done", which is the lowest rating. While

we assumed that "have not done" responses were due to individuals not being able to do a task,

this may not have been the case for the "carrying infant in arms" item. The increased variance

on this item may have resulted from a lack of opportunity to do this task and not due to an ability

or inability to perform the task. For the walking/moving construct, infit/outfit statistics of









"jogging one mile" and "running one block" were 1.56/2.34 and 1.80/2.16, respectively. Since

these two items are among the most difficult items, individuals with low disability (i.e., high

ability) were likely to score either low or high ratings. The bimodal distribution of responses

might have resulted due to a lack of observations (the middle categories) and lead to the large

observed variances for this item. Similar to the "carrying infant in arms" item, nearly 73% of

individuals on "running one block" and 68% of individuals on "running one block" responded

with the lowest rating ("have not done").

The Hierarchy of Item Difficulty Calibration and Physical Activity

For the constructs of positioning/transfer and lifting/carrying, an empirical evidence of

hierarchical order of item difficulty based on Rasch analysis did not support the hierarchical

order of physical activity based on MET values. For the positioning/transfer construct, for

example, the relevant MET value of "lying down on back 5-8 hours" was 1.0 MET, while

"changing standing to sitting in chair" was 2.0 METs. That is, "changing standing to sitting in

chair" would be a more challenging activity than "lying down on back 5-8 hours" in terms of the

MET value because the changing posture activity would require more energy to perform than the

lying down activity. However, the opposite order of the hierarchy was found in this study. That

is, in our item difficulty hierarchy, "lying down on back 5-8 hours" was more challenging item

than "changing standing to sitting in chair". This finding reflects a clinical profile of back pain in

which, in general, maintaining a particular activity is more difficult than changing position. By

contrast, for the lifting/carrying construct in which individuals with back pain often report

limitations, there was empirical evidence to support the hypothetical order of activity based on

METs. The empirical order generated by Rasch analysis was differentiating three items (lifting

25 pounds shoulder to above head, lifting 25 pounds floor to waist, and carrying 25 pounds for

25 feet), while relevant the MET values of the three items were the same (i.e., 3 METs). The









MET value of "pulling wet laundry out from washing machine" was found as 2.0 METs. This

item is less challenging than the three items not only in empirical hierarchy but also in the MET

value.

As we hypothesized that findings in the item difficulty hierarchy of the walking/moving

construct was that the empirical hierarchy of item difficulty reflected the hypothesized hierarchy

of activities based on MET values. In our empirical hierarchy generated through Rasch analysis,

an individual with low back pain who is having difficulty on average difficulty item such as

"climbing down one flight of stairs" would be expected to have more difficulty on "climbing up

or down a 6-foot ladder" (more difficult than "climbing down one flight of stairs"). Similarly, an

individual with low back pain who is capable of "climbing down one flight of stairs" would be

expected to be more capable of "step up or down a standard curb" (easier than "climbing down

one flight of stairs"). This logical pattern is similar to that of hypothetical activity hierarchy

based on the MET values (33,37,39). That is, the empirical hierarchy of walking related activities

may correspond to the hypothesized activity hierarchy based on physiological energy

expenditure using the METs. The similarity of these two hierarchies may allude to areas of

unexplored research. Future research investigating associations between self report measures and

physiological measures could be indicated, however, previous studies of this type have

demonstrated weak correlations between physiological functioning and self report health status

(8,115).

Limitations and Future Implications

Several limitations in the present study include sample size, homogeneity of the sample

and dimensionality. The sample size was small to perform confirmatory and exploratory factor

analysis. In order to obtain useful results, studies suggest that the minimum number of subjects

should be at least five observations per item (117) or the larger of five times the number of









variables (118). Since the ICFAM instrument includes 56, 27, 20 items for three construct,

respectively, a sample size larger than 280 would be recommended. Thus, in the present study,

we used the combined group consisting of individuals with three different diagnoses to meet the

criterion because the sample size of back pain group (n=101) was not sufficient for factor

analysis techniques. These findings may indicate the need for use of multidimensional models to

adequately describe the dimensionality of physical function. In addition, the present study

attempted simple comparisons between item difficulty and MET value from the compendium of

activity. Therefore, there is a need for future studies to investigate the constructs of the ICFAM

more in detail, particularly a study by measuring the METs on relevant activity for the items of

the walking/moving construct.










Table 2-1. Examples of items for three constructs of the ICFAM
Constructs Examples of items Item difficulty

positioning/ 1.staying in a kneeling position on both knees for 10-20 1.47
transfers minutes (while making only minor adjustments)
(56 items) 2.staying in a lying position on your back for 5-8 hours (while 1.06
making only minor adjustments)
3.staying in a standing position for 1-2 hours (while making .64
only minor adjustments)
4. moving yourself out of a bathtub after taking a bath .25
5.changing position from standing to kneeling .12
6.bending at the waist while standing for 1-5 minutes (for -.25
example, reaching for something in the trunk of a car)
7.staying in a lying position on your back for 1 hour (while -.45
making only minor adjustments)
8.changing position from lying on your back to sitting (for -.06
example, lying in your bed to sitting on the edge of your bed)
9.changing position from standing to sitting in a chair -1.15
10.shifting your weight while lying in your bed -1.69
lifting/ 1.carrying a toddler on your shoulders 2.84
carrying 2.carrying a toddler on your back (for example, piggyback) 2.69
(27 items) 3.lifting 25 pounds (for example, large bag of dog food or cat 1.77
litter) from shoulder height to above your head with your
hand(s)and arm(s)
4.lifting 25 pounds (for example, large bag of dog food or cat 1.18
litter) from floor to waist height with your hand(s)and arm(s)
5.carrying 25 pounds (for example, large bag of dog food or cat .89
litter) in your hands) and arm(s) 25 feet (for example, from car
to front door)
6.Lifting 10 pounds (for example, bag of groceries or 12-pack .35
of soft drinks) from waist height to shoulder height with your
hand(s)and arm(s)
7.lifting 5 pounds (for example, bag of sugar or large telephone -.31
book) from shoulder height to above your head with your
hands)
8.pulling open a heavy door (for example, -.86
department/convenience store door)
9.lifting 1 pound (for example, a can of soup) from waist height -2.00
to shoulder height with your hands)
10.pulling open a full-size refrigerator door -2.43









Table 2-1. Continued
Constructs


walking/
climbing
(20 items)


Examples of items


1.Running one block
2.climbing up or down a 6-foot ladder
3.climbing up or down a 3-step stool
4.climbing down one flight of stairs
5.walking 4-8 blocks (about 1/2 mile) without stopping
6.walking 2-4 blocks (about 1/4 mile) without stopping
7.walking in a crowded place (for example, outdoor
marketplace, shopping mall)
8.walking within your home/living environment
9.stepping up or down a standard curb
10.walking on carpeting


Item difficulty

2.51
1.28
.65
.11
.17
-.43
-.77


-1.31
-1.55
-2.00









Table 2-2. Demographic information of research participants
Chas 3 Diagnostic Combined Back Pain Group
Characteristics
Group n=312 n=101
Age


< 20
21 -30
31 40
41 50
51 -65
>65
Missing
Mean SD


14 (4.5)
34 (10.9)
42 (13.5)
56 (17.9)
84 (26.9)
66 (21.2)
16 (5.1)
50.25 17.6


5 (5.0)
12(11.9)
15(14.9)
2(23.8)
19(18.8)
20(19.8)
6 (5.9)
48.14 + 17.3


Female
Male
Missing
Education
Elementary
Middle/Junior High
High School
Technical
College
Graduate
Race/Ethnic
African American
Hispanic American
American Indian
White, not Hispanic origin
Asian/Pacific Islander
Other
Missing
Years that has had related problems
Less than a year
1 through < 4 years
More than 4 years
Missing


Gender


(51.0)
(42.6)
(6.4)

(0.2)
(6.2)
(42.0)
(8.0)
(32.2)
(17.0)

(14.1)
(5.4)
(0.3)
(74.4)
(1.6)
(2.9)
(1.2)

(16.0)
(16.4)
(11.3)
(10.3)


65 (64.4)
31(30.7)
5 (5.0)

0 (0.0)
3 (3.0)
34(33.7)
8 (7.9)
33 (32.7)
23 (22.8)

19(18.8)
7 (6.9)
1 (1.0)
68 (67.3)
1 (1.0)
3 (3.0)
2 (2.0)

7 (6.9)
20(19.8)
59 (58.4)
15(14.9)


44
17
1
232
5
9
4









Table 2-3. Demographic information of research participants
INDICES THREE CONSTRUCTS POSITIONING/TRANSFER
(CRITERION) 3-FACTOR MODEL 1-FACTOR MODEL
CHI-SQUARE 1814.065 1554.932
DF 128 60
P-VALUE (> 0.05) .000 .000
CFI (> 0.95) .741 .667
TLI (> 0.95) .907 .872
RMSEA (< 0.06) .205 .310
WRMR (< 0.1) 2.994 3.954


LIFTING/CARRYING
1-FACTOR MODEL
936.447
30
.000
.880
.932
.327
4.387


WALKING/MOVING
1-FACTOR MODEL
359.304
24
.000
.960
.978
.220
2.842


Table 2-4. Number of retaining factors for the ICFAM
Criteria Positioning/Transfer
Eigenvalue (>1) 11
Variance (>5%) 4
Scree test 3


Lifting/Carrying
6


Walking/Moving
3










Table 2-5. Factor structure of positioning/transfer construct following EFA
POSITIONING/TRANSFER Fl F2 F3 F4
1) staying in a lying position on your favorite side for 1 hour .004 .505 .347 .032
2) staying in a lying position on your favorite side for 2-4 hours .018 .437 .538 .047
3) staying in a lying position on your favorite side for 5-8 hours .099 .375 .596 .133
4) staying in a lying position on your back for 1 hour .088 .414 .517 -.037
5) staying in a lying position on your back for 2-4 hours .131 .320 .676 -.013
6) staying in a lying position on your back for 5-8 hours .159 .188 .751 .046
7) staying in a lying position on your stomach for 1 hour .153 -.092 .628 .503
8) staying in a lying position on your stomach for 2-4 hours .163 -.085 .753 .421
9) staying in a lying position on your stomach for 5-8 hours .189 -.082 .783 .400
10) staying in a seated position for 10-20 minutes .049 .593 .297 -.140
11) staying in a seated position for 30-60 minutes .094 .570 .423 -.127
12) staying in a seated position for 1-2 hours .072 .444 .544 -.070
13) staying in a seated position for 3-4 hours .113 .246 .651 .053
14) staying in a standing position for 10-20 minutes .370 .560 .278 -.044
15) staying in a standing position for 30-60 minutes .446 .417 .362 -.017
16) staying in a standing position for 1-2 hours .503 .296 .469 .027
17) staying in a standing position for 3-4 hours .502 .208 .515 .080
18) staying in a kneeling position on both knees for 5-10 minutes .648 .222 .216 .066
19) staying in a kneeling position on both knees for 10-20 minutes .661 .119 .305 .088
20) staying in a squatting position for 1-2 minutes .672 .140 .171 .076
21) staying in a squatting position for 3-5 minutes .634 .077 .267 .051
22) bending at the waist while standing for 1-5 minutes .335 .596 .137 .045
23) bending at the waist while standing for 5-10 minutes .363 .429 .248 .137










Table 2-5. Continued
POSITIONING/TRANSFER Fl F2 F3 F4
1) shifting your weight while lying in bed .178 .675 .165 .247
2) shifting your weight while sitting in a chair with armrests .166 .712 .064 .192
3) shifting your weight while sitting in a chair without armrests .248 .593 .147 .218
4) shifting your weight while standing .302 .604 .061 .049
5) shifting your weight while kneeling on both knees .751 .203 .092 .176
6) shifting your weight while squatting .808 .179 .088 .195
7) rolling over from your back to your side .118 .677 .165 .211
8) rolling over from your stomach to your side .200 .230 .293 .534
9) changing position from sitting to lying down .096 .739 .119 .153
10) changing position from lying on your back to sitting .136 .731 .169 .174
11) changing position from lying on your side to sitting .135 .723 .146 .234
12) changing position from standing to sitting in a chair .251 .613 .113 .177
13) changing position from sitting in a chair to standing .371 .592 .141 .194
14) changing position from kneeling to sitting on the floor .800 .159 .050 .341
15) changing position from sitting on the floor to kneeling .811 .139 .058 .325
17) changing position from kneeling to standing .824 .217 .102 .147
16) changing position from standing to kneeling .815 .202 .065 .150
18) changing position from squatting to kneeling .836 .104 .012 .308
19) changing position from kneeling to squatting .844 .112 .027 .307
20) changing position from standing to squatting .813 .171 .041 .214
21) changing position from squatting to standing .812 .175 .052 .206
1) while scooting yourself up/back into a chair .129 .567 .006 .385
2) while scooting yourself along a couch .142 .556 .016 .422










Table 2-5. Continued
PO SITIONING/TRANSFER
9) moving yourself from sitting on a chair to sitting on the floor
11) moving yourself from a low mattress/futon to the floor
10) moving yourself from sitting on the floor to sitting on a chair
12) moving from the floor to a low mattress/futon
13) moving yourself while sitting on your bed, scooting along the edge of your bed
14) moving yourself while lying on your bed, scooting up in your bed
15) moving yourself into a bathtub to take a bath
16) moving yourself out of a bathtub after taking a bath
17) moving yourself into a bathtub to take a shower
18) moving yourself out of a bathtub after taking a shower


Fl F2
.524 .315
.490 .232
.534 .328
.498 .222
.152 .513
.131 .582
.224 .163
.241 .157
.248 .239
.256 .248


F3
.103
.087
.117
.110
.038
.105
.132
.153
-.006
-.006


F4
.496
.624
.518
.628
.359
.357
.750
.740
.555
.525









Table 2-6. Factor structure of lifting/carrying construct following EFA
LIFTING/CARRYING Fl F2 F3 F4
1) lifting 1 pound from floor to waist height with your hands) .083 .317 .671 .157
2) lifting 1 pound from waist height to shoulder height with your hands) .184 .208 .805 .070
3) lifting 1 pound from shoulder height to above your head with your hands) .286 .134 .720 .120
4) lifting 5 pounds from floor to waist height with your hands) .350 .362 .622 .143
5) lifting 5 pounds from waist height to shoulder height with your hands) .526 .291 .625 .050
6) lifting 5 pounds from shoulder height to above your head with your hands) .570 .161 .589 .131
7) lifting 10 pounds from floor to waist height with your hands) and arm(s)? .566 .371 .429 .212
8) lifting 10 pounds from waist height to shoulder height with your hands) and arm(s) .733 .264 .386 .182
9) lifting 10 pounds from shoulder height to above your head with your hands) and arm(s) .734 .189 .379 .246
10) lifting 25 pounds from floor to waist height with your hands) and arm(s) .721 .315 .160 .267
11) lifting 25 pounds from waist height to shoulder height with your hands) and arm(s) .854 .158 .104 .275
12) lifting 25 pounds from shoulder height to above your head with your hands) and arm(s) .792 .062 .114 .326
15) carrying 1 pound in your hands) 25 feet (for example, from car to front door) -.077 .567 .484 .175
16) carrying 5 pounds in your hands) 25 feet .092 .593 .516 .196
17) carrying 10 pounds in your hands) and arm(s) 25 feet .338 .584 .338 .264
18) carrying 25 pounds in your hands) and arm(s) 25 feet .603 .351 .149 .377
19) carrying 10 pounds (for example, bag of groceries) up one flight of stairs .282 .293 .158 .494
20) carrying 10 pounds (for example, bag of groceries) down one flight of stairs .299 .278 .120 .480
21) carrying an infant cradled in your arms .127 .120 .118 .753
22) carrying a toddler on your hip .172 .082 .126 .785
23) carrying a toddler on your shoulders .200 -.010 .074 .864
24) carrying a toddler on your back (for example, piggyback) .247 -.031 .068 .823
1) pulling open a full-size refrigerator door .073 .550 .177 -.046










Table 2-6. Continued
LIFTING/CARRYING
2) pushing open a heavy door (for example, department/convenience store door)
3) pulling open a heavy door (for example, department/convenience store door)
4) pushing a shopping cart


Fl F2
.267 .732
.271 .710
.107 .627


F3
.124
.116
.155


F4
-.008
.026
.158









Table 2-7. Factor structure of walking/moving construct following EFA
WALKING/MOVING Fl F2 F3
1) walking within your home/living environment .802 .236 .124
2) walking 2-4 blocks (about 1/4 mile) without stopping .662 .416 .271
3) walking 4-8 blocks (about 1/2 mile) without stopping .570 .462 .360
4) walking one mile without stopping .411 .470 .474
5) walking on carpeting .836 .092 .062
6) walking on grass .832 .178 .130
7) walking on gravel .649 .262 .223
8) walking over small obstacles on the floor (for example, toys, shoes) .662 .282 .165
9) walking in a crowded place (for example, outdoor marketplace, shopping mall) .690 .382 .178
1) climbing down one flight of stairs .412 .802 .193
2) climbing up one flight of stairs .388 .787 .231
3) climbing down two flights of stairs .291 .789 .353
4) climbing up two flights of stairs .259 .776 .396
5) climbing up or down a 3-step stool .371 .442 .503
6) climbing up or down a 6-foot ladder .284 .387 .551
7) stepping up or down a standard curb .672 .370 .138
8) stepping onto or off a bus .320 .376 .463
9) stepping into or out of an elevator .499 .250 .165
10) running one block .088 .214 .840
11) jogging one mile .064 .139 .838









Table2-8. Fit statistics for positioning/transfer construct
Items Measure (Logits) Error Infit MnSq ZSTD Outfit MnSq ZSTD Correlation
kneeling 10-20 minutes 1.59 .15 .97 -.1 .87 -.6 .61
lying stomach 5-8 hours 1.49 .14 1.32 2.0 1.36 1.8 .47
lying stomach 2-4 hours 1.22 .13 1.44 2.8 1.41 2.2 .40
standing 3-4 hours 1.16 .13 .98 -.1 1.02 .2 .59
lying back 5-8 hours 1.14 .13 1.02 .2 1.02 .2 .54
lying side 5-8 hours .88 .13 1.11 .8 1.14 .9 .47
seated 3-4 hours .85 .13 .98 -.1 1.10 .7 .47
change position kneeling to squatting .82 .13 1.11 .8 1.00 .1 .61
shift squatting .81 .13 .95 -.3 .88 -.8 .67
shift kneeling .78 .12 1.06 .5 1.00 .0 .61
kneeling 5-10 minutes .77 .13 1.21 1.6 1.13 .9 .58
squatting 3-5 minutes .74 .12 1.36 2.6 1.33 2.2 .55
change position squatting to kneeling .71 .12 1.22 1.7 1.14 1.0 .62
standing 1-2 hours .67 .12 .82 -1.5 .86 -1.0 .63
moving floor to low mattress/futon .66 .12 .99 -.1 .94 -.4 .67
moving low mattress/futon to floor? .61 .12 1.17 1.4 1.06 .5 .69
bending waist 5-10 .60 .12 .94 -.5 .99 .0 .53
lying stomach 1 hour .59 .12 1.47 3.3 1.45 2.9 .47
change position sitting floor to kneeling .54 .12 .91 -.7 .87 -.9 .66
change position kneeling to standing .51 .12 .69 -2.7 .66 -2.8 .67
change position kneeling to sitting floor .45 .12 .94 -.4 .91 -.7 .71









Table 2-8. Continued
Items Measure (Logits) Error Infit MnSq ZSTD Outfit MnSq ZSTD Correlation
change position squatting to standing .45 .12 .83 -1.4 .79 -1.6 .65
lying back 2-4 hours .41 .12 1.03 .3 1.04 .4 .50
moving sitting floor to sitting chair .38 .12 .74 -2.2 .73 -2.2 .65
moving out bathtub taking a bath .37 .12 1.35 2.6 1.33 2.3 .54
seated 1-2 hours .26 .12 .96 -.3 1.06 .5 .42
lying side 2-4 hours .25 .12 1.12 1.0 1.27 1.9 .32
moving into bathtub take bath .25 .12 1.51 3.7 1.49 3.3 .55
change position standing to squatting .23 .12 1.01 .1 .94 -.4 .68
squatting 1-2 minutes .23 .12 1.40 2.9 1.33 2.3 .58
change position standing to kneeling .14 .12 .87 -1.0 .83 -1.3 .73
moving sitting chair to sitting floor .04 .12 .90 -.8 .87 -1.0 .66
standing 30-60 minutes -.05 .12 .86 -1.1 .82 -1.4 .58
bending waist 1-5 minutes -.15 .12 .83 -1.4 .86 -1.1 .58
rolling stomach to side -.22 .12 1.30 2.2 1.32 2.1 .49
lying back 1 hour -.37 .13 1.20 1.5 1.33 2.2 .43
moving out bathtub taking a shower -.40 .13 1.50 3.4 1.51 3.1 .48
seated 30-60 minutes -.45 .13 .88 -.8 1.03 .3 .40
moving into bathtub take shower -.51 .13 1.49 3.2 1.52 3.1 .53
lying side 1 hour -.56 .13 .82 -1.3 .80 -1.4 .43
change position lying back to sitting -.66 .13 .59 -3.5 .65 -2.6 .51
shift sitting chair without armrests -.80 .13 .94 -.4 .90 -.6 .55










Table 2-8. Continued
Items

moving lying bed, scooting up in bed

change position sitting chair to standing

change position lying side to sitting

rolling back to side

standing 10-20 minutes

moving sitting bed scooting edge of bed

shift lying in bed

scooting along a couch

change position sitting to lying down

shift while standing

scooting up/back into chair
seated 10-20 minutes

shift sitting chair with armrests

change position standing to sitting chair


Measure (Logits)

-.84

-.85

-.85

-.94

-1.00

-1.04

-1.12

-1.16

-1.32

-1.38

-1.41

-1.46

-1.46

-1.62


Error

.14

.14

.14

.14

.14

.14

.14

.15

.15

.16

.16

.16

.16

.17


Infit MnSq

.84

.47

.57

.67

.74

1.09

.59

.88

.79

1.11

.95

.88

.60

.87


ZSTD

-1.1

-4.5

-3.5

-2.5

-1.9

.6

-3.1

-.7

-1.4

.7
-.3

-.7

-2.8

-.7


Outfit MnSq

.83

.54

.58

.75

.71

1.16

.58

.82

.81

1.06

.83

.96

.65

.89


ZSTD

-1.0

-3.5

-3.1

-1.6

-1.9

1.0

-2.8

-1.0

-1.1

.4

-.9

-.1

-2.0

-.5


Correlation

.50

.60

.61

.53

.60

.49

.60

.54

.54

.48

.58

.42

.57

.50










Table 2-9. Fit statistics for lifting/carrying construct
Items Measure (Logits)

carrying toddler on shoulders 2.82
carrying toddler on back 2.73
carrying toddler on hip 1.79
lifting 25 pounds shoulder to above head 1.74
carrying infant in arms 1.62
lifting 25 pounds waist to shoulder 1.34
lifting 25 pounds floor to waist 1.19
carrying 10 pounds down one flight stairs 1.16
carrying 10 pounds up one flight stairs 1.01
carrying 25 pounds 25 feet .96
lifting 10 pounds shoulder to above head .77
lifting 10 pounds waist to shoulder .44
lifting 10 pounds floor to waist .21
lifting 5 pounds shoulder above head -.18

carrying 10 pounds 25 feet -.41
lifting 5 pounds floor to waist -.58
lifting 5 pounds waist to shoulder -.64
pulling wet laundry out washing machine -.72
lifting 1 pound shoulder to above head -1.01
pulling open a heavy door -1.07
lifting 1 pound floor to waist -1.42


Error

.19
.19
.15

.15
.14
.14
.13
.14
.13
.13
.13
.13
.13
.13
.14
.14
.14
.14
.15
.15
.16


Infit MnSq

1.45
1.34
1.56
.84
1.95
.74
.70
1.43
1.26
.75
.64
.59
.89
.89
.77
.80
.68
.86
1.21
.86
1.13


ZSTD

2.0
1.6
3.2
-1.0
5.1
-2.0
-2.4
2.7
1.8
-1.9
-3.0
-3.6
-.8
-.8
-1.7
-1.5

-2.4
-.9
1.3
-.9

.8


Outfit MnSq

1.68
1.45
1.28
.76
1.89
.75
.76
1.57
1.45
.72
.60
.56
.86
.80
.68
.85
.63
.91
1.08
.85
1.27


ZSTD

1.7
1.3
1.2
-1.1
3.5
-1.4
-1.4
2.8
2.4
-1.8
-2.8
-3.4
-.9
-1.2
-2.0
-.8
-2.3
-.4

.4
-.6
1.1


Correlation

.43
.50
.56
.71
.46
.73
.73
.56
.59
.74
.79
.79
.69
.70
.72
.68
.72
.62
.57
.58
.55










Table 2-9. Continued
Items

pushing open a heavy door
carrying 5 pounds 25 feet
pushing a shopping cart
lifting 1 pound waist to shoulder
carrying 1 pound 25 feet
pulling open refrigerator door


Measure (Logits)
-1.42
-1.47
-1.85
-1.86
-2.15
-2.99


Error Infit MnSq

.16 .81
.16 .76
.18 1.54
.18 .87
.20 1.44
.27 1.00


ZSTD Outfit MnSq
-1.1 .72
-1.5 .77
2.5 1.31
-.6 .69
1.9 1.16
.1 .72


ZSTD Correlation

-1.2 .61
-.9 .62
1.0 .41
-1.1 .57
.6 .45
-.5 .40


Table 2-10. Fit statistics for walking/moving construct
Items Measure (Logits)

jogging one mile 3.17
running one block 2.54
climbing up or down a 6-foot ladder 1.35
climbing up two flights of stairs .94
walking one mile .90
climbing down two flights of stairs .69
climbing up or down a 3-step stool .65
stepping onto or off a bus .60
walking 4-8 blocks .24
climbing up one flight of stairs .13
climbing down one flight of stairs -.06
walking 2-4 blocks -.36


Error
.21
.18
.14
.14
.13
.13
.13
.13
.13
.13
.13
.14


Infit MnSq
1.56
1.80
1.50
.81
.72
.87
1.13
1.38
.74
.64
.69
.79


ZSTD
2.1
3.4
2.9
-1.4
-2.1
-.9
.9
2.4
-2.0
-2.9
-2.4
-1.5


Outfit MnSq

2.34
2.16
1.25
.69
.71
.76
1.03
1.49
.70
.65
.68
.65


ZSTD

2.2
2.5
1.1
-1.6
-1.6
-1.3
.2
2.3
-1.7
-2.0
-1.7
-1.7


Correlation
.53
.57
.69
.78
.79
.76
.72
.65
.76
.76
.74
.71










Table 2-10. Continued
Items

walking on gravel
walking crowded place
walking small obstacles on floor
stepping up or down a standard curb
stepping into or out of an elevator
walking within home environment
walking on grass
walking on carpeting


Measure (Logits)
-.69
-.69
-.95
-1.14
-1.49
-1.53
-1.92
-2.38


Error Infit MnSq
.14 1.28
.14 .75
.15 1.16
.16 .86
.17 1.62
.17 .80
.19 1.01
.22 .90


ZSTD Outfit MnSq
1.7 1.38
-1.7 .62
1.0 1.20
-.8 .68
2.9 1.38
-1.1 .66
.1 .76
-.3 .72


ZSTD Correlation

1.4 .57
-1.6 .69
.8 .55
-1.0 .61
1.1 .49
-.9 .57
-.5 .49
-.4 .44

























xx




2 +
X T|
x I
XXX I T
XX | kneeling 10-20 minutes
XX lying stomach 5-8 hours

1 XXX S+ lying back 5-8 hours, lying stomach 2-4 hours, seated 3-4 hours, standing 3-4 hours
XXXX lying side 5-8 hours
X S change position kneeling to squatting, shift kneeling, shift squatting
XXXXXXXX bend waist 5-10,change pos kneel-stand, change pos squat-kneel, kneel 5-10 min, mov fl-low matt, squat 3-5 min, standing 1-2 hrs
X change pos sit floor-kneel, change pos squat-stand, lying back 2-4 hrs, lying stomach 1 hr, mov low matt-floor, mov sit fl-sit chair
XX change position kneeling to sitting floor, moving out bathtub taking a bath, seated 1-2 hours
X change pos stand-kneeling, change pos stand-squat, lying side 1 hr, lying side 2-4 hrs, mov into bathtub, squat 1-2 min
SXXXXX M+M change position lying back to sitting, moving sitting chair to sitting floor
XXXXXXXX standing 30-60 minutes
XXXXXX I bending waist 1-5 minutes, change position lying side to sitting, rolling stomach to side
X change posit sit to stand, lying back 1 hr, moving into bathtub for shower, moving out bathtub for shower, roll back to side

XX S seated 30-60 minutes
XXXXX S| change position sitting to lying down, shift sitting chair without armrests
-1 X + moving lying bed, scooting up in bed, moving sitting bed scooting edge of bed, seated 10-20 minutes
XXXXX change position standing to sitting chair, scooting along a couch
I scooting up/back into chair, shift sitting chair with armrests
X shift while standing, standing 10-20 minutes
IT
X shift lying in bed
Tx
-2 +



-3 +



|

Figure 2-1. Item-person map of positioning/transfer construct of the ICFAM. Each 'X" on the left side of map represents 1 subject,
with Xs and at the top of map representing individuals with high ability and at the bottom of map representing individuals

with low ability.
Ix
xxx
1x
xxxxxx
xxxxxxx fequ
Fiue21 te -esnmpofpstoigtrnfrcntrc fteIFA .Ec "o telf iexfmprpeensIsbet
xxxxxxxxth o fmprersnigidiiul ihhihaiiyad ttebto o a ersntn niiul
xxxxx wabliy



















x


4 +





3 T+
X |T carrying toddler on shoulders
I carrying toddler on back

XXX

2 XX +
I
X lifting 25 pounds shoulder to above head

XXXXXXXX |S carrying toddler on hip
XXXXX | carrying infant in arms, lifting 25 pounds waist to shoulder
XXXXX | carrying 10 pounds down one flight stairs, lifting 25 pounds
XXX I carrying 10 pounds up one flight stairs
XXX | carrying 25 pounds 25 feet
XXXX | lifting 10 pounds shoulder to above head
XXXXXX
XXX M| lifting 10 pounds waist to shoulder
XXXXX |
0 XXXXXXXXXXXX +M lifting 10 pounds floor to waist

XX | lifting 5 pounds shoulder above head
XXX | carrying 10 pounds 25 feet, pulling open a heavy door
XXXX | lifting 5 pounds floor to waist
XXXX | lifting 5 pounds waist to shoulder, pushing open a heavy dooj
XXX S+ lifting 1 pound shoulder to above head, pulling wet laundry c
XXXXX x
XXX
X |S carrying 5 pounds 25 feet, lifting 1 pound floor to waist
I pushing a shopping cart

-2 XX + carrying 1 pound 25 feet, lifting 1 pound waist to shoulder

T|
X I pulling open refrigerator door

X |T
-3 X +
|



Figure 2-2. Item-person map of lifting/carrying construct of the ICFAM.







64
















xxxx


3 XX +

IT
XX | jog one mile (6.0 8.0)

XX I run one block
x I


2 XX +
X S|
XXX I
XXXX I
XX |S climb up or down a 6-foot ladder
XXXXX I
1 X + climb up two flights of stairs
XXXX walk one mile
XXX | climb down two flights of stairs, climb up or down a 3-step st

XXX I
I climb up one flight of stairs, walk 4-8 blocks
0 XXX +M climb down one flight of stairs
XXXX I
XXXX I
XXX walk 2-4 blocks
XX I walk on gravel
X walk crowded place, walk small obstacles on floor(2.5)
-1 XXXXXXX S+
XXXX | step into or out of an elevator(2.0)
X |S walk within home/living environment(2.0)
I step up or down a standard curb(2.0)
X I
XXX
-2 X + walk on carpeting(2.0), walk on grass(2.0)

TI

IT

-3 XX +




Figure 2-3. Item-person map of walking/moving construct of the ICFAM









CHAPTER 3
PRECISION OF THREE SHORT FORMS FOR BACK PAIN

Introduction

Fixed short forms have been primarily used in health assessment for the last 30 years to

achieve "psychometric efficiency" (8,28,35,44-46,55). Shortened instruments with good

psychometric properties have developed in response to growing demands for reducing test

administration time, respondent burden, and study costs. Several short forms have evolved from

generic health measures. For example, the Duke Health Profile-12 and Short Form-36 (SF-36)

developed from the Medical Outcome Scale (MOS), while the Physical Function-10 (PF-10)/PF-

12 Physical Component Summary (PCS) were generated from the SF-36. Although these

instruments were originally designed to measure either the overall health status or physical

function in general populations, they have also been used with back pain populations.

Additionally, short forms have been developed from condition-specific measures for back pain

(62), for instance, the 24-item Roland-Morris Disability Questionnaire (RMDQ) (6,7) developed

from items on the 136-item the Sickness Impact Profile and the 18-item derived its items from

multiple instruments including the Oswestry Disability Index, the RMDQ, and the PF-10.

In creating the short form of an assessment, the goal is to select the least number of items

necessary while maintaining adequate precision in measuring the latent trait (119). That is, the

major challenge in developing fixed short forms is to achieve psychometric efficiency with fewer

items without sacrificing measurement precision (8,15,36,44-46). The creation of fixed short

form has been largely driven by the comprehensiveness and breadth of prior assessment. These

assessments were particularly burdensome for respondents and test administration. However,

when the number of items are reduced substantially (as it is often the case), the partial loss of

measurement precision is inevitable (8). Several studies indicate that balance between









comprehensiveness and precision of measurement should be taken into account when developing

a short form (8,44,46,115). The loss of precision may appear regardless of which items

investigators eliminate because fewer items would leave more "gaps" in measurement across the

ranges of person ability. In general, deficits in precision often occur when items do not closely

match ability level (i.e., disability level). Thus, items should be chosen to match ability in order

to enhance measurement precision (44, 46). For example, when an easy test is administered to

individuals of high ability (i.e., low disability) or a difficult test is administered to individuals of

low ability (i.e., high disability) measurement precision is insufficient to differentiate the ability

level of the individuals. Critical questions are to what extent and by what methods can the

precision of short forms be optimized.

Traditionally, Classical Test Theory (CTT) methodologies have been used to select items

from lengthy assessments to create short forms. These methods often include the deletion of

items with low item-total correlations, least impact on the overall internal consistency of test, and

low factor loadings (60). Of these methods, Cronabach's a is one of the most commonly used

methods for selecting and eliminating items that have the least impact on internal consistency of

the test. However, copious studies indicate that the values obtained for Cronbach's a are reliant

on the particular sample used (sample-dependent) and thus, do not reflect inherent, stable

property of the test (117, 118, 120, 121). The estimated a that is a property of observed responses

of a sample cannot be generalized to different samples. In addition, several studies indicate that

coefficient a could be influenced by many factors such as 1) test length (i.e., longer tests are

more reliable than shorter ones) (60), 2) test items not well matched to the individuals (i.e., too

easy or difficult) (122), 3) missing data (60). These methods do not address the importance of









maintaining items with difficulties that reflect the range of person abilities in the population of

interest (119).

In addition to the traditional approach using Cronbach's a to make item deletion decision,

Mallinson and colleagues (2004) advocated use of the separation ratio (SR) in item reduction.

The SR indicates the impact that removing an item or items has on measurement precision.

Velozo and colleagues (2000) investigated the use of item reduction procedures based on IRT

methodologies. These researchers recommended deleting items with high/low mean square

residuals, similar item difficulty calibrations, and substantial influence on person separation. In

other studies IRT methods, items were selected based on: 1) frequency of administration in

Computer Adaptive Testing (CAT), high test information, and 3) broad item difficulty coverage.

The ICF Activity Measure (ICFAM) has recently been developed to create an efficient and

precise measurement system based on the activity dimension of World Health Organization's

(WHO) International Classification of Functioning, Disability and Health (ICF). The ICF

provided the conceptual framework and classification system for generating the items on the

ICFAM. Activities involving movement, moving around and daily life activities were the

subcategories of the ICF activity dimension consulted in the development of items. Items were

developed with the intent to represent the entire range of ability on each construct, thus, creating

an equiprecise measure. Using IRT and Computer Adaptive Testing (CAT) methods, Velozo and

colleagues (41) created the ICFAM, a web based computer adaptive survey system. The

administrative core of the instrument allows adjustments to be made to various settings making it

possible to change, the initial theta value (i.e., difficulty of question first given to respondent)

and stopping rule (i.e., guidelines for terminating the test). Because questions are targeted to

individuals at their ability level requiring 5-10 questions per construct are required to reach at a









final measure of person ability with acceptable error. In addition, immediate feedback is

provided to the respondents/clinicians in the form of a graph and summary statistics.

In the current study, we attempted to develop short forms using IRT methodologies for the

constructs on the ICFAM that were most relevant to individuals with chronic back pain. The goal

was to create three efficient short forms while maintaining adequate precision. In contrast to

several methods of shortening instrument based on CTT, the IRT approach places more focus on

item- level psychometrics than the test as a whole. In addition, IRT methods do not concentrate

on estimates of reliability (i.e., Cronbach's a) as indicators for reliable measurement since these

statistics are sample-dependent varying from sample to sample.

The purpose of the present study is twofold. First, we removed items to create three 10-

item short forms, one for each of the applicable constructs (i.e., positioning/transfer,

lifting/carrying, and walking/moving), which are psychometrically comparable to the entire set

of items in each construct. Second, we investigated the item level psychometrics and precision of

these three newly generated short forms using the Rasch rating scale model.

Method

Research Participants

The data used in this study was collected during the development phase of

ICFmeasure.com. Funding for the development of ICFAM was obtained from the National

Institute of Disability and Rehabilitation Research (NIDRR). The study was approved by the

Institutional Review Board of the University of Florida (Approved by IRB # 568-2000). Stages

in the development of ICFmeasure.com included: 1) presenting potential items to focus groups,

2) consulting with a professional panel, 3) cognitive interviewing with individuals with

disabilities, and 4) a paper-pencil field test. These stages resulted in the 264 items that make up









the ICFAM item bank. Data from the 101 individuals with back pain who completed the paper-

pencil version was analyzed in the current study (Table 3-1).

Instrumentation

The ICFAM consists of six constructs: positioning/transfers, lifting/carrying, fine hand,

walking/climbing, wheelchair/scooters, and self care activities based on the activity dimension of

the ICF. Three of these constructs are particularly relevant to individuals with back pain,

positioning/transfer (56 items), lifting/carrying (27 items), and walking/moving (20 items). We

chose these constructs based on two criteria: 1) most frequently cited problem activities for those

with back pain, and 2) relevance of activities in the construct to the population of individuals

with back pain. Our hypothesis was that the 103 items selected would represent three distinct

latent abilities as divided into subcategories of the ICFAM constructs.

In an effort to overcome limitations of the CTT-based short form construction procedure,

the Rasch rating scale model (one-parameter IRT model) was employed. An iterative approach

was used to identify items that could be eliminated based on four criteria: 1) high mean square, 2)

low mean square, 3) similar calibrations to other items, and 4) person separation value (i.e.,

item was retained if analysis with the item removed substantially decreased person separation)

(45). High or low mean square values indicate that the item may measure a different construct or

need further clarification to fit the Rasch model. Similar calibrations may indicate redundant

items. Removal of redundant items (i.e., items having similar calibrations) was considered to be

appropriate if the range of ability level (i.e., ranges between the most difficult and the easiest

item) and intervals between items were maintained on the item-person map.

In addition, after item removal the separation ratio (SR) and person reliability (analogous

to Cronbach's a) were examined. If these two values decreased minimally after item removal,

this was considered as supporting the deletion of the item. Person separation indicates whether









items are effectively separating individuals into distinct levels (i.e., discriminating people of

differing ability). The separation ratio (SR) provides an indication of the number of statistically

significant strata or categories of ability (e.g., low, medium, and high ability) that the sample is

being divided into. The formula used to calculate the separation ratio is SR = (4Gp+1)/3, where

"Gp" represents the person separation value provided by the Winsteps software output. Response

categories on the ICFAM include four choices with a lower score representing lower level of

ability: "3" (no difficulty), "2" (some difficulty), "1" (a lot of difficulty), and "0" (have not done).

This rating scale is used on all three constructs. If the activity did not occur within the last 30

days, the participant was instructed to select "have not done". In this study, this rating (i.e., "have

not done") was treated as the maximum difficulty rating. This was based on the rationale that the

most likely explanation as to why an activity was not performed during the last 30 days was due

to inability to perform the task (123).

Rasch Rating Scale Model

The Rasch rating scale model can be explained by a probability equation: In (Pnik/Pni(k-1))

= Bn Di Fk. The left side of the equation is the logarithmic function (In is the natural

logarithmic which uses e = 2.718 as the base). Pnfkis the probability that person n, encountering

item i would be observed in category k. By taking the probability of passing a rating category k

(Pnik) divided by the probability of passing one less rating category k-1 (P,(k-i)), it computes the

odds ratio of passing the rating category from the k rated to the k-1 level. The log transformation

turns ordinal level data into interval level data where the probability of passing the rating scale at

the next higher level can be a conjoint measurement of the person ability (Bn), item difficulty

(Di) and the step measure between the rating categories Fk. The unit of measurement that results

when the Rasch model is used to transform raw scores into log odds ratios on a common interval

scale is the "logit" (95).









Data Analysis

Using Winsteps software program (103, 104), the Rasch rating scale model was employed

to determine model fit, as well as, item level psychometrics of the ICFAM was conducted to

determine the model fit as well as the item level psychometrics of the ICFAM. The Rasch model

(i.e., one-parameter IRT model) is the most robust of the IRT models. That is, stable and

accurate item parameters (e.g., fit statistics) can be obtained with a relatively small sample size

(105). The Winsteps program produces goodness of fit statistics for each item and person. These

fit statistics are used to identify items that did not fit the unidimensional Rasch model. Infit and

outfit mean square (MnSq) values greater than 1.4 and smaller than 0.6 indicate misfit indicate

that the item was responded to erratically relative to other items (i.e., the item "misfits")

(95, 106). This type of inconsistent pattern of responses may indicate that the item is measuring a

different construct or that the item was poorly understood and needs clarification. Infit is inlier-

sensitive or information-weighted fit. This type of fit is more sensitive to the pattern of responses

to items at a person's ability level (i.e., those items which an individual has 50% chance of

passing). Outfit is outlier sensitive fit. In contrast to infit, outfit is more sensitive to the pattern of

responses to items with difficulty far from a person (107).

Rasch analysis also provides point measure correlation coefficients as an immediate

evaluation of response-level scoring. If the item-level scoring accords with the latent variable,

these correlations will be positive. A negative correlations coefficient might indicate a reverse

scored item. The point measure correlations are acceptable if they are > 0.3 (108). Rasch analysis

also produces estimates of person ability and item difficulty. These estimates are on a log odd

unit (i.e., logit) scale. The average item difficulty is arbitrarily set at "0" logits with positive

logits indicating higher than average probabilities and negative logits indicating lower than

average probabilities (95).









Rasch analysis also provides person separation, which is an index of the sample standard

deviation in terms of standard error units and person reliability (analogous to Cronbach's a),

which is the proportion of observed sample variance that is not attributable to measurement error

(104). The separation ratio (SR) values, which allows determining whether items are effective in

separating individuals into statistically distinct ability levels. The SR provides an indication of

the number of statistically significant strata into meaningful categories (e.g., low, medium, and

high ability back pain groups). The formula used to calculate is SR = (4Gp+1)/3, where "Gp"

represents person separation (124).

Prior to conducting the Rasch analysis to obtain the item level psychometrics, confirmatory

factor analysis (CFA) was used to test the unidimensionality of the three short forms. MplusTM

(Muthen & Muthen, Los Angeles, CA, version 4.21) was used to determine the goodness of fit of

items to one-factor model of each short form. The following criteria were used to determine the

goodness of fit to the one factor models; 1) p-value of chi square > 0.05, 2) Comparative Fit

Index (CFI) and Tucker-Lewis Index (TLI) close to 1.0, 3) root mean square error of

approximations (RMSEA) < 0.06, and 4) weighted root mean square residual (WRMR) < 0.01

(98,99).

Because a one-factor model was not sufficient, exploratory factor analysis (EFA) was used

to further investigate the potential factor structure. MplusTM (Muthen & Muthen, Los Angeles,

CA, version 4.21) was used to conduct EFA. We applied the unweighted least squares method

for estimators, varimax rotation following the initial factor extraction, and replaced missing

values with mean values. Criteria to determine the number of factors to retain was 1) Kiser's

eigenvalues greater than 1, 2) factors accounting for greater than 5% of total variance, and 3)









scree test where the slope changes substantially in the factor versus eigenvalue graph (101). A

criterion of greater than 0.46 was used to indicate a significant loading on a factor (102).

Test information function reports the "statistical information" in the data corresponding to

the complete test. In general, the precision with which a parameter is estimated is measured by

the variability of the estimates around the value of the parameter. Thus, the variance, symbolized

52 provides a measure of precision of the estimators. The amount of information, denoted by I, is

the reciprocal of variance.

Statistically, when the standard deviation of person ability estimates about the examinee's

ability is squared, the term represents the variance and is a measure of the precision with which a

given ability level can be estimated. From the above explanation, the amount of information at a

given level is the reciprocal of this variance. If the amount of information is large, it means that

the person ability may be estimated with high precision at a given ability level and the estimates

will be close to the true value of ability. If the amount of information is small, it means that the

person ability may be estimated with low precision and the estimates will be widely scattered

around the true value of ability. In order to determine how precisely the items on each of the

short forms estimate person ability across the full range of the construct, the test information

function was examined.

Results

Sample demographic and clinical information is presented in Table 3-1. The average age

of the sample was 48 years and nearly 80% of participants reported having back pain more than a

year, indicating it was a chronic condition. A series of Rasch analysis were conducted to develop

three short forms consisting of ten items from the three constructs of the ICFAM (Table 3-6).

Figure 3-1, 3-2, and 3-3 present item-person map of each construct of the ICFAM following and

prior to the item reduction. Each "m" in the right side of map represents the locations of deleted









items from original full set of items, while each "X" on the left side of map represents one study

participant.

Short Form for Positioning/Transfer

The items "lying down on stomach 2-4 hours" and "moving into bathtub to take shower"

items were removed after the first Rasch analysis due to high infit/outfit statistics (1.44/1.41 and

1.49/1.52, respectively) (Table2-8). After a few iterations of Rasch analysis attempting to

maintain adequate person separation, ten item were selected from an entire set of 56 items. The

ten newly developed items for positioning/transfer construct fit the Rasch model. All items

showed exceptional infit/outfit except one item (moving out from bathtub after taking a bath)

which had slightly high infit/outfit (1.46/1.41) (Table 3-7). These items also retained moderate

point measure correlations with values ranging from 0.52 0.67 (prior to item reduction range

was from 0.32 0.73). Ten items had a slightly better spread (-2.33 3.30 logits) in relation to

person ability than the entire set of 56 items (-1.96 2.87 logits) (Figure 3-1). Item calibrations

for the ten remaining items were similar to the calibrations before the deletion of 46 items.

However, person separation decreased considerably from 4.52 to 1.86 (SR decreased from 6.36

to 2.81). That is, the newly developed short form for the positioning/transfer construct separated

individuals with back pain into nearly three groups, while the entire set of items separated

individuals into six groups. Person reliability (analogous to Cronbach's alpha) of the short form

was acceptable (0.78 compared to 0.95 with all items).

A confirmatory factor analysis (CFA) was conducted on the positioning/transfer 10-item

short form (Table 3-2). An exploratory factor analysis (EFA) was conducted to further

investigate the factor structure of the construct. This analysis suggested that a three factor

solution was more appropriate (Table 3-3). We retained three factors based on the Kiser's

criterion of eigenvalue greater than 1. These factors accounted for 60% of total variance (the first









factor accounting for 36%, the second factor accounting for 13 %, and the third factor accounting

for 11%). Table 3-3 presents the factor loadings of the ten short form items (factor loadings

greater than 0.46 are in bold). Four items loaded onto factor 1, four items onto factor 2, and two

items onto factor 3 (2 of 10 items). One item (staying in a lying position on back for 1 hour) did

not load onto any factor and one item (bending at the waist while standing for 1-5 minutes)

loaded onto more than one factor factoriall complex). Items appeared to load onto factors based

on item difficulty (i.e., easy, moderate, and difficult items).

The empirical hierarchy of item difficulty was scrutinized with estimated item difficulty

calibrations, which are expressed in logits with higher positive values indicating a more

challenging task. Item difficulty calibrations of ten newly developed items for

positioning/transfer construct followed a logical progression in terms of motor control theory.

The most challenging item was "kneeling 10-20 minutes" (1.840.16 logits), while moderately

challenging item was "standing 1-2 hours" (0.790.13 logits) and the least challenging item was

"changing position standing to sitting in chair" (-1.880.18 logits) (Table 3-7). With respect to

motor control theory, kneeling tasks would appear to be more difficult than static standing, since

as kneeling is an unnatural position one could think that balance control would be poorer than

that found during static standing (125). Similarly, with no doubt, one could easily think that

static standing 1-2 hours would be more difficult than changing position standing to sitting in

chair. However, this logical progression of item difficulty did not reflect our hypothetical

hierarchy of item difficulty based on MET values.

Figure 3-5 presents test information function of the positioning/transfer construct with the

entire set of 56 items versus thelO-item short form. Following item reduction, the test

information function of short form moved to the right in comparison to that of its entire set of









items. The figure shows that the amount of information in the entire item set reached to its

maximum (37.77) at a person ability of"0" logits, then decreased rapidly as either the ability

estimate increased or decreased. With the ten-item short form, the amount of information

reached to maximum (6.27) at a person ability of -0.15 logits. That is, the entire set of items

provided the most precise measure of person ability near the center of the ability range, while

short form provided the most precise measure of person ability at a slightly lower level than the

center of the ability range. The figure also shows the extent to which item reduction contributes

to lost test information (i.e., precision) at a particular ability estimate. Precision peaked at -0.49

and -0.42 logits and decreased rapidly as either the ability estimate increases or decreases.

Short Form for Lifting/Carrying

On the first Rasch analysis run (with 27 items) "carrying toddler on shoulder", "carrying

infant in arms", and "carrying 10 pounds down one flight stairs" items had high infit/outfit

statistics (1.45/1.68, 1.95/1.89, and 1.43/1.57, respectively) and were thus removed (Table 2-9).

After several iterations of Rasch analysis, attempting to maintain adequate person separation, ten

item were selected from an entire set of 27 items. The ten items retained to create the

lifting/carrying short form all conformed to the Rasch model except one item. The item was

"carrying toddler on back" (infit/outfit =1.90/2.17) (Table 3-8). The item of the short form

exhibited moderate to high point measure correlations ranging from 0.42 to 0.83, compared to

the range of the entire set of items, which was 0.40 to 0.79. The ten items of the short form had a

slightly better spread of person ability (-3.10 to 4.80 logits) than the entire set of items (-2.72 to

4.30 logits). Item calibrations of the ten-item short form remained relatively stable after the 17

items were deleted. However, person separation decreased from 3.67 to 2.49 (SR decreased from

5.23 to 3.65). That is, the 10-item short form for the lifting/carrying construct separated

individuals with chronic back pain into nearly 4 groups, while the entire set of items separated









the individuals into six groups. Person reliability (analogous to Cronbach's a) for the short form

was 0.86, compared to 0.93 for all the items.

A confirmatory factor analysis was conducted on the 10-item short form to test for

unidimensionality. The one factor model proved to be inadequate (Table 3-2). An exploratory

factor analysis to further investigate the factor structure suggested that a two factor solution was

more appropriate (Table 3-4). We retained two factors based on the Kiser criterion of eigenvalue

greater than one. These two factors accounted for 64% of total variance (the first factor

accounting for 48% and the second factor accounting for 16%). Table 3-4 presents factor

loadings of 10 short form items (factor loadings greater than 0.46 are in bold). Five items loaded

onto factor 1 and six items loaded onto actor 2, while one item (lifting 10 pounds from waist

height to above your head with your hand) loaded on more than one factor. Items tended to load

onto factors based on item difficulty with activities involving lifting heavy objects loading onto

one factor and those involving lifting light objects onto the other factor.

Figure 3-6 presents the test information function for the lifting/carrying construct with the

entire set of 27 items versus the 10-item short form. Following the item reduction, the test

information function is slightly moved to the right in comparisons to the entire set. With the

entire set of items, the amount of information peaked (12.09) at a person ability of near "0"

logits, and decreased rapidly as either the ability estimate increased or decreased. That is, the

entire set of items provided the most precise measure of person ability near the center of the

ability range, while the short form provided the most precise measure of person ability at a

slightly higher level than the center of the ability range. The figure also shows to what extent

item reduction for short form lost test information (i.e., precision) at a particular person ability.

The removal of 17 items from entire set resulted in considerable loss of measurement precision,









which decreased from 12.09 to 4.85 in information. In addition, this loss of precision peaked

near "0" logits, decreased slightly as the ability estimate increases, and decreased rapidly as

either the ability estimate increases or decreases. Thus, the entire set of 27 items on the for the

lifting/carrying construct estimated person ability with greater precision than did the 10-item

short form, especially near the center of the ability range.

Short Form for Walking/Moving

On the first Rasch analysis run (with all 20 items), the items "jogging one mile", "running

one block" and "stepping into or out of elevator" were removed due to high infit/outfit statistics

(1.54/2.34, 1.80/2.16, and 1.62 for infit, respectively) (Table 2-10). However, after several

iterations of Rasch analysis, the decision was made to reinstate the item "running one block" to

the ten candidate item list for short form since there was no item available near the high extreme

of the ability continuum.

The ten newly developed items for walking/moving construct fit the Rasch model. All

items showed exceptional infit/outfit statistics except one item (running one block) with high

infit/outfit values (2.20/3.97) (Table 3-9). These items exhibited moderate to high point measure

correlations ranging from 0.46 to 0.78 with the range extending from 0.44 to 0.79 prior to item

reduction. The ten items of the short form had slightly less spread in person ability (-2.88 to 4.51

logits) than the entire set of items (-2.59 to 4.86 logits). Item calibrations for the ten items

remained similar after the deletion of ten items. However, person separation was decreased from

3.44 to 2.42 (SR decreased from 4.92 to 3.56). That is, the short form for the walking/moving

construct separated individuals with chronic back pain into three groups, while the entire set of

items separated the individuals into nearly five groups. Person reliability (analogous to

Cronbach's a) for the short forms was .85, decreasing from 0.92 for the entire set of 20 items.









A confirmatory factor analysis was conducted on 10-item short form to test for

unidimensionality. The one factor model was found to be inadequate (Table 3-2). An exploratory

factor analysis to further investigate factor structure suggested that a two factor solution was

more appropriate. We retained two factors based on the Kiser criterion of eigenvalue greater than

one. These factors accounted for 67% of total variance (the first factor accounting for 53% and

the second factor accounting for 14%). Table 3-5 presents factor loadings for the ten short form

items (factor loadings greater than 0.46 are in bold). Items loaded onto factors that contained

items related to difficulty and also type of activity (e.g., walking/stepping and running/climbing).

Figure 3-7 presents the test information function for the Walking/moving construct with

entire set of 20 items versus the 10-item short form. Following item reduction, the peak of the

test information function slightly moved to the right. The figure shows that the amount of

information with entire set peaked (12.05) at a person ability ranging from -0.49 to -0.42 logits

and decreased rapidly as either the ability estimate increased or decreased. With the 10-item

short form, the amount of information peaked (5.75) at a person ability near "0" logits. That is,

the entire set of items provided the most precise measure of person ability at a slightly lower

level than the center of the ability range, while short form provided the most precise measure of

person ability near the center of the ability range.

In addition, the figure shows to what extent item reduction results in a loss of test

information (i.e., precision) at a particular person ability. The removal often items from the

entire set resulted in some loss of measurement precision, which decreased from 12.05 to 5.75.

In addition, this loss of precision peaked at -0.49 and -0.42 logits and decreased rapidly as either

the ability estimate increased or decreased. Thus, the entire set of 20 items for walking/moving

construct estimated person ability with greater precision than the 10-item short form near the









center of the ability range. Similarly, 10-item short form estimated person ability with less

precision than did the entire set of 20 items near the center of the ability range.

Discussion

Summary of Results

The purpose of this study was to create threelO-item short forms for three constructs on the

ICFAM and to investigate the item-level psychometric properties of the short forms, as well as,

unidimensionality and test information function of the three constructs. To create short forms of

the ICFAM, an item level psychometric investigation was conducted focusing on infit/oufit mean

square (MnSq), person separation, item-person map, and hierarchical order of item difficulty.

While item level psychometric findings support the soundness of the short forms and advocate

their future use, factor analyses failed to support the proposed unidimensional constructs or the

original 3-factor structure of the ICFAM. Test information functions showed that the entire set of

items on the ICFAM constructs estimated person ability with greater precision than did the short

forms near the center of the ability range, while the precisions of both the entire set of items, and

short form items rapidly decreased as the ability estimate increased or decreased.

Item Level Psychometrics

This study demonstrated how IRT methodologies could be used to achieve measurement

efficiency, reducing items while maintaining adequate precision. Attempts to creating short

forms for use with individuals with back pain have previously focused on CTT methodologies

such as internal consistency and test-retest reliability (6, 10, 23, 26, 27, 67, 68, 70-72, 79). In this

study, we used an IRT approach using Rasch analysis to provide item level information about

three constructs on the ICFAM. The newly developed short forms showed adequate

psychometric properties as determined by the infit/outfit statistics, item difficulty calibrations,

item-person map, and person separation. All items of each short form fit to the Rasch model









except one item (i.e., carrying toddler on back) for lifting/carrying and one item (i.e., running

one block) for walking/moving construct. Since these items were the most challenging items in

those constructs, these two items were included in order to fill the potential gaps on the high

extreme of person ability.

Problematic items were identified with high/low fit statistics, which indicated that the

items were measuring a different construct or the item needed further clarification. That is,

individuals with low disability (i.e., high ability) may have tendency to provide low ratings or

individuals with high disability (i.e., low ability) may have tendency to provide unexpected high

ratings on these items. These response patterns might have been the result of a lack of

observations for these items. Rasch analysis also aided item selection by identifying items that

best capture the range of persons to be estimated and identified gaps where item difficulty

calibrations did not match person-ability measures. These gaps provide direction in selecting

items along with item statistics. Thus, in determining whether or not items are equally distributed

across the full ranges of ability, items are selected based on the person location on the map (i.e.,

in order to assure that items match person abilities). That is, we placed items at or near the

middle of the scale where average individuals aggregate even though candidate items distributed

toward both extremes. For example, in the initial modification phase, four items (lying down on

stomach 2-4 hours, carrying toddler on back, jogging one mile, and running one block) from the

three constructs were identified due to high fit statistics. By inspecting the item-person map

(Figure 3-2 and 3-3) revealed that these items were needed to reduce possible ceiling effects as

no other items remained on the short forms that were as difficult as these items. Of these four

items, two items (carrying toddler on back and running one block) were later reinstated to the

short form because of a lack of difficult items to match individuals at the extremes of the scale.









It should be noted that we treated a response category 'have not done' as the lowest rating

based on the rationale that the most likely explanation for an activity not occurring was that the

item could not be performed (123). Thus, we determined that treating the category 'have not

done' as the lowest rating would have been more appropriate. In fact, nearly half (51%) of

individuals with above average person ability (i.e., high ability) scored the lowest rating on the

item "carrying toddler on back", while more than half (60%) of individuals with above average

person ability scored the lowest rating on the item "running one block". One plausible

explanation for this observation is that these respondents might have responded to the absence of

opportunity on these items (i.e., you can do the activity but have not done so for any reason in

the last 30 days). In addition, other respondents might have responded to other instructions

indicating the lowest score (i.e., if you are unable to do the activity or requires the help or

assistance of another person).

Unidimensionality of the Short Forms

The dimensionality of the three 10-item short forms was investigated by confirmatory

factor analysis (CFA) and exploratory factor analysis (EFA). The results of the CFA and the

EFA were conclusive as to whether or not one factor model for each short form was plausible.

Although a one factor accounted for a small percentage of the variance for the

positioning/transfer short form (> 36%), one factor for the other two short forms accounted for a

moderate percentage of the variance (> 48% for lifting/carrying and > 53% for walking/moving).

Based on the Kaiser rule (eigenvalues greater than one considered to be factors), we retained the

three factors accounting for most of the variance (> 60%) for the positioning/transfer short form,

two factors for the lifting/carrying short form 64%) and two factors for the walking/moving

short form (> 67%). These findings may implicate that the theoretically generated construct of

the ICFAM instrument may have more than one dimension. For the positioning/transfer short









form, items failed to show any logical relationship of factors, while an interesting finding noted

in the EFA of the lifting/carrying and the walking/moving. That is, items appear to group items

by the hierarchical order of item difficulty. For the lifting/carrying short form, items with high

item calibrations (i.e., difficult items) had a tendency to load on factor 1, while items with

moderate/low item calibrations (i.e., moderate/easy items) had a tendency to load on factor 2

(Table 3-4). Similarly, for the walking/moving short form, items with moderate/low item

calibrations had a tendency to load on factor 1, while items with moderate/low item calibrations

had a tendency to load on factor 2 (Table 3-5). The hierarchical factor structure of the

walking/moving short form replicates similar findings of a factor analysis study of the motor

scale in the Functional Independence Measure (FIM) (126, 127). That is, the study grouped

motor scale items by relative energy requirement including 'low energy' subscale such as

grooming or dressing and 'high energy' subscale such as locomotion or stair climbing. The

findings of our study may indicate that dividing each construct into more than one subscale

would be preferred.

Person Separation and Person Reliability

The separation ratio (SR) for the short forms was good separating the samples nearly 3 to 4

statistically meaningful strata. Relative to the full item banks, the SR value of all three short

forms considerably decreased, while the most dramatic decrease was for positioning/transfer

short form. These were unavoidable because such large number of item was removed from the

entire set of items. Nearly 82% of items for the positioning/transfer, 63% of items for the

lifting/carrying, and 50% of items for the walking/moving were removed. In addition, person

reliability (analogous to Cronbach's a) decreased considerably for the positioning/transfer short

form, while it slightly decreased for the lifting/carrying and the walking/moving short form.

Perhaps the reason for this is that the removal of redundant items on the lifting/carrying and









walking/moving allowed item removal without loss of internal consistency. Despite the reduction

of person reliability, the values were still in acceptable ranges (128).

Constructing fixed short forms is a conventional approach to achieving measurement

efficiency with fewer items. This reduces the respondent and test administrator burden. Despite

the loss of precision, fixed short form has been shown to be valid and practical for use in

outcome measurement (34, 44-46). It is inevitable to sacrifice some precision in the creation of a

short form. In terms of measurement precision and breadth, several studies have indicated that

there is a tradeoff or a compromise between measurement precision and breadth in short form

creation (35, 44-46, 129). In this study, by using Rasch analysis (one-parameter IRT model), we

were successful in developing short forms that provided an optimal range (i.e., measurement

breadth) despite loss of precision. That is, we reduced many items to create three 10-item short

forms, yet captured person ability across the full range of the sample.

As a measure of precision, the SR is a valuable indicator of whether reducing the number

of items substantially lowers or maintains the precision with which a short form is measuring the

ability of sample (119). Separation is defined as the ratio of standard deviation of the sample to

standard error of measurement (i.e., the root mean square error), while the Cronbach's a is the

estimated average correlation of a test with all possible tests of the same length obtained by

domain sampling. Despite its similarity of the ratio represented by Cronbach's a, there is a slight

difference. For the separation, the numerator reflects a property of the sample only and the

denominator reflects a property of the test only. Thus, the ratio describes the relationship

between the amounts of variability captured in the sample to the precision. While the SR depends

on the particular sample being measured, the relationship between the sample and the test is

apparent. In our study, the back pain sample is nearly three to four times more variable than our









short form's ability to detect the sample's variability. This indicates that when measuring

individuals with little variation on the trait of interest, the test will need little error to

discriminate the differences that exist among these individuals (119).

Test Information Function

The statistical meaning of information is defined as the reciprocal of the precision with

which a parameter could be estimated (130). Thus, when we estimate person ability with

precision, we would know more about the values of the person ability than if we estimated it

with less precision. The precision with which person ability is estimated is measured by the

variability of the estimates around the value of person ability. Therefore, a measure of precision

is the variance of the estimators (i.e., 02) and the amount of information at a given ability level is

the reciprocal of this variance. That is, if the amount of information is large, person ability at a

particular level can be estimated with precision. Similarly, if the amount of information is small,

person ability at a particular level cannot be estimated with precision.

In this study, the test information function (TIF) showed a considerable loss of information

as the number of items was reduced. As items were eliminated to create the short forms,

information decreased in the following manner: information decreased about 83% for the

positioning/transfers construct, about 60% for the lifting/carrying construct, and about 52% for

the walking/moving construct. These decreases in information reveal that the

positioning/transfers construct sacrificed more information (83%) than the lifting/carrying and

walking/moving constructs (60% and 52%). This makes intuitive sense since many more items

were removed from the original positioning/transfers item bank than from the lifting/carrying

and walking/moving item banks (46 of 56 items). The peak of the TIF for the positioning/transfer

short form slightly moved to the left side of the center, while the peak of the TIF for both the

lifting/carrying and walking/moving short forms slightly moved to the right side of the center.









This may suggest that we should have selected items with lower item calibrations (i.e., easier

items), when we deleted items with similar item calibrations in creating the positioning/transfers

short form. In fact, the total number of individuals in the ceiling increased from six to eight for

positioning/transfer constructs following item reduction. By contrast, for the other two constructs

we should have selected items with higher item calibrations (i.e., more difficult items). However,

the total number of individuals in the ceiling did not differ before and after item reduction for

these two constructs.

Limitations and Future Implications

There were several limitations in this study. Problematic items with high infit/outfit

statistics (i.e., "carrying toddler on back" on the lifting/carrying and "running one block" on the

walking/moving) were reinstated in short forms to avoid ceiling effects. This may be a limitation

of our short forms despite their adequate breadth. The item level psychometrics indicate that the

newly created short forms could be improved in future research addressing: 1) replacing

problematic items, 2) developing items that more adequately fill the gaps in the person ability to

cover the wider range of ability. In addition, the results of the present study suggest that the short

forms were multidimensional. These findings may prompt the use of multidimensional models

with adequate sample sizes to better explain physical activity domains.

In order to achieve psychometric efficiency, this study showed how Rasch analysis could

be used to reduce the number of items in an instrument while maintaining adequate psychometric

properties. The item level psychometrics (e.g., fit statistics, item difficulty calibrations) as well

as other qualifiers (e.g., Cronbach's a, person separation) were used to reduce items. Despite the

use of an item response theory methodology, it is apparent that relative to the entire item banks,

the short forms showed decrements in measurement precision (28, 35). One way to avoid this

decrement in measurement precision would be to combine the IRT and computer adaptive testing









methodology. By selectively presenting items that are matched to the ability levels of

respondents, these methodologies may accomplish both measurement efficiency and precision

(28, 34, 47).









Table 3-1. Demographic information of research participants
Characteristics Individuals with back pain n=101
Age
<20 5 (5.0)
21-30 12 (11.9)
31-40 15 (14.9)
41- 50 24 (23.8)
51 -65 19 (18.8)
> 65 20 (19.8)
Missing 6 (5.9)
Mean + SD 48.14 + 17.3
Gender
Female 65 (64.4)
Male 31 (30.7)
Missing 5 (5.0)
Education
Elementary 0 (0.0)
Middle/Junior High 3 (3.0)
High School 34 (33.7)
Technical 8 (7.9)
College 33 (32.7)
Graduate 23 (22.8)
Race/Ethnic
African American 19 (18.8)
Hispanic American 7 (6.9)
American Indian 1 (1.0)
White, not Hispanic 6 (.
oii68 (67.3)
origin
Asian/Pacific Islander 1 (1.0)
Other 3 (3.0)
Missing 2 (2.0)
Years that has had back pain
Less than a year 7 (6.9)
1 through < 4 years 20 (19.8)
More than 4 years 59 (58.4)
Missing 15 (14.9)









Table 3-2. Results of confirmatory factor analysis for short forms of the ICFAM
Indices Positioning/transfer Lifting/carrying Walking/moving
Criterion 1-Factor model 1-Factor model 1-Factor model
Chi-square 1511.670 1380.940 1380.940
df 31 39 39
P-Value (> 0.05) 0.000 0.000 0.000
CFI (1.0) 0.000 0.016 0.026
TLI (1.0) 0.003 0.016 0.026
RMSEA (< 0.06) 0.689 0.579 0.576
WRMR (< 0.1) 6.594 5.728 5.700









Table 3-3. Factor structure of short form for positioning/transfer construct
Items (difficulty order) Fl F2 F3
staying in a kneeling position on both knees for 10-20 minutes (while making only minor
adjustments)? 0.090 0.677 0.342

staying in a lying position on your favorite side for 5-8 hours (while making only minor
adjustments)? 0.175 0.018 0.858

staying in a standing position for 1-2 hours (while making only minor adjustments)? 0.147 0.378 0.693
moving yourself out of a bathtub after taking a bath? 0.211 0.560 0.134
changing position from standing to kneeling? 0.144 0.761 0.123
bending at the waist while standing for 1-5 minutes (for example, reaching for something in the trunk
of a car)? 0.488 0.563 0.030

staying in a lying position on your back for 1 hour (while making only minor adjustments)?
0.319 0.442 0.177
changing position from lying on your back to sitting (for example, lying in your bed to sitting on the
edge of your bed)? 0.811 0.056 0.289

shifting your weight while lying in bed? 0.769 0.296 0.009
changing position from standing to sitting in a chair? 0.805 0.120 0.150
Percent of total variance accounted for by factors 36% 13% 11%









Table 3-4. Factor structure of short form for lifting/carrying construct
Items (difficulty order)
carrying a toddler on your back (for example, piggyback)?
lifting 25 pounds (for example, large bag of dog food or cat litter) from shoulder height to above your head
with your hands) and arm(s)?
lifting 25 pounds (for example, large bag of dog food or cat litter) from floor to waist height with your
hands) and arm(s)?
carrying 25 pounds (for example, a large bag of dog food or cat litter) in your hands) and arm(s) 25 feet?
lifting 10 pounds (for example, bag of groceries or 12-pack of soft drinks) from waist height to shoulder
height with your hands) and arm(s)?
lifting 5 pounds (for example, bag of sugar or large telephone book) from shoulder height to above your
head with your handss?
pulling wet laundry out of a washing machine?
pulling open a heavy door (for example, department/convenience store door)?
lifting 1 pound (for example, a can of soup) from waist height to shoulder height with your handss?
pulling open a full-size refrigerator door?
Percent of total variance accounted for by factors


Fl F2
0.636 -0.151

0.859 0.223


0.806
0.814


0.289
0.320


0.696 0.488


0.429
0.323
0.225
0.128
-0.092
48%


0.631
0.647
0.727
0.747
0.790
16%









Table 3-5. Factor structure of short form for walking/moving construct
Items (difficulty order) Fl F2
running one block? -0.102 0.806
climbing up or down a 6-foot ladder? 0.288 0.793
climbing up or down a 3-step stool? 0.446 0.719
walking 4-8 blocks (about 1/2 mile) without stopping? 0.736 0.429
climbing down one flight of stairs? 0.645 0.399
walking 2-4 blocks (about 1/4 mile) without stopping? 0.827 0.296
walking in a crowded place (for example, outdoor marketplace, shopping mall)? 0.797 0.281
stepping up or down a standard curb? 0.723 0.250
walking within your home/living environment? 0.866 0.000
walking on carpeting? 0.714 -0.068
Percent of total variance accounted for by factors 53% 14%
























3 + 3 +

XX x

xx
TI
IT
2 XX + 2 T+
kneeling 10-20 minutes X
XX XX IT
XXX XXXX kneeling 10-20 minutes
x 1
XXXXX S lying back 5-8 hours XX I 0
X S XXX lying back 5-8 hours, U
1 X X + 1 XX S+
XXX I x S *, ,,
XX standing 1-2 hours XXXXXXX U, U, U, stand 1-2 hrs
xx E, E, E, ,
XXXXXXXX moving out bathtub taking a bath XXXXXXX U, U, move out bathtub bath, U
XXXXXX | XXXXX | U, U, U, U
XXXXXXXX M change position stand to kneel XXXXXXM change position stand to kneel
0 XXXXXXXXX +M 0 XXXXXX +M ,
bending waist 1-5 minutes XXXXXXX bending waist 1-5 minutes

XXXXXXX lying back 1 hour XXXXXXXXX lying back 1 hour, U, U
XXXX xxxxx U
XXXX change position lying back to sit XXXX S change position lying back to sit
S XX IS U, U, U, U, U
-1 XXXXXXX + A-1 + U, U
XXXXX XX I shift lying in bed
X I shift lying in bed X I 0
x I m, m, U
XX I change position standing to sit chair
I X TIT
change position stand to sit chair X
-2 X T+ -2 X +
T I
XX




It
-3 + -3 +
l l


Figure 3-1. Item-person map of positioning/transfer construct of the ICFAM following 10 items removal and prior to 10 item removal.



















x l I
x I
5 + 5 +



I T X |
4 T+ 4 +
xx I
I carrying toddler on back
XXX | xx

3 + 3 T+T
XXXX | X I carrying toddler on back, 0
XXXX | I

XXXX SIS lifting 25 pounds shoulder to above head XX
2 + 2 X+
XXXX I XX lifting 25 pounds shoulder to above head
XXXXX | lifting 25 pounds floor to waist XXXXXXXX SS
XXX XXXXX
I carrying 25 pounds 25 feet XXXXXX lifting 25 pounds floor to waist
1 XXXXX + 1 XXX + 0, carrying 25 pounds 25 feet
XXXXXX XXXX
XXXX MI XXXXXXXX I
I lifting 10 pounds waist to shoulder XXXXX MI lifting 10 pounds waist to shoulder
XXXX | XXXX |
0 XXXXXXXX +M O XXXXXXXXXXXXX +M
XXXX | XXX I lifting 5 pounds shoulder above head
XXXXXX lifting 5 pounds shoulder above head XXXX
I XXX 0
XXXXXXXXX I XXXXXXX S pulling wet laundry out washing machine
-1 XXXXXXX + pulling wet laundry out washing machine -1 XXXXXX+ 0, pulling open a heavy door
XX S XXX I
I pulling open a heavy door X | 0, 0
XXXXX I Is
xx I I lifting 1 pound waist to shoulder, 0
-2 + -2 XX +
X |IS T| 0
I lifting 1 pound waist to shoulder X I
I x 1

-3 X T+ -3 +T pulling open refrigerator door

XX I I
xx I I
I I

I pulling open refrigerator door 1
-4 + -4 +
|


Figure 3-2. Item-person map of lifting/carrying construct of the ICFAM following 10 items removal and prior to 10 item removal.
















5 + 5 XX +






4 + 4 +
I X I
X T XX
I T I

IT running one block
3 XXXXXXXX + 3 XX +
I XX IT
XXXX i
X XX running one block
XX | XX |
S XXXX I
2 XXXXXXX + 2 XX S+
I X I
XXXX I climbing up or down a 6-foot ladder XXX
IS XXXX |
I XX IS climbing up or down a 6-foot ladder
XXXX I XXXXX I
1 1 X +
XXXX I climbing up or down a 3-step stool XXXX |
XXXXX XXX climbing up or down a 3-step stool,
XXXXXXX XXXXXX M|
XXXXXX | walking 4-8 blocks XXX
I XXXXXXXXXXXX walking 4-8 blocks
O XXXXXX +M climbing down one flight of stairs O XXX +M climbing down one flight of stairs
XX XX
XXXXXXX | walking 2-4 blocks XXX walking 2-4 blocks
XXX XXXXXX
XXX | walking crowded place walking crowded place,
XXX X
-1 XXXX S+ -1 S+ 0
I stepping up or down a standard curb XXXX stepping up or down a standard curb
XXX X S
XX |S 1 0, walking within home/living environment
XXX | walking within home/living environment X

-2 XX + -2 X +

I I walking on carpeting
X TI walking on carpeting TI
x I XX
X I IT
-3 + -3 +
l l


Figure 3-3. Item-person map of walking/moving construct of the ICFAM following 10 items removal and prior to 10 item removal.









Table 3-6. Short form of the ICFAM
ICF ACTIVITY MEASURE SHORT FORM
This survey consists of 3 sections of 10 questions you might already been asked to answer or will be asked to answer in our
computer adaptive testing. Each question will ask you how difficult it has been for you to perform a given activity within the last 30
days. Please choose the answer that best fits your situation. If you have not performed the activity in question, then check the 'Have
Not Done' answer. Thank you very much for participating in this study.

MAINTAINING A BODY POSITION
In the last 30 days, how much difficulty have you had staying in the No Some A Lot of Have Not
following positions (while making only minor adjustments): Difficulty Difficulty Difficulty Done
staying in a lying position on your back for 1 hour? I O D D
staying in a lying position on your back for 5-8 hours? I O D D
standing position for 1 2 hours? I O D D
kneeling on both knees for 10 20 minutes? I O D D
bending at the waist while standing for 1-5 minutes (for example, O O D D
reaching for something in the trunk of a car)?
shifting your weight while lying in your bed? I O D D
changing position from lying on your back to sitting (for example, O O D D
lying in your bed to sitting on the edge of your bed)?
changing position from standing to sitting in a chair? I O D D
changing position from standing to kneeling? I O D D
moving yourself out of a bathtub after taking a bath? I O D D
LIFTING AND CARRYING OBJECTS
In the last 30 days, how much difficulty have you had: No Some A Lot of Have Not
Difficulty Difficulty Difficulty Done
lifting 1 pound (for example, a can of soup) from waist height to I O D D
shoulder height?
lifting 5 pounds (for example, bag of sugar or large telephone D O D D
book) from shoulder height to above your head?
lifting 10 pounds (for example, bag of groceries or 12-pack of I O D D
soft drinks) from waist height to shoulder height?









Table 3-6. Continued
In the last 30 days, how much difficulty have you had: No Some A Lot of Have Not
Difficulty Difficulty Difficulty Done
lifting 25 pounds (for example, large bag of dog food or cat 1 0
litter) from floor to waist height?
lifting 25 pounds (for example, large bag of dog food or cat I D 0 0
litter) from shoulder height to above your head?
carrying 25 pounds (for example, large bag of dog food or cat I D 0 0
litter) 25 feet (for example, from car to front door)?
carrying a toddler on your back (for example, piggyback)? O D D
pulling open a full-size refrigerator door? O D D
pulling open a heavy door (for example, department/convenience O D D
store door)?
pulling wet laundry out of a washing machine? I O D D

WALKING AND MOVING
In the last 30 days, how much difficulty have you had: No Some A Lot of Have Not
Difficulty Difficulty Difficulty Done
walking within your home/living environment? 1 0 0
walking 2-4 blocks (about 1/4 mile) without stopping? 1 0
walking 4-8 blocks (about 1/2 mile) without stopping? 1 0
walking on carpeting? 1 0 0
walking in a crowded place (for example, outdoor marketplace, O 0 0 0
shopping mall)?
climbing down one flight of stairs? 1 0 0
climbing up or down a 3-step stool? 1 0
climbing up or down a 6-foot ladder? 1 0
stepping up or down a standard curb? 1 0 0
running one block? 1 0 0










Table 3-7. Fit statistics for positioning/carrying
Items Measure (Logits)
kneeling 10-20 minutes 1.84
lying back 5-8 hours 1.32
standing 1-2 hours 0.79
moving out bathtub taking a bath 0.44
change position standing to kneeling 0.16
bending waist 1-5 minutes -0.17
lying back 1 hour -0.43
change position lying back to sitting -0.76
shift lying in bed -1.30
change position standing to sitting chair -1.88


Error
0.16
0.14
0.13
0.13
0.13
0.13
0.14
0.14
0.15
0.18


Infit MnSq
1.22
0.93
0.98
1.46
1.16
0.80
1.22
0.61
0.60
0.85


ZSTD
1.4
-0.5
-0.1
3.2
1.2
-1.5
1.6
-3.2
-3.0
-0.9


Outfit MnSq
1.11
0.94
1.10
1.41
1.13
0.83
1.27
0.65
0.63
0.82


ZSTD
0.6
-0.3
0.7
2.7
1.0
-1.2
1.8
-2.6
-2.5
-0.9


Correlation
0.58
0.64
0.61
0.61
0.67
0.67
0.52
0.58
0.63
0.57


Table 3-8. Fit statistics for lifting/carrying construct
Items Measure (Logits) Error Infit MnSq ZSTD Outfit MnSq ZSTD Correlation
carrying toddler on back 3.58 0.21 1.90 3.7 2.17 2.2 0.49
lifting 25 pounds shoulder to above head 2.25 0.17 0.80 -1.3 0.75 -1.1 0.79
lifting 25 pounds floor to waist 1.51 0.16 0.77 -1.7 0.77 -1.3 0.80
carrying 25 pounds 25 feet 1.20 0.15 0.78 -1.6 0.71 -1.8 0.82
lifting 10 pounds waist to shoulder 0.50 0.15 0.69 -2.5 0.65 -2.5 0.83
lifting 5 pounds shoulder above head -0.33 0.15 1.15 1.0 0.99 0 0.72
pulling wet laundry out washing mach -1.02 0.16 1.07 0.5 1.05 0.3 0.65
pulling open a heavy door -1.45 0.17 0.96 -0.2 1.13 0.5 0.61
lifting 1 pound waist to shoulder -2.45 0.20 1.06 0.4 0.74 -0.5 0.57
pulling open refrigerator door -3.78 0.29 1.05 0.3 0.62 -0.2 0.42









Table 3-9. Fit statistics for walking/moving construct
Items Measure (Logits) Error Infit MnSq ZSTD Outfit MnSq ZSTD Correlation
running one block 3.16 0.20 2.20 4.4 3.97 4.1 0.56
climbing up or down a 6-foot ladder 1.70 0.15 1.45 2.5 1.19 0.8 0.75
climbing up or down a 3-step stool 0.88 0.14 1.09 0.7 0.91 -0.4 0.77
walking 4-8 blocks 0.40 0.14 0.73 -2.0 0.73 -1.7 0.78
climbing down one flight of stairs 0.05 0.14 0.99 0 1.24 1.3 0.69
walking 2-4 blocks -0.28 0.15 0.64 -2.7 0.58 -2.6 0.76
walking crowded place -0.66 0.15 0.67 -2.3 0.58 -2.2 0.72
stepping up or down a standard curb -1.15 0.17 0.83 -1 0.67 -1.4 0.65
walking within home/living environment -1.59 0.18 0.70 -1.8 0.59 -1.4 0.62
walking on carpeting -2.52 0.23 0.90 -0.4 0.72 -0.5 0.46
















4 X + 6 X + XXXXXX +



X
X 5 +
4 +
3 +
XX T
XX I T I
4 T+ I
XX XX IT running one block (7.0)
TI I carry toddler on back 3 XXXXXXXX +
T XXX
2 XX + I XXXX
kneeling 10-20 mins 3 + X
XX xxxx xx
I I xxxxxxx
2 XXXXXXX +
XXXX S lying back 5-8 hrs (1.0) XXXX S|S lift 25 lb shoulder to head (3.0)
X S 2 + XXXX | climb up or down a 6-foot
1 XXXXXX + XXXX IS (3.0-6.0)
XXX | XXXXX | lift 25 lb floor to waist (3.0)
standing 1-2 hours (1.5) XXX XXXX
carry 25 lb 25 feet (3.0) 1 +
XXXXXXXX moving out bathtub (2.5) 1 XXXXX + XXXX climb up or down a 3-step
XXXXXX XXXXXX (3.0-6.0)
XXXXXXXX M change pos stand to kneel XXXX M| XXXXXXX M
0 XXXXXXXXX +M X | lift 10 lb waist to shoulder XXXXXX walking 4-8 blocks (3.0-6
bending waist 1-5 mins XXXX I
XXXXXXX 0 XXXXXXXX +M 0 XXXXXX +M climb down one flight of
XXXXXXX lying back 1 hour XXXX | XX (3.0-6.0)
XXXX | XXXXXX | lift 5 lb shoulder above head XXXXXXXI walking 2-4 blocks
XXXX change lying back to sit | XXX (3.0-6.0)
S XXXXXXXXX | XXX walking crowded place
-1 XXXXXXX + XXXXXXX + pull wet laundry out washing (2.0) XXX (<3.0)
XXXXX S XX S -1 XXXX S+
X Shift lying in bed I pull open a heavy door I step up or down a standard,
SXXXXX | XXX (2.0)
XX x XX IXX IS
2 + XXX I walking within home envirn
change stand to sit chair (2.0) X |S I (2.0)
-2 X T+ I lift 1 lb waist to shoulder -2 XX +
T X I I

3 X T+ X T| walking on carpeting
S1 (2.0)
XX I x I
I -3 +
-3 +I pull open refrigerator door
4 +

Figure 3-4. Item-person map of three short forms (positioning/transfer, lifting/carrying, and walking/moving) of the ICFAM

following the item removal.





101

































.---I I I I


-8 -6 -4 -2 0 +2 +4 +6 +8

Figure 3-5. Test information function of short form versus entire set of items for
positioning/transfer. A dotted line shows short form and a solid line shows entire set
of item.


J.
- -


- -1


-8 -6 -4 -2 0 +2 +4 +6 +8



Figure 3-6. Test information function of short form versus entire set of items for lifting/carrying.
A dotted line shows short form and a solid line shows entire set of item.


I-- ---










14

12

10

8

6

4
4 \

2 # I



-8 -6 -4 -2 0 +2 +4 +6 +8
Figure 3-7. Test information function short form versus entire set of items for walking/moving.
A dotted line shows short form and a solid line shows entire set of item.









CHAPTER 4
COMPARISONS OF THE RELATIVE PRECISION OF THREE DIFFERENT TYPE BACK
PAIN MEASURES: THE ICF ACTIVITY MEASURE (ICFAM) COMPUTER ADAPTIVE
TEST, ICFAM SHORT FORMS, AND OSWESTRY BACK PAIN DISABILITY
QUATIONNAIRE

Introduction

Many self-report measures have been developed specifically for the back pain population

due to their several advantages. These advantages include decreasing administration costs,

reducing respondent burden, and potentially accessing scattered sample (131). Many studies

suggest that self-report disability measures for back pain are as reliable as performance measures

(23-25, 32, 40, 70) and appear to be sensitive indicators of long-term outcome (7). In general,

these self-report disability measures are commonly classified into generic and condition specific

measures (28, 35). Two generic measures, the Sickness Impact Profile (SIP) (62, 66) and the

Physical Function scale (PF-10) (62, 66) are the most commonly used assessments with

individuals reporting back pain. The most extensively utilized condition-specific measures for

back pain include the Oswestry Back Pain Disability Questionnaire (ODQ), the Roland-Morris

Disability Questionnaire (RMDQ), and the Quebec Back Pain Disability Scale (QBDS)

(23, 25, 29, 30, 74, 77, 79, 80, 132). To date, nearly 82 condition specific disability measures for

back pain have been developed and have been shown to have adequate psychometrics. Of these

widely accepted disability instruments, the ODQ is regarded as one of the most reliable back

pain instruments (10, 23-27).

Apparent advantages of the ODQ over other disability instruments include: 1) strong

relevance between the condition of back pain and the isolated objective physical measurement

(e.g., range of motion of back), 2) high responsiveness to functional change due to its rating scale

with six response categories, 3) ease of administration, and 4) low impact on normal clinic

operations (3, 13, 23, 24, 29, 30, 73, 74). Many studies have shown that the ODQ and revised









versions of it have adequate psychometric properties, such as reliability, validity, and

responsiveness (3, 13, 23, 24, 29, 30, 73, 74). However, studies have shown that the ODQ may

lack sensitivity to discriminate between individuals at the high extreme of ability range (i.e.,

ceiling effects) (29, 30), only occasionally being responsive to individuals with severe back pain

(31, 32). Several studies also indicate that the ODQ is more sensitive for patients who have

improved but less sensitive for patients whose condition remained unchanged (23, 79). Thus,

despite its adequate psychometrics, the ODQ may not precisely measure the disability of back

pain across the full range of ability.

Deficits in precision may be the result of using items that do not closely match the ability

of the sample of interest (35). That is, when easy items are administered to individuals with high

ability (i.e., low disability) and/or difficult items are administered to individuals with low ability

(i.e., high disability) there is a lack of measurement precision with a resulting inability to

discriminate among individuals (29, 30). Problems with measurement precision often occur

conventional instruments with fixed number of item, because it is unrealistic for one instrument

to include enough items to precisely measure individuals across a wide range of ability. Even

instruments with excellent breadth may still have inadequate depth of measurement (33).

Additionally problematic is the fact that long assessments (i.e., those covering a wider range of

ability level) contain items that appear unnecessary and induce a concern over respondent burden

and administration costs (36).

These legitimate concerns prompted the creation of static short forms from full length

instruments (28, 35). By reducing the number of items on the full instrument, short forms could

achieve measurement efficiency while addressing concerns related to burden and cost (28, 35).

Developers of static short forms have attempted to select items that spread across the ability









ranges, however, with large reduction in the number of items, loss of precision remains an issue

(15, 36, 44-46). Creating the "ideal" measure consisting of enough items to cover the full range

of the trait with adequate precision is challenging when using short forms. Despite the popularity

and widespread use of short forms developed using Classical Test Theory (CTT), these

instruments have a number of limitations (37). Item Response Theory (IRT)-based short forms

can alleviate the limitations by focusing on item level psychometric properties.

In contrast to CTT, Item Response Theory (IRT) focuses on the psychometric properties of

the items making up the instrument instead of the instrument as a whole (40, 41). By estimating

the probability that a respondent will select a particular rating for an item, item difficulty and

person ability (or disability) can be placed on the same linear continuum. Thus, IRT model

allows "connecting" individuals' responses to items with their ability level (40, 42). Estimates of

person ability (i.e., disability) on an underlying construct obtained using IRT methods are

invariant regardless of the items used (i.e., test free measurement), whereas under the CTT

paradigm, person scores vary depending on the difficulty of the instrument (41). Furthermore,

item difficulty estimates derived from the IRT analyses remain the same regardless of the ability

of the sample (i.e., sample free measurement), while test statistics in CTT are dependent on the

sample taking the test. In addition, the IRT models linearly transform raw scores (typically used

in analyses based on CTT) into equal interval measures (34). These advantages of IRT allow for

the creation of invariantly calibrated large item banks that can more precisely discriminate

individuals' ability levels and thus, capture smaller increments of change.

While IRT methodologies provide the means for generating and linking person ability and

item difficulty calibrations, Computer Adaptive Testing (CAT) methods promise a means for

administrating items in a way that is both efficient and precise (28, 34, 36, 44-48). Studies have









shown that CAT improves test efficiency maintaining adequate precision with fewer items than

the full test (41, 43, 48, 50, 52, 53, 57-59). CAT measures are highly correlated with other

assessments intending to measure the same construct and require fewer items (i.e., an average of

six items needed to reach an ability estimate) (81-84).

The CAT is based on a testing algorithm which defines iterative processes with a set of

rules specifying the test questions to be administered to respondents a) This includes procedures

for item selection, ability estimation, and termination criteria. By selectively administering items

that are matched to the ability level of the individuals, measurement efficiency can be

accomplished without the loss of precision provided by the full item bank. For example, when

measuring the ability of a person with mild back pain, more difficult items would be chosen (i.e.,

matching the ability of the individual). Similarly, when measuring the ability of a person with

more severe back pain, a different set of items would be chosen that match that individual's

severely impaired ability (i.e., easier items would be selected).With this technology, a small

number of items can be selected from the item bank which are most relevant for a person of a

particular ability level (34).IRT in combination with CAT has recently become an alternative to

conventional fixed-format disability measurement (25, 36).

The ICF Activity Measure (ICFAM) has recently been developed to create an efficient and

precise measurement system based on the activity dimension of World Health Organization's

(WHO) International Classification of Functioning, Disability and Health (ICF). The ICF

provided the conceptual framework and classification system for generating the items on the

ICFAM. Activities involving movement, moving around and daily life activities were the

subcategories of the ICF activity dimension consulted in the development of items. Items were

developed with the intent to represent the entire range of ability on each construct, thus, creating









an equiprecise measurement (i.e., precise measurement across the entire range of the underlying

construct). Using Item Response Theory (IRT) and Computer Adaptive Testing (CAT) methods,

Velozo and colleagues (41) created ICFAM, a web based computer adaptive survey system. The

administrative core of the instrument allows adjusts to be made to various settings making it

possible to change, the initial theta value (i.e., difficulty of question first given to respondent)

and stopping rule (i.e., guidelines for terminating the test). Because questions are targeted to

individuals at their ability level requiring 5-10 questions per construct are required to reach at a

final measure of person ability with acceptable error. In addition, immediate feedback is

provided to the respondents/clinicians in the form of a graph and summary statistics.

We hypothesized that the CAT measures will discriminate more precisely than the short

forms or the ODQ measures. The purpose of this study is to compare the precision of the person

measures generated from the ICF activity measure (ICFAM) computer adaptive test, short forms

of the ICFAM, and to the Oswestry Back Pain Disability Questionnaire (ODQ).

Method

Research Participants

Forty-two individuals with back pain were recruited from rehabilitation clinics in

Gainesville, Florida including the University of Florida and Shands Orthopaedics and Sports

Medicine Institute and Shands Rehab Hospital. Forty-two participants without back pain were

recruited from multiple public sites in Gainesville. Criteria for participants with back pain

included: 1) currently experiencing back pain, 2) having previously received treatment for back

pain, 3) ability to read and understand English, and 4) age between 18 and 100 years. The criteria

for non-back pain participants included: 1) currently experiencing no back pain, 2) able to read

and understand English, and 3) age between 18 and 100 years. All appropriate clients presenting

to the recruiting sites between November 3, 2009 and June 30, 2010 were recruited for the back









pain group. This study was approved by the Institutional Review Board at the University of

Florida (Approved by IRB #17-2009).

Instrumentation

The Oswestry Low Back Disability Questionnaire (ODQ), a conventional back pain

disability instrument developed under classical test theory, was one of the instruments used in

this study (Table 4-1). The ODQ is among the most popular self-report condition specific

instruments assessing how back pain affects patients' ability to manage daily life tasks (74). The

ODQ and its revised versions provide an index of the perceived disability experienced by

individuals' with back pain. It consists of ten items including pain intensity, personal care, lifting,

walking, sitting, standing, sleeping, employment/home-making, and traveling. Participants

respond on a 5-point ordinal scale (5 = pain does not interfere with activities, 0 = pain so severe

that activities cannot be performed). The total score (i.e., sum of all item responses) is converted

to a percentage score ranging from 0 (no disability) to 100 (most severe disability). Thus, a

higher score is indicative of a higher level of disability.

The construction of fixed short forms is a conventional approach to achieving

measurement efficiency, reducing respondent burden and administration costs (44, 46). Despite

the loss of some precision, short forms have been shown to be valid and practical for use in order

to achieve measurement efficiency (34, 44-46). A second measure used consisted of the three

newly created short forms of the ICFAM (Appendix 2). These short forms were created using

item response theory methodologies, specifically the Rasch one-parameter IRT model. Each

short form consists of 10 items which were judged to have adequate psychometrics including fit

statistics, person separation ratio, and Cronbach's a. For each of the questions on the short forms,

respondents select one of four choices with a lower score representing a lower level of ability; "3"

(no difficulty), "2" (some difficulty), "1" (a lot of difficulty), and "0" (have not done). The









participant was instructed to select "have not done", if the activity did not occur within the last

30 days. In this study, a rating of "0" (i.e., "have not done") was treated as missing value.

In an effort to achieve both psychometric efficiency and precision, the ICF Activity

Measure (ICFAM) was developed using Item Response Theory (IRT). The World Health

Organization's (WHO) International Classification of Functioning, Disability and Health (ICF)

provided the conceptual framework and classification system for developing items used in the

study. Specifically, the activity dimension of the ICF including activities involving movement,

moving around and daily life tasks was utilized as a guide in the item development stage(43).

The original ICFAM consists of 6 activity constructs: positioning/transfers, lifting/carrying, fine

hand, walking/climbing, wheelchair/scooters, and self care activities. Constructs for use in this

study were selected based on the following two criteria: 1) tasks represented by items within the

construct frequently cited as problematic for individuals with back pain and 2) tasks within the

construct represent a potential activity limitation for individuals with back pain. Based on these

criteria, three relevant constructs were chosen for this study: 1) positioning/transfers, 2)

lifting/carrying, and 3) walking/moving. For each of the questions on the CAT, respondents are

asked to select one of four response categories with a lower score representing a lower level of

ability; "3" (no difficulty), "2" (some difficulty), "1" (a lot of difficulty), and "0" (have not done).

CAT technology was used to administer items of the ICFAM instrument for each construct.

Figure 4-1 presents the CAT algorithm used for the ICFAM instrument. First, the CAT begins

with an initial person ability estimate (Bn) for a particular construct (i.e., positioning/transfer).

The initial person ability measure is set at the mean person ability of the sample used in the

preliminary paper-and-pencil field test (during ICFAM development phase). The CAT presents

an item with a difficulty measure (Di) that is identical or closes to this initial person ability









measure. After the initial item is presented and responded to, a new person ability estimate and

standard error (SE) is generated. The stopping rule for the CAT is pre-set based on the standard

error associated with a person ability estimate (i.e., SE < 0.40) and the maximum number of

items administered (i.e., < 10 items). That is, the test finishes when an individual's ability is

estimated with a standard error less than .40 or 10 items have been administered to the individual.

Since the stopping rule is unlikely to be reached with the presentation of a single item, a second

item is presented to the respondent. Based on the response, the person ability estimate is re-

calculated. This procedure continues until the SE associated with the person ability estimate is

less than the pre-set SE, which defines the stopping rule. Once the stopping rule is satisfied, the

respondent's final ability measure for that construct is formulated. After completed

positioning/transfer construct, the next construct (i.e., lifting/carrying and walking/moving

construct) is presented until the CAT reaches the final ability measure.

Analysis

A series of Rasch analyses were performed using Winsteps software program to calculate

person measures for back pain and non-back pain groups (103). The Rasch model transforms

total raw scores into estimate of person ability in logits. To maximize the comparability of

summative scores from the short forms and the ODQ instrument, Rasch scores were linearly

transformed from the original logit estimates to a 0-100 metric.

Pearson product moment correlations were obtained to compare the measurement

properties of CAT (i.e., 10-item stopping rule and standard error less than 0.40), short forms, and

ODQ. Scatter plots of ability estimates for the CAT versus the short forms and the ODQ measure

were used to further examine these relationships.

To examine potential differences in precision across the three measures (i.e., CAT, short

forms, and ODQ), the method of known-groups validity to test relative precision (RP) in









discriminating back pain and non-back pain groups was used. Methods included under the

general linear model were used to test for hypothesized differences in group mean estimates. The

magnitude of the F value from the ANOVA represents a measure of precision. F-statistics

associated with chance probabilities p < 0.05 were considered significant. If the RP ratio is equal

to 1, both methods of estimating function are equally discriminatory. If the RP >1 the

measurement method in the numerator is superior in differentiating function compared to method

in denominator. The greater the F value, the greater the amount of systematic variance a

measurement method accounts for and, therefore, the greater its ability to discriminate groups of

subj ects.

Results

Sample demographic characteristics and clinical information are presented in Table 4-1.

The average age was 53 years for the back pain group and 48.years for the non-back pain group.

Nearly 60% of participants reported having back pain more than a year indicating it was a

chronic condition. Five percent of the non-back pain participants reported having another pain

related condition.

The stopping rule requiring <.40 SE was achieved for each of the respondents before the

maximum number of questions (10) was reached. Participants in the back pain group answered

slightly more questions than those in the non-back pain group. For the back pain group, the

average respondent answered 5.62 questions in the positioning/transfer construct, 6.37 questions

in the lifting/carrying construct, and 6.25 questions in the walking/moving construct. For non-

back pain group, the average respondent answered 4.64 questions in the positioning/transfer

construct, 5.12 questions in the lifting/carrying construct, and 5.45 questions in the

walking/moving construct. The CAT administered more questions for back pain group than non-

back pain group.









In order to inspect the linear association between the measures, Pearson product moment

correlations were calculated. Table 4-2 and 4-3 provide Pearson correlation coefficients between

the CAT measures, short form measures, and the ODQ measures. Overall, the CAT measures

had moderate to high correlations with the short form measures and had moderate correlations

with the ODQ measures. The correlations between the CAT and three short form measures

among back pain/non-back pain groups were moderate to high (r = 0.805/r = 0.569 for

positioning/transfer, r = 0.808/r = 0.545 for lifting/carrying, and r = 0.620/r = 0.647 for

walking/moving). In addition, the correlations between the CAT measures and the ODQ

measures were slightly lower than between the CAT measures and short form measures. The

correlations between back pain/non-back pain groups were moderate (r = 0.605/r = 0.037 for

positioning/transfer, r = 0.530/r = 0.058 for lifting/carrying, and r = 0.594/r = 0.029 for

walking/moving). All correlations between the CAT and the short form were statistically

significant at thep < 0.01 level, while all correlations between the CAT and the ODQ measure

were not statistically significant.

In an auxiliary investigation of the linear relationships between CAT and the short form

measures, and CAT and the ODQ measures, each pair of measures were plotted against each

other (Figure 4-1, 4-2, and 4-3). Scatter plots of the CAT and short form measures clustered

slightly more around the center of graph than that of the CAT and the ODQ. In addition, the

ODQ measures were more dispersed in the y-coordinate direction than other measures, while the

CAT measures clustered into the center of the graph. As noted in Table 4.4, the CAT had 24-32%

less variance than the short forms and 22-36% less variance than the ODQ while the short form

had similar levels of variance as the ODQ. Scatter plots of all relationships showed linear

relationships. Of these plots, the scatter plot of the CAT and short form measures for the









positioning/transfer and lifting/carrying construct was the closest to a line (i.e., these measures

had the highest correlation, r = 0.605 and 0.808). The pattern of scatter plot between the CAT

versus the short form measures and the ODQ measures was relatively consistent.

Comparisons of the relative precision (RP) of the two measures to discriminate groups

differing in back pain are presented in Table 4-4. As was hypothesized, the CAT measure

achieved almost 2 times greater RP than the short form for the positioning/transfer construct.

This indicates that the CAT's ability to discriminate between individuals in the back pain and

non-back pain groups was twice as effective as the short form's ability. In addition, the CAT for

the lifting/carrying construct had 16% greater RP in discriminating the groups than the short

form, while the CAT for the walking/moving construct had 38% less RP in discriminating the

groups. Comparison between the CAT and the ODQ measures had a similar pattern to that of the

CAT and the short form measures. That is, the CAT positioning/transfer construct achieved 116%

greater RP and the CAT for the lifting/carrying construct had 42% greater RP in discriminating

the groups than the ODQ measure. The RP ratio for discriminating the groups did not favor the

CAT measure for the walking/moving construct, showing 16% less precision than the ODQ

measure. As we hypothesized, in comparison between the short form measures and the ODQ

measure, short form measure for all constructs had greater RP (6% for positioning/transfer, 22%

for lifting/carrying, and 35% for walking/moving construct) than the ODQ.

Discussion

Summary of Results

The ODQ and its versions are widely used as outcome measures for disability resulting

from back pain. They have been extensively cited more than 200 times in the Science Citation

Index (73). Despite the popularity of the ODQ, numerous studies reveal substantial concerns

regarding its measurement precision as well as measurement breadth (30,77,132). That is, the









ODQ is recommended to use for the assessment of a particular severity group (i.e., high

disability) due to its floor effects (23). The ODQ also appears to have a "gap" where items do not

closely match the ability of the sample of interest (80) and lead to deficits in precision. The

creation of fixed short forms has been a popular method of achieving measurement efficiency

and reducing respondent burden and administration cost. However, increases in efficiency often

result in decreases in precision because item reduction leads to inadequate coverage of items

relevant for all ability levels. We hypothesized that the ICFAM computer adaptive assessment

would be superior to short form and conventional measures. The purpose of this study was to

compare the precision of person measures obtained from the CAT, short forms, and the ODQ, a

conventional back pain instrument.

Correlations

Correlations between person measures from the CAT and those from the short forms

indicate a moderate to high degree of correspondence, while person measures from the CAT and

the ODQ show a moderate degree of correspondence. The CAT measures showed an acceptable

range of correlations with short form measures across all three constructs. However, the

correlations of CAT measure with the ODQ dropped from r ranging from 0.620 to 0.805 to r

ranging from r = 0.530 to r = 0.605. The CAT and the short form for the lifting/carrying

construct had the highest correlation (r = 0.808) compared to other two constructs (r = 0.805 for

the positioning/transfer and r = 0.620 for the walking/moving construct). This could be due to

the fact that the lifting/carrying construct contains items that are most relevant for individuals

with back pain. In comparison to the correlation between the CAT and the ODQ, the greater

correlation between the CAT and the short form is consistent with what we expected. It is

probably because the short forms were originated from the ICFAM item bank.









Relative Precision

Relative precision (RP) was used to examine whether there are empirical advantages in

measurement precision using the CAT over the short form and the ODQ as a conventional

measure. RP is based on the ratio of pair wise F statistics (an index of between-subject

variability to within-subject variability) of two different measures. The magnitude of the F

statistics from the ANOVA (analysis of variance) represents a measure of precision. Thus, the

RP estimates indicate how much more or less precise a measure is relative to another measure

(11).

In this study, RP comparisons were conducted using known-groups validity (i.e., back pain

and non-back pain groups) in discriminating back pain and non-back pain groups. This known-

group validity addresses the extent to which a measure differs as predicted between groups who

should score low and high on an ability trait. Supportive evidence of know-group validity

typically is provided by significant differences in mean score across independent samples (133).

As was hypothesized, the results showed that the CAT measures achieved greater RP in

discriminating back pain and non-back pain groups than did the short form measures.

Furthermore, the CAT measures had greater RP in discriminating the groups than did the ODQ

measures except for with the walking/moving construct. In addition, the short form measures

outperformed the ODQ in RP. This may indicate that CAT measures outperform short form

measures and short form measures outperform conventional measures such as the ODQ measure

in terms of measurement precision. On the other hand, for the walking/moving construct, the

CAT measure achieved less RP than did the short form measure in discriminating the groups.

Likewise, the CAT measure also achieved less RP than did the ODQ for this construct. This may

indicate that the CAT and the short form measure for the walking/moving construct were not

successful to discriminate individuals with back pain. That is, these individuals appear to be









reporting with higher rating (e.g., "no difficulty") rather than lower rating (e.g., "have not done")

on the construct.

Our results supported the notion that the CAT generally outperformed the short forms

(44, 46, 58). Previous researchers have found similar results in terms of measurement precision.

In effort to compare the CAT to conventional lumbar spine functional status (LFS) instruments,

Hart and colleagues (2006) found that CAT measures produced as precise as the LFS instrument

for back pain disability (134). Likewise, Haley and colleagues (2004) compared CAT to the 10-

item short forms assessing physical/mobility, personal care/instrumental, and applied cognition

with three activity item pools consisting of 101 items, 62 items, and 59 items, respectively. The

results showed that CAT measures were more precise than the 10-item fixed short forms across

the three constructs of the Activity Measure for Post-Acute Care (AM-PAC). Other than physical

activity domain, a six-item short form survey for measuring Headache Impact Test (the HITTM)

also showed that the short form was as responsive as the CAT in headache impact. In general,

the results of the present study are consistent with previous studies in precision comparisons

between CAT and short form measures. In addition, we attempted an additional comparison

between CAT and the ODQ measure as conventional instrument. Excluding the walking/moving

construct, the CAT measure appeared to be more effective than did the 10-item short forms

measures, while the 10-item short form measures appeared to be more effective than did the

ODQ measure in terms of measurement precision.

Limitations and Future Implications

The present study has several limitations. Computer adaptive testing methods shorten test

length by 62.5%, or require only an estimated nine items (58). When we preset the algorithm of

the CAT, the stopping rules of CAT were; 1) ten items for the maximum number of items, 2)

four items for the minimum number of items, and 3) the standard error < 0.4. In the present study,









our CAT used much fewer items than the preset ten items and average respondents answered

6.08 items for each construct. Since the standard error of CAT measures was not included in

analysis, which criteria were met to reach the person measure was unknown. Future research is

needed to investigate the effects of adjusting the stopping rules to make them more rigorous, thus

allowing more information to be obtained about respondents.

































I kIn)
9. Stop


Figure 4-1. Computer adaptive testing algorithm. Adapted from Wainer, Dorans, Eignor,
Flaugher, Green, Mislevy, Steinberg, and Thissen (2000).









Table 4-1. Demographic characteristics of study participants
Back Pain Group Non-Back Pain Group
Characteristics n=42 n=42
Age


< 20
21-30
31 -40
41 -50
51 -65
> 65
Mean SD


Female
Male


1 (2.4)
3 (7.1)
10 (23.8)
8 (19.0)
8 (19.0)
12 (28.6)
53.74 + 20.13


29 (69.0)
13 (31.0)


Middle/Junior High
High School
College
Graduate


3
6
9
6
8
10
48.76


(7.0)
(14.4)
(21.4)
(14.4)
(19.0)
(23.8)
19.7


27 (64.3)
15 (35.7)


0
(33.3)
(54.8)
(11.9)


(4.7)
(45.3)
(28.5)
(21.5)


Race/Ethnic
African American
Hispanic American
American Indian
White, not Hispanic
origin
Asian/Pacific Islander

Years that has had related problems
Less than a year
1 through < 4 years
More than 4 years
Missing


Gender


Education


5 (11.9)
2 (4.8)
0 (0.0)
25 (59.5)
10 (23.8)


7 (16.6)
1 (2.3)
1 (2.3)
32 (76.2)
2 (4.6)


14 (33.3)
5 (12.0)
20 (47.6)
3 (7.1)


(0.0)
(0.0)
(4.7)
(95.3)





























0 50
CAT measure positioning/transfer


0 50 100
CAT measure lifting/carrying

Scatter plot of ability measures from the CAT measure versus the short form
measure for positioning/transfer and lifting/carrying construct. Figure A represents
the plot of ability measures for the CAT and the short form for positioning/transfer;
Figure B represents the plot of ability measure for the CAT and the ODQ measure
for lifting/carrying. "*" represent that Pearson's correlation is significant at the 0.01
level.


Figure 4-1.



















*






S* r = 0.620*


0 50 10
CAT measure walking/moving


100







50

0


r = 0.605*

0
0 50 100
CAT measure positioning/transfer
Scatter plot of ability measures from the CAT measure versus the short form
measure. Figure A represents the plot of ability measures for the CAT and the short
form measure for walking/moving; Figure B represents the plot of ability measure
for the CAT and the ODQ measure for positioning/transfer. "*" represent that
Pearson's correlation is significant at 0.01 level.


Figure 4-2.











100






5-
50


O
0




0


0 50
CAT measure lifting/carrying


100











0


r = 0.594*

0
0 50 100
CAT measure walking/moving

Scatter plot of ability measures from the CAT measure versus the ODQ measure.
Figure A represents the plot of ability measures for the CAT and the ODQ measure
for lifting/carrying; Figure B represents the plot of ability measure for the CAT and
the ODQ measure for walking/moving. "*" represent that Pearson's correlation is
significant at 0.01 level.


Figure 4-3.









Table 4-2. Correlations coefficients for CAT, short forms, and ODQ measure for back pain
group
CAT P/T CAT L/C CAT W/M SF P/T SF L/C SF W/M ODQ
CAT P/T 1.000
CAT L/C 0.837* 1.000
CAT W/M 0.614* 0.647* 1.000
SF P/T 0.805* 0.632* 0.568* 1.000
SF L/C 0.671* 0.808* 0.536* 0.635* 1.000
SF W/M 0.524* 0.566* 0.620* 0.554* 0.548* 1.000
ODQ 0.605* 0.530* 0.594* 0.605* 0.576* 0.605* 1.000
Note: correlation is significant at the 0.01 level (2-tailed). CAT P/T: CAT Positioning/Transfer
measure, CAT L/C: CAT Lifting/Carrying measure, CAT W/M: CAT Walking/Moving measure,
SF P/T: Short Form Positioning/Transfer measure, SF L/C:Short Form Lifting/Carrying measure,
SF W/M: Short Form Walking/Moving measure, and ODQ: Oswestry Back Pain Disability
Questionnaire measure.



Table 4-3. Correlations coefficients for CAT, short forms, and ODQ measure for non-back pain
group
CAT P/T CAT L/C CAT W/M SF P/T SF L/C SF W/M ODQ
CAT P/T 1.000
CAT L/C 0.699* 1.000
CAT W/M 0.843* 0.623* 1.000
SF P/T 0.569* 0.331* 0.559* 1.000
SF L/C 0.402* 0.499* 0.512* 0.784* 1.000
SF W/M 0.574* 0.354* 0.606* 0.836* 0.788* 1.000
ODQ 0.037 0.058 0.029 0.132 0.098 0.064 1.000
Note: correlation is significant at the 0.01 level (2-tailed). CAT P/T: CAT Positioning/Transfer
measure, CAT L/C: CAT Lifting/Carrying measure, CAT W/M: CAT Walking/Moving measure,
SF P/T: Short Form Positioning/Transfer measure, SF L/C:Short Form Lifting/Carrying measure,
SF W/M: Short Form Walking/Moving measure, and ODQ: Oswestry Back Pain Disability
Questionnaire measure.









Table 4-4. Mean difference between means for back pain and non-back pain groups
Mease Means (SE) Relative
Measure F
Back pain Non-back pain Precision
CAT P/T 49.83 55.55
41.76** 2.02
(0.61) (0.61)
Short Form P/T 53.85 83.07
20.58** 1.00
(2.31) (2.30)
CAT L/C 50.24 56.02 1
27.36** 1.16
(0.77) (0.77)
Short Form L/C 50.68 78.09
23.56** 1.00
(3.08) (3.14)
CAT W/M 53.14 58.33
16.34"* 0.62
(0.89) (0.89)
Short Form W/M 58.28 86.98
26.09** 1.00
(2.74) (2.73)
CAT P/T 49.83 55.55
41.76** 2.16
(0.61) (0.61)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
CAT L/C 50.24 77.00
27.36** 1.42
(0.77) (0.77)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
CAT W/M 53.14 58.33
16.34"* 0.84
(0.89) (0.89)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
Short Form P/T 53.85 83.07
20.58** 1.06
(2.31) (2.30)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
Short Form L/C 50.68 78.09 1
23.56** 1.22
(3.08) (3.14)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
Short Form W/M 58.28 86.98 1
26.09** 1.35
(2.74) (2.73)
ODQ 53.69 85.38
19.26"* 1.00
(2.69) (2.69)
Note: ** F statistics is significant at the 0.001 level. CAT: Computer Adaptive Testing, P/T:
positioning/transfer measure, L/C: lifting/carrying measure, W/M: walking/moving measure, and
ODQ: Oswestry Back Pain Disability Questionnaire measure.









CHAPTER 5
CONCLUSION

Back pain is the most common cause of activity limitation in our society (1). The need for

assessment of disability resulting from back pain has led to a proliferation of health status

measures (35). Many of these measures are self-reports of functional status. Self-report

functional status measures have been shown to be as reliable as or more reliable than physical

measurements of function and more relevant to the patient and society (24). In addition, self-

reports of pain and disability appear to be sensitive indicators of long-term outcomes (6, 7).

Because of these superior characteristics, self-report measures of back-related disability

developed (23) with most, if not all, having adequate psychometric properties

(13, 23, 24, 26, 27, 29, 30, 40, 65, 70, 72, 79, 80). The Oswestry Back Pain Disability

Questionnaire (ODQ) is one of the most widely used conventional self-report measures

(3, 13, 30, 73, 80). Due to the abundance of these measures, a prevailing challenge is selecting

the optimal measure. One characteristic that may make some measures less ideal than others is

the presence of ceiling and floor effects. This problem may be the result of instrument

development based solely on the Classical Test Theory (CTT) measurement model.

Measurement imprecision generally results from the use of items that do not closely match to the

ability of the population of interest (35, 55). In order to overcome these limitations, the "ideal"

measure should have items that cover a wide range of the underlying construct with high

precision. However, most conventional measures fail to evaluate individuals precisely

throughout the whole range of disability.

Utilizing Item Response Theory (IRT) and Computer Adaptive Testing (CAT) methods,

the ICF Activity Measure (ICFAM) was developed, creating an efficient and precise

measurement system based on the activity dimension of International Classification of









Functioning, Disability and Health (ICF). Items relating to activities involving movement,

moving around and daily life tasks as defined by the activity dimension of the ICF were

developed with the intent to create an equiprecise measurement (i.e., one with precise

measurement across the entire range of a construct). Creating short forms is a conventional

approach to achieving measurement efficiency by reducing the number of items (28,35).

However, the loss of precision is inevitable in short form creation (8,36,44-46,115). Critical

questions are to what extent and using what methods can the precision of short forms be

optimized.

Three research questions were proposed as part of this dissertation project; 1) What are the

psychometric properties of the computer adaptive ICF activity measure constructs of

positioning/transfer, lifting/carrying, and walking/moving with a sample of individuals having

activity limitations resulting from back pain?, 2) What are the psychometric properties of three

newly generated short forms developed from items on the positioning/transfer, lifting/carrying,

and walking/moving constructs?, and 3) How does the precision of the ICFAM CAT measures,

the short form measures developed from the ICFAM, and the ODQ compare?

Unidimensionality

To address the first research question, confirmatory factor analyses (CFA) and exploratory

factor analyses (EFA) were conducted to investigate the dimensionality. The CFA did not

confirm the unidimensionality of the three ICFAM constructs. In order to identify the factor

structure, EFA was subsequently performed and revealed a multidimensional factor structure for

each ICFAM construct, including the full item bank for all constructs. We speculated that the

low subject/item ratio (approximately 5 subjects per item) may have contributed to the failure of

confirming unidimensionality. Therefore, we improved the subject/item ratio (approximately 10

subjects per item) by performing the same analysis using 10-item short forms. CFAs still failed









to reveal unidimensional structures for the three short forms. The subsequent EFAs for the three,

10-item short forms revealed multidimensional constructs composed of three factors for the

positioning/transfer construct, two factors for the lifting/carrying construct, and two factors for

the walking/moving construct.

The factors retained for the short forms appear plausible from a clinical point of view. For

the positioning/transfer short form, complex activity items (e.g., kneeling and getting out of bath

tub) composed one factor, while simple activity items (e.g., changing and shifting position) make

up a separate factor. In terms of motor control theory, the activity of kneeling requires greater

metabolic demands than standing or stooping (135) and involves the complex neural activity

associated with balance (125). The logical progression of item difficulty is even more prominent

in lifting/carrying and walking/moving short forms. For the lifting/carrying short form, the two-

factor model grouped the items into lifting heavy and lifting light objects. Likewise, for the

walking/moving short form, the two factor model grouped items into simple walking activities

and more difficult climbing/running activities. In summary, the factors appear to be

subcomponents of each of the three ICFAM constructs, determined to a large degree by the

difficulty of the activities.

This multidimensional nature of the construct creates a serious challenge for this study,

since unidimensionality is a requirement of most IRT models. The reason for that is, a single

construct can better explain the relationship between person performance and the item continuum

in any data set (37). However, in practical terms, unidimensionality is an ideal that is never fully

achieved and in most "successful" cases is approximated. Applying multidimensional IRT

models to these existing constructs of the ICFAM may be worthwhile in future analyses,









although many multidimensional models are still in the early stages of development and

refinement.

Hypothetical versus Empirical Item Hierarchies

The hypothetical hierarchy of activity based on Metabolic Equivalent (MET) was partially

supported by the empirical hierarchy of item difficulty generated by Rasch analysis. Of the three

constructs studied, only the walking/moving construct showed an item hierarchy that can be

explained by MET. For instance, the most difficult item, jogging one mile has a MET of 11.0,

while the easiest item, walking on carpeting has a MET of 2.0. In contrast, for the

positioning/transfer construct, the item difficulty hierarchy can be explained better by the clinical

features of back pain than the logical progression of the MET. That is, individuals with back pain

demonstrated greater difficulty in maintaining postures for a prolonged time than shifting or

changing postures. This hierarchical order is not supported by the MET values. One of the most

challenging items in the positioning/transfer construct (e.g., lying down on back 5-8 hours) has a

MET rating of 1.0, while the least challenging item (e.g., changing standing to sitting in chair) is

rated as 2.0 METs. The item difficulty hierarchy of the lifting/carrying construct only partially

concurred with the MET categorization. Different weight activities paralleled the MET

categorization (e.g., lifting 25 pounds with 3.0 METs was more challenging than pulling wet

laundry out from a washing machine with 2.0 METs). However within the lifting/carrying

construct, the three above average items with different item difficulty calibrations had the same

MET value. That is, the empirical item difficulty order generated by Rasch analysis

differentiated the three items (lifting 25 pounds shoulder to above head, lifting 25 pounds floor to

waist, and carrying 25 pounds for 25 feet), while the MET values of the these items are the same

(3.0 METs). The different difficulty levels of the lifting items may be more a function of

biomechanical challenge and pain experienced than energy expenditure. That is, lifting from









shoulder level to above the head is more biomechanically challenging than lifting from floor to

waist and may be more painful because lifting from shoulder level to above the head is a burden

on both arms and back.

There are limitations associated with determining the hypothetical item difficulty hierarchy

of activity relevant to the items of the ICFAM constructs. First, the standardize the assignment of

MET intensities in physical activity questionnaires is based on a compendium of physical

activities that was developed for use in epidemiologic studies (136,137). The values do not

estimate the energy cost of physical activity in individuals in ways that account for differences in

body mass, age, gender, efficiency of movement, or geographic and environmental condition in

which the activities are performed. Therefore, individual differences in energy expenditure for

the same activity can be large. Second, there are no values generated for activities that consume

less than one MET, which is defined as 1 Kcal/kg/hour and is roughly equivalent to the energy

cost of sitting quietly. Many ICFAM items are not comparable, since we have many bed mobility

items in the positioning/transfer construct that may be less than one MET. Third, although we

attempted to select the closest MET value when the relevant item was not available from the

compendium of physical activities, the accuracy of these estimates is uncertain. Furthermore, the

compendium of physical activities does not provide detailed descriptions of the physical

activities. Thus, comparing our item difficulty hierarchy to a hierarchy based on MET values

provides only a general sense of distinctions between the two hierarchies.

There were evidences that the item difficulty hierarchies appeared plausible from a pain-

related clinical point of view. Motor control theory purports that complex tasks involving the use

of multiple joints and challenging environmental factors are more difficult than functional tasks

requiring only a single joint or more optimal environmental factors (94). In this study, as









hypothesized, complex tasks were found to be more difficult (e.g., kneeling 10-20 minutes) than

simple tasks (e.g., standing 1-2 hours). However, a relatively simple task (e.g., lying back 5-8

hours) was found to be more difficult than a complex task (e.g., change position lying back to

sitting). These findings were neither in agreement with motor control theory nor the MET values.

Also, for the positioning/transfer construct, we speculate that individuals with back problems

may have been primarily affected by pain not energy expenditure. That is, individuals with back

pain who are having difficulty with a transient task such as changing position from lying on their

back to sitting (i.e., an easiest item) would be expected to have more difficulty with a prolonged

activity such as lying on their back 5-8 hours activity (i.e., above average difficult item).

Furthermore, lying on their back for a prolonged period of time would be a difficult task for

individuals with back pain even though the activity does not involve complex biomechanical

modifications or adjustments. Thus, the logical progression of item hierarchies for the constructs

positioning and transfers and lifting/carrying have a tendency to reflect the clinical features of

back pain. Future research should investigate the relationship between pain during particular

activities and the Rasch generated item difficulty hierarchies to appraise this hypothesis.

Short Forms

Several methods have been used to develop short forms from original tests. These methods,

based on the Classical Test Theory (CTT) framework, often include the deletion of items with

low item-total correlations, items with the least impact on the overall internal consistency of test,

and items with low factor loadings. In this study, using an IRT method, we focused on having

items distributed across the difficulty range for each construct. The item-level psychometrics

based on Rasch analysis (one-parameter IRT model) were effective in equally distributing ten

items across the full range of ability and selecting items that matched person ability location.









This method focuses on maintaining measurement precision across the full range of the construct

(i.e., equiprecise measurement) while reducing the number of items.

Despite these attempts, there was a loss of precision with the three newly created 10-item

short forms in comparison to the full test, as well as decreased person reliability. Test

Information Function (TIF) graphs were used to visually inspect the loss of precision. Fisher

(1920) defined information as the reciprocal of the precision with which a parameter could be

estimated. Thus, if one could estimate person ability with precision, one would have more

information about the person's ability (130). The TIF graph is obtained by plotting the amount of

information against ability. The TIF for the positioning/transfer short forms showed a

considerable loss of information as a large proportion of items was removed from the entire set

of items (46 of 56 items removed). In contrast, the lifting/carrying and walking/moving short

forms displayed less information loss, as a much smaller proportion of items were removed from

these constructs (17 of 27 items and 10 of 20 items, respectively). All of the TIF graphs showed

that different ability levels are estimated with differing degrees of precision. As one moves to the

extremes of the scale (both low and high), less information and less precision is obtained.

Constructing a fixed short form is a conventional approach to achieving measurement

efficiency with fewer items. Although it is inevitable to sacrifice some precision in short form

creation, short forms are always attractive from the perspective of patient and administrative

practicality. Short forms reduce the burden on respondents and test administration. In addition,

short forms may be useful in a situation where computer access is not readily available to

researchers and clinicians. The short forms of the ICFAM have a few advantages over the ODQ.

First, the ICFAM short forms provide optimal precision across a wide range of ability. This

would substantially reduce deficits in measurement such as ceiling/floor effects. Secondly, the









ICFAM short forms offer three constructs (i.e., positioning/transfer, lifting/carrying, and

walking/moving), while the ODQ provides only three items relevant to positioning/transfer, one

item relevant to lifting/carrying, and one item relevant to walking/moving. Researchers and

clinicians may maximize their effectiveness in detecting group differences or clinical change by

selecting the ICFAM constructs that are most relevant to individuals with back pain. Since the

ICFAM positioning/transfer construct and lifting/carrying constructs are more precise than the

ODQ, these measures may be preferable to the ODQ.

Of note, the two items reinstated in order to fill substantial gaps on the high extreme in the

ability continuum, "carrying toddler on back" for lifting/carrying short form and "running one

block" for walking/moving short form showed high fit statistics. These two items were

measuring the extremes of the construct and had a lack of observations on particular response

categories that might lead to large observed variances. This may be a limitation of the short

forms despite their adequate breadth of measurement. Therefore, our short forms could be

improved in future research by developing items that more adequately fill gaps and replace the

misfitting items.

Precision

As we hypothesized for relative precision, both the CAT and short form measures of the

ICFAM showed more precision than the ODQ for the positioning/transfer and lifting/carrying

constructs. That is, discriminating clinically irrelevant groups (i.e., back pain versus non-back

pain), the CAT outperforms both the short forms and the ODQ and the ICFAM short forms

outperform the ODQ in. This was not true for the walking/moving construct. For the

walking/moving construct, the CAT was less precise than both the short form and the ODQ in

discriminating individuals with back pain from those without back pain. For the

positioning/transfer construct, the CAT performed about two times greater in terms of relative









precision (RP) than did the short form or the ODQ, while the short form performed 42% greater

in terms of RP than did the ODQ. For the lifting/carrying construct, the CAT performed 16%

greater in terms of RP than did the short form and 42% greater than did the ODQ, while the short

form performed 22% greater than did the ODQ.

The failure of the walking/moving CAT to show more precision than the short form or the

ODQ appears to be related to the relative variances. The F statistic is a ratio of between group

estimates to within group estimates. The low CAT F statistic for the walking/moving construct is

a result of either high variance of person measures between the two groups or low variance of the

person measures within the groups. In practical terms, the walking/moving construct may have

less relevance than the other constructs for individuals with back pain. This might lead to either

lower between group variance or higher within group variance relative to the other constructs.

In addition to precision, the CAT method provides a means for administering items in a

way that is efficient (28,34,36). In the present study, in terms of efficiency the CAT

outperformed both the 10-item short forms and the ODQ. That is, on the CAT average

respondents answered 5.62 items for the positioning/transfer, 6.37 items for the lifting/carrying,

and 6.25 items for the walking/moving constructs while both the short form and ODQ required

answering 10 questions.

In summary, our data did not fit the models in CFA and subsequent EFA exploring factor

structure of each construct did not show sufficient evidence to support the existence of

unidimensional constructs. These findings may indicate the need for use of multidimensional

models to adequately describe the dimensionality of physical function. In addition, there is a

need for future studies to further develop the constructs of the ICFAM, particularly the

walking/moving construct based on physiological measures such as METs. Another limitation of









this study is that we sacrificed considerable precision in short form creation. This may be partly

due to reinstating two problematic items for the substantial gaps in the short forms. This may

implicate that short forms could be improved by future research addressing: 1) replacing

problematic items and 2) developing items that more adequately fill the gaps in the person ability

to cover a wider range of the trait.

Despite the multidimensional constructs on the ICFAM and the short forms, the adequate

item level psychometrics suggests that the CAT method for measuring physical activity has

promise. The CAT and the short forms of the ICFAM showed more precision than the ODQ for

the positioning/transfer and lifting/carrying constructs, although the CAT of the ICFAM for the

walking/moving construct was less precise than the short form and the ODQ measure. Overall,

the CAT and the short forms of the ICFAM have several advantages over traditional self-report

measures such as the ODQ. For researchers, precise measures decrease the number of subjects

needed for a study and maximize the possibility of detecting differences between groups. For

clinicians, precise measures capture small but potentially significant increments of improvements

in response to clinical interventions. In the present study, we presented evidence of the

advantages of IRT-based short forms and CAT measures over a conventional back pain

questionnaire. With the increased use of computers and web-based devices for data collection in

research and clinical practice, CAT measures may become preferable due to their efficiency

without loss of precision. When these devices are not available, IRT-based short forms appear to

a reasonable alternative. In general, the findings are supportive of implementing contemporary

IRT-based measures in both research and clinical settings.









APPENDIX
THE OSWESTRY BACK PAIN DISABILITY QUESTIONNAIRE (ODQ)

This questionnaire has been designed to give your therapist information as to how your back pain
has affected your ability to manage in everyday life. Please answer every question by placing a
mark in the one box that best describes your condition today. We realize you may feel that 2 of
the statements may describe your condition, but please mark only the box that most closely
describes your current condition.
Pain Intensity
SI can tolerate the pain I have without having to use pain medication.
O The pain is bad, but I can manage without having to take pain medication.
O Pain medication provides me with complete relief from pain.
O Pain medication provides me with moderate relief from pain.
O Pain medication provides me with little relief from pain.
O Pain medication has no effect on my pain.
Personal Care (e.g., Washing, Dressing)
SI can take care of myself normally without causing increased pain.
0 I can take care of myself normally, but it increases my pain.
O It is painful to take care of myself, and I am slow and careful.
0 I need help, but I am able to manage most of my personal care.
0 I need help every day in most aspects of my care.
0 I do not get dressed, I wash with difficulty, and I stay in bed.
Lifting
SI can lift heavy weights without increased pain.
0 I can lift heavy weights, but it causes increased pain.
O Pain prevents me from lifting heavy weights off the floor, but I can manage
if the weights are conveniently positioned (e.g., on a table).
O Pain prevents me from lifting heavy weights, but I can manage
light to medium weights if they are conveniently positioned.
0 I can lift only very light weights.
0 I cannot lift or carry anything at all.
Walking
I Pain does not prevent me from walking any distance.
O Pain prevents me from walking more than 1 mile. (1 mile = 1.6 km).
O Pain prevents me from walking more than 1/2 mile.
O Pain prevents me from walking more than 1/4 mile.
0 I can walk only with crutches or a cane.
0 I am in bed most of the time and have to crawl to the toilet.
Sitting
SI can sit in any chair as long as I like.
0 I can only sit in my favorite chair as long as I like.
O Pain prevents me from sitting for more than 1 hour.
O Pain prevents me from sitting for more than 1/2 hour.
O Pain prevents me from sitting for more than 10 minutes.
O Pain prevents me from sitting at all.









Standing
SI can stand as long as I want without increased pain.
0 I can stand as long as I want, but it increases my pain.
O Pain prevents me from standing for more than 1 hour.
O Pain prevents me from standing for more than 1/2 hour.
O Pain prevents me from standing for more than 10 minutes.
O Pain prevents me from standing at all.
Sleeping
I Pain does not prevent me from sleeping well.
0 I can sleep well only by using pain medication.
O Even when I take medication, I sleep less than 6 hours.
O Even when I take medication, I sleep less than 4 hours.
O Even when I take medication, I sleep less than 2 hours.
O Pain prevents me from sleeping at all.
Social Life
D My social life is normal and does not increase my pain.
O My social life is normal, but it increases my level of pain.
O Pain prevents me from participating in more energetic activities (e.g., sports, dancing).
O Pain prevents me form going out very often.
O Pain has restricted my social life to my home.
0 I have hardly any social life because of my pain.
Traveling
D I can travel anywhere without increased pain.
0 I can travel anywhere, but it increases my pain.
O My pain restricts my travel over 2 hours.
O My pain restricts my travel over 1 hour.
O My pain restricts my travel to short necessary journeys under 1/2 hour.
O My pain prevents all travel except for visits to the physician / therapist or hospital.
Employment / Homemaking
I My normal homemaking /job activities do not cause pain.
O My normal homemaking /j ob activities increase my pain, but
I can still perform all that is required of me.
0 I can perform most of my homemaking /job duties, but pain prevents me from
performing more physically stressful activities (e.g., lifting, vacuuming).
O Pain prevents me from doing anything but light duties.
O Pain prevents me from doing even light duties.
O Pain prevents me from performing any job or homemaking chores.

Source: Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back Pain Disability
Questionnaire and the Quebec Back Pain Disability Scale. Physical Therapy. 2001;81:776-788.









LIST OF REFERENCES


1. Andersson GB. Epidemiological features of chronic low-back pain. Lancet.
1999;354(9178):581-5.

2. Bergner M, Bobbitt RA, Pollard WE, Martin DP, Gilson BS. The sickness impact profile:
validation of a health status measure. Med Care. 1976;14(1):57-67.

3. Fairbank JC, Couper J, Davies JB, O'Brien JP. The Oswestry low back pain disability
questionnaire. Physiotherapy. 1980;66(8):271-3.

4. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile:
development and final revision of a health status measure. Med Care. 1981;19(8):787-
805.

5. Waddell G. An approach to backache. Br J Hosp Med. 1982;28(3):187, 90-1, 93-4,
passim.

6. Roland M, Morris R. A study of the natural history of low-back pain. Part II:
development of guidelines for trials of treatment in primary care. Spine (Phila Pa 1976).
1983;8(2):145-50.

7. Roland M, Morris R. A study of the natural history of back pain. Part I: development of a
reliable and sensitive measure of disability in low-back pain. Spine (Phila Pa 1976).
1983;8(2):141-4.

8. Ware JE, Jr., Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I.
Conceptual framework and item selection. Med Care. 1992;30(6):473-83.

9. Haley SM, McHorney CA, Ware JE, Jr. Evaluation of the MOS SF-36 physical
functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item
scale. J Clin Epidemiol. 1994;47(6):671-84.

10. Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping
DL, et al. The Quebec Back Pain Disability Scale: conceptualization and development. J
Clin Epidemiol. 1996;49(2):151-61.

11. McHorney CA, Haley SM, Ware JE, Jr. Evaluation of the MOS SF-36 Physical
Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch
scoring methods. J Clin Epidemiol. 1997;50(4):451-61.

12. Fisher WP, Jr. Foundations for health status metrology: the stability of MOS SF-36 PF-
10 calibrations across samples. J La State Med Soc. 1999;151(11):566-78.

13. Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back Pain Disability
Questionnaire and the Quebec Back Pain Disability Scale. Phys Ther. 2001;81(2):776-88.









14. Davidson M, Keating JL. A comparison of five low back disability questionnaires:
reliability and responsiveness. Phys Ther. 2002;82(1):8-24.

15. Ware J, Jr., Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction
of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220-33.

16. Million R, Hall W, Nilsen KH, Baker RD, Jayson MI. Assessment of the progress of the
back-pain patient 1981 Volvo Award in Clinical Science. Spine (Phila Pa 1976).
1982;7(3):204-12.

17. Ruta DA, Garratt AM, Wardlaw D, Russell IT. Developing a valid and reliable measure
of health outcome for patients with low back pain. Spine (Phila Pa 1976).
1994;19(17):1887-96.

18. Greenough CG, Fraser RD. Assessment of outcome in patients with low-back pain. Spine
(Phila Pa 1976). 1992;17(1):36-41.

19. Manniche C, Asmussen K, Lauritsen B, Vinterberg H, Kreiner S, Jordan A. Low Back
Pain Rating scale: validation of a tool for assessment of low back pain. Pain.
1994;57(3):317-26.

20. Daltroy LH, Cats-Baril WL, Katz JN, Fossel AH, Liang MH. The North American spine
society lumbar spine outcome assessment Instrument: reliability and validity tests. Spine
(Phila Pa 1976). 1996;21(6):741-9.

21. Williams RM, Myers AM. Functional Abilities Confidence Scale: a clinical measure for
injured workers with acute low back pain. Phys Ther. 1998;78(6):624-34.

22. Williams RM, Myers AM. A new approach to measuring recovery in injured workers
with acute low back pain: Resumption of Activities of Daily Living Scale. Phys Ther.
1998;78(6):613-23.

23. Muller U, Roder C, Greenough CG. Back related outcome assessment instruments. Eur
Spine J. 2006; 15 Suppl 1:S25-31.

24. Deyo RA. Measuring the functional status of patients with low back pain. Arch Phys Med
Rehabil. 1988;69(12):1044-53.

25. Kopec JA. Measuring functional outcomes in persons with back pain: a review of back-
specific questionnaires. Spine (Phila Pa 1976). 2000;25(24):3110-4.

26. Muller U, Roeder C, Dubs L, Duetz MS, Greenough CG. Condition-specific outcome
measures for low back pain. Part II: scale construction. Eur Spine J. 2004; 13(4):314-24.

27. Muller U, Duetz MS, Roeder C, Greenough CG. Condition-specific outcome measures
for low back pain. Part I: validation. Eur Spine J. 2004;13(4):301-13.









28. McHorney CA. Generic health measurement: past accomplishments and a measurement
paradigm for the 21st century. Ann Intern Med. 1997;127(8 Pt 2):743-50.

29. Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine (Phila Pa 1976).
2000;25(22):2940-52; discussion 52.

30. White LJ, Velozo CA. The use of Rasch measurement to improve the Oswestry
classification scheme. Arch Phys Med Rehabil. 2002;83(6):822-31.

31. Baker C, Pynsent, PB, Fairbank, JCT. The Oswestry Disability Index revisited: Its
reliability, repeatability, and validity, and a comparison with the St. Thomas's Disability
Index. 1989(In: Roland MO, Jenner JR, eds. Back Pain: New Approaches to Education
and Rehabilitation. Manchester, UK: Manchester University Press,): 174-86.

32. Deyo RA, Battie M, Beurskens AJ, Bombardier C, Croft P, Koes B, et al. Outcome
measures for low back pain research. A proposal for standardized use. Spine (Phila Pa
1976). 1998;23(18):2003-13.

33. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy L. Measuring clinically important
changes with patient-oriented questionnaires. Med Care. 2002;40(4 Suppl):II45-51.

34. Velozo CA, Kielhofner G, Lai JS. The use of Rasch analysis to produce scale-free
measurement of functional ability. Am J Occup Ther. 1999;53(1):83-90.

35. McHorney CA. Health status assessment methods for adults: past accomplishments and
future challenges. Annu Rev Public Health. 1999;20:309-35.

36. Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes
assessment. J Rehabil Med. 2005;37(6):339-45.

37. Hambleton RK. Emergence of item response modeling in instrument development and
data analysis. Med Care. 2000;38(9 Suppl):II60-5.

38. DeVellis RF. Classical test theory. Med Care. 2006;44(11 Suppl 3):S50-9.

39. Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must
be interval. Arch Phys Med Rehabil. 1989;70(12):857-60.

40. Velozo CA, Choi B, Zylstra SE, Santopoalo R. Measurement qualities of a self-report
and therapist-scored functional capacity instrument based on the Dictionary of
Occupational Titles. J Occup Rehabil. 2006;16(1):109-22.

41. Velozo CA, Wang Y, Lehman L, Wang JH. Utilizing Rasch measurement models to
develop a computer adaptive self-report of walking, climbing, and running. Disabil
Rehabil. 2008;30(6):458-67.

42. Velozo CA, Peterson EW. Developing meaningful Fear of Falling Measures for
community dwelling elderly. Am J Phys Med Rehabil. 2001;80(9):662-73.









43. Weiss D. Improving measurement quality and efficiency with adaptive testing. Applied
Psychological Testing. 1982;6:473-92.

44. Haley SM, Coster WJ, Andres PL, Kosinski M, Ni P. Score comparability of short forms
and computerized adaptive testing: Simulation study with the activity measure for post-
acute care. Arch Phys Med Rehabil. 2004;85(4):661-6.

45. Velozo CA, Lai JS, Mallinson T, Hauselman E. Maintaining instrument quality while
reducing items: application of Rasch analysis to a self-report of visual function. J
Outcome Meas. 2000;4(3):667-80.

46. Haley SM, Andres PL, Coster WJ, Kosinski M, Ni P, Jette AM. Short-form activity
measure for post-acute care. Arch Phys Med Rehabil. 2004;85(4):649-60.

47. Bjorner J, Ware Jr., JE. Using modern psychometric methods to measure health outcomes.
Med Outcome Trust Monitor 1998;3:12-6.

48. Elhan AH, Oztuna D, Kutlay S, Kucukdeveci AA, Tennant A. An initial application of
computerized adaptive testing (CAT) for measuring disability in patients with low back
pain. BMC Musculoskelet Disord. 2008;9:166. PMCID: 2651163.

49. Haley SM, Ni P, Ludlow LH, Fragala-Pinkham MA. Measurement precision and
efficiency of multidimensional computer adaptive testing of physical functioning using
the pediatric evaluation of disability inventory. Arch Phys Med Rehabil.
2006;87(9):1223-9.

50. Hart DL, Cook KF, Mioduski JE, Teal CR, Crane PK. Simulated computerized adaptive
test for patients with shoulder impairments was efficient and produced valid measures of
function. J Clin Epidemiol. 2006;59(3):290-8.

51. Haley SM, Siebens H, Coster WJ, Tao W, Black-Schaffer RM, Gandek B, et al.
Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation:
I. Activity outcomes. Arch Phys Med Rehabil. 2006;87(8):1033-42.

52. Jette AM, Haley SM, Ni P, Olarsch S, Moed R. Creating a computer adaptive test version
of the late-life function and disability instrument. J Gerontol A Biol Sci Med Sci.
2008;63(11):1246-56. PMCID: 2718692.

53. Haley SM, Gandek B, Siebens H, Black-Schaffer RM, Sinclair SJ, Tao W, et al.
Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation:
II. Participation outcomes. Arch Phys Med Rehabil. 2008;89(2):275-83. PMCID:
2666330.

54. World Health Organization. Towards a common language for Functioning, Disability and
Health. Geneva. 2002.

55. Jette AM. Assessing disability in studies on physical activity. Am J Prev Med. 2003;25(3
Suppl 2):122-8.









56. Picavet HS, Schouten JS. Musculoskeletal pain in the Netherlands: prevalences,
consequences and risk groups, the DMC(3)-study. Pain. 2003;102(1-2):167-78.

57. Ware JE, Jr., Kosinski M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlof CG, et al.
Applications of computerized adaptive testing (CAT) to the assessment of headache
impact. Qual Life Res. 2003;12(8):935-52.

58. Hol A, Vorst, HCM, Mellenbergh, GJ. Computerized adaptive testing for polytomous
motivation items: Administration mode effects and a comparison with short forms.
Applied Psychological Measure. 2007;31:412-29.

59. Flynn KE, Dombeck CB, DeWitt EM, Schulman KA, Weinfurt KP. Using item banks to
construct measures of patient reported outcomes in clinical trials: investigator perceptions.
Clin Trials. 2008;5(6):575-86. PMCID: 2662709.

60. Nunnally JC, Bernstein, I.H. Psychometric Theory 1994(New York, NY: McGraw-Hill).

61. Kosinski M, Bayliss MS, Bjorner JB, Ware JE, Jr., Garber WH, Batenhorst A, et al. A
six-item short-form survey for measuring headache impact: the HIT-6. Qual Life Res.
2003;12(8):963-74.

62. Deyo RA. Comparative validity of the sickness impact profile and shorter scales for
functional assessment in low-back pain. Spine (Phila Pa 1976). 1986; 11(9):951-4.

63. Carter WB, Bobbitt RA, Bergner M, Gilson BS. Validation of an interval scaling: the
sickness impact profile. Health Serv Res. 1976;11(4):516-28. PMCID: 1071949.

64. Deyo RA, Carter WB. Strategies for improving and expanding the application of health
status measures in clinical settings. A researcher-developer viewpoint. Med Care.
1992;30(5 Suppl):MS176-86; discussion MS96-209.

65. Deyo RA, Diehl AK. Measuring physical and psychosocial function in patients with low-
back pain. Spine (Phila Pa 1976). 1983;8(6):635-42.

66. Follick MJ, Smith TW, Ahern DK. The sickness impact profile: a global measure of
disability in chronic low back pain. Pain. 1985;21(1):67-76.

67. Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping
DL, et al. The Quebec Back Pain Disability Scale. Measurement properties. Spine (Phila
Pa 1976). 1995;20(3):341-52.

68. Stratford PW, Binkley FM, Riddle DL. Health status measures: strategies and analytic
methods for assessing change scores. Phys Ther. 1996;76(10):1109-23.

69. Stratford PW, Binkley J, Solomon P, Finch E, Gill C, Moreland J. Defining the minimum
level of detectable change for the Roland-Morris questionnaire. Phys Ther.
1996;76(4):359-65; discussion 66-8.









70. Kopec JA, Esdaile JM. Functional disability scales for back pain. Spine (Phila Pa 1976).
1995;20(17):1943-9.

71. Hsieh CY, Phillips RB, Adams AH, Pope MH. Functional outcomes of low back pain:
comparison of four treatment groups in a randomized controlled trial. J Manipulative
Physiol Ther. 1992;15(1):4-9.

72. Beurskens AJ, de Vet HC, Koke AJ. Responsiveness of functional status in low back pain:
a comparison of different instruments. Pain. 1996;65(1):71-6.

73. Fairbank JC. The use of revised Oswestry Disability Questionnaire. Spine (Phila Pa
1976). 2000;25(21):2846-7.

74. Fairbank J. Revised Oswestry Disability questionnaire. Spine (Phila Pa 1976).
2000;25(19):2552.

75. Bossons CR, Levy J, Sutterlin CE, 3rd. Reconstructive spinal surgery: assessment of
outcome. South Med J. 1996;89(11): 1045-52.

76. Frost H, Lamb SE, Stewart-Brown S. Responsiveness of a patient specific outcome
measure compared with the Oswestry Disability Index v2.1 and Roland and Morris
Disability Questionnaire for patients with subacute and chronic low back pain. Spine
(Phila Pa 1976). 2008;33(22):2450-7; discussion 8.

77. Fairbank JC. Use and abuse of Oswestry Disability Index. Spine (Phila Pa 1976).
2007;32(25):2787-9.

78. Stewart AL. Conceptual challenges in linking physical activity and disability research.
Am J Prev Med. 2003;25(3 Suppl 2):137-40.

79. Taylor SJ, Taylor AE, Foy MA, Fogg AJ. Responsiveness of common outcome measures
for patients with low back pain. Spine (Phila Pa 1976). 1999;24(17): 1805-12.

80. Page SJ, Shawaryn MA, Cemich AN, Linacre JM. Scaling of the revised Oswestry low
back pain questionnaire. Arch Phys Med Rehabil. 2002;83(11): 1579-84.

81. Hart DL, Wang YC, Stratford PW, Mioduski JE. Computerized adaptive test for patients
with knee impairments produced valid and responsive measures of function. J Clin
Epidemiol. 2008;61(11):1113-24.

82. Fliege H, Becker J, Walter OB, Rose M, Bjorner JB, Klapp BF. Evaluation of a
computer-adaptive test for the assessment of depression (D-CAT) in clinical application.
Int J Methods Psychiatr Res. 2009; 18(1):23-36.

83. Hart DL, Wang YC, Stratford PW, Mioduski JE. A computerized adaptive test for
patients with hip impairments produced valid and responsive measures of function. Arch
Phys Med Rehabil. 2008;89(11):2129-39.









84. Shone CC, Quinn CP, Wait R, Hallis B, Fooks SG, Hambleton P. Proteolytic cleavage of
synthetic fragments of vesicle-associated membrane protein, isoform-2 by botulinum type
B neurotoxin. Eur J Biochem. 1993;217(3):965-71.

85. Andersson GBI. Epidemiological features of chronic low-back pain. Lancet.
1999;354(9178):581-5.

86. U.S. Department of Labor. Nonfatal occupational injuries and illness requiring days away
from work. Bureau of Labor Statistics; 2005 [updated 2005; cited]; Available
from: http://www.bls.gov/iif/oshwc/osh/os/oshO5_01 .pdf

87. Pai S, Sundaram, LJ. Low back pain: an economic assessment in the United States.
Orthop Clin North Am. 2004;35:1-5.

88. Frymoyer JW, Cats-Baril WL. An overview of the incidences and costs of low back pain.
Orthop Clin North Am. 1991;22(2):263-71.

89. Manchikanti L. Epidemiology of low back pain. Pain Physician. 2000;3(2): 167-92.

90. Hambleton RK. Comparison of classical test theory and item response theory and their
applications to test development. Educ Meas Issue Pract 1993:38-47.

91. Crocker LA, J. Introduction to classical and modern test theory. 1986.

92. Thurstone L. Measurement of social attitudes. J Abnorm Soc Psycholol. 1931;26:249-69.

93. Merbitz C, Morris J, Grip JC. Ordinal scales and foundations of misinference. Arch Phys
Med Rehabil. 1989;70(4):308-12.

94. Shumway-Cook A, Woollacott, M, editor. Motor Control: Theory and Practical
Applications. 2nd ed. ed. Philadelphia: Lippincott Williams & Wilkins; 2000.

95. Bond TG, Fox, CM. Applying the Rasch model, Fundamental measurement in the human
sciences. 2001;2nd edition.

96. Linacre JM. Detecting multidimensionality: which residual data-type works best? J
Outcome Meas. 1998;2(3):266-83.

97. Smith E. Detecting and evaluating the impact of multidimensionality using item fit
statistics and principle component analysis of residuals. J Appl Meas. 2002;3:205-31.

98. Brown TA. Confirmatory factor analysis of the Penn State Worry Questionnaire:
Multiple factors or method effects? Behav Res Ther. 2003;41(12): 1411-26.

99. Brown TA. Confirmatory Factor Analysis for Applied Research. 2008.

100. Child D, editor. The Essentials of Factor Analysis. 3rd edition ed: Continuum; 2006.









101. Cattell RB. The scree test for the number of factors. Mutivariate Behavioral Research.
1966; 1:245-76.

102. Norman GR, Steiner, D.L. Biostatistics: The bare essentials. 1994(St. Louise: Mosby
Yearbook Inc.).

103. Linacre JM. WINSTEPS Rasch measurement computer program. 2005.

104. Wright BD, Masters, G.N. Rating scale analysis. 1982.

105. Wang WC, Chen, C.T. Item parameter recovery, standard error estimates, and fit
statistics of the WINSTEPS Program for the family of Rasch models. Educ Psychol
Measure. 2005;65:376-404.

106. Wright BD, Linacre, J. M. Reasonable mean-square fit values. 1994;8:3 Autumn(Rasch
Measurement Transactions Conents):370.

107. Linacre JM. What do Infit and Outfit, Mean-square and Standardized mean? Rasch
Measurement Transactions. 2002; 16(2):878.

108. Correlations: point-biserial, point-measure, residual. Special
topuics. http://www.winsteps.com/winman/index.htm

109. Balady GJ. Survival of the fittest--more evidence. N Engl J Med. 2002;346(11):852-4.

110. Blair SN, Haskell WL, Ho P, Paffenbarger RS, Jr., Vranizan KM, Farquhar JW, et al.
Assessment of habitual physical activity by a seven-day recall in a community survey and
controlled experiments. Am J Epidemiol. 1985;122(5):794-804.

111. Fletcher GF, Balady, G.J., Amsterdam, E.A. Exercise standards for testing and training, a
statement for healthcare professionals from the American Heart Association Circulation
2001;104:1694-740.

112. Montoye HJ, Kemper, H.C.G., Saris, W.H.M., and Washburn, R.A. Measuring Physical
Activity and Energy Expenditure. Human Kinetics. 1996(Cahmpaign, IL):p 4-5.

113. Braith RW, Welsch MA, Mills RM, Jr., Keller JW, Pollock ML. Resistance exercise
prevents glucocorticoid-induced myopathy in heart transplant recipients. Med Sci Sports
Exerc. 1998;30(4):483-9.

114. Reckase MD. The difficulty of test items that measure more than one ability. Applied
Psychological Measurement. 1985;Dec(9(4)):401-12.

115. Ware JE, Jr. A 12-Item Short-Form Health Survey: Construction of Scales and
Preliminary Tests of Reliability and Validity. Med Care. 1996;34(3):220-33.

116. Box G, Draper, N. Empirical Model Building and Response Surfaces. 1987;New
York(John Wiley and Sons).









117. Wilkinson L, The task force on statistical inference Statistical Methods in Psychology
Journals. American Psychologist. 1999;54(8):594-604.

118. Feldt LS, Brennan, R.L. Reliability. Educational Measurement. 1989;3rd ed.(New York:
Macmillan.):pp. 105-46.

119. Mallinson T, Stelmack, J., Velozo, C. A comparison of the separation ratio and
coefficient a in the creation of minimum item sets. Med Care. 2004;42(1 suppl):I-17 I-
24.

120. Raykov T. Reliability if deleted, not "alpha if deleted": evaluation of scale reliability
following component deletion. British Journal of Mathematical and Statistical
Psychology. 2007;60:201-16.

121. Raykov T. "alpha if item deleted": A note on loss of criterion validity in scale
development if maximizing coefficient alpha. British Journal of Mathematical and
Statistical Psychology. 2008;61:275-85.

122. Thomdike RL, Hagen, E.P. Measurement and evaluation in psychology and education.
1977;4th ed.

123. Jette AM, Haley SM, Ni P. Comparison of functional status tools used in post-acute care.
Health Care Financ Rev. 2003;24(3): 13-24.

124. Wright BD, Masters, G.N. Number of Person or Item Strata. Rasch Measurement
Transactions. 2002;16:3:888.

125. Mezzarane RA, Kohn, A.F. Postural control during kneeling. Experimental Brain
Research. 2007;187(3):395-405.

126. Stineman MG, Goin JE, Granger CV, Fiedler R, Williams SV. Discharge motor FIM-
function related groups. Arch Phys Med Rehabil. 1997;78(9):980-5.

127. Stineman MG, Jette A, Fiedler R, Granger C. Impairment-specific dimensions within the
Functional Independence Measure. Arch Phys Med Rehabil. 1997;78(6):636-43.

128. George D, Mallery, P. SPSS for Windows step by step: A simple guide and reference.
11.0 update. 2003;4th ed.(Boston: Allyn & Bacon).

129. McHorney CA, Ware JE, Jr., Lu JF, Sherbourne CD. The MOS 36-item Short-Form
Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability
across diverse patient groups. Med Care. 1994;32(1):40-66.

130. Fisher RA. Theory of Statistical Estimation. 1925;Proc. Cambridge Pill(Soc. 22):700-25.

131. Erhard RE, Delitto A, Cibulka MT. Relative effectiveness of an extension program and a
combined program of manipulation and flexion and extension exercises in patients with
acute low back syndrome. Phys Ther. 1994;74(12):1093-100.









132. Davidson M. Rasch analysis of three versions of the Oswestry Disability Questionnaire.
Man Ther. 2008;13(3):222-31.

133. Netemeyer RG, Bearden, W.O., Sharma, S. Scaling Procedures, Issues and Applications.
Thousand Oaks, California: Sage Publications, Inc.; 2003.

134. Hart DL, Mioduski JE, Werneke MW, Stratford PW. Simulated computerized adaptive
test for patients with lumbar spine impairments was efficient and produced valid
measures of function. J Clin Epidemiol. 2006;59(9):947-56.

135. Gallagher S. Trunk extension strength and muscle activity in standing and kneeling
postures. Spine (Phila Pa 1976). 1997;22(16):1864-72.

136. Ainsworth BE, Haskell WL, Leon AS, Jacobs DR, Jr., Montoye HJ, Sallis JF, et al.
Compendium of physical activities: classification of energy costs of human physical
activities. Med Sci Sports Exerc. 1993;25(1):71-80.

137. Ainsworth BE, Haskell WL, Whitt MC, Irwin ML, Swartz AM, Strath SJ, et al.
Compendium of physical activities: an update of activity codes and MET intensities. Med
Sci Sports Exerc. 2000;32(9 Suppl):S498-504.









BIOGRAPHICAL SKETCH

Bongsam Choi received his Bachelor of Health Science degree in physical therapy from

the Yonsei University in February 1987. He completed his Master of Health Science degree in

public health at the Yonsei University, Seoul, Korea in February, 1989. Since coming to the

United States in 1992, He has worked in a variety of inpatient, outpatient, rehabilitation hospital,

and home health care settings over 23 years of practice in physical therapy. He was a rehab

supervisor of a company that provides the specialized outpatient rehabilitation service at CORF

(Certified Outpatient Rehabilitation Facility) in Tarpon Springs, FL. He is also an active member

of American Physical Therapy Association.

As he graduated with a PhD in Rehabilitation Science from University of Florida,

Gainesville, FL, he plans to remain active on both clinical practice and research to better

measure the physical function of pain related population.





PAGE 1

1 DEVELOPING PRECISE DISABILITY MEASURES FOR BACK PAIN By BONGSAM CHOI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010

PAGE 2

2 2010 Bongsam Choi

PAGE 3

3 To my dad who has been fighting with stomach cancer, my mom devoting her life to his care, and my family Keonhwa, Jayeon, and Jasun

PAGE 4

4 ACKNOWLEDGMENTS I am heartily thankful to my advisor, Dr. Craig Velozo, whose encouragement, guidance, and support from the initial to the final level en abled me to develop an understanding of the subject. I would like to thank my committee memb ers. First, I would like to thank Dr. Mark Bishop for believing in me from the very first meeting. To Dr. Steven George, I would like to thank for all the recommendations including the a rrangement of data coll ection sites. Finally, I would like to thank to Dr. I-Chan Huang for having me to the heart of issues and introducing me to many different aspects of research. I am hono red to have had the opportunity to work with such high caliber people. I would also like to th ank my colleague, Dr. Le igh Lehman, for all her supports during my dissertation writing. I should specially thank my wife, Keonhwa, for her unlimited supports with encouragements, and my lovely two daughters, Jayeon and Jasun, for checking spelling mistakes and errors throughout whole pages of my draft. My last tha nks should go to my parents, Myungsik and Youngdong; my parents-in law, Gw yhee and Pilhee Lee; my brother Bongno; my sister Yongsoon for supporting me to complete ne w learning; last but not least, my catholic Brother Michael Chun for endless blessings.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ............................................................................................................... 4 LIST OF TABLES ...........................................................................................................................7 LIST OF FIGURES .........................................................................................................................9 LIST OF ABBREVIATIONS ........................................................................................................ 10 ABSTRACT ...................................................................................................................... .............12 CHAPTER 1 THE IMPORTANCE OF PRECISELY ME AS URING DISABILITY FOR BACK PAIN .......................................................................................................................... .............14 Introduction .................................................................................................................. ...........14 Item Response Theory (IRT) and Computer Adaptive Testing (CAT) ........................... 16 Physical Function CAT and ICFmeasure.com ................................................................ 18 Short Forms of Back Pain Disability ............................................................................... 19 Existing Self-Report Back Pa in Disability Measures ...................................................... 20 The Oswestry Back Pain Disability Questionnaire as a Gold Standard ..........................22 CAT, Short Form, and Existing Back Pain Measure in Measurement Precision ............ 23 Research Question 1 ...............................................................................................................24 Research Question 2 ...............................................................................................................24 Research Question 3 ...............................................................................................................25 2 THE PSYCHOMETRICS OF THE ICF ACTI VITY MEASURE FOR BACK PAIN .........26 Introduction .................................................................................................................. ...........26 Methods ..................................................................................................................................29 Research Participants .......................................................................................................29 Instrumentation ............................................................................................................... .29 Rasch Rating Scale Model ...............................................................................................30 Data Analysis ...................................................................................................................31 Results .....................................................................................................................................34 Positioning/transfer Construct .........................................................................................34 Lifting/carrying Construct ...............................................................................................37 Walking/moving Construct .............................................................................................39 Discussion .................................................................................................................... ...........41 Summary of Result ..........................................................................................................41 Unidimensionality ........................................................................................................... 42 Rasch Model Fit ..............................................................................................................43 The Hierarchy of Item Difficulty Calibration and Physical Activity ..............................44 Limitations and Future Implications ............................................................................... 45

PAGE 6

6 3 PRECISION OF THREE SHORT FORMS FOR BACK PAI N ............................................66 Introduction .................................................................................................................. ...........66 Method ........................................................................................................................ ............69 Research Participants .......................................................................................................69 Instrumentation ............................................................................................................... .70 Rasch Rating Scale Model ...............................................................................................71 Data Analysis ...................................................................................................................72 Results .....................................................................................................................................74 Short Form for Positioning/Transfer ............................................................................... 75 Short Form for Lifting/Carrying ...................................................................................... 77 Short Form for Walking/Moving ..................................................................................... 79 Discussion .................................................................................................................... ...........81 Summary of Results ........................................................................................................81 Item Level Psychometrics ............................................................................................... 81 Unidimensionality of the Short Forms ............................................................................ 83 Person Separation and Person Reliability ........................................................................84 Test Information Function ...............................................................................................86 Limitations and Future Implications ............................................................................... 87 4 COMPARISONS OF THE RELATIVE PRECISION OF THREE DIFFERENT TYPE BACK PAI N MEASURES: THE ICF ACTI VITY MEASURE (ICFAM) COMPUTER ADAPTIVE TEST, ICFAM SHORT FORMS, AND OSWESTRY BACK PAIN DISABILITY QUATIONNAIRE ........................................................................................ 104 Introduction .................................................................................................................. .........104 Method ........................................................................................................................ ..........108 Research Participants .....................................................................................................108 Instrumentation .............................................................................................................. 109 Analysis .........................................................................................................................111 Results ...................................................................................................................................112 Discussion .................................................................................................................... .........114 Summary of Results ......................................................................................................114 Correlations .................................................................................................................. .115 Relative Precision ..........................................................................................................116 Limitations and Future Implications ............................................................................. 117 5 CONCLUSION: INTEGRATING THE FINDINGS ...........................................................126 APPENDIX: THE OSWESTRY BACK PAIN DISA BILITY QUESTIONNAIRE (ODQ) ......136 LIST OF REFERENCES .............................................................................................................138 BIOGRAPHICAL SKETCH .......................................................................................................148

PAGE 7

7 LIST OF TABLES Table page 2-1 Examples of items for three constructs of the ICFAM ...................................................... 47 2-2 Demographic information of research participants ............................................................49 2-3 Demographic information of research participants ............................................................50 2-4 Number of retaining factors for the ICFAM ......................................................................50 2-5 Factor structure of positioni ng/transfer construct following EFA ..................................... 51 2-6 Factor structure of liftin g/carrying construct following EFA ............................................54 2-7 Factor structure of walki ng/moving construct following EFA .......................................... 56 2-8 Fit statistics for pos itioning/transfer construct ...................................................................57 2-9 Fit statistics for lifting/carrying construct ..........................................................................60 2-10 Fit statistics for walking/moving construct ........................................................................61 3-1 Demographic information of research participants ............................................................89 3-2 Results of confirmatory factor analysis for short forms of the ICFAM ............................ 90 3-3 Factor structure of short form for positioning/transfer construct ....................................... 91 3-4 Factor structure of short form for lifting/carrying construct .............................................. 92 3-5 Factor structure of short fo rm for walking/moving construct ............................................93 3-6 Short form of the ICFAM ................................................................................................. .97 3-7 Fit statistics for positioning/carrying ................................................................................. 99 3-8 Fit statistics for lifting/carrying construct ..........................................................................99 3-9 Fit statistics for walking/moving construct ......................................................................100 4-1 Demographic characteristics of study participants ..........................................................120 4-2 Correlations coefficients for CAT, short forms, and ODQ measure for back pain group ......................................................................................................................... .......124 4-3 Correlations coefficients for CAT, sh ort forms, and ODQ measure for non-back pain group ......................................................................................................................... .......124

PAGE 8

8 4-4 Mean difference between means for back pain and non-back pain groups ..................... 125

PAGE 9

9 LIST OF FIGURES Figure page 2-1 Item-person map of positioning/tr ansfer construct of the ICFAM. ...................................59 2-2 Item-person map of lifting/car rying construct of the ICFAM. ..........................................64 2-3 Item-person map of walki ng/moving construct of the ICFAM ......................................... 65 3-1 Item-person map of pos itioning/transfer construct of the ICFAM following 10 items removal and prior to 10 item removal. ............................................................................... 94 3-2 Item-person map of lifting/carrying construct of the IC FAM following 10 items removal and prior to 10 item removal.. .............................................................................. 95 3-3 Item-person map of walking/moving construct of the ICFAM following 10 items removal and prior to 10 item removal. ............................................................................... 96 3-4 Item-person map of three short form s (positioning/transfer, lifting/carrying, and walking/moving) of the ICFAM following the item removal. ........................................ 101 3-5 Test information function of s hort form versus entire set of items for positioning/transfer ............................................................................................................99 3-6 Test information function of short form versus entire set of item s for lifting/carrying .....99 3-7 Test information function short form ve rsus entire set of items for walking/moving ..... 103 4-1 Scatter plot of ability measures from the CAT measure versus the short form measure for positioning/transfer a nd lifting/carrying construct. ...................................... 121 4-2 Scatter plot of ability measures from the CAT measure versus the short form measure ....................................................................................................................... .....122 4-3 Scatter plot of ability measures from the CAT measure vers us the ODQ measure. ........ 123

PAGE 10

10 LIST OF ABBREVIATIONS ADL Activity of Daily Living AMPAC Activity Measure for Post-Acute Care CAT Computer Adaptive Test/Testing CATs Computer Adaptive Tests CFA Confirmatory Factor Analysis CFI Comparative Fit Index CTT Classical Test Theory EFA Exploratory Factor Analysis HIT Headache Impact Test ICF International Classification of Functioning, Disability, and Health ICFAM ICF Activity Measure IRT Item Response Theory MET Metabolic Equivalent MnSq Mean Square MOS Medical Outcome Scale NIDRR National Institute of Disabi lity and Rehabilitation Research ODQ Oswestry Back Pain Disability Questionnaire PCS Physical Component Summary PF Physical Function QBDS Quebec Back Pain Disability Scale RFS Lumbar Spine Functional Status RMDQ Roland-Morris Disa bility Questionnaire RP Relative Precision SR Separation Ratio

PAGE 11

11 TLI Tucker-Lewis Index WHO World Health Organization WRMR Weighted Root Mean Square Residual

PAGE 12

12 Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DEVELOPING PRECISE DISABILITY MEASURES FOR BACK PAIN By Bongsam Choi August 2010 Chair: Craig A. Velozo Major: Rehabilitation Science Measurement of disability is crucial to many aspects of the rehabil itation process including capturing individual level change s, evaluating treatment effe ctiveness, and making policy decision, and administration cost s. Many condition-specific self -reported instruments were developed over the past three decades to meet the need for assessment of disability resulting from back pain. However, these existing back pain disability measures have considerable limitations in terms of measurement precision an d comprehensiveness. In order to overcome these limitations, precise measures should have the tremendous number of items either covering the wide range of ability traits or matching it ems to person ability clos ely. However, it is impossible to achieve the goals under the conv entional classical test theory framework. Therefore, the aims of this study are to create precise disability measures with adequate measurement precision. The study consisted of the following three st eps to accomplish the goals; 1) investigating the item level psychometrics of the ICF Activity Measure will be determined by using Rasch analysis (one-parameter Item Response Theory model), 2) creating three short forms of the ICFAM based on the item level psychometrics, 3) comparison of three measures in terms of relative precision; the Computer Adaptive Testin g (CAT) measure of the ICFAM, the three 10-

PAGE 13

13 item short form measures of the ICFAM, and th e Oswestry Back Pain Disability Questionnaire (ODQ) measure as a most popular conventiona l back pain disability instrument. Three constructs of the ICFAM and three 10item short forms of the ICFAM were found to have a multidimensional construct. However, some findings still implicate the possibility of subconstructs of essentially unidime nsional construct. The item diffi culty hierarchical order did not reflect the hypothetical hierar chy based on the MET values ex cept walking/moving construct. The empirical hierarchy fairly well follows either a clinical featur e of back pain or motor control theory. The three IRT-based short forms with adequate breadth were created based on item-level psychometric properties. These we re applied to 42 back pain and 42 non-back pain groups as well as the CAT of the ICFAM and the ODQ. Th e CAT outperformed the short forms and the ODQ except walking/moving construct, while the short forms outperformed the ODQ in terms of precision. The results may implicate that researchers/clinicians s hould be encouraged to use the CAT measure of the ICFAM, sin ce it is precise and efficient measure. The IRT-based short forms of the ICFAM may be an alternative, when computer systems are not readily available.

PAGE 14

14 CHAPTER 1 THE IMPORTANCE OF PRECISELY MEAS URING DIS ABILITY FOR BACK PAIN Introduction Disability m easurement is crucial in capt uring clinical changes, evidence-based rehabilitation practice, admini stration of disability manageme nt, and policy making process. Over the past three decades, the need for assessmen t of disability resulting from back pain has grown. This is due to the fact that back pain is the most common cause of activity limitation in our society ( 1). This need has prompted extensive rese arch o n self-report outcome measure of function and disability resulting from back pain ( 2-22). To date, nearly 82 back-related condition specific disability instrum ents have been introduced ( 23). Most, if not all of these instruments appear in peer reviewed j ournals and show adequate ps ychom etrics (e.g., good reliability, validity, and responsiveness). However, only a fe w outcome measures have been widely used and commonly accepted for disability measures for back pain ( 10,23-27). Despite the myriad of back pain outcom e measures available, selecting an optimal disability measure is a prevailing challenge. This stems from the need to carefully consider the preferences of investigators or c linician when choosing a measure ( 25). While published m easures have adequate psychometric properties, th ey may or may not be sensitive to all severity groups or to the actual improvements that resu lt from clinical interventions. Because these instruments are developed to targ et the average person, they tend to be more sensitive at the center than at the extremes (e.g., low and high le vels of disability) of the ability range ( 28). For exam ple, while the Oswestry Back Pain Disabili ty Questionnaire (ODQ), is often considered as a gold standard, it demonstrates ceiling effects (i.e., persons with hi gh ability) when it is administered to persons with minimal impairments ( 29,30) and floor effects (i.e., person with

PAGE 15

15 low ability) when it is administered to persons with severe impairments ( 31,32). Therefore, the ODQ often fails to precisely m easur e the disability of back pain across the full range of ability. Imprecise measurement, especially ceiling eff ects result in type II errors. That is, the number of false-negatives is large among those scoring in the upper extreme of the instrument ( 28). Furthermore, it is impossible to measure impr ovem ent in health status over time for those in the ceiling (i.e., those able to complete all items without difficu lty or scoring high initially). These problems in measurement precision are par tly due to the fixed number of items included on instruments. Thus, these instruments do not have adequate breadth for the underlying construct being measured ( 33). This leads to ceiling and/or floor effects and subsequently, failure to captu re small but potentially significant incr ements of improvement across the full ranges of the construct ( 11, 34). Measurem ents may also be the result of using items that do not closely match to the ability of the population of interest (35). Deficits in precision occur wh en easy item s are administered to high ability populations (e.g., admi nistering items measuring basic ADLs to elite athletes) and difficult items are administered to low ability populations (e.g., administering items measuring ability to lift heavy objects to individuals with se vere back pain). Furthe rmore, since individuals are asked to respond all items on an instrument re gardless of how they respond to previous items (i.e., asked if they can lift 10 pounds after re sponding that they cannot lift 5 pounds) and regardless of the relevance of the items to the individual (i.e., asking individual with no movement in legs if they can walk a mile), respondent burden and administration costs are increased. Unfortunately, test-level statistics provid e little insight in rega rds to how to eliminate these limitations. These limitations are a function of characteristics of the Classical Test Theory (CTT) model.

PAGE 16

16 Despite the popularity and widesp read use of instruments deve loped using the CTT model, existing disability measures have numerous shortcomings (36). In general, disability instruments created under the CTT paradigm yield total scores obtained by a dding individual item responses. These scores provide only a genera l sense of a persons ability leve l (i.e., disability level) and often fail to provide detailed ite m level psychometrics (i.e., no detail information is provided about how an individual performs on each item). First, the total score is dependent on the items chosen to represent the underlyi ng construct (test-dependent) ( 37). That is, respondents will have lower scores on difficult item s and higher sc ores on easier items while their ability remains the same. Second, the test scores obtained from a sample cannot be compared across different samples (sample-dependent) ( 37,38). That is, test statistics such as coefficient alpha for the estim ate of reliability or correlations for estimates of vary from sample to sample (i.e., sample dependent). Third, test scores are non-linear summed scores, whic h yield ordinal raw scores ( 39). These ordinal scores m ay be insensitive to changes at the extremes of the scale ( 39). Item Response Theory (IRT) and Computer Adaptive Testing (CAT) In contrast to the CTT, Item Response Theory (IRT) focuses on the psychometric properties of the items making up instrument instead of the instrument as a whole ( 40, 41). By estim ating the probability that a respondent will select a particular rating for an item, item difficulty and person ability (or disability) can be placed on the same linear continuum. Thus, IRT model allows connecting individuals responses to items at their ability level ( 40,42). Estim ates of person ability (or disability) on an underlying construct obt ained using IRT methods are invariant regardless of the items used (i.e., test free measurement), whereas under the CTT paradigm, person scores vary depending on the difficulty of the instrument ( 41). Furthermore, item difficulty estimates derived from the IRT an alyses are independent of the ability of the sample (i.e., sample free measurement), while test statistics in CTT are dependent on the sample

PAGE 17

17 taking the test. In addition, the Rasch model ( one-parameter IRT model) can linearly transform raw scores (typically used in analyses based on CTT) into equal interval measures (34). These advantages of IRT allow for the creation of invari antly calibrated large item banks that can more precisely discriminate individuals ability levels and thus, capture smaller increments of change. In order to achieve the goal of measuremen t precision, a disability measure should have items covering the full range of the underlying c onstruct and capturing the small increments of changes ( 36). With optimal measurement precision, one can theoretically yield m easures of equal precision at all levels of th e underlying construct, thus achieving what has been termed equiprecise measurement ( 43). That is, the measure is capabl e of m easuring a wide range of disability from the least able (or most disabled) to the most able (or le ast disabled). Unlike the existing fixed disability assessments that require all the items of an instrument, equiprecise measurement fosters item selection determined by disabi lity level. For example, when measuring the physical function of a person with mild back pain, items w ould be chosen which closely match the ability of this individua ls (i.e., more difficult items would be chosen). Similarly, when measuring the physical function of a person with severe back pain, items will be chosen that closely match the severely impaired person. These two persons will be measured on the same physical-function scale with different sets of items (34). While IRT methodologies provide the means fo r generating and linking person ability and item difficulty calibrations, Computer Adaptive Testing (CAT) methods promise a means for administrating items in a way that is both efficient and precise ( 28,34,36,44-48). Studies have shown that CAT i mproves test efficiency maintaining adequate precisions with fewer items than the full test. Six to 7 items have been shown on average to achieve a standard error of ability estimates of 0.3 ( 44,49-53). The CAT often requires a tes ting algorithm which defines iterative

PAGE 18

18 processes with a set of rules spec ifying the test questions to be administered to respondents. This includes procedures for item sel ection, ability estimation, and termin ation criteria. By selectively administering items that are matched to the ab ility level of the individuals, measurement efficiency can be accomplished without the loss of precision provided by the full item bank. For example, when measuring the disability of a pers on with mild back pain, items would be chosen that matched the mildly impaired ability. Simila rly, when measuring the disability of a person with more severe back pain, a different set of items would be chosen that match that individuals severely impaired ability. With this technology, a small number of items can be selected from the item bank which are most relevant and target ed to a person of a particular ability ( 34). IRT in com bination with CAT has recently become an alternative to conventional fixed-format disability measurement ( 25,36). Physical Function CAT and ICFmeasure.com In order to m easure the impact of back pa in on individuals, the use of World Health Organizations (WHO) Internati onal Classification of Functioni ng, Disability and Health (ICF) framework is useful. The ICF describes health and health status in terms of functioning and disability ( 54). The conceptual model describes thr ee dom ains of functioning and disability, which are body function and structure; activities as a whole person; part icipation as a whole person in a social context. Disa bility therefore invol ves dysfunction at one or more levels of impairments, activity limitations and participation restrictions influenced by environmental and personal factors. Despite attempts to clarify whether disability assessment should focus on how much difficulty or how frequent one performs an activity, disabil ity measures can be organized along the single construct of da ily functioning such as activities of daily living (ADLs) ( 55). Disability m ay be assessed in terms of physical function or activity limitation because most individuals with back pain are rest ricted in their daily functioning ( 56). Accordingly, the newly

PAGE 19

19 created ICF Activity Measure (ICFAM-ICFmeasur e.com) embraced the ICF framework as a conceptual basis of measuring a persons physical ability. The ICFAM development was funded by the National Institute of Disability and Rehabilitation Research (NIDRR). The primary goal of the research was to develop an efficient and precise measurement system based on the acti vity domain of the Inte rnational Classification of Functioning, Disability and Health (ICF). Equiprecise measurement covering across the entire range of a construct, was applied to activities involving move ment, moving around and daily life tasks as defined by the activity domai n of the ICF. Based on applying the Rasch model (one-parameter IRT model) on the activity dom ain, the ICFAM with 264 question item bank was developed. With CAT methods, these questions are selectively administered to respondents from a large item bank. Furthermore, the measur e is now accessible worldwide through the web ( http://icfmeasure.phhp.ufl.edu/). Short Forms of Back Pain Disability While both CAT and IRT fra mework have consider able advantages in terms of efficiency and precision of measurement, fixed short fo rms have been primarily used to achieve measurement efficiency, especially in the absence of computer technology. Accordingly, the efficient measurement is achieved by reducing th e number of items of a larger instrument to relieve administration and respondent burden ( 28,35). Despite achieving measurement efficiency with f ewer items, loss of preci sion became an issue when developing a short form from its full instrument ( 15,36,44-46). Not surprisingly, several studie s have reported that CAT m easures outperform fixed short form versions of asse ssments in terms of measurement precision ( 57-59). It should b e noted that the CAT method often faces challenges, such as financial or technological requirements, for many settings ( 59). This leads researchers and h ealthcare professio nals to seek practical measures to overcome those challenge s. A goal of this study is to compare CAT and

PAGE 20

20 fixed short form versions of the ICFAM, to determine which achieves optimal precision required in clinical settings and research. Several methods have been applied to deve lop short forms from their full tests. CTT methods include the deletion of items with low it em-total score correlation and with the least impact on the overall internal consistency of the tests ( 60). Several studies have recently developed short form s using the IRT framework ( 46,58,61). These studies involve selecting item s that are most frequently administered items in CAT administ ration, have high test information, or show broad item difficulty coverage. Using IRT framework, items that show poor item fit statistics or similar calibrations also can be deleted. De leting items with fit statistics and having similar item calibrations while ma intaining instrument measurement quality. Recently, investigators created short forms from an IRT-base d item pools and confirmed that short form scores are nearly as precise as th e CAT scores in differe nt diagnostic groups (44,46). Several short form s evolved from generic health status measures under the CTT framework such as the Physical Function (PF)-10 ( 8) and PF-12 PCS (Physical Co m ponent Summary) from the Short Form (SF)-36 of the Medical Outcome Scal e (MOS). These particular short forms have been applied to back pain populations to assess the impact of back pain on quality of life. Short forms have also been created from cond ition specific back pain instruments ( 62) such as the 24item Roland-Morris Disability Questionnaire (RMDQ) (6,7) from 136-item the Sickness Impact Profile and the 18-item PF-18 ( 14) from the multiple instruments, which include the Oswestry Disability Index, the RMDQ, and the PF-10. Existing Self-Report Back Pain Disability Measures Self -reported outcome measures are generally classified as generic or condition-specific measure ( 28,35). Generic measures often include global rati ngs of health status as well as ratings of multi dimensional status of health-related quality of life. These instruments often measure a

PAGE 21

21 broad spectrum of health concepts and are intended to provide scores that are sensitive to disease severity. By contrast, condition-specific measures are designed to assess the aspects of health status affected by certain disease pathology and view the attribution of symptom and functional limitations to a specific condition ( 25). Thus, in contrast to gene ric m easures, condition-specific measures are likely to be sensitive to treatmen t and natural history of a specific disease or condition. Although generic measures were not primarily designed to assess the specific conditions, two instruments, the Sickness Imp act Profile (SIP) and the Physic al Function scale (PF-10) have been applied to chronic back pain. The SIP was or iginally developed and validated as a measure of sickness-related behavior al dysfunction consisting of 189 items in 14 categories ( 63). With few revisions, the final version of the SIP was de veloped as a behavioral-based m easure of health status for use in a variet y of chronic diseases ( 62). The PF-10 is a subscale of the SF-36 that m easures physical functioning, which assesses lim itations in a variety of physical activities. Other versions of PF-10, such as a general population version PF-12 PC S (Physical Component Summary) and specific low back vers ion Physical Functioning (PF)-18 (14) have been developed. Am ong patients with back pain, st udies report adequate psychometric properties for these two instruments ( 4, 8, 15, 64-66). As disease specific m easures for back pai n, the Roland-Morris Disability Questionnaire (RMDQ), the Quebec Back Pain Disability Scale (QBDS), and the Oswestry Back Pain Disability Questionnaire (ODQ) ar e the most widely accepted instruments. The RMDQ consists of 24 items of daily physi cal activity from the Sickness Impact Pr ofile. In contrast to the SIP, the RMDQ is short, simple to complete and readily understood by patients ( 7). The QBDS consists of 20 item s of a comprehensive view of person s disability for back pain, which adopted the

PAGE 22

22 World Health Organizations International Classi fication of Functioning, Disability and Health (ICF) as a conceptual model to select test items relevant to ICF activity and participation domains ( 10,67). One of unique features about the QB DS is that it m easures only physical function domain, while most instruments appear to assess more than one domain within the assessment ( 67). All of them appear to have good psychom etric properties supported by many studies ( 7, 23, 68-72). The Oswestry Back Pain Disability Questionnaire as a Gold Standard The Oswestry Back Pain Disability Q ues tionnaire (ODQ) was first introduced by John OBrien in 1976 and further develope d by Fairbank and colleagues in 1980 (29,73,74). The ODQ consisted of 10 item s assessing the level of pain and interference with personal care, physical activities (i.e., lifting, walking, sit ting, and standing, sleeping, sex lif e, social life, and traveling. Its several validated versions have also been published omitting a single item (i.e., sex life or social life) ( 75) or replacing sex life item with employm ent/homemaking item ( 13). The ODQ and its revised versions have been proved to be much more sensitive to patients with severe symptoms, while they also appear to be occasio nally responsive to thos e with minor symptoms ( 29). The ODQ, whether in the original or revi sed versions, rem ains a salient measure of condition-specific disability with good validity and reliability ( 3, 13, 23, 24, 29, 30, 73, 74). The ODQ is one of the m ost widely accepte d back pain-specific instruments ( 25, 30, 76, 77). It is presently co nsidered as the gold standa rd in the assessments of back pain ( 29) because of its m any advantages such as popularity, internal ly consistent scale, good reliability and responsiveness to clinic al change. In numerous studies, th e ODQ and the revised versions of it are recommended as a standardized measure of phys ical function in individuals with back pain (3, 13, 14, 23, 25, 29, 30, 32, 70, 73, 74, 76-79).

PAGE 23

23 Despite the popularity of ODQ in health care, th ere have been a few c oncerns about several of its measurement properties. The ODQ is shown to be the multidimensional construct. Physical function and pain item as separate construct ( 30,80) and lacks of sensitivity to reliably discrim inate individuals in par ticular ranges of the scale due to gaps between test items (e.g., none of items were available a gap between standing and lifting on item difficulty hierarchical order) for th e underlying continuum ( 30). The lack of breadth may lead to inadequate sensitivity at the extrem es of the scale. Not surprisingly, th e developers of ODQ and researchers indicate that the instrument is better at detecting change only in a specific disability level due to its substantial measurement imprecision ( 3, 29, 73, 74, 77). Despite these limitations, the ODQ rem ains a leading back pain di sability instrument in health care (3, 13, 14, 23, 29, 30, 32, 70, 73, 74, 77-79). CAT, Short Form, and Existing Back Pain Measure in Measurement Precis ion Although the psychometric proper ty in CTT paradigm such as reliability, validity, or responsiveness, are well-known a nd rigorous criteria to select a proper outcome measure, the properties may not be sufficient in terms of measurement precision and efficiency. Numerous studies have found that Computer Adaptive Testing (CAT) improves both in measurement precision and efficiency relative to the full test ( 41, 43, 48, 50, 52, 53, 57-59). Several studies report that CAT m easures are high ly correlated with other instru ments measuring same construct and require fewer number of items with an average 6 items to reach the ability estimation (81-84). The construction of fixed short forms is a conventional approach of achieving m easurement efficiency, which reduces the burden of respondent and administration ( 44, 46). Despite the loss of som e precision, short forms have been shown to be valid and practical for use in outcome measurement ( 34, 44-46). The purpose of this study is to de termine; 1) the item level psychometrics of the ICFAM in chronic back pa in population and three s hort forms, 2) how the

PAGE 24

24 ICFAM items respond differently across different diagnostic group versus chronic back pain, 3) compare the relative precision of the person meas ure of computer adaptive ICFAM versus short forms and the Oswestry Back Pain Disability Questionnaire. Research Question 1 What are the psychom etric properties of th e computer adaptive ICF activity measure (ICFAM) in terms of positioni ng/transfer, lifting/pushing, and walking/moving constructs associated with individuals with low back pain? Hypothesis: The ICF activity measure for back pain (ICFAM) will demonstrate a unidimensional construct for each construct. Hypothesis: Item level psychometric propert ies of the ICFAM will demonstrate item difficulty hierarchy empirically versus hypothetically derived. Research Question 2 What are the psychom etric properties of newly generated short forms of positioning/transfer, lifting/carrying, and walking/moving? Hypothesis 2.1: The ICFAM short forms w ill demonstrate a unidimensional construct for each of the three constructs respectivel y (positioning/transfe r, lifting/carrying, and walking/moving). Hypothesis 2.2: The ICFAM short forms will show acceptable item level psychometrics (item fit, person separation, item-person match, logical item difficulty hierarchy relative to metabolic equivalents). Hypothesis 2.3: The ICFAM short forms will show a precision distribution (information function) that is similar to that of the entire item bank but will show overall lower precision than the entire item bank across the breath of the measure.

PAGE 25

25 Research Question 3 How precise are th e ICF activity CAT measur es and the IRT-based ICFAM short form measures relative to the Oswestry Back Pain Disability Questionnaire (ODQ)? Hypothesis 3.1: The relative preci sion of the ICFAM CAT meas ures are superior to the ICFAM short forms. Hypothesis 3.2: The relative precisi on of the short form measures are superior to that of the Oswestry Disability Questionnaire.

PAGE 26

26 CHAPTER 2 THE PSYCHOMETRICS OF THE ICF ACTIVITY MEASURE FOR BACK PAIN Introduction Back pain is one of the most common health p roblems that affect activity limitation in a age group younger than 45 years in the United St ates, the second most frequent reason for physician visit, the third most common cause of surgical interventions, and the fifth-ranking cause of admission to hospital ( 85). Its lifetime incidence and annual prevalence estim ated of general population are about 70% and 15% respectively ( 85). The impact of chronic back pain on the US work for ce is remarkably signif icant. According to the U.S. Bureau of Labor Statistics, there were 63% back related injuries for a total 4.2 million nonfatal occupational injuries reported in 2005 ( 86). The estimated annual cost incurr ed by back pain was $20 billion to 50 billion in 2004 ( 87). Recently, not only is the populatio n vulnerab le to back pain, but people age 65 and older are also the fastestgrowing back pain population. It is a leading source of health care expenditures and financial compen sation for a temporary or permanent disability (88, 89). That is, about three in f our people experience back pain at som etime in their life and almost half of population suffer from back pain every year. Many indivi duals do not recover and remain with limitations in activity and physical functioning, which may further lead to the chronic condition of the limitation. In order to monitor the hea lth status of back pain popul ation, a precise measure of disability resulting from back pain is essential. Traditionally, investigators and clinicians have used disability instruments based on test-lev el psychometrics such as reliability ( 38,90). However the reliability values which are wide ly accepted as a criterion for good measurements varies from sample to sample ( 90, 91). That is, reliabili ty values obtained w ith one sample are not necessarily reflective of reliab ility values from other samples. In addition, test scores

PAGE 27

27 obtained from disability measur es are always dependent on sele ction of assessment tasks from the underlying construct being measured. These scores will exhibit lower scores on difficult tests and higher scores on easy tests, while the res pondents ability remains the same. Moreover, existing conventional instruments frequently exhi bit inadequate breadth for the wide range of underlying construct because these are developed to target the average persons for whom the instruments are designed ( 28). Along with the breadth of measurement, these instruments also often show a lack of precision in which the item s of the instrum ents do not closely match to the ability of individuals. For example, this precision problem may appear when lifting 25 pounds weight item (i.e., difficult item) is administered to individu als of low ability who cannot perform lifting 1 pound weight item (i.e., easy item) or when an easy item is administered to individuals of high ability. Similarly, it may ha ppen when an easy test is administered to individuals of high ability or a difficult test is administered to individuals of low ability. The ICF Activity Measure (ICFAM) was develo ped to create an efficient and precise measurement system based on the activity di mension of International Classification of Functioning, Disability and Health (ICF). The ICF by World Health Organization (WHO) provided the conceptual framewor k and classification system for developing items used in the study. Equiprecise measurement (i.e., measurement across the en tire range of a construct) was applied to activities involving movement, moving around and daily life tasks as defined by the activity dimension of the ICF. By applying Item Response Theory (IRT) and Computer Adaptive Testing (CAT) methods, Velozo and colleagues ( 41) created ICFAM, which is a web-based com puter adaptive survey system. The administra tive core of the instrument allows setting a wide range of functions, including initial theta value (i.e., directi ng the initial question that most closely matches the ability level of the respondent ) and standard error (i.e., for terminating the

PAGE 28

28 test). The questions are targeted to individuals at their ability level re quiring only 5-10 questions per construct to reach at a final measure of pe rson ability. In addition, immediate results are provided to the respondents/clinician in the form of graphs and summary statistics. The ICFAM consists of 6 constructs; positio ning/transfers, lifting/carrying, fine hand, walking/climbing, wheelchair/scooters, and self care activities measuring activity limitations. While the ICFAM was designed fo r individuals with upper extrem ity deficits, lower extremity deficits, spinal cord injury and back pain, the focus of this study is only on back patients. To comprehensively cover the extensive activity lim itations of chronic back pain population, there are three constructs that are particularly relevant to individuals with back pain. We identified the ICFAM constructs for this study by tw o criteria; 1) the most frequent ly cited as deficit constructs for back pain and relevance of activity for individuals with back pain and 2) positioning/transfers, lifting/carrying, and walking/moving. The purpose of this study is to investigate the item-level measurement qualities of the ICFAM with a sample of patients with back pain Factor analysis and the Rasch model (one parameter IRT model) was used to investigate the following measurement qualities of three constructs of the ICFAM; 1) unidimensionalit y, 2) item-level psychometrics, and 3) the hierarchical order of item diffi culty (hypothetical versus empirical). Unidimensionality refers to measuring a single dominant construct even while multiple attributes are being measured (39,92). This property is a basic assumption of m easurem ent theory that allows combining the items to obtain a total score for an assessment and the validity of interpretations based on a total score ( 93). Rasch analysis was used to scru tinize the data at the item-level including item difficulty and rating scale structure. These item parameters are invarian t whichever subgroups of sample are used (sample-free). Rasch analysis also provides person-item match map, which

PAGE 29

29 places both person ability and item difficulties on the same linear continuum. This map can reveal ceiling and floor effects and other gaps where items difficulty calibrations do not match person ability estimates ( 40). In addition, the person-item matc h m ap also can provide insight on construct validity (i.e., supporting that stayi ng in a kneeling position on both knees for 10-20 minutes item is more challenging than staying in a lying position on back for 1 hour item). A hierarchy of item difficulty continuum refers to a possible logical progres sion in which relevant items of a unidimensional construct are arrayed from easy to difficult ( 93). The empirically derived h ierarchy of item difficulties based on Rasch analysis can be compared the hypothetically derived hierarchy of activiti es based on Metabolic Equivalent (MET). Methods Research Participants The data used in this study was retrieved from a research that developed the ICFAM funded by the National Institute of Disability and Rehabilitation Research (NIDRR). The developmental research was approved by the Inst itutional Review Board of the University of Florida (Approved by IRB # 568-2000). Through 1) fo cus group presentation with test items, 2) professional panel consultations, 3) cognitive inte rviewing, and 4) paper-pencil version filed test with 255 items for different diagnostic groups the study was conducted to develop the ICFAM with 264 items measuring activity limitation. Three hundred twelve individua ls with 3 diagnostic groups (i.e., low back pain, lower extremity, and upper extremity injury) who completed the paper-pencil version test were selected for this study. Instrumentation In ef fort to capture limitations in activities the ICFAM was designed with 6 constructs (positioning/transfers, gross upper extremity, fine hand, walking/moving, wheelchair/scooters, and self care activities). Thr ee constructs (103 total items) of the ICFAM (56 items for

PAGE 30

30 positioning/transfer, 27 items for lifting/carrying and 20 items for walking/moving construct) were chosen for this study. The three constructs with examples of items are presented in the Table 2-1. Items difficulties for the items exemplified for each cons truct in the Table 2 are listed in descending order from the most difficulty to the easiest. Response cat egories for these items consist of four choices; 1) a lot of difficulty, 2) some difficulty, 3) no difficulty, and 4) have not done. If participants have not performed the activ ity for the past 30 days, unable to perform the activity, require the help/assist ance of another person, or your doctor told you not to do the activity, they are instructed to answer have not done. This re sponse category have not done is regarded as missing value in the analysis. Rasch Rating Scale Model Rasch rating scale m odel is generaliza tion of the dichotomous Rasch model and sometimes referred to as the polytomous Rasch model. It was derived by Andrich (1978) ( 94). The Rasch r ating scale model can be expl ained by a probability equation: ln (Pnik/Pni(k-1)) = Bn Di Fk The left side of the equati on is the logarithmic function (ln is the natural logarithmic which uses e = 2.718 as the base). Pnik is the probability that person n, encountering item i would be observed in category k By taking the probability of passing rating category k ( Pnik) divided by the probability of passing one less rating category k-1 ( Pni(k-1)), it computes the odds ratio of passing the rating category from k rated to k-1 le vel. The log transforma tion then turns ordinal level data into interval level data where the pr obability of passing the rating scale at the next higher level can be a conjoint m easurement of the person ability (Bn), item difficulty (Di), and the step category betwee n the rating categories Fk. The unit of measurement that results when the Rasch model is used to transform raw scores in to log odds ratios on a common interval scale is the logits (95).

PAGE 31

31 Data Analysis Several s tudies have shown that dimensionality cannot be determined by solely by fit statistics ( 96,97). Thus prior to the application of Rasc h analy sis to the items of the ICFAM, confirmatory factor anal ysis (CFA) using MplusTM (Muthn & Muthn, Los Angeles, CA, version 4.21) was conducted to determine the goodness of fit of the items to the 3 factor model of the ICFAM (n=312). In addition, CFA were conducted to determine a goodness of fit of the items to one factor model for the ICFAM and one factor model for each construct of the ICFAM. The following criteria were used to determine goodness of fit to the one and multi factor model; 1) the p-value of chi square > 0.05 indicating a significant fit, 2) comparative fit index (CFI) and Tucker-Lewis Index (TLI) 1.0 indica ting the closer to 1.0, the better th e fit, 3) root mean square error of approximations (RMSEA) < 0.06, and 4) we ighted root mean square residual (WRMR) < 0.01 ( 98,99). Traditionally, exploratory factor analysis (EFA) has been used to explore the possible underlying structure of a set of interrelated va riable without any preconceived structure on the outcom e ( 100). In this study, we conducted EFA on the construct of the ICFAM, if the CFA failed to con firm the unidimensionality of each construct to further investigate the potential factor structure. EFA was performed using MplusTM (Muthn & Muthn, Los Angeles, CA, version 4.21). We used the unweighted least s quares method for estimators, varimax rotation following the initial factor extraction, and replace d missing data with with mean values. Criteria to determine the number of reta ining factors were; 1) Kisers eigenvalues greater than 1, 2) factors accounting for greater than 5% of the vari ance, and 3) scree test where the slope changes substantially in the factor versus eigenvalue graph (101). A criterion of greater than 0.46 was used as a s ignificant factor loading ( 102).

PAGE 32

32 Rasch analysis with rating scale mode l using Winsteps computer program ( 103, 104) was conducted to determ ine the model fit as well as the item level psychometrics of the ICFAM for back pain patients. Rasch model (i.e., one-paramet er IRT model) is the most robust of the IRT models in which stable and accurate item parameters such as fit statistics could be obtained with relatively small sample size ( 105). The Winsteps program produces goodness of fit statistics for each item and person, which were used to iden tify items that did not fit the unidimensional Rasch model. Items with infit and outfit mean square (MnSq) presented greater than 1.4 and smaller than 0.6 indicate misfit, which means that the items we re responded erratic ally relative to other items (95,106). The erratic pattern of response m ay indicate that the item might be measuring a different construct or the item needs further clarifica tion. Infit means inlier-sensitive or information-weighted fit, which is more sensit ive to the pattern of responses to items targeted on the person, while outfit means outlier sensitive fit, which is more sensitive to the pattern of responses to items with difficulty far from a person ( 107). Rasch analysis also provides point m easure correlation coefficients as an immediate check that the item-level scoring accords with the latent variable. A negative co rrelations coefficient may indi cate reversed survey item. The point measure correlations should be > 0.3 or better ( 108). Rasch analysis also provides person separation (SR) values, which identifies whether item s are effective in separating individuals into dist inct ability levels. The SR provides an indication of the number of statistically significant strata into meaningf ul categories (e.g., low, medium, and high ability back pain groups). The formula used to calculate is SR = (4Gp+1)/3, where Gp represents the person sepa ration. Person separation is an i ndex of the sample standard deviation in terms of standard error units and person separation reli ability (analogous to

PAGE 33

33 Cronbachs ) is the proportion of observed sample variance that is not attributable to measurement error ( 104). The item -person map detailing an empirica lly derived hierarchy produced by Rasch analysis was compared a hypothetically derived item difficulty hierarchy based on Metabolic Equivalents (METs). The MET system provides the energy cost of physical activities as multiples of resting metabolic rate (RMR) ( 109-112). Although there is an evidence that the MET m ay be inaccurate in estimating energy expe nditure for people of different body weights and fat percentages, it is a universally accepte d concept to express energy expenditure for various physical activities ( 112). In addition, the American College of Sp orts Medicine has recently defined light, moderate, and vigorous physical activity based on specific MET levels ( 113). The MET system is used by many researcher s and clinicians to identify and prescribe physical activities. For item s with no correspondi ng MET values, estimates were determined by inspecting values for similar activities. For example, since there was no exact matching MET value for the item walking on carpeting as estimate value was determined by examining the value for a similar activity household walk ing, which has a MET value of 2.0. This hypothetical hierarchy based on MET values was co mpared to the empirical hierarchy of item difficulty of the short forms was determined w ith the Rasch analysis. Rasch analysis provides item difficulty estimates in logits. The order of difficulty of items based on these estimates was compared to the hypothetical hierarchy base d on MET values. Support for the hypothetical hierarchy might be found if staying in a kn eeling position on both knees for 10-20 minutes, a higher MET value activity, also has a higher logi ts value than staying in a lying position on back for 1 hour, a lower MET value activity. Ad ditionally, the comparison of item difficulty

PAGE 34

34 and person ability (i.e., item-person map) can be used to determine whethe r or not the items of each construct cover the range of person abil ity (i.e., no gaps, ceili ng or floor effects). Results The dem ographic and clinical features of two groups of the study partic ipants are presented in Table 2-2. The three diagnostic groups incl ude low back pain (n=101), lower extremity (n=108), and upper extremity impairment groups (n=103). The average age is 50.6 and 48.3 years for the combined and back pain group, respectively. Almost one third of the back pain group reported having the problem (i.e., back pain) for more than a year suggesting a chronic condition. The results of the CFA failed to confirm the three constructs of the ICFAM. Table 2-3 represents the five indices for the three factor model of the ICFAM. None of five indices for goodness of fit test reached the criteria of mode l fit, while only Tucker-Lewis Index (TLI) was approximate to its criterion (0.907) Positioning/Transfer Construct CFA did not confirm one factor model for pos itioning/transfer constructs (Table 2-3). None of the indices for the goodness of fit test reached its criteri on. To further explore the factor structure of positioning/transfer, exploratory factor analysis (EFA) was conducted (Table 2-4). We retained eleven factors based on a criterion of eigenvalue greater than 1, four factors based on a criterion of variance greate r than 5%, and 3 factors based on a criterion of the scree test. These factors accounted for 76%, 60%, and 53% of total variance, respectively. We extracted 4 factors to further investigate the interpretability of the factor loadings. Items loaded onto factors that contained items which appeared to be activities with staying in upright position/shifting weight/changing position, staying in seated position/bending/shifting weight/changing position, staying lying and st anding position, and moving yourself in various

PAGE 35

35 positions (factor loadings greater than 0.46 are bo lded) (Table 2-5). Most of items loaded onto factor 1 (20 of 56 items) and f actor 2 (19 of 56 items), while twelve items onto factor 3 and eleven items onto factor 4. Of these items, 7 items loaded onto more than one factor (factorial complex) while 2 items did not load onto any fact ors. Items related to staying in a standing for longer than 1 hour, kneeling, and squatting position.. changing position from.. and moving yourself.. had a tendency to load onto factor 1. Items related to staying seated .., shifting weight.., and few items of changing position .. had tendency to load onto factor 2, while staying in a ly ing position.. had a tendency to load onto factor 3. Items of moving yourself into/out of bathtub had tendency to lo ad onto factor 4 while moving yourself from mattress/sitting/floor to.. had tendency to cross load onto factor 1 and factor 4. Table 2-8 presents item measures, error, infit/ outfit statistics, and poi nt measure correlation coefficients for 56 items. The result shows that 54 /56 items showed adequate infit/outfit statistics and point measure correlations for 56 items; two ite ms slightly exceeded the fit statistics criterion of mean square of 1.4. All item s exhibited adequate infit/outf it statistics except lying down stomach 2-4 hours and moving into bathtub to take shower (items presented in bold) (1.44 /1.41 and 1.49/1.52, respectively). In addition, all 56 items showed adequate point measure correlations distributing from 0.32 and 0.73. Items of positioning/transfer construct were e ffective in differentiating individuals with chronic back pain into 6 statistically distinct levels of person abilit y. Person separation index (person standard deviation in calibration error units) was 4.52, defi ning 6.36 statistically meaningful levels of disability (person separation ratio). These items also showed good person separation reliability (analogous to Cronbachs ) at 0.95.

PAGE 36

36 Table 2-8 also presents the item difficulty hierarchy, which displays the most difficult items at the top of table and the easiest item at the bottom. Items least likely to be endorsed with a high rating (i.e., the most difficult items) were kneeling 10-20 minutes and lying stomach 58 hours, while items most likely to be endorsed with a low rati ng (i.e., the easiest items) were change position standing to sitti ng chair and shift lying in be d. That is, these individuals with chronic back pain demonstrated greater di fficulties in maintaining postures for a prolonged time (i.e., lying on stomach 5-8 hours items was at 1.49 logits) than shifting or changing postures (i.e., changing position squatt ing to standing and shifting lying in bed items were at 0.43 and -1.69 logits, respectively) Item difficulty calibrations match person ability measures fairly well on Positioning/transfer construct (Figure 2-1). The items of each construct at their average measure are listed to the right side of each map, with the easiest items at the bottom of map and the most difficult items at the top. M to the left/right of the vertical lines represents the average person measure and item measure, re spectively, while S and T to the vertical line represents 1 and 2 standard deviation, re spectively. The map showed a relative normal distribution of individual abi lities ranging between-2.0 and 4 .0 logits. The person ability distribution also showed no appa rent ceiling or floor effects. The average item difficulty and average person ability were virt ually identical with item difficulty 0.06 .93 logits lower than average person ability. A hypothetically derived hierarchy of activity based on MET values ( 33,37,39) is com pared with an empirically derived hierarchy of item difficulty. Empirical evidence based on Rasch analysis did not support the difficulty hi erarchy based on MET levels. As energy cost measure of physical activities, MET level for l ying down on back 5-8 hours item is 1.0 MET, while changing standing to sitting in chair item is 2.0METs. In our empirical hierarchy of item

PAGE 37

37 difficulty, lying down on back 5-8 hours item was one of the most difficult items and changing standing to sitting in chair item was th e easiest item for the individuals with chronic back pain (see Figure 2-1). Lifting/Carrying Construct The result of the CFA did not confirm one factor m odel for lif ting/carrying construct (Table 2-3). None of indices for the goodness of fit test reached the criteria, while only TuckerLewis Index (TLI) was approximate to its crit erion (0.932). To further explore the factor structure of Lifting/carrying c onstruct, EFA was conducted. We re tained six factors based on a criterion of eigenvalue greater th an 1, four factors based on a cr iterion of variance greater than 5%, and three factors based on a criterion of scree test (Table 2-4). Thes e factors based on each criterion accounted for 78%, 69%, and 54% of tota l variance, respectively. We extracted four factors to further investigate the interpretability of the factor loadings. Items loaded onto factors that contained items which appeared to be activities with lifting light objects, lifting heavy objects, carrying/pushing/pulling, and carrying infants/toddler (Table 2-6). Items loaded onto factor 1 (9 of 27 items), f actor 2 (8 of 27 items), factor 3 (8 of 27 items) and factor 4 (6 of 27 items). Four items loaded onto more than one factor (factorial complex). Items of lifting objects 10 pounds or heavier ha d tendency to load onto factor 1. Items of carrying 5 to 10 pounds for 25 feet and pulling an d pushing had tendency to load onto factor 2. Items of lifting objects 5 pounds or less had tendency to load ont o factor 3, while items of carrying 10 pounds up/down stairs and infants/toddlers had tende ncy to load onto factor 4. Table 2-9 presents item measures, error, infit/outfit statistics, and point measure correlations for 27 items. Twenty of twenty-seven items showed an acceptable infit/outfit with seven items with slightly high infit/oufit statis tics (carrying toddler on shoulders, on back, and on hip, carrying infant in arm, carrying 10 pounds up/down one flight of stairs, pushing a shopping

PAGE 38

38 cart, and carrying one pound 25 feet) and one item with low infit/outf it statistics (lifting 10 pounds waist to shoulder). The item carrying in fants in arm item significantly misfit on both infit/outfit criteria (presented in bold) (1.95 and 1.89). In addition, all 27 items showed adequate point measure correlations distributing from 0.40 to 0.79. Items of lifting/carrying construc t are effective in differentiating individuals with back pain into 5 statistically distinct levels of person ability. Person separation index was 3.67, defining 5.23 statistically meaningful levels of disabilit y. These items also show ed good person separation reliability (Cronbachs ), which was 0.93. Table 2-8 presents the item difficulty hierar chy of lifting/carrying construct. The least likely items to be endorsed with high rating (i.e., the most difficult item) were carrying toddler on the shoulder and carrying toddlers on back with similar item difficulty calibrations (2.82 and 2.73 logits, respectively). In addition, items most likely to be endorsed with low rating (i.e., the easiest items) were pulling open refriger ator door and carrying 1 pound for 25 feet (-2.99 and 2.15 logits, respectively). That is, individual s with back pain are more likely to have difficulties with carrying toddler related ac tivities than pulling related activities. Item difficulty calibrations matched person abil ity measures fairly well on lifting/carrying construct (Figure 2-2). The item-person map show s a relative normal distribution of individual abilities ranging from -3.0 and 5.0 logits. The person ability distribution also showed no apparent ceiling or floor effects. The average item difficulty was 0.31 1.31 logits lower than average person ability. Empirical evidence based on Rasch analysis supported the difficulty hierarchy based on MET levels. Since only a few items of lifting/carr ying construct correspond to activities of MET level, we attempted detailing the comparisons with estimated values. The estimated MET level

PAGE 39

39 for lifting 25 pounds shoulder to above head, lifting 25 pounds floor to waist, and carrying 25 pounds for 25 feet was about 3.0 METs. The item of pulling wet laundry out from washing machine item was estimated at about 2.0 METs. In our empirical hierarchy of item difficulty, the above 3 items of lifting/carrying construc t were among the difficult items and pulling wet laundry out from washing machine item was am ong easy items for the individuals with back pain. Walking/Moving Construct The result of the CFA only partially confir m ed the one factor model for walking/moving construct (Table 2-3). Comparative Fit Index (CF I) and Tucker-Lewis Inde x (TLI) of the indices were marginally adequate (0.960 and 0.978, respectivel y). To further explore the factor structure of lifting/carrying construct, EF A was conducted. We retained th ree factors based on a criterion of eigenvalue greater than 1, three factors based on a cr iterion of variance greater than 5%, and two factors based on a criterion of scree test (Table 2-4). These factors based on each criterion accounted for 70%, 65%, and 54% of total variance, respectively. Based on these results, we extracted three factors to further investigate the interpretability of the factor loadings. Most items loaded onto factors that contained items which appeared to be activities with walking/stepping, climbing/walking, and clim bing/running/jogging (Table 2-7). Ten items loaded onto factor 1; a 6 items had tendency to lo ad onto factor 2 and 5 items loaded onto factor 3. Walking related items except walking one mile item and stepping related items had tendency to load onto factor 1. Four climbing related items and two walking-related items had tendency to load onto factor 2, while two clim bing-related items, running and jogging items had tendency to load onto factor 3. Two items (walking 4-8 blocks without stopping and walking one mile without stopping) loaded onto more than one factor (factorial complex).

PAGE 40

40 Table 2-10 also presents item measures, errors infit/outfit statistics, and point measure correlations for the 20 items. Fifteen items s howed an acceptable infit/outfit, while 5 items showed high infit/outfit (jogging one mile, r unning one block, climbing up or down, stepping onto or off a bus, and stepping into or out of an elevator). Of these item s, jogging one mile and running one block items significantly misfit on both infit/outfit criteri a (presented in bold, 1.56/2.34 and 1.80/2.16, respectively). In addition, all 20 items showed adequate point measure correlations distributing from 0.44 to 0.79. Items of walking/moving construct were effec tive in differentiating i ndividuals with back pain into almost 5 statistically distinct levels of person abil ity. Person separation index was 3.44, defining 4.92 statistically meani ngful levels of disability. Th ese items also showed good person separation reliability (Cronbachs ), which was 0.92. Table 2-9 also presents the item difficulty hi erarchy of walking/moving construct. Items least likely to be endorsed with high rating (i.e., the most difficu lt item) were jogging one mile and running one block with similar item difficulty calibrations (3.17 and 2.54 logits, respectively). In addition, items most likely to be endorsed with low rating (i.e., the easiest items) were walking on grass and walking on carpe t -1.92 and -2.38 logits, respectively). That is, individuals with back pain are more likely to have difficulties with jogging/running related activities than walking on grass/carpet. Item difficulty calibrations matched person abil ity measures fairly well on walking/moving construct (Figure 2-2). The item-person map show s a relative normal distribution of individual abilities ranging from -3.0 and 5.0 logits. Th e person ability distribution also showed 8 individuals in the ceilings but no floor effect. The average item difficulty was 0.43.39 logits lower than average person ability.

PAGE 41

41 Empirical evidence generated by Rasch analysis supports the hypot hetically derived hierarchy based on MET levels. Th e hierarchy of walking/running activities for MET system is primarily determined by speed ( 33,37,39), while our empirical hier archy of item difficulty is determined by its conceptual difficulties such as challenges of distance or environments. The most challenging items in our empirical hier archy, such as jogging one mile and running one block, match with the vigorous activity (> 6.0 METs) category in METs. In addition, moderately challenging items in our empirical hi erarchy, such as most climbing related items, matched with moderate activity in METs ra nging from 3.0 METs to 6.0 METs. Moreover, the least challenging items in our empirical hierarch y clearly match the light activity (< 3.0 METs) category in METs. That is, empirically derive d item difficulty hierar chy of walking/moving construct of the ICFAM is similar to the ac tivity hierarchy of a ssociated MET levels. Discussion Summary of Result W hile CFA failed to confirm the unidimensionality of the ICFAM for the positioning/transfer, lifting/carrying and walking/moving construct, overall, the item level psychometrics of the ICFAM showed good meas urement qualities as determined by the fit statistics, item difficulty hier archy, and person separation reli ability. Since the hypothesized single factor structure did not provide a good fit of data, expl oratory factor analysis was conducted to further investigate the factor stru cture of three constructs. EFA suggested multifactor solution for the three constructs. The major ity of the items for each of the three constructs fit the Rasch rating scale model. Items of thr ee constructs of the ICFAM were effective in separating individuals with back pain into statistically meaningf ul disability groups. One of the three constructs, the walking/moving construct, presented an empirical hierarchy of item difficulty that supported the hypothetical hierar chy of activity based on MET values. These

PAGE 42

42 findings may implicate that the association betw een the multifactor models of physical activity domain and the empirical hierarchical order of item difficulty needs further investigation. Unidimensionality CFA failed to confirm the unidimensionality of each construct of the ICFAM. Therefore, EFA was used to explore the factor structure of each of the three constructs. The EFA revealed four factor solution for positioning/transfer, f our factor solution for lifting/carrying, and three factor solution for walking/moving construct. Several possible reasons might account for the failure to support unidimensionality. First, although all constructs of the ICFAM were theoretically generated, they might diffe r from their practical dimensionality ( 96,97). A few laten t traits for each construct were identif ied by EFA as follows. For positioning/transfer construct, EFA showed that there is a tendency to separate the construct into four potential latent traits, which could be labeled as 1) stayi ng in upright position (standing, kneeling, and squatting)/shifting weight (in kneeling and squa tting)/changing position (sitting to kneeling to squatting to standing), 2) st aying in seated posit ion/bending/shifting we ight (in chair and standing)/changing position (lying to sit to standing), 3) staying lying and standing position, and 4) moving yourself in various pos itions. For lifting/carrying constr uct, EFA showed that there is tendency to separate the construct into four latent traits, which could be labeled as 1) lifting light and heavy objects, 2) carrying/pushing/pulling, and 3) carrying infants/toddler. For walking/moving construct, EFA showed that there is a tendency to separa te the construct into three different latent tr aits, which could be labeled as 1) walking/stepping, 2) climbing/walking, and 3) climbing/running/jogging. These findings may suggest that the theoretically generated construct of the ICFAM instrument have multid imensional structures. Further investigation would be necessary to ascer tain the dimensionality.

PAGE 43

43 Second, although unidimensionality is a requisi te assumption for IRT approaches the concept of unidimensionality remains obscure. R eckase (1985) indicates that no measures are purely unidimensional ( 114) and McHorney (2004) states that there is no single test available to check the un idimensionality ( 8). However, in many cases, studi es can be justified by applying essential unidim entionality, which involves minimizing methodological or trivial dimensions ( 115). Box and Draper (1987) state that all m odels are essentially wrong, but only some of them are useful. That is, although a st atistical model violates its assu mptions, the model may be still useful ( 116). Thus, unidimensionality may be a quantit ative ideal that can only be approxim ated ( 39). Future research should take into account the influence multid imensionality on measuring individuals. That is, how robust are the IRT models to multidimensionality. Rasch Model Fit In rega rds to the fit statistics obtained from Rasch analysis, none of items misfit significantly except carrying infant in arms in the lifting/carrying construct and jogging one mile and running one block in the walking/ moving construct. These fit statistics are a measure of observed variance over expected varian ce. For the positioning/transfer construct, all items showed adequate infit/outfit statistics a nd fit to the Rasch rating scale model. For the lifting/carrying construct, infit/outf it statistics of carrying infant in arms showed that this item was showing 95% % more variance than expected. That is, individuals with low disability (i.e., high ability) had a tendency to score low or high ratings on the item Furthermore, 56% 73% of the respondents scored the rating of hav e not done, which is th e lowest rating. While we assumed that have not done responses were due to individuals not being able to do a task, this may not have been the case for the carrying infant in arms item. The increased variance on this item may have resulted from a lack of oppor tunity to do this task a nd not due to an ability or inability to perform the task. For the walk ing/moving construct, infi t/outfit statistics of

PAGE 44

44 jogging one mile and running one block were 1.56/2.34 and 1.80/2.16, respectively. Since these two items are among the most difficult items, individuals with low disability (i.e., high ability) were likely to score ei ther low or high ratings. The b imodal distribution of responses might have resulted due to a lack of observati ons (the middle categories) and lead to the large observed variances for this item. Similar to the carrying infant in arms item, nearly 73% of individuals on running one block and 68% of individuals on running one block responded with the lowest rating (have not done). The Hierarchy of Item Difficulty Ca libration and Physical Activity For the cons tructs of positioning/transfer a nd lifting/carrying, an empirical evidence of hierarchical order of item diffi culty based on Rasch analysis di d not support the hierarchical order of physical activity based on MET values. For the positioning/transfer construct, for example, the relevant MET value of lyi ng down on back 5-8 hours was 1.0 MET, while changing standing to sitting in chair was 2.0 METs. That is, changing standing to sitting in chair would be a more challenging activity than lying down on back 5-8 hours in terms of the MET value because the changing posture activity would require more energy to perform than the lying down activity. However, the opposite order of the hierarchy was found in this study. That is, in our item difficulty hier archy, lying down on back 5-8 hour s was more challenging item than changing standing to sitting in chair. This finding re flects a clinical prof ile of back pain in which, in general, maintaining a particular ac tivity is more difficult than changing position. By contrast, for the lifting/carryi ng construct in which individual s with back pain often report limitations, there was empirical evidence to suppo rt the hypothetical order of activity based on METs. The empirical order generated by Rasch an alysis was differentiating three items (lifting 25 pounds shoulder to above head, lifting 25 pounds floor to waist, and carrying 25 pounds for 25 feet), while relevant the MET values of the three items were the same (i.e., 3 METs). The

PAGE 45

45 MET value of pulling wet laundry out from washing machine was found as 2.0 METs. This item is less challenging than the three items not only in empirical hierarchy but also in the MET value. As we hypothesized that findings in the item difficulty hierarchy of the walking/moving construct was that the empirical hierarchy of item difficulty refl ected the hypothesized hierarchy of activities based on MET values. In our empiri cal hierarchy generated through Rasch analysis, an individual with low back pain who is having difficulty on average difficulty item such as climbing down one flight of stairs would be expected to have more difficulty on climbing up or down a 6-foot ladder (more di fficult than climbing down one fli ght of stairs). Similarly, an individual with low back pain who is capable of climbing down one flight of stairs would be expected to be more capable of step up or down a standard curb (e asier than climbing down one flight of stairs). This l ogical pattern is similar to that of hypothetical activity hierarchy based on the MET values ( 33,37,39). That is, the empirical hierar chy of walking related activities may correspond to the hypothesized activity hierarchy based on physiological energy expenditure using the METs. The similarity of these two hierarchies may allude to areas of unexplored research. Future resear ch investigating asso ciations between self report measures and physiological measures could be indicated, however, previous studies of this type have demonstrated weak correlations between physiological functioning and self report health status (8,115). Limitations and Future Implications Several lim itations in the present study include sample size, homogeneity of the sample and dimensionality. The sample size was small to perform confirmatory and exploratory factor analysis. In order to obtain usef ul results, studies suggest that the minimum number of subjects should be at least five observations per item ( 117) or the larger of fi ve tim es the number of

PAGE 46

46 variables ( 118). Since the ICFAM instrument include s 56, 27, 20 item s for three construct, respectively, a sample size larger than 280 w ould be recommended. Thus, in the present study, we used the combined group consisting of individuals with three different diagnoses to meet the criterion because the sample size of back pain group (n=101) was not sufficient for factor analysis techniques. These findings may indicate the need for use of multidimensional models to adequately describe the dimensionality of physical function. In addition, the present study attempted simple comparisons between item diffi culty and MET value from the compendium of activity. Therefore, there is a need for future studies to investigate the constructs of the ICFAM more in detail, particularly a study by measuring the METs on relevant activity for the items of the walking/moving construct.

PAGE 47

47 Table 2-1. Examples of items fo r three constructs of the ICFAM Constructs Examples of items Item difficulty positioning/ transfers (56 items) 1.staying in a kneeling position on both knees for 10-20 minutes (while making only minor adjustments) 1.47 2.staying in a lying position on your back for 5-8 hours (while making only minor adjustments) 1.06 3.staying in a standing position for 1-2 hours (while making only minor adjustments) .64 4. moving yourself out of a bathtub after taking a bath .25 5.changing position from standing to kneeling .12 6.bending at the waist while standing for 1-5 minutes (for example, reaching for something in the trunk of a car) -.25 7.staying in a lying position on your back for 1 hour (while making only minor adjustments) -.45 8.changing position from lying on your back to sitting (for example, lying in your bed to s itting on the edge of your bed) -.06 9.changing position from standing to sitting in a chair -1.15 10.shifting your weight while lying in your bed -1.69 lifting/ carrying (27 items) 1.carrying a toddler on your shoulders 2.84 2.carrying a toddler on your back (for example, piggyback) 2.69 3.lifting 25 pounds (for example, large bag of dog food or cat litter) from shoulder height to above your head with your hand(s)and arm(s) 1.77 4.lifting 25 pounds (for example, large bag of dog food or cat litter) from floor to waist height with your hand(s)and arm(s) 1.18 5.carrying 25 pounds (for example, large bag of dog food or cat litter) in your hand(s) and arm(s) 25 feet (for example, from car to front door) .89 6.Lifting 10 pounds (for example, bag of groceries or 12-pack of soft drinks) from waist height to shoulder height with your hand(s)and arm(s) .35 7.lifting 5 pounds (for example, ba g of sugar or large telephone book) from shoulder height to above your head with your hand(s) -.31 8.pulling open a heavy door (for example, department/convenience store door) -.86 9.lifting 1 pound (for example, a can of soup) from waist height to shoulder height with your hand(s) -2.00 10.pulling open a full-size refrigerator door -2.43

PAGE 48

48 Table 2-1. Continued Constructs Examples of items Item difficulty walking/ climbing (20 items) 1.Running one block 2.51 2.climbing up or down a 6-foot ladder 1.28 3.climbing up or down a 3-step stool .65 4.climbing down one flight of stairs .11 5.walking 4-8 blocks (about 1/2 mile) without stopping .17 6.walking 2-4 blocks (about 1/ 4 mile) without stopping -.43 7.walking in a crowded place (for example, outdoor marketplace, shopping mall) -.77 8.walking within your home/living environment -1.31 9.stepping up or down a standard curb -1.55 10.walking on carpeting -2.00

PAGE 49

49 Table 2-2. Demographic informa tion of research participants Characteristics 3 Diagnostic Combined Group n=312 Back Pain Group n=101 Age < 20 14 (4.5) 5 (5.0) 21 30 34 (10.9) 12 (11.9) 31 40 42 (13.5) 15 (14.9) 41 50 56 (17.9) 2 (23.8) 51 65 84 (26.9) 19 (18.8) > 65 66 (21.2) 20 (19.8) Missing 16 (5.1) 6 (5.9) Mean SD 50.25 17.6 48.14 17.3 Gender Female 159 (51.0) 65 (64.4) Male 133 (42.6) 31 (30.7) Missing 20 (6.4) 5 (5.0) Education Elementary 8 (0.2) 0 (0.0) Middle/Junior High 20 (6.2) 3 (3.0) High School 131 (42.0) 34 (33.7) Technical 25 (8.0) 8 (7.9) College 101 (32.2) 33 (32.7) Graduate 53 (17.0) 23 (22.8) Race/Ethnic African American 44 (14.1) 19 (18.8) Hispanic American 17 (5.4) 7 (6.9) American Indian 1 (0.3) 1 (1.0) White, not Hispanic origin 232 (74.4) 68 (67.3) Asian/Pacific Islander 5 (1.6) 1 (1.0) Other 9 (2.9) 3 (3.0) Missing 4 (1.2) 2 (2.0) Years that has had related problems Less than a year 50 (16.0) 7 (6.9) 1 through < 4 years 51 (16.4) 20 (19.8) More than 4 years 35 (11.3) 59 (58.4) Missing 32 (10.3) 15 (14.9)

PAGE 50

50 Table 2-3. Demographic informa tion of research participants INDICES (CRITERION) THREE CONSTRUCTS POSITIONING/TRANSFER LIFTING/CARRYING WALKING/MOVING 3-FACTOR MODEL 1-FACTOR MODEL 1-FACTOR MODEL 1-FACTOR MODEL CHI-SQUARE 1814.065 1554.932 936.447 359.304 DF 128 60 30 24 P-VALUE (> 0.05) .000 .000 .000 .000 CFI (> 0.95) .741 .667 .880 .960 TLI (> 0.95) .907 .872 .932 .978 RMSEA (< 0.06) .205 .310 .327 .220 WRMR (< 0.1) 2.994 3.954 4.387 2.842 Table 2-4. Number of reta ining factors for the ICFAM Criteria Positioning/Transfer Lifting/Carrying Walking/Moving Eigenvalue ( >1) 11 6 3 Variance ( >5%) 4 4 3 Scree test 3 3 2

PAGE 51

51 Table 2-5. Factor structure of posi tioning/transfer construct following EFA POSITIONING/TRANSFER F1 F2 F3 F4 1) staying in a lying position on your favorite side for 1 hour .004.505.347.032 2) staying in a lying position on your favorite side for 2-4 hours .018.437.538.047 3) staying in a lying position on your favorite side for 5-8 hours .099.375.596.133 4) staying in a lying position on your back for 1 hour .088.414.517-.037 5) staying in a lying position on your back for 2-4 hours .131.320.676-.013 6) staying in a lying position on your back for 5-8 hours .159.188.751.046 7) staying in a lying position on your stomach for 1 hour .153-.092.628.503 8) staying in a lying position on your stomach for 2-4 hours .163-.085.753.421 9) staying in a lying position on your stomach for 5-8 hours .189-.082.783.400 10) staying in a seated position for 10-20 minutes .049.593.297-.140 11) staying in a seated position for 30-60 minutes .094.570.423-.127 12) staying in a seated position for 1-2 hours .072.444.544-.070 13) staying in a seated position for 3-4 hours .113.246.651.053 14) staying in a standing position for 10-20 minutes .370.560.278-.044 15) staying in a standing position for 30-60 minutes .446.417.362-.017 16) staying in a standing position for 1-2 hours .503.296.469.027 17) staying in a standing position for 3-4 hours .502.208.515.080 18) staying in a kneeling position on both knees for 5-10 minutes .648.222.216.066 19) staying in a kneeling position on both knees for 10-20 minutes .661.119.305.088 20) staying in a squatting position for 1-2 minutes .672.140.171.076 21) staying in a squatting position for 3-5 minutes .634.077.267.051 22) bending at the waist while standing for 1-5 minutes .335.596.137.045 23) bending at the waist while standing for 5-10 minutes .363.429.248.137

PAGE 52

52 Table 2-5. Continued POSITIONING/TRANSFER F1 F2 F3 F4 1) shifting your weight while lying in bed .178.675.165.247 2) shifting your weight while sitting in a chair with armrests .166.712.064.192 3) shifting your weight while sitting in a chair without armrests .248.593.147.218 4) shifting your weight while standing .302.604.061.049 5) shifting your weight while kneeling on both knees .751.203.092.176 6) shifting your weight while squatting .808.179.088.195 7) rolling over from your back to your side .118.677.165.211 8) rolling over from your stomach to your side .200.230.293.534 9) changing position from sitting to lying down .096.739.119.153 10) changing position from lying on you r back to sitting .136.731.169.174 11) changing position from lying on you r side to sitting .135.723.146.234 12) changing position from standing to sitting in a chair .251.613.113.177 13) changing position from sitting in a chair to standing .371.592.141.194 14) changing position from kneeling to sitting on the floor .800.159.050.341 15) changing position from sitting on the floor to kneeling .811.139.058.325 17) changing position from kneeli ng to standing .824.217.102.147 16) changing position from standi ng to kneeling .815.202.065.150 18) changing position from squatti ng to kneeling .836.104.012.308 19) changing position from kneeli ng to squatting .844.112.027.307 20) changing position from standing to squatting .813.171.041.214 21) changing position from squatti ng to standing .812.175.052.206 1) while scooting yourself up/back into a chair .129.567.006.385 2) while scooting yourself along a couch .142.556.016.422

PAGE 53

53 Table 2-5. Continued POSITIONING/TRANSFER F1 F2 F3 F4 9) moving yourself from sitting on a chair to sitting on the floor .524.315.103.496 11) moving yourself from a low mattress/futon to the floor .490.232.087.624 10) moving yourself from sitting on the floor to sitting on a chair .534.328.117.518 12) moving from the floor to a low mattress/futon .498.222.110.628 13) moving yourself while sitting on your bed, scoo ting along the edge of your bed .152.513.038.359 14) moving yourself while lying on your bed, scooting up in your bed .131.582.105.357 15) moving yourself into a bathtub to take a bath .224.163.132.750 16) moving yourself out of a bathtub after taking a bath .241.157.153.740 17) moving yourself into a bathtub to take a shower .248.239-.006.555 18) moving yourself out of a bathtub after taking a shower .256.248-.006.525

PAGE 54

54 Table 2-6. Factor structure of lif ting/carrying construct following EFA LIFTING/CARRYING F1 F2 F3 F4 1) lifting 1 pound from floor to waist hei ght with your hand(s) .083.317.671.157 2) lifting 1 pound from waist height to shoulder height with your hand(s) .184.208.805.070 3) lifting 1 pound from shoulder height to above your head with your hand(s) .286.134.720.120 4) lifting 5 pounds from floor to waist he ight with your hand(s) .350.362.622.143 5) lifting 5 pounds from waist height to shoulde r height with your hand(s) .526.291.625.050 6) lifting 5 pounds from shoulder height to above your head with your hand(s) .570.161.589.131 7) lifting 10 pounds from floor to waist height w ith your hand(s) and arm(s)? .566.371.429.212 8) lifting 10 pounds from waist height to shoulder heig ht with your hand(s) and arm(s) .733.264.386.182 9) lifting 10 pounds from shoulder height to above your head with your hand(s) and arm(s) .734.189.379.246 10) lifting 25 pounds from floor to waist height with your hand(s) and arm(s) .721.315.160.267 11) lifting 25 pounds from waist height to shoulder he ight with your hand(s) and arm(s) .854.158.104.275 12) lifting 25 pounds from shoulder height to above your head with your hand(s) and arm(s) .792.062.114.326 15) carrying 1 pound in your hand(s) 25 feet (for example, from car to front door) -.077.567.484.175 16) carrying 5 pounds in your hand(s) 25 feet .092.593.516.196 17) carrying 10 pounds in your hand(s) and arm(s) 25 feet .338.584.338.264 18) carrying 25 pounds in your hand(s) and arm(s) 25 feet .603.351.149.377 19) carrying 10 pounds (for example, bag of groceries) up one flight of stairs .282.293.158.494 20) carrying 10 pounds (for example, bag of groceries) down one flight of stairs .299.278.120.480 21) carrying an infant cradled in your arms .127.120.118.753 22) carrying a toddler on your hip .172.082.126.785 23) carrying a toddler on your shoulders .200-.010.074.864 24) carrying a toddler on your back (for example, piggyback) .247-.031.068.823 1) pulling open a full-size refrigerator door .073.550.177-.046

PAGE 55

55 Table 2-6. Continued LIFTING/CARRYING F1 F2 F3 F4 2) pushing open a heavy door (for example, department/convenience store door) .267.732.124-.008 3) pulling open a heavy door (for example, department/convenience store door) .271.710.116.026 4) pushing a shopping cart .107.627.155.158

PAGE 56

56 Table 2-7. Factor structure of wa lking/moving construct following EFA WALKING/MOVING F1 F2 F3 1) walking within your home/living environment .802 .236 .124 2) walking 2-4 blocks (about 1/4 mile) without stopping .662 .416 .271 3) walking 4-8 blocks (about 1/2 mile) without stopping .570 .462 .360 4) walking one mile without stopping .411 .470 .474 5) walking on carpeting .836 .092 .062 6) walking on grass .832 .178 .130 7) walking on gravel .649 .262 .223 8) walking over small obstacles on th e floor (for example, toys, shoes) .662 .282 .165 9) walking in a crowded place (for exam ple, outdoor marketplace, shopping mall) .690 .382 .178 1) climbing down one flight of stairs .412 .802 .193 2) climbing up one f light of stairs .388 .787 .231 3) climbing down two flights of stairs .291 .789 .353 4) climbing up two f lights of stairs .259 .776 .396 5) climbing up or down a 3-step stool .371.442.503 6) climbing up or down a 6-foot ladder .284.387.551 7) stepping up or down a standard curb .672.370.138 8) stepping onto or off a bus .320.376.463 9) stepping into or out of an elevator .499.250.165 10) running one block .088.214.840 11) jogging one mile .064.139.838

PAGE 57

57 Table2-8. Fit statistics for positioning/transfer construct Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation kneeling 10-20 minutes 1.59 .15 .97 -.1 .87 -.6.61 lying stomach 5-8 hours 1.49 .141.32 2.01.36 1.8.47 lying stomach 2-4 hours 1.22 .131.44 2.81.41 2.2.40 standing 3-4 hours 1.16 .13 .98 -.11.02 .2.59 lying back 5-8 hours 1.14 .131.02 .21.02 .2.54 lying side 5-8 hours .88 .131.11 .81.14 .9.47 seated 3-4 hours .85 .13 .98 -.11.10 .7.47 change position kneeling to squatting .82 .131.11 .81.00 .1.61 shift squatting .81 .13 .95 -.3 .88 -.8.67 shift kneeling .78 .121.06 .51.00 .0.61 kneeling 5-10 minutes .77 .131.21 1.61.13 .9.58 squatting 3-5 minutes .74 .121.36 2.61.33 2.2.55 change position squatting to kneeling .71 .121.22 1.71.14 1.0.62 standing 1-2 hours .67 .12 .82 -1.5 .86 -1.0.63 moving floor to low mattress/futon .66 .12 .99 -.1 .94 -.4.67 moving low mattress/futon to floor? .61 .121.17 1.41.06 .5.69 bending waist 5-10 .60 .12 .94 -.5 .99 .0.53 lying stomach 1 hour .59 .121.47 3.31.45 2.9.47 change position sitting floor to kneeling .54 .12 .91 -.7 .87 -.9.66 change position kneeling to standing .51 .12 .69 -2.7 .66 -2.8.67 change position kneeling to sitting floor .45 .12 .94 -.4 .91 -.7.71

PAGE 58

58 Table 2-8. Continued Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation change position squatting to standing .45 .12 .83 -1.4 .79 -1.6.65 lying back 2-4 hours .41 .121.03 .31.04 .4.50 moving sitting floor to sitting chair .38 .12 .74 -2.2 .73 -2.2.65 moving out bathtub taking a bath .37 .121.35 2.61.33 2.3.54 seated 1-2 hours .26 .12 .96 -.31.06 .5.42 lying side 2-4 hours .25 .121.12 1.01.27 1.9.32 moving into bathtub take bath .25 .121.51 3.71.49 3.3.55 change position standing to squatting .23 .121.01 .1 .94 -.4.68 squatting 1-2 minutes .23 .121.40 2.91.33 2.3.58 change position standing to kneeling .14 .12 .87 -1.0 .83 -1.3.73 moving sitting chair to sitting floor .04 .12 .90 -.8 .87 -1.0.66 standing 30-60 minutes -.05 .12 .86 -1.1 .82 -1.4.58 bending waist 1-5 minutes -.15 .12 .83 -1.4 .86 -1.1.58 rolling stomach to side -.22 .121.30 2.21.32 2.1.49 lying back 1 hour -.37 .131.20 1.51.33 2.2.43 moving out bathtub taking a shower -.40 .131.50 3.41.51 3.1.48 seated 30-60 minutes -.45 .13 .88 -.81.03 .3.40 moving into bathtub take shower -.51 .131.49 3.21.52 3.1.53 lying side 1 hour -.56 .13 .82 -1.3 .80 -1.4.43 change position lying back to sitting -.66 .13 .59 -3.5 .65 -2.6.51 shift sitting chair without armrests -.80 .13 .94 -.4 .90 -.6.55

PAGE 59

59 Table 2-8. Continued Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation moving lying bed, scooting up in bed -.84 .14 .84 -1.1 .83 -1.0.50 change position sitting chair to stan ding -.85 .14 .47 -4.5 .54 -3.5.60 change position lying side to sitting -.85 .14 .57 -3.5 .58 -3.1.61 rolling back to side -.94 .14 .67 -2.5 .75 -1.6.53 standing 10-20 minutes -1.00 .14 .74 -1.9 .71 -1.9.60 moving sitting bed scooting edge of bed -1.04 .141.09 .61.16 1.0.49 shift lying in bed -1.12 .14 .59 -3.1 .58 -2.8.60 scooting along a couch -1.16 .15 .88 -.7 .82 -1.0.54 change position sitting to lying down -1.32 .15 .79 -1.4 .81 -1.1.54 shift while standing -1.38 .161.11 .71.06 .4.48 scooting up/back into chair -1.41 .16 .95 -.3 .83 -.9.58 seated 10-20 minutes -1.46 .16 .88 -.7 .96 -.1.42 shift sitting chair with armrests -1.46 .16 .60 -2.8 .65 -2.0.57 change position standing to sitting ch air -1.62 .17 .87 -.7 .89 -.5.50

PAGE 60

60 Table 2-9. Fit statistics for lifting/carrying construct Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation carrying toddler on shoulders 2.82 .191.45 2.01.68 1.7 .43 carrying toddler on back 2.73 .191.34 1.61.45 1.3 .50 carrying toddler on hip 1.79 .151.56 3.21.28 1.2 .56 lifting 25 pounds shoulder to above head 1.74 .15 .84 -1.0 .76 -1.1 .71 carrying infant in arms 1.62 .141.95 5.11.89 3.5 .46 lifting 25 pounds waist to shoulder 1.34 .14 .74 -2.0 .75 -1.4 .73 lifting 25 pounds floor to waist 1.19 .13 .70 -2.4 .76 -1.4 .73 carrying 10 pounds down one flight stairs 1.16 .141.43 2.71.57 2.8 .56 carrying 10 pounds up one flight stairs 1.01 .131.26 1.81.45 2.4 .59 carrying 25 pounds 25 feet .96 .13 .75 -1.9 .72 -1.8 .74 lifting 10 pounds shoulder to above head .77 .13 .64 -3.0 .60 -2.8 .79 lifting 10 pounds waist to shoulder .44 .13 .59 -3.6 .56 -3.4 .79 lifting 10 pounds floor to waist .21 .13 .89 -.8 .86 -.9 .69 lifting 5 pounds shoulder above head -.18 .13 .89 -.8 .80 -1.2 .70 carrying 10 pounds 25 feet -.41 .14 .77 -1.7 .68 -2.0 .72 lifting 5 pounds floor to waist -.58 .14 .80 -1.5 .85 -.8 .68 lifting 5 pounds waist to shoulder -.64 .14 .68 -2.4 .63 -2.3 .72 pulling wet laundry out washing machine -.72 .14 .86 -.9 .91 -.4 .62 lifting 1 pound shoulder to above head -1.01 .151.21 1.31.08 .4 .57 pulling open a heavy door -1.07 .15 .86 -.9 .85 -.6 .58 lifting 1 pound floor to waist -1.42 .161.13 .81.27 1.1 .55

PAGE 61

61 Table 2-9. Continued Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation pushing open a heavy door -1.42 .16 .81 -1.1 .72 -1.2 .61 carrying 5 pounds 25 feet -1.47 .16 .76 -1.5 .77 -.9 .62 pushing a shopping cart -1.85 .181.54 2.51.31 1.0 .41 lifting 1 pound waist to shoulder -1.86 .18 .87 -.6 .69 -1.1 .57 carrying 1 pound 25 feet -2.15 .201.44 1.91.16 .6 .45 pulling open refrigerator door -2.99 .271.00 .1 .72 -.5 .40 Table 2-10. Fit statistics for walking/moving construct Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation jogging one mile 3.17 .21 1.56 2.12.34 2.2 .53 running one block 2.54 .181.80 3.42.16 2.5 .57 climbing up or down a 6-foot ladder 1.35 .141.50 2.91.25 1.1 .69 climbing up two flights of stairs .94 .14 .81 -1.4 .69 -1.6 .78 walking one mile .90 .13 .72 -2.1 .71 -1.6 .79 climbing down two flights of stairs .69 .13 .87 -.9 .76 -1.3 .76 climbing up or down a 3-step stool .65 .131.13 .91.03 .2 .72 stepping onto or off a bus .60 .131.38 2.41.49 2.3 .65 walking 4-8 blocks .24 .13 .74 -2.0 .70 -1.7 .76 climbing up one flight of stairs .13 .13 .64 -2.9 .65 -2.0 .76 climbing down one flight of stairs -.06 .13 .69 -2.4 .68 -1.7 .74 walking 2-4 blocks -.36 .14 .79 -1.5 .65 -1.7 .71

PAGE 62

62 Table 2-10. Continued Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation walking on gravel -.69 .141.28 1.71.38 1.4 .57 walking crowded place -.69 .14 .75 -1.7 .62 -1.6 .69 walking small obstacles on floor -.95 .151.16 1.01.20 .8 .55 stepping up or down a standard curb -1.14 .16 .86 -.8 .68 -1.0 .61 stepping into or out of an elevator -1.49 .171.62 2.91.38 1.1 .49 walking within home environment -1.53 .17 .80 -1.1 .66 -.9 .57 walking on grass -1.92 .191.01 .1 .76 -.5 .49 walking on carpeting -2.38 .22 .90 -.3 .72 -.4 .44

PAGE 63

63 PERSONS MAP OF ITEMS | 4 X + | | | | | | 3 + | XX | | | | | 2 + X T| X | XXX |T XX | kneeling 10-20 minutes XX | lying stomach 5-8 hours XX | 1 XXX S+ lying back 5-8 hours, lying stomach 2-4 hours, seated 3-4 hours, standing 3-4 hours XXXX | lying side 5-8 hours XXX |S change position kneeling to squatting, shift kneeling, shift squatting XXXXXXXX | bend waist 5-10,change pos kneel-stand, change pos squat-kneel, kneel 5-10 min, mov fl-low matt, squat 3-5 min, standing 1-2 hrs XXXXXXX | change pos sit floor-kneel, change pos squat-stand, lying back 2-4 hrs, lying stomach 1 hr, mov low matt-floor, mov sit fl-sit chair XX | change position kneeling to sitting floor, moving out bathtub taking a bath, seated 1-2 hours XXXXXXXX | change pos stand-kneeling, change pos stand-squat, lying side 1 hr, lying side 2-4 hrs, mov into bathtub, squat 1-2 min 0 XXXXX M+M change position lying back to sitting, moving sitting chair to sitting floor XXXXXXXX | standing 30-60 minutes XXXXXX | bending waist 1-5 minutes, change position lying side to sitting, rolling stomach to side XXXXXX | change posit sit to stand, lying back 1 hr, moving into bathtub for shower, moving out bathtub for shower, roll back to side XXXXXXXXX | XX |S seated 30-60 minutes XXXXX S| change position sitting to lying down, shift sitting chair without armrests -1 X + moving lying bed, scooting up in bed, moving sitting bed scooting edge of bed, seated 10-20 minutes XXXXX | change position standing to sitting chair, scooting along a couch | scooting up/back into chair, shift sitting chair with armrests X | shift while standing, standing 10-20 minutes |T X | shift lying in bed T| -2 X + X | | X | | | | -3 + | Figure 2-1. Item-person map of positioning/tran sfer construct of the ICFAM. Each X' on the left side of map represents 1 sub ject, with Xs and at the top of map representing individuals with hi gh ability and at the bottom of map representing individuals with low ability.

PAGE 64

64 PERSONS MAP OF ITEMS | 5 X + | | X | | | 4 + | | XX | | | 3 T+ X |T carrying toddler on shoulders | carrying toddler on back | XXX | | 2 XX + X | lifting 25 pounds shoulder to above head XX S| XXXXXXXX |S carrying toddler on hip XXXXX | carrying infant in arms, lifting 25 pounds waist to shoulder XXXXX | carrying 10 pounds down one flight stairs, lifting 25 pounds floor to waist 1 XXX + carrying 10 pounds up one flight stairs XXX | carrying 25 pounds 25 feet XXXX | lifting 10 pounds shoulder to above head XXXXXX | XXX M| lifting 10 pounds waist to shoulder XXXXX | 0 XXXXXXXXXXXX +M lifting 10 pounds floor to waist XXX | XX | lifting 5 pounds shoulder above head XXX | carrying 10 pounds 25 feet, pulling open a heavy door XXXX | lifting 5 pounds floor to waist XXXXX | lifting 5 pounds waist to shoulder, pushing open a heavy door -1 XXX S+ lifting 1 pound shoulder to above head, pulling wet laundry out washing machine XXXXX | XXX | X |S carrying 5 pounds 25 feet, lifting 1 pound floor to waist | pushing a shopping cart | -2 XX + carrying 1 pound 25 feet, lifting 1 pound waist to shoulder | T| X | pulling open refrigerator door | X |T -3 X + | Figure 2-2. Item-person map of lifti ng/carrying construct of the ICFAM.

PAGE 65

65 PERSONS MAP OF ITEMS | 5 XXXX + | | | | | 4 + XX | | | | T| 3 XX + XX | jog one mile (6.0 8.0) |T XX | run one block X | XXXXX | 2 XX + X S| XXX | XXXX | XX |S climb up or down a 6-foot ladder XXXXX | 1 X + climb up two flights of stairs XXXX | walk one mile XXX | climb down two flights of stairs, climb up or down a 3-step stool, step onto or off a bus XXXXXX M| XXX | XXXXXXXXXXXX | climb up one flight of stairs, walk 4-8 blocks 0 XXX +M climb down one flight of stairs XXXX | XXXX | XXX | walk 2-4 blocks XX | walk on gravel X | walk crowded place, walk small obstacles on floor(2.5) -1 XXXXXXX S+ XXXX | step into or out of an elevator(2.0) X |S walk within home/living environment(2.0) | step up or down a standard curb(2.0) X | XXX | -2 X + walk on carpeting(2.0), walk on grass(2.0) | T| | |T | -3 XX + | Figure 2-3. Item-person map of walk ing/moving construct of the ICFAM

PAGE 66

66 CHAPTER 3 PRECISION OF THREE SHORT FORMS FOR BACK PAIN Introduction Fixed short f orms have been primarily used in health assessment for the last 30 years to achieve psychometric efficiency ( 8,28,35,44-46,55). Shortened instruments with good psychom etric properties have developed in resp onse to growing demands for reducing test administration time, respondent burden, and study co sts. Several short forms have evolved from generic health measures. For example, the Duke Health Profile-12 and Short Form-36 (SF-36) developed from the Medical Outcome Scale (MOS ), while the Physical Function-10 (PF-10)/PF12 Physical Component Summary (PCS) were generate d from the SF-36. Although these instruments were originally designed to measure either the overall health status or physical function in general populations, they have also been used with back pain populations. Additionally, short forms have been developed from condition-specific measures for back pain ( 62), for instance, the 24-item Roland-Mor ris Disability Questionnaire (RMDQ) ( 6,7) developed from items on the 136-item the Sickness Impact Pr ofile and the 18-item de rived its items from multiple instruments including the Oswestry Di sability Index, the RMDQ, and the PF-10. In creating the short form of an assessment, th e goal is to select the least number of items necessary while maintaining adequate precision in measuring the latent trait ( 119). That is, the m ajor challenge in developing fixed short forms is to achieve psychometric efficiency with fewer items without sacrificing measurement precision ( 8,15,36,44-46). The creation of fixed short for m has been largely driven by the comprehens iveness and breadth of prior assessment. These assessments were particularly burdensome for respondents and test administration. However, when the number of items are reduced substantially (as it is often the case), the partial loss of measurement precision is inevitable ( 8). Several studies indicat e that balan ce between

PAGE 67

67 comprehensiveness and precision of measurement should be taken into ac count when developing a short form (8,44,46,115). The loss of precision may app ear regardless of which items investigato rs eliminate because fewer items would leave more gaps in measurement across the ranges of person ability. In genera l, deficits in precision often occur when items do not closely match ability level (i.e., disability level). Thus, items should be c hosen to match ability in order to enhance measurement precision ( 44, 46). For example, when an easy test is administered to individuals of high ability (i.e., lo w disability) o r a difficult test is administered to individuals of low ability (i.e., high disability) measurement preci sion is insufficient to differentiate the ability level of the individuals. Critical questions are to what extent and by what methods can the precision of short forms be optimized. Traditionally, Classical Test Th eory (CTT) methodologies have been used to select items from lengthy assessments to create short forms. These methods often in clude the deletion of items with low item-total correlations, least impact on the overall internal consistency of test, and low factor loadings ( 60). Of these methods, Cronabachs is one of the m ost commonly used methods for selecting and elimina ting items that have the least im pact on internal consistency of the test. However, copious studies indicate that the values obtained for Cronbachs are reliant on the particular sample used (sample-dependent ) and thus, do not reflect inherent, stable property of the test ( 117, 118, 120, 121). The estimated that is a property of observed responses of a sa mple cannot be generalized to different sa mples. In addition, several studies indicate that coefficient could be influenced by many factors such as 1) test length (i .e., longer tests are more reliable than shorter ones) ( 60), 2) test items not well matche d to the individuals (i.e., too easy or difficult) ( 122), 3) missing data ( 60). These methods do not address the importance of

PAGE 68

68 maintaining items with difficulties that reflect th e range of person abilities in the population of interest ( 119). In addition to the traditional approach using Cronbachs to make item deletion decision, Mallinson and colleagues (2004) advocated use of the separation ratio (SR) in item reduction. The SR indicates the impact that removing an item or items has on measurement precision. Velozo and colleagues (2000) investigated the use of item reduction procedures based on IRT methodologies. These researchers recommended de leting items with high/low mean square residuals, similar item difficulty calibrations, an d substantial influence on person separation. In other studies IRT methods, items were selected based on: 1) frequency of administration in Computer Adaptive Testing (CAT), high test inform ation, and 3) broad item difficulty coverage. The ICF Activity Measure (ICFAM) has recently been developed to create an efficient and precise measurement system based on the activity dimension of World Health Organizations (WHO) International Classifica tion of Functioning, Disability and Health (ICF). The ICF provided the conceptual framew ork and classification system for generating the items on the ICFAM. Activities involving movement, moving around and daily life ac tivities were the subcategories of the ICF activity dimension consu lted in the development of items. Items were developed with the intent to represent the entire range of ability on each construct, thus, creating an equiprecise measure. Using IRT and Computer Adap tive Testing (CAT) methods, Velozo and colleagues ( 41) created the ICFAM, a web based computer adaptive survey system. The adm inistrative core of the instrument allows adju stments to be made to various settings making it possible to change, the initial theta value (i.e., difficulty of question first given to respondent) and stopping rule (i.e., guidelines for terminati ng the test). Because questions are targeted to individuals at their abil ity level requiring 5-10 questions per c onstruct are required to reach at a

PAGE 69

69 final measure of person ability with acceptable error. In addition, immediate feedback is provided to the respondents/clinicians in the form of a graph and summary statistics. In the current study, we attempted to devel op short forms using IRT methodologies for the constructs on the ICFAM that were most relevant to individuals with chronic back pain. The goal was to create three efficient short forms while maintaining adequate precision. In contrast to several methods of shortening instrument based on CTT, the IRT approach places more focus on itemlevel psychometrics than the test as a w hole. In addition, IRT me thods do not concentrate on estimates of reliability (i.e., Cronbachs ) as indicators for reliable measurement since these statistics are sample-dependent varying from sample to sample. The purpose of the present study is twofold. First, we removed items to create three 10item short forms, one for each of the appli cable constructs (i.e., positioning/transfer, lifting/carrying, and walking/moving), which are ps ychometrically comparable to the entire set of items in each construct. Second, we investigat ed the item level psychometrics and precision of these three newly generated short forms using the Rasch rating scale model. Method Research Participants The data use d in this study was colle cted during the development phase of ICFmeasure.com. Funding for the developmen t of ICFAM was obtained from the National Institute of Disability and Rehabilitation Research (NIDRR). The study was approved by the Institutional Review Board of the University of Florida (Approved by IRB # 568-2000). Stages in the development of ICFmeasure.com included: 1) presenting potential items to focus groups, 2) consulting with a professional panel, 3) cognitive interviewing with individuals with disabilities, and 4) a paper-pencil field test. These stages resulte d in the 264 items that make up

PAGE 70

70 the ICFAM item bank. Data from the 101 individuals with back pain who completed the paperpencil version was analyzed in the current study (Table 3-1). Instrumentation The ICFAM consists of six constructs: positioning/transfers, lifting/carrying, fine hand, walking/climbing, wheelchair/scooters, and self car e activities based on the activity dim ension of the ICF. Three of these construc ts are particularly relevant to individuals with back pain, positioning/transfer (56 items), lifting/carrying ( 27 items), and walking/moving (20 items). We chose these constructs based on tw o criteria: 1) most frequently cited problem activities for those with back pain, and 2) relevan ce of activities in th e construct to the population of individuals with back pain. Our hypothesis wa s that the 103 items se lected would represent three distinct latent abilities as divided into s ubcategories of the ICFAM constructs. In an effort to overcome limitations of the CTT-based short form construction procedure, the Rasch rating scale model (one-parameter IR T model) was employed. An iterative approach was used to identify items that could be eliminated based on four criteria: 1) high mean square, 2) low mean square, 3) similar calibrations to othe r items, and 4) person separation value (i.e., item was retained if analysis with the item re moved substantially decreased person separation) ( 45). High or low mean square values indicate th at the item may measure a different construct or need further clarification to fit the Rasch model. Similar calibrations may indicate redundant items. Removal of redundant items (i.e., items ha ving similar calibrations) was considered to be appropriate if the range of ability level (i.e., ranges between the most difficult and the easiest item) and intervals between items were maintained on the item-person map. In addition, after item removal the separation ratio (SR) and person re liability (analogous to Cronbachs ) were examined. If these two values de creased minimally after item removal, this was considered as supporting the deletion of the item. Person separation indicates whether

PAGE 71

71 items are effectively separating individuals into distinct levels (i.e., discriminating people of differing ability). The separation ratio (SR) provide s an indication of the number of statistically significant strata or categories of ability (e.g., low, medium, and hi gh ability) that the sample is being divided into. The formula used to calculate the separation ratio is SR = (4Gp+1)/3, where Gp represents the person sepa ration value provided by the Wins teps software output. Response categories on the ICFAM include four choices w ith a lower score representing lower level of ability: (no difficulty), (some difficulty), (a lot of difficulty), and (have not done). This rating scale is used on all three constructs. If the activity did not occur within the last 30 days, the participant was instructed to select have not done. In this study, this rating (i.e., have not done) was treated as the maxi mum difficulty rating. This was based on the rationale that the most likely explanation as to why an activity was not performed during the last 30 days was due to inability to perform the task ( 123). Rasch Rating Scale Model The Rasch r ating scale model can be expl ained by a probability equation: ln (Pnik/Pni(k-1)) = Bn Di Fk The left side of the equation is the logarithmic function (ln is the natural logarithmic which uses e = 2.718 as the base). Pnik is the probability that person n, encountering item i would be observed in category k By taking the probability of passing a rating category k ( Pnik) divided by the probability of pa ssing one less rating category k-1 ( Pni(k-1)), it computes the odds ratio of passing the rating category from the k rated to the k1 level. The log transformation turns ordinal level data into interval level data where the probability of passing the rating scale at the next higher level can be a conjoi nt measurement of the person ability (Bn), item difficulty (Di) and the step measure between the rating categories Fk. The unit of measurement that results when the Rasch model is used to transform raw scores into log odds ratios on a common interval scale is the logit ( 95).

PAGE 72

72 Data Analysis Using W insteps software program ( 103, 104), the Rasch rating scale model was employed to determ ine model fit, as well as, item leve l psychometrics of the ICFAM was conducted to determine the model fit as well as the item level psychometrics of the ICFAM. The Rasch model (i.e., one-parameter IRT model) is the most robust of the IRT models. That is, stable and accurate item parameters (e.g., fit statistics) can be obtained with a relatively small sample size ( 105). The Winsteps program produces goodness of fit statistics for each item and person. These fit statistics are used to identify items that did not fit the unidimensional Rasch model. Infit and outfit mean square (MnSq) values greater than 1.4 and smaller than 0.6 indicate misfit indicate that the item was responded to erratically rela tive to other items (i.e., the item misfits) (95, 106). This type of inconsistent pattern of res ponses m ay indicate that the item is measuring a different construct or that the item was poorly understood and need s clarification. Infit is inliersensitive or information-weighted fit. This type of fit is more sensitive to the pattern of responses to items at a persons ability level (i.e., those items which an individual has 50% chance of passing). Outfit is outlier sensitive fit. In contrast to infit, outfit is more sensitive to the pattern of responses to items with difficulty far from a person ( 107). Rasch analysis also provides point measure correlation coefficients as an imm ediate evaluation of response-level scoring. If the item -level scoring accords with the latent variable, these correlations will be positive. A negative correlations coefficient might indicate a reverse scored item. The point measure correla tions are acceptable if they are > 0.3 (108). Rasch analysis also produces estim ates of person ability and item difficulty. These estimates are on a log odd unit (i.e., logit) scale. The average item difficulty is arbitrarily set at "0" logits with positive logits indicating higher than average probabilities and negative logits indicating lower than average probabilities ( 95).

PAGE 73

73 Rasch analysis also provides person separation, which is an index of the sample standard deviation in terms of standard error units and person reliability (analogous to Cronbachs ), which is the proportion of observed sample variance that is not attributable to measurement error ( 104). The separation ratio (SR) values, which allo ws determ ining whether items are effective in separating individuals into statis tically distinct ability levels. The SR provides an indication of the number of statistically signi ficant strata into meaningful categories (e.g., low, medium, and high ability back pain groups). The formula used to calculate is SR = (4Gp+1)/3, where Gp represents person separation ( 124). Prior to con ducting the Rasch analysis to obta in the item level psychometrics, confirmatory factor analysis (CFA) was used to test the un idimensionality of the three short forms. MplusTM (Muthn & Muthn, Los Angeles, CA, version 4.21) was used to determine the goodness of fit of items to one-factor model of each short form. The following criteria were used to determine the goodness of fit to the one factor models; 1) p-value of chi square > 0.05, 2) Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) cl ose to 1.0, 3) root mean square error of approximations (RMSEA) < 0.06, and 4) weighted root mean square residual (WRMR) < 0.01 (98, 99). Because a o ne-factor model was not sufficient, exploratory factor analysis (EFA) was used to further investigate the poten tial factor structure. MplusTM (Muthn & Muthn, Los Angeles, CA, version 4.21) was used to conduct EFA. We applied the unweighted least squares method for estimators, varimax rotation following the in itial factor extraction, and replaced missing values with mean values. Criteria to determine the number of factors to retain was 1) Kisers eigenvalues greater than 1, 2) f actors accounting for greater than 5% of total variance, and 3)

PAGE 74

74 scree test where the slope changes substantially in the factor versus eigenvalue graph ( 101). A criterion of greater than 0.46 was used to in dicate a significant loading on a factor ( 102). Test information function reports the statistic al infor mation in the data corresponding to the complete test. In general, the precision with which a parameter is estimated is measured by the variability of the estimates around the value of the parameter. Thus, the variance, symbolized 2 provides a measure of precision of the estimators. The amount of information, denoted by I, is the reciprocal of variance. Statistically, when the standard deviation of person ability estimates about the examinees ability is squared, the term represents the varian ce and is a measure of the precision with which a given ability level can be estimated. From the above explanation, the amount of information at a given level is the reciprocal of th is variance. If the am ount of information is large, it means that the person ability may be estimated with high precision at a given ability level and the estimates will be close to the true value of ability. If the amount of information is small, it means that the person ability may be estimated with low precisi on and the estimates will be widely scattered around the true value of ability. In order to determine how preci sely the items on each of the short forms estimate person ability across the full range of the construct, the test information function was examined. Results Sa mple demographic and clinical information is presented in Table 3-1. The average age of the sample was 48 years and nearly 80% of part icipants reported having back pain more than a year, indicating it was a chronic condition. A series of Rasch analysis were conducted to develop three short forms consisting of ten items from th e three constructs of the ICFAM (Table 3-6). Figure 3-1, 3-2, and 3-3 present item-person map of each construct of the ICFAM following and prior to the item reduction. Each in the right side of map repr esents the locations of deleted

PAGE 75

75 items from original full set of items, while each X on the left side of map represents one study participant. Short Form for Positioning/Transfer The item s lying down on stomach 2-4 hours and moving into bathtub to take shower items were removed after the first Rasch analysis due to high infit/outf it statistics (1.44/1.41 and 1.49/1.52, respectively) (Table2-8). After a few ite rations of Rasch analysis attempting to maintain adequate person separation, ten item we re selected from an entire set of 56 items. The ten newly developed items for positioning/transfer construct fit the Rasch model. All items showed exceptional infit/outfit except one item (moving out from bathtub after taking a bath) which had slightly high infit/out fit (1.46/1.41) (Table 3-7). These items also retained moderate point measure correlations with values ranging from 0.52 0.67 (prior to item reduction range was from 0.32 0.73). Ten items had a slightly better sp read (-2.33 3.30 log its) in relation to person ability than the entire set of 56 items (1.96 2.87 logits) (Figure 3-1). Item calibrations for the ten remaining items were similar to the calibrations before the deletion of 46 items. However, person separation decreased considerably from 4.52 to 1.86 (SR decreased from 6.36 to 2.81). That is, the newly developed short form for the positioning/transfer construct separated individuals with back pain into nearly three groups, while the entire set of items separated individuals into six groups. Pers on reliability (analogous to Cronbachs alpha) of the short form was acceptable (0.78 compared to 0.95 with all items). A confirmatory factor analysis (CFA) was conducted on the positioning/transfer 10-item short form (Table 3-2). An exploratory fact or analysis (EFA) was conducted to further investigate the factor structure of the construct. This analysis suggested that a three factor solution was more appropriate (T able 3-3). We retained thre e factors based on the Kisers criterion of eigenvalue greater than 1. These fact ors accounted for 60% of to tal variance (the first

PAGE 76

76 factor accounting for 36%, the second factor acc ounting for 13 %, and the third factor accounting for 11%). Table 3-3 presents the factor loadings of the ten short form items (factor loadings greater than 0.46 are in bold). Four items loaded onto factor 1, four items onto factor 2, and two items onto factor 3 (2 of 10 items). One item (staying in a lying position on back for 1 hour) did not load onto any factor and one item (bendi ng at the waist while st anding for 1-5 minutes) loaded onto more than one factor (factorial comp lex). Items appeared to load onto factors based on item difficulty (i.e., easy, m oderate, and difficult items). The empirical hierarchy of item difficulty was scrutinized with estimated item difficulty calibrations, which are expressed in logits w ith higher positive valu es indicating a more challenging task. Item difficulty calibrati ons of ten newly developed items for positioning/transfer construct followed a logical progression in terms of motor control theory. The most challenging item was kneeling 1020 minutes (1.84.16 logits), while moderately challenging item was standing 1-2 hours (0.79.13 logits) and the least challenging item was changing position standing to sitting in chair (-1.88.18 logits) (Table 3-7). With respect to motor control theory, kneeling task s would appear to be more diffi cult than static standing, since as kneeling is an unnatural position one could th ink that balance contro l would be poorer than that found during static standing ( 125). Similarly, with no doubt, one could easily think that static s tanding 1-2 hours would be more difficu lt than changing position standing to sitting in chair. However, this logical progression of item difficulty did not reflect our hypothetical hierarchy of item difficulty based on MET values. Figure 3-5 presents test information function of the positioning/transfer construct with the entire set of 56 items versus the10-item short form. Following item reduction, the test information function of short form moved to the ri ght in comparison to that of its entire set of

PAGE 77

77 items. The figure shows that the amount of inform ation in the entire item set reached to its maximum (37.77) at a person ability of logits, then decreased rapidly as either the ability estimate increased or decreased. With the tenitem short form, the amount of information reached to maximum (6.27) at a person ability of -0.15 logits. That is, th e entire set of items provided the most precise measure of person ability near the cent er of the ability range, while short form provided the most preci se measure of person ability at a slightly lower level than the center of the ability range. The figure also show s the extent to which item reduction contributes to lost test information (i.e., pr ecision) at a particul ar ability estimate. Precision peaked at -0.49 and -0.42 logits and decreased rapidly as either the ability estimate increases or decreases. Short Form for Lifting/Carrying On the first Rasch analy sis run (with 27 ite ms) carrying toddler on shoulder, carrying infant in arms, and carrying 10 pounds down one flight stairs items had high infit/outfit statistics (1.45/1.68, 1.95/1.89, and 1.43/1.57, respectively) and were thus removed (Table 2-9). After several iterations of Rasc h analysis, attempting to maintain adequate person separation, ten item were selected from an en tire set of 27 items. The ten items retained to create the lifting/carrying short form all conformed to the Rasch model except one item. The item was carrying toddler on back (infit/outfit =1.90/2.17) (Table 3-8). The item of the short form exhibited moderate to high point measure corre lations ranging from 0.42 to 0.83, compared to the range of the entire set of items, which was 0.40 to 0.79. The ten items of the short form had a slightly better spread of person ability (-3.10 to 4.80 logits) than the entire set of items (-2.72 to 4.30 logits). Item calibrations of the ten-item s hort form remained relatively stable after the 17 items were deleted. However, person separati on decreased from 3.67 to 2.49 (SR decreased from 5.23 to 3.65). That is, the 10-item short form for the lifting/carrying construct separated individuals with chronic back pain into nearly 4 groups, while the entire set of items separated

PAGE 78

78 the individuals into six groups. Person reliability (analogous to Cronbachs ) for the short form was 0.86, compared to 0.93 for all the items. A confirmatory factor analysis was conducted on the 10-item short form to test for unidimensionality. The one factor model proved to be inadequate (Table 3-2). An exploratory factor analysis to further investigate the factor structure suggested that a two factor solution was more appropriate (Table 3-4). We retained two factors based on the Kiser criterion of eigenvalue greater than one. These two factors accounted for 64% of total variance (the first factor accounting for 48% and the second factor accoun ting for 16%). Table 3-4 presents factor loadings of 10 short form items (factor loadings greater than 0.46 are in bo ld). Five items loaded onto factor 1 and six items loaded onto actor 2, while one item (lifting 10 pounds from waist height to above your head with your hand) loaded on more than one factor. Items tended to load onto factors based on item difficulty with activ ities involving lifting heavy objects loading onto one factor and those involving liftin g light objects onto the other factor. Figure 3-6 presents the test information functi on for the lifting/carrying construct with the entire set of 27 items versus the 10-item s hort form. Following the item reduction, the test information function is slightly moved to the ri ght in comparisons to the entire set. With the entire set of items, the amount of information peaked (12.09) at a pers on ability of near logits, and decreased rapidly as either the ability estimate increased or decreased. That is, the entire set of items provided the most precise m easure of person ability ne ar the center of the ability range, while the short form provided the most precise measure of person ability at a slightly higher level than the center of the abil ity range. The figure also shows to what extent item reduction for short form lost test informati on (i.e., precision) at a particular person ability. The removal of 17 items from entire set resulted in considerable loss of measurement precision,

PAGE 79

79 which decreased from 12.09 to 4.85 in informati on. In addition, this loss of precision peaked near logits, decreased slight ly as the ability estimate increases, and decreased rapidly as either the ability estimate increases or decreases Thus, the entire set of 27 items on the for the lifting/carrying construct estimated person abilit y with greater precision than did the 10-item short form, especially near the center of the ability range. Short Form for Walking/Moving On the first Rasch analysis run (with all 20 item s), the items jogging one mile, running one block and stepping into or out of elevator were removed due to high infit/outfit statistics (1.54/2.34, 1.80/2.16, and 1.62 for infit, respectively) (Table 2-10). However, after several iterations of Rasch analysis, the decision was made to reinstate the item running one block to the ten candidate item list for short form since th ere was no item available near the high extreme of the ability continuum. The ten newly developed items for walking/moving construct fit the Rasch model. All items showed exceptional infit/outfit statistics except one item (running one block) with high infit/outfit values (2.20/3.97) (Table 3-9). These items exhibited moderate to high point measure correlations ranging from 0.46 to 0.78 with the range extending from 0.44 to 0.79 prior to item reduction. The ten items of the short form had sl ightly less spread in person ability (-2.88 to 4.51 logits) than the entire set of items (-2.59 to 4.86 logits). Item calibrations for the ten items remained similar after the deletion of ten items However, person separation was decreased from 3.44 to 2.42 (SR decreased from 4.92 to 3.56). That is, the short form for the walking/moving construct separated individuals with chronic back pain into three groups, while the entire set of items separated the individuals into nearly five groups. Person reliability (analogous to Cronbachs ) for the short forms was .85, decreasing fr om 0.92 for the entire set of 20 items.

PAGE 80

80 A confirmatory factor analysis was conduc ted on 10-item short form to test for unidimensionality. The one factor model was found to be inadequate (Table 3-2). An exploratory factor analysis to further inve stigate factor structure suggested that a two factor solution was more appropriate. We retained two factors based on the Kiser criterion of eigenvalue greater than one. These factors accounted for 67% of total vari ance (the first factor accounting for 53% and the second factor accounting for 14%). Table 3-5 pres ents factor loadings for the ten short form items (factor loadings greater than 0.46 are in bold). Items loaded onto factors that contained items related to difficulty and also type of activity (e.g., walking/ stepping and running/climbing). Figure 3-7 presents the test information f unction for the Walking/moving construct with entire set of 20 items versus the 10-item short form. Following item reduction, the peak of the test information function slightly moved to th e right. The figure shows that the amount of information with entire set peaked (12.05) at a person ability ranging from -0.49 to -0.42 logits and decreased rapidly as either the ability es timate increased or decreased. With the 10-item short form, the amount of information peaked (5.75) at a person ability near logits. That is, the entire set of items provided the most precis e measure of person ability at a slightly lower level than the center of the ability range, while short form provided the mo st precise measure of person ability near the center of the ability range. In addition, the figure shows to what extent item reduction results in a loss of test information (i.e., precision) at a particular pers on ability. The removal of ten items from the entire set resulted in some loss of measurem ent precision, which decreased from 12.05 to 5.75. In addition, this loss of precision peaked at -0.49 and -0.42 logits and decreased rapidly as either the ability estimate increased or decreased. Thus the entire set of 20 items for walking/moving construct estimated person ability with greater precision than th e 10-item short form near the

PAGE 81

81 center of the ability range. Similarly, 10-item short form estimated person ability with less precision than did the entire set of 20 items near the center of the ability range. Discussion Summary of Results The purpose of this study was to create three1 0-item short forms for three constructs on the ICFAM and to investigate the item-level psychometric properties of the short forms, as well as, unidimensionality and test information function of the three constructs. To create short forms of the ICFAM, an item level psychometric investig ation was conducted focusing on infit/oufit mean square (MnSq), person separation, item-person ma p, and hierarchical order of item difficulty. While item level psychometric findings support the soundness of the short forms and advocate their future use, factor analyses failed to s upport the proposed unidimensional constructs or the original 3-factor structure of th e ICFAM. Test information functions showed that the entire set of items on the ICFAM constructs es timated person ability with great er precision than did the short forms near the center of the ability range, while the precisions of both the entire set of items, and short form items rapidly decreased as the ability estimate increased or decreased. Item Level Psychometrics This study dem onstrated how IRT methodologie s could be used to achieve measurement efficiency, reducing items while maintaining ad equate precision. Attempts to creating short forms for use with individuals with back pain have previously focu sed on CTT methodologies such as internal consistency and test-retest reliability ( 6, 10, 23, 26, 27, 67, 68, 70-72, 79). In this study, we used an IRT approach using Rasch anal ysis to provide item level information about three constructs on the ICFAM. The newly developed short forms showed adequate psychometric properties as determined by the infit/outfit statistics, item difficulty calibrations, item-person map, and person separation. All items of each short fo rm fit to the Rasch model

PAGE 82

82 except one item (i.e., carrying toddler on back ) for lifting/carrying a nd one item (i.e., running one block) for walking/moving construct. Since th ese items were the most challenging items in those constructs, these two items were included in order to fill the potential gaps on the high extreme of person ability. Problematic items were identified with high/lo w fit statistics, which indicated that the items were measuring a different construct or the item needed further clarification. That is, individuals with low disability (i.e., high ability ) may have tendency to provide low ratings or individuals with high disability (i.e., low ability) may have te ndency to provide unexpected high ratings on these items. These response patterns might have been the result of a lack of observations for these items. Rasch analysis also aided item selection by identifying items that best capture the range of persons to be esti mated and identified gaps where item difficulty calibrations did not match person-ability measures These gaps provide direction in selecting items along with item statistics. Thus, in determin ing whether or not items are equally distributed across the full ranges of ability, items are select ed based on the person location on the map (i.e., in order to assure that items match person abilities). That is, we placed items at or near the middle of the scale where averag e individuals aggregat e even though candidate items distributed toward both extremes. For example, in the init ial modification phase, f our items (lying down on stomach 2-4 hours, carrying toddler on back, jogg ing one mile, and running one block) from the three constructs were identifie d due to high fit statistics. By inspecting the item-person map (Figure 3-2 and 3-3) revealed that these items we re needed to reduce possible ceiling effects as no other items remained on the short forms that we re as difficult as these items. Of these four items, two items (carrying toddler on back and ru nning one block) were later reinstated to the short form because of a lack of difficult items to match individuals at the extremes of the scale.

PAGE 83

83 It should be noted that we tr eated a response category have not done as the lowest rating based on the rationale that the mo st likely explanation for an activity not occurring was that the item could not be performed ( 123). Thus, we determined that treating the category have not done as the lowest ratin g would have been m ore appropriate. In fact, nearly half (51%) of individuals with above average person ability (i .e., high ability) scored the lowest rating on the item carrying toddler on back, while more than half (60%) of individua ls with above average person ability scored the lowest rating on the item running one bl ock. One plausible explanation for this observation is that these respondents might ha ve responded to the absence of opportunity on these items (i.e., you can do the activ ity but have not done so for any reason in the last 30 days). In addition, other respondent s might have responded to other instructions indicating the lowest score (i.e., if you are unable to do the activ ity or requires the help or assistance of another person). Unidimensionality of the Short Forms The dim ensionality of the three 10-item shor t forms was investigated by confirmatory factor analysis (CFA) and exploratory factor an alysis (EFA). The results of the CFA and the EFA were conclusive as to whether or not one factor model for each short form was plausible. Although a one factor accounted for a sma ll percentage of the variance for the positioning/transfer short form (> 36%), one factor for the other two short forms accounted for a moderate percentage of the variance (> 48% fo r lifting/carrying and > 53% for walking/moving). Based on the Kaiser rule (eigenvalues greater than one considered to be factors), we retained the three factors accounting for most of the variance (> 60%) for the positioni ng/transfer short form, two factors for the lifting/carrying short form 64%) and two factors for the walking/moving short form (> 67%). These findings may implicate that the theoretically generated construct of the ICFAM instrument may have more than on e dimension. For the positioning/transfer short

PAGE 84

84 form, items failed to show any logical relationship of factors, while an interesting finding noted in the EFA of the lifting/carrying and the walkin g/moving. That is, items appear to group items by the hierarchical order of item difficulty. Fo r the lifting/carrying shor t form, items with high item calibrations (i.e., difficult items) had a te ndency to load on factor 1, while items with moderate/low item calibrations (i.e., moderate/e asy items) had a tendency to load on factor 2 (Table 3-4). Similarly, for the walking/movi ng short form, items with moderate/low item calibrations had a tendency to load on factor 1, while items with moderate/low item calibrations had a tendency to load on factor 2 (Table 35). The hierarchical factor structure of the walking/moving short form replicates similar find ings of a factor analysis study of the motor scale in the Functional I ndependence Measure (FIM) ( 126, 127). That is, the study grouped motor scale item s by relative energy requirement including low energy subscale such as grooming or dressing and high energy subscale such as locomotion or stair climbing. The findings of our study may indicate that dividing each construct into more than one subscale would be preferred. Person Separation and Person Reliability The separation ratio (SR) for the short form s was good separating the samples nearly 3 to 4 statistically meaningful strata. Relative to th e full item banks, the SR value of all three short forms considerably decreased, while the most dramatic decrease was for positioning/transfer short form. These were unavoidable because such large number of item was removed from the entire set of items. Nearly 82% of items fo r the positioning/transfer, 63% of items for the lifting/carrying, and 50% of items for the walking/moving were removed. In addition, person reliability (analogous to Cronbachs ) decreased considerably for the positioning/transfer short form, while it slightly decreased for the liftin g/carrying and the walking/moving short form. Perhaps the reason for this is that the rem oval of redundant items on the lifting/carrying and

PAGE 85

85 walking/moving allowed item removal without loss of internal consistency. Despite the reduction of person reliability, the values were still in acceptable ranges ( 128). Constructing fixed short for ms is a conve ntional approach to achieving measurement efficiency with fewer items. This reduces the respondent and test administrator burden. Despite the loss of precision, fixed short form has been shown to be valid and practical for use in outcome measurement ( 34, 44-46). It is inevitable to sacrifice som e precision in the creation of a short form. In terms of measurement precision an d breadth, several studies have indicated that there is a tradeoff or a compromise between m easurement precision and breadth in short form creation ( 35, 44-46, 129). In this study, by using Rasch anal ysis (one-param eter IRT model), we were successful in developing short forms that provided an optimal range (i.e., measurement breadth) despite loss of precision. That is, we reduced many items to crea te three 10-item short forms, yet captured person ability ac ross the full range of the sample. As a measure of precision, the SR is a valu able indicator of whet her reducing the number of items substantially lowers or maintains the pr ecision with which a short form is measuring the ability of sample ( 119). Separation is defined as the ratio of standard deviation of the sam ple to standard error of measurement (i.e., the root mean square error), while the Cronbachs is the estimated average correlation of a test with all possible tests of the same length obtained by domain sampling. Despite its similarity of the ratio represented by Cronbachs there is a slight difference. For the separation, the numerator re flects a property of the sample only and the denominator reflects a property of the test on ly. Thus, the ratio describes the relationship between the amounts of vari ability captured in the sample to the precision. While the SR depends on the particular sample being measured, the re lationship between the sample and the test is apparent. In our study, the back pain sample is near ly three to four times more variable than our

PAGE 86

86 short forms ability to detect the samples va riability. This indicates that when measuring individuals with little variation on the trait of interest, the test w ill need little error to discriminate the differences that exist among these individuals ( 119). Test Information Function The statistical m eaning of information is defi ned as the reciprocal of the precision with which a parameter could be estimated ( 130). Thus, when we estimate person ability with precision, we would know m ore about the values of the person ability than if we estimated it with less precision. The precision with which pers on ability is estimated is measured by the variability of the estimates around the value of person ability. Th erefore, a measure of precision is the variance of the estimators (i.e., 2) and the amount of information at a given ability level is the reciprocal of this variance. That is, if the am ount of information is large, person ability at a particular level can be estimated with precision. Similarly, if the amount of information is small, person ability at a particular level cannot be estimated with precision. In this study, the test inform ation function (TIF) showed a c onsiderable loss of information as the number of items was reduced. As items were eliminated to cr eate the short forms, information decreased in the following manne r: information decrea sed about 83% for the positioning/transfers construct, about 60% for th e lifting/carrying construct, and about 52% for the walking/moving construct. These decr eases in information reveal that the positioning/transfers construct sacrificed more information (83%) than the lifting/carrying and walking/moving constructs (60% and 52%). This makes intuitive sense since many more items were removed from the original positioning/transfers item bank than from the lifting/carrying and walking/moving item banks (46 of 56 items). Th e peak of the TIF for the positioning/transfer short form slightly moved to the left side of the center, while the peak of the TIF for both the lifting/carrying and walking/moving short forms sli ghtly moved to the righ t side of the center.

PAGE 87

87 This may suggest that we should have selected items with lower item calibrations (i.e., easier items), when we deleted items with similar item calibrations in creating the positioning/transfers short form. In fact, the total number of individuals in the ceiling increased from six to eight for positioning/transfer constructs following item reduc tion. By contrast, for the other two constructs we should have selected items with higher item ca librations (i.e., more difficult items). However, the total number of individuals in the ceiling did not differ before and after item reduction for these two constructs. Limitations and Future Implications There were several lim itations in this st udy. Problematic items with high infit/outfit statistics (i.e., carrying toddler on back on the lif ting/carrying and running one block on the walking/moving) were reinstated in short forms to avoid ceiling effects. This may be a limitation of our short forms despite their adequate breadth. The item level ps ychometrics indicate that the newly created short forms could be improved in future research addressing: 1) replacing problematic items, 2) developing items that more ad equately fill the gaps in the person ability to cover the wider range of ability. In addition, the re sults of the present stud y suggest that the short forms were multidimensional. These findings may prompt the use of multidimensional models with adequate sample sizes to bette r explain physical activity domains. In order to achieve psychometr ic efficiency, this study show ed how Rasch analysis could be used to reduce the number of items in an in strument while maintaining adequate psychometric properties. The item level psychometrics (e.g., fit statistics, item difficulty calibrations) as well as other qualifiers (e.g., Cronbachs person separation) were used to reduce items. Despite the use of an item response theory methodology, it is appa rent that relative to the entire item banks, the short forms showed decremen ts in measurement precision ( 28, 35). One way to avoid this decrem ent in measurement precis ion would be to combine the IR T and computer adaptive testing

PAGE 88

88 methodology. By selectively presenting items that are matched to the ability levels of respondents, these methodologies may accomplish both measurement efficiency and precision (28, 34, 47).

PAGE 89

89 Table 3-1. Demographic informa tion of research participants Characteristics Individual s with back pain n=101 Age < 20 5 (5.0) 21 30 12 (11.9) 31 40 15 (14.9) 41 50 24 (23.8) 51 65 19 (18.8) > 65 20 (19.8) Missing 6 (5.9) Mean SD 48.14 17.3 Gender Female 65 (64.4) Male 31 (30.7) Missing 5 (5.0) Education Elementary 0 (0.0) Middle/Junior High 3 (3.0) High School 34 (33.7) Technical 8 (7.9) College 33 (32.7) Graduate 23 (22.8) Race/Ethnic African American 19 (18.8) Hispanic American 7 (6.9) American Indian 1 (1.0) White, not Hispanic origin 68 (67.3) Asian/Pacific Islander 1 (1.0) Other 3 (3.0) Missing 2 (2.0) Years that has had back pain Less than a year 7 (6.9) 1 through < 4 years 20 (19.8) More than 4 years 59 (58.4) Missing 15 (14.9)

PAGE 90

90 Table 3-2. Results of confirmatory factor analysis for short forms of the ICFAM Indices Positioning/transfer Lifting/carrying Walking/moving Criterion 1-Factor model 1-F actor model 1-Factor model Chi-square 1511.670 1380.940 1380.940 df 31 39 39 P-Value (> 0.05) 0.000 0.000 0.000 CFI (1.0) 0.000 0.016 0.026 TLI (1.0) 0.003 0.016 0.026 RMSEA (< 0.06) 0.689 0.579 0.576 WRMR (< 0.1) 6.594 5.728 5.700

PAGE 91

91 Table 3-3. Factor structure of short form for positioning/transfer construct Items (difficulty order) F1 F2 F3 staying in a kneeling position on both knees for 10-20 minutes (while making only minor adjustments)? 0.0900.6770.342 staying in a lying position on your favorite side for 5-8 hours (while making only minor adjustments)? 0.1750.0180.858 staying in a standing position for 1-2 hours (while making onl y minor adjustments)? 0.1470.3780.693 moving yourself out of a bathtub after taking a bath? 0.2110.5600.134 changing position from standing to kneeling? 0.1440.7610.123 bending at the waist while standing for 1-5 minutes (for example, reaching for something in the trunk of a car)? 0.4880.5630.030 staying in a lying position on your back for 1 hour (while making only minor adjustments)? 0.3190.4420.177 changing position from lying on your back to sitting (for example, lying in your bed to sitting on the edge of your bed)? 0.8110.0560.289 shifting your weight while lying in bed? 0.7690.2960.009 changing position from standing to sitting in a chair? 0.8050.1200.150 Percent of total va riance accounted for by factors 36% 13% 11%

PAGE 92

92 Table 3-4. Factor structure of shor t form for lifting/carrying construct Items (difficulty order) F1 F2 carrying a toddler on your back (for example, piggyback)? 0.636-0.151 lifting 25 pounds (for example, large bag of dog food or cat litter) from shoulder heig ht to above your head with your hand(s) and arm(s)? 0.8590.223 lifting 25 pounds (for example, large bag of dog food or cat litter) from floor to waist height with your hand(s) and arm(s)? 0.8060.289 carrying 25 pounds (for example, a larg e bag of dog food or cat litter) in your hand(s) and arm(s) 25 feet? 0.8140.320 lifting 10 pounds (for example, bag of groceries or 12-pack of soft dri nks) from waist height to shoulder height with your hand(s) and arm(s)? 0.6960.488 lifting 5 pounds (for example, bag of sugar or large telephone book) fr om shoulder height to above your head with your hand(s)? 0.4290.631 pulling wet laundry out of a washing machine? 0.3230.647 pulling open a heavy door (for example, department/convenience store door)? 0.2250.727 lifting 1 pound (for example, a can of soup) from waist height to shou lder height with your hand(s)? 0.1280.747 pulling open a full-size refrigerator door? -0.0920.790 Percent of total va riance accounted for by factors 48% 16%

PAGE 93

93 Table 3-5. Factor structure of shor t form for walking/moving construct Items (difficulty order) F1 F2 running one block? -0.1020.806 climbing up or down a 6-foot ladder? 0.2880.793 climbing up or down a 3-step stool? 0.4460.719 walking 4-8 blocks (about 1/ 2 mile) without stopping? 0.7360.429 climbing down one flight of stairs? 0.6450.399 walking 2-4 blocks (about 1/ 4 mile) without stopping? 0.8270.296 walking in a crowded place (for example, outdoor marketplace, shopping mall)? 0.7970.281 stepping up or down a standard curb? 0.7230.250 walking within your home/living environment? 0.8660.000 walking on carpeting? 0.714-0.068 Percent of total va riance accounted for by factors 53% 14%

PAGE 94

94 PERSONS MAP OF ITEMS PERSONS MAP OF ITEMS | | 4 X + 4 X + | | | | | | | | X | | | | 3 + 3 + | X | XX | X | | | XX | | T| | |T | 2 XX + 2 T+ | kneeling 10-20 minutes X | XX | XX |T XXX | XXXX | kneeling 10-20 minutes | X | XXXXX S| lying back 5-8 hours XX | X |S XXX | lying back 5-8 hours, 1 XXXXXX + 1 XXXX S+ XXX | XXX |S XX | standing 1-2 hours XXXXXXX | , stand 1-2 hrs | XX | , XXXXXXXX | moving out bathtub taking a bath XXXXXXX | move out bathtub bath XXXXXX | XXXXX | , XXXXXXXX M| change position stand to kneel XXXXXX M| change position stand to kneel 0 XXXXXXXXX +M 0 XXXXXX +M | bending waist 1-5 minutes XXXXXXX | bending waist 1-5 minutes XXXXXXX | XXXXXXX | XXXXXXX | lying back 1 hour XXXXXXXXX | lying back 1 hour XXXX | XXXXX | XXXX | change position lying back to sit XXXX S| change position lying back to sit S| XX |S , -1 XXXXXXX + -1 XXXX + XXXXX |S XX | shift lying in bed X | shift lying in bed X | | X | XX | | change position standing to sit chair | X T|T | change position stand to sit chair X | -2 X T+ -2 X + |T | XX | | | | | | | | | | -3 + -3 + | | Figure 3-1. Item-person map of positioning/transfer construct of the ICFAM following 10 items removal and prior to 10 item rem oval.

PAGE 95

95 PERSONS MAP OF ITEMS PERSONS MAP OF ITEMS | | 6 X + 6 X + | | | | X | | X | | 5 + 5 + | | | | X | | |T X | 4 T+ 4 + XX | | | carrying toddler on back | XXX | XX | | | 3 + 3 T+T XXXX | X | carrying toddler on back, XXXX | | | XXX | XXXX S|S lifting 25 pounds shoulder to above head XX | 2 + 2 X + XXXX | XX | lifting 25 pounds shoulder to above head XXXXX | lifting 25 pounds floor to waist XXXXXXXX S|S XXX | XXXXX | | carrying 25 pounds 25 feet XXXXXX | lifting 25 pounds floor to waist 1 XXXXX + 1 XXX + carrying 25 pounds 25 feet XXXXXX | XXXX | XXXX M| XXXXXXXX | X | lifting 10 pounds waist to shoulder XXXXX M| lifting 10 pounds waist to shoulder XXXX | XXXX | 0 XXXXXXXX +M 0 XXXXXXXXXXXXX +M XXXX | XXX | lifting 5 pounds shoulder above head XXXXXX | lifting 5 pounds shoulder above head XXXX | | XXX | XXXXXXXXX | XXXXXXX S| pulling wet laundry out washing machine -1 XXXXXXX + pulling wet laundry out washing machine -1 XXXXXX + pulling open a heavy door XX S| XXX | | pulling open a heavy door X | XXXXX | |S XX | | lifting 1 pound waist to shoulder, -2 + -2 XX + X |S T| | lifting 1 pound waist to shoulder X | X | X | | X | -3 X T+ -3 +T pulling open refrigerator door | | XX | | | | | pulling open refrigerator door | -4 + -4 + | | Figure 3-2. Item-person map of lifting/carry ing construct of the ICFAM following 10 items removal and prior to 10 item removal ..

PAGE 96

96 PERSONS MAP OF ITEMS PERSONS MAP OF ITEMS | | 5 XXXXXX + 5 XX + | XX | | | X | | | | | | 4 + 4 + | | X T| XX | | T| | | |T running one block | 3 XXXXXXXX + 3 XX + | XX |T XXXX | | X | XX | running one block XX | XX | S| XXXX | 2 XXXXXXX + 2 XX S+ | X | XXXX | climbing up or down a 6-foot ladder XXX | |S XXXX | | XX |S climbing up or down a 6-foot ladder XXXX | XXXXX | 1 + 1 X + XXXX | climbing up or down a 3-step stool XXXX | XXXXX | XXX | climbing up or down a 3-step stool, XXXXXXX M| XXXXXX M| XXXXXX | walking 4-8 blocks XXX | | XXXXXXXXXXXX | walking 4-8 blocks 0 XXXXXX +M climbing down one flight of stairs 0 XXX +M climbing down one flight of stairs XX | XX | XXXXXXX | walking 2-4 blocks XXX | walking 2-4 blocks XXX | XXXXXX | XXX | walking crowded place XX | walking crowded place, XXX | X | -1 XXXX S+ -1 XXXXXXX S+ | stepping up or down a standard curb XXXX | stepping up or down a standard curb XXX | X |S XX |S | walking within home/living environment XXX | walking within home/living environment XXXX | | | -2 XX + -2 X + | | | | walking on carpeting X T| walking on carpeting T| | XX | X | |T -3 + -3 + | | Figure 3-3. Item-person map of walking/moving construct of the ICFAM following 10 items removal and prior to 10 item removal.

PAGE 97

97 Table 3-6. Short form of the ICFAM MAINTAINING A BODY POSITION In the last 30 days, how much diffi culty have you had staying in the following positions (while making only minor adjustments): No Difficulty Some Difficulty A Lot of Difficulty Have Not Done staying in a lying positio n on your back for 1 hour? staying in a lying position on your back for 5-8 hours? standing position for 1 2 hours? kneeling on both knees for 10 20 minutes? bending at the waist while standing for 1-5 minutes (for example, reaching for something in the trunk of a car)? shifting your weight while lying in your bed? changing position from lying on your back to sitting (for example, lying in your bed to sitting on the edge of your bed)? changing position from standi ng to sitting in a chair? changing position from standing to kneeling? moving yourself out of a ba thtub after taking a bath? LIFTING AND C ARRYING OBJECTS In the last 30 days, how much difficulty have you had: No Difficulty Some Difficulty A Lot of Difficulty Have Not Done lifting 1 pound (for example, a can of soup) from waist height to shoulder height? lifting 5 pounds (for example, bag of sugar or large telephone book) from shoulder height to above your head? lifting 10 pounds (for example, bag of groceries or 12-pack of soft drinks) from waist height to shoulder height? ICF ACTIVITY MEASURE SHORT FORM This survey consists of 3 sections of 10 questions you might alr eady been asked to answer or w ill be asked to answer in our computer adaptive testing. Each question will ask you how difficult it has been for you to perform a give n activity w ithin the last 30 days. Please choose the answer that best fits your situation. If you have not performed the activity in question, then check th e 'Have Not Done' answer. Thank you very much for participating in this study.

PAGE 98

98 Table 3-6. Continued In the last 30 days, how much difficulty have you had: No Difficulty Some Difficulty A Lot of Difficulty Have Not Done lifting 25 pounds (for example, large bag of dog food or cat litter) from floor to waist height? lifting 25 pounds (for example, large bag of dog food or cat litter) from shoulder heig ht to above your head? carrying 25 pounds (for example, large bag of dog food or cat litter) 25 feet (for example, from car to front door)? carrying a toddler on your back (for example, piggyback)? pulling open a full-size refrigerator door? pulling open a heavy door (for example, department/convenience store door)? pulling wet laundry out of a washing machine? WALKING AND MOVING In the last 30 days, how much difficulty have you had: No Difficulty Some Difficulty A Lot of Difficulty Have Not Done walking within your home/living environment? walking 2-4 blocks (about 1/4 mile) without stopping? walking 4-8 blocks (about 1/2 mile) without stopping? walking on carpeting? walking in a crowded place (for example, outdoor marketplace, shopping mall)? climbing down one flight of stairs? climbing up or down a 3-step stool? climbing up or down a 6-foot ladder? stepping up or down a standard curb? running one block?

PAGE 99

99 Table 3-7. Fit statistics for positioning/carrying Table 3-8. Fit statistics for lifting/carrying construct Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation kneeling 10-20 minutes 1.840.161.22 1.41.110.60.58 lying back 5-8 hours 1.320.140.93 -0.50.94-0.30.64 standing 1-2 hours 0.790.130.98 -0.11.100.70.61 moving out bathtub taking a bath 0.440.131.46 3.21.412.70.61 change position standing to kneeling 0.160.131.16 1.21.131.00.67 bending waist 1-5 minutes -0.170.130.80 -1.50.83-1.20.67 lying back 1 hour -0.430.141.22 1.61.271.80.52 change position lying back to sitting -0.760.140.61 -3.20.65-2.60.58 shift lying in bed -1.300.150.60 -3.00.63-2.50.63 change position standing to sit ting chair -1.880.180.85 -0.90.82-0.90.57 Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation carrying toddler on back 3.580.211.90 3.72.172.20.49 lifting 25 pounds shoulder to above head 2.250.170.80 -1.30.75-1.10.79 lifting 25 pounds floor to waist 1.510.160.77 -1.70.77-1.30.80 carrying 25 pounds 25 feet 1.200.150.78 -1.60.71-1.80.82 lifting 10 pounds waist to shoulder 0.500.150.69 -2.50.65-2.50.83 lifting 5 pounds shoulder above head -0.330.151.15 1.00.9900.72 pulling wet laundry out washing mach -1.020.161.07 0.51.050.30.65 pulling open a heavy door -1.450.170.96 -0.21.130.50.61 lifting 1 pound waist to shoulder -2.450.201.06 0.40.74-0.50.57 pulling open refrigerator door -3.780.291.05 0.30.62-0.20.42

PAGE 100

100 Table 3-9. Fit statistics for walking/moving construct Items Measure (Logits)ErrorInfit MnSq ZSTDOutfit MnSqZSTDCorrelation running one block 3.160.202.20 4.43.974.10.56 climbing up or down a 6-foot ladder 1.700.151.45 2.51.190.80.75 climbing up or down a 3-step stool 0.880.141.09 0.70.91-0.40.77 walking 4-8 blocks 0.400.140.73 -2.00.73-1.70.78 climbing down one flight of stairs 0.050.140.99 01.241.30.69 walking 2-4 blocks -0.280.150.64 -2.70.58-2.60.76 walking crowded place -0.660.150.67 -2.30.58-2.20.72 stepping up or down a standard cu rb -1.150.170.83 -10.67-1.40.65 walking within home/living environment -1.590.180.70 -1.80.59-1.40.62 walking on carpeting -2.520.230.90 -0.40.72-0.50.46

PAGE 101

101 PERSONS MAP OF ITEMS PERSONS MAP OF ITEMS PERSONS MAP OF ITEMS (Positioning/Transfer) (Lifting/Carrying) (Walking/Moving) | | | 4 X + 6 X + 5 XXXXXX + | | | | | | | X | X | | X | | X | 5 + | | | 4 + 3 + | | | X | X T| XX | |T | | 4 T+ | XX | XX | |T running one block (7.0) T| | carry toddler on back 3 XXXXXXXX + |T XXX | | 2 XX + | XXXX | | kneeling 10-20 mins 3 + X | XX | XXXX | XX | XXX | XXXX | S| | | 2 XXXXXXX + XXXXX S| lying back 5-8 hrs (1.0) XXXX S|S lift 25 lb shoulder to head (3.0) | X |S 2 + XXXX | climb up or down a 6-foot ladder 1 XXXXXX + XXXX | |S (3.0.0) XXX | XXXXX | lift 25 lb floor to waist (3.0) | XX | standing 1-2 hours (1.5) XXX | XXXX | | | carry 25 lb 25 feet (3.0) 1 + XXXXXXXX | moving out bathtub (2.5) 1 XXXXX + XXXX | climb up or down a 3-step stool XXXXXX | XXXXXX | XXXXX | (3.0-6.0) XXXXXXXX M| change pos stand to kneel XXXX M| XXXXXXX M| 0 XXXXXXXXX +M X | lift 10 lb waist to shoulder XXXXXX | walking 4-8 blocks (3.0-6.0) | bending waist 1-5 mins XXXX | | XXXXXXX | 0 XXXXXXXX +M 0 XXXXXX +M climb down one flight of stairs XXXXXXX | lying back 1 hour XXXX | XX | (3.0-6.0) XXXX | XXXXXX | lift 5 lb shoulder above head XXXXXXX | walking 2-4 blocks XXXX | change lying back to sit | XXX | (3.0-6.0) S| XXXXXXXXX | XXX | walking crowded place -1 XXXXXXX + 1 XXXXXXX + pull wet laundry out washing (2.0) XXX | (<3.0) XXXXX |S XX S| -1 XXXX S+ X | shift lying in bed | pull open a heavy door | step up or down a standard curb | XXXXX | XXX | (2.0) XX | XX | XX |S | 2 + XXX | walking within home environment | change stand to sit chair (2.0) X |S | (2.0) -2 X T+ | lift 1 lb waist to shoulder -2 XX + |T X | | XX | | | | 3 X T+ X T| walking on carpeting | | | (2.0) | XX | X | | | -3 + -3 + | pull open refrigerator door 4 + Figure 3-4. Item-person map of three shor t forms (positioning/transfe r, lifting/carrying, and walk ing/moving) of the ICFAM following the item removal.

PAGE 102

102 Figure 3-5. Test information function of short form versus entire set of items for positioning/transfer. A dotted line shows short form and a solid line shows entire set of item. Figure 3-6. Test information function of short fo rm versus entire set of items for lifting/carrying. A dotted line shows short form and a solid line shows entire set of item. 0 5 10 15 20 25 30 35 40 -8 -6-4 -2 0 +2 +4 +6 +8 0 2 4 6 8 10 12 14 -8 -6-4 -2 0 +2 +4 +6 +8

PAGE 103

103 Figure 3-7. Test information func tion short form versus entire set of items for walking/moving. A dotted line shows short form and a solid line shows entire set of item. 0 2 4 6 8 10 12 14 -8 -6-4 -2 0 +2 +4 +6 +8

PAGE 104

104 CHAPTER 4 COMPARISONS OF THE RELATIVE PRECISION OF THREE DIFFERENT TYPE BACK PAIN MEASURES: THE ICF ACTI VITY MEASURE (ICFAM) COMPUTER ADAPTIVE TEST, ICFAM SHORT FORMS, AND OSWE STRY BACK PAIN DISABILITY QUATIONNAIRE Introduction Many self-report m easures have been devel oped specifically for the back pain population due to their several advantages. These advant ages include decreasing administration costs, reducing respondent burden, and potentia lly accessing scattered sample (131). Many studies suggest that self-report di sability m easures for back pain are as reliable as performance measures ( 23-25, 32, 40, 70) and appear to be sensitive indicators of long-term outcome ( 7). In general, these self-report disability m easures are commonly classified into generic and condition specific measures ( 28, 35). Two generic measures, the Si ckness Im pact Profile (SIP) ( 62, 66) and the Physical Function scale (PF-10) ( 62, 66) are the most commonly used assessments with individuals reporting back pain. The most extensively utilized condition-specific m easures for back pain include the Oswestry Back Pain Disability Questio nnaire (ODQ), the Roland-Morris Disability Questionnaire (RMDQ), and the Qu ebec Back Pain Disability Scale (QBDS) (23, 25, 29, 30, 74, 77, 79, 80, 132). To date, nearly 82 condition sp ecific disability measures for back pain have been developed and have been shown to have adequate psychom etrics. Of these widely accepted disability instruments, the ODQ is regarded as one of the most reliable back pain instruments ( 10, 23-27). Apparent advantages of the ODQ ove r other disability instruments include: 1) strong relevance between the condition of back pain a nd the isolated objective physical measurement (e.g., range of motion of back), 2) high responsiveness to functional change due to its rating scale with six response categories, 3) ease of administra tion, and 4) low impact on normal clinic operations ( 3, 13, 23, 24, 29, 30, 73, 74). Many studies have shown that the ODQ and revised

PAGE 105

105 versions of it have adequate psychometric properties, such as reliability, validity, and responsiveness ( 3, 13, 23, 24, 29, 30, 73, 74). However, studies have shown that the ODQ may lack sens itivity to discriminate between individuals at the high extreme of ability range (i.e., ceiling effects) ( 29, 30), only occasionally being responsive to individu als with severe back pain (31, 32). Several studies also indicate that the O DQ is more sensitive for patients who have improved but less sensitive for patients whose condition remained unchanged ( 23, 79). Thus, despite its adequate psychom etrics, the ODQ may not precisely measure the disability of back pain across the full range of ability. Deficits in precision may be the result of usi ng items that do not closely match the ability of the sample of interest ( 35). That is, when easy items are admi nistered to individuals with high ability (i.e., low disability) and/or d ifficult items are administered to individuals with low ability (i.e., high disability) there is a lack of measurement precision with a resulting inability to discriminate among individuals ( 29, 30). Problems with measurement precision often occur convention al instruments with fixed number of ite m, because it is unrealistic for one instrument to include enough items to precisely measure indi viduals across a wide range of ability. Even instruments with excellent br eadth may still have inadequa te depth of measurement ( 33). Additionally problem atic is the fact that long assessments (i.e., those covering a wider range of ability level) contain items that appear unnece ssary and induce a concern over respondent burden and administration costs (36). These leg itimate concerns prompted the creat ion of static short forms from full length instruments ( 28, 35). By reducing the number of items on the full instrument, short form s could achieve measurement efficiency while addressi ng concerns related to burden and cost (28, 35). Developers of static short form s have attem pted to select items that spread across the ability

PAGE 106

106 ranges, however, with large reduc tion in the number of items, loss of precision remains an issue (15, 36, 44-46). Creating the ideal m easure consisting of enough item s to cover the full range of the trait with adequate prec ision is challeng ing when using short form s. Despite the popularity and widespread use of short forms develope d using Classical Test Theory (CTT), these instruments have a number of limitations ( 37). Item Response Theory (IRT)-based short forms can alleviate the lim itations by focusing on item level psychometric properties. In contrast to CTT, Item Response Theory (IRT) focuses on the psychometric properties of the items making up the instrument inst ead of the instrument as a whole ( 40, 41). By estimating the probability that a respondent will select a pa rticular rating for an item item difficulty and person ability (or disability) can be placed on the same linear continuum. Thus, IRT model allows connecting individuals responses to items with their ability level ( 40, 42). Estimates of person ability (i.e., disability) on an underlyi ng construct obtained us ing IRT methods are invariant regardless of the items used (i.e., test free measurement), whereas under the CTT paradigm, person scores vary depending on the difficulty of the instrument ( 41). Furthermore, item difficulty estimates derived from the IRT anal yses remain the same regardless of the ability of the sample (i.e., sample free measurement), wh ile test statistics in CTT are dependent on the sample taking the test. In addition, the IRT mode ls linearly transform raw scores (typically used in analyses based on CTT) into equal interval me asures (34). These advantages of IRT allow for the creation of invariantly calibrated large item banks that can more precisely discriminate individuals ability levels and thus, capture smaller increments of change. While IRT methodologies provide the means fo r generating and linking person ability and item difficulty calibrations, Computer Adaptive Testing (CAT) methods promise a means for administrating items in a way that is both efficient and precise ( 28, 34, 36, 44-48). Studies have

PAGE 107

107 shown that CAT improves test efficiency maintaining adequate precision with fewer items than the full test ( 41, 43, 48, 50, 52, 53, 57-59). CAT measures are highl y co rrelated with other assessments intending to measure the same constr uct and require fewer items (i.e., an average of six items needed to reac h an ability estimate) ( 81-84). The CAT is based on a testing algorithm which defines iterative processes with a set of rules specifying the test questions to be administered to responde nts a) This includes procedures for item selection, ability estimati on, and termination criteria. By selectively administering items that are matched to the ability level of the individuals, measurement efficiency can be accomplished without the loss of precision provided by the full item bank. For example, when measuring the ability of a person with mild back pain, more difficult items would be chosen (i.e., matching the ability of the individual).Similarly, when measuring the ability of a person with more severe back pain, a differe nt set of items would be chosen that match that individuals severely impaired ability (i.e., easier items would be selected).With this technology, a small number of items can be selected from the item bank which are most relevant for a person of a particular ability level ( 34).IRT in combination with CAT has recently become an alternative to conventional fixed-format disability m easurement ( 25, 36). The ICF Activity M easure (ICFAM) has recently been developed to create an efficient and precise measurement system based on the activity dimension of World Health Organizations (WHO) International Classifica tion of Functioning, Disability and Health (ICF). The ICF provided the conceptual framew ork and classification system for generating the items on the ICFAM. Activities involving movement, moving around and daily life ac tivities were the subcategories of the ICF activity dimension cons ulted in the development of items. Items were developed with the intent to represent the entire range of ability on each construct, thus, creating

PAGE 108

108 an equiprecise measurement (i.e., precise measurement across the entire range of the underlying construct). Using Item Response Theory (IRT) and Computer Adaptive Testing (CAT) methods, Velozo and colleagues ( 41) created ICFAM, a web based com puter adap tive survey system. The administrative core of the instrument allows adju sts to be made to various settings making it possible to change, the initial theta value (i.e., difficulty of question first given to respondent) and stopping rule (i.e., guidelines for terminati ng the test). Because questions are targeted to individuals at their abil ity level requiring 5-10 questions per c onstruct are required to reach at a final measure of person ability with acceptable error. In addition, immediate feedback is provided to the respondents/clinicians in the form of a graph and summary statistics. We hypothesized that the CAT measures will discriminate more precisely than the short forms or the ODQ measures. The purpose of this study is to compare the precision of the person measures generated from the ICF activity measur e (ICFAM) computer adaptive test, short forms of the ICFAM, and to the Oswestry Back Pain Disability Questionnaire (ODQ). Method Research Participants Forty-two individuals with b ack pain were recruited from rehabilitation clinics in Gainesville, Florida including the University of Florida and Shands Orthopaedics and Sports Medicine Institute and Shands Rehab Hospital. Forty-two participants without back pain were recruited from multiple public sites in Gainesvi lle. Criteria for participants with back pain included: 1) currently experienci ng back pain, 2) having previously received treatment for back pain, 3) ability to read and understand English, and 4) age between 18 and 100 years. The criteria for non-back pain participants in cluded: 1) currently experiencing no back pain, 2) able to read and understand English, and 3) age between 18 and 100 years. All appropriate clients presenting to the recruiting sites between November 3, 2009 and June 30, 2010 were recruited for the back

PAGE 109

109 pain group. This study was approved by the Institu tional Review Board at the University of Florida (Approved by IRB #17-2009). Instrumentation The Oswestry Low Back Disability Ques tionnaire (ODQ), a conventional back pain disability instrument developed under classical test theory, was one of the instruments used in this study (Table 4-1). The ODQ is among the most popular self-report condition specific instruments assessing how back pain affects patients ability to manage daily life tasks (74). The ODQ and its revised versions provide an index of the perceived disa bility experienced by individuals with back pain. It consists of ten item s including pain intensity, personal care, lifting, walking, sitting, standing, sl eeping, employment/home-making, and traveling. Participants respond on a 5-point ordinal scale (5 = pain does not interfere with activities, 0 = pain so severe that activities cannot be performed). The total score (i.e., sum of all item responses) is converted to a percentage score ranging fr om 0 (no disability) to 100 (most severe disability). Thus, a higher score is indicative of a higher level of disability. The construction of fixed short forms is a conventional approach to achieving measurement efficiency, reducing responde nt burden and administration costs ( 44, 46). Despite the loss of som e precision, short forms have been s hown to be valid and practical for use in order to achieve measurement efficiency ( 34, 44-46). A second measure used consisted of the three newly created short forms of the ICFAM (Appendix 2). These short forms were created using item response theory methodologies, specifically the Rasch one-parameter IRT model. Each short form consists of 10 items which were judged to have adequate psychometrics including fit statistics, person separation ratio, and Cronbachs For each of the questions on the short forms, respondents select one of four choices with a lower score representing a lowe r level of ability; (no difficulty), (some difficulty), (a lo t of difficulty), and (have not done). The

PAGE 110

110 participant was instructed to sel ect have not done, if the activit y did not occur within the last 30 days. In this study, a rating of (i.e., h ave not done) was treated as missing value. In an effort to achieve both psychometric efficiency and precision, the ICF Activity Measure (ICFAM) was developed using Item Response Theory (IRT). The World Health Organizations (WHO) Internati onal Classification of Functioni ng, Disability and Health (ICF) provided the conceptual framewor k and classification system for developing items used in the study. Specifically, the activity dimension of the ICF including activities involving movement, moving around and daily life task s was utilized as a guide in the item development stage( 43). The original ICFAM consists of 6 activity constructs: positioning/ transfers, lifting/carrying, fine hand, walking/clim bing, wheelchair/ scooters, and self care activities. Constructs for use in this study were selected based on the following two crit eria: 1) tasks represented by items within the construct frequently cited as problematic for indi viduals with back pain and 2) tasks within the construct represent a potential activity limitation fo r individuals with back pain. Based on these criteria, three relevant construc ts were chosen for this study: 1) positioning/transfers, 2) lifting/carrying, and 3) walking/moving. For each of the questions on the CAT, respondents are asked to select one of four re sponse categories with a lower scor e representing a lower level of ability; (no difficulty), (some difficulty), (a lot of difficulty), and (have not done). CAT technology was used to administer items of the ICFAM instrument for each construct. Figure 4-1 presents the CAT algorithm used for the ICFAM instrument. First, the CAT begins with an initial pers on ability estimate (Bn) for a particular construct (i.e., positioning/transfer). The initial person ability measure is set at the mean person ability of the sample used in the preliminary paper-and-pencil field test (duri ng ICFAM development phase). The CAT presents an item with a difficulty measure (Di) that is identical or closes to this initial person ability

PAGE 111

111 measure. After the initial item is presented and responded to, a ne w person ability estimate and standard error (SE) is generate d. The stopping rule for the CAT is pre-set based on the standard error associated with a person ability estimate (i.e., SE < 0.40) and the maximum number of items administered (i.e., < 10 items). That is, the test finishes when an individuals ability is estimated with a standard error less than .40 or 10 items have been administered to the individual. Since the stopping rule is unlikely to be reached with the presentation of a single item, a second item is presented to the respondent. Based on th e response, the person ab ility estimate is recalculated. This procedure continues until the SE associated with the person ability estimate is less than the pre-set SE, which defines the stopping rule. Once the stopping rule is satisfied, the respondents final ability measure for that construct is formulated. After completed positioning/transfer construct, the next cons truct (i.e., lifting/carrying and walking/moving construct) is presented until the CAT reaches the final ability measure. Analysis A series of Rasch analyses were perfor med us ing Winsteps software program to calculate person measures for back pain and non-back pain groups ( 103). The Rasch model transforms tota l raw scores into estimate of person abil ity in logits. To maximize the comparability of summative scores from the short forms and th e ODQ instrument, Rasch scores were linearly transformed from the original logit estimates to a 0-100 metric. Pearson product moment correlations we re obtained to compare the measurement properties of CAT (i.e., 10-item stopping rule and standard error less than 0.40), short forms, and ODQ. Scatter plots of ab ility estimates for the CAT versus the short forms and the ODQ measure were used to further examine these relationships. To examine potential differences in precision across the three measures (i.e., CAT, short forms, and ODQ), the method of known-groups validity to test relative precision (RP) in

PAGE 112

112 discriminating back pain and non-back pain groups was used. Methods included under the general linear model were used to test for hypothe sized differences in group mean estimates. The magnitude of the F value from the ANOVA represents a measure of precision. F-statistics associated with chance probabilities p < 0.05 were considered significant. If the RP ratio is equal to 1, both methods of estimating function are equally discriminatory. If the RP >1 the measurement method in the numerator is superior in differentiating function compared to method in denominator. The greater the F value, the gr eater the amount of sy stematic variance a measurement method accounts for and, therefore, the greater its ability to discriminate groups of subjects. Results Sa mple demographic characteri stics and clinical informati on are presented in Table 4-1. The average age was 53 years for the back pain group and 48.years for the non-back pain group. Nearly 60% of participants re ported having back pain more than a year indicating it was a chronic condition. Five percent of the non-back pa in participants report ed having another pain related condition. The stopping rule requiring <.40 SE was achieved for each of the respondents before the maximum number of questions (10) was reached. Pa rticipants in the back pain group answered slightly more questions than those in the nonback pain group. For the back pain group, the average respondent answered 5.62 qu estions in the positioning/transf er construct, 6.37 questions in the lifting/carrying construc t, and 6.25 questions in the walking/moving construct. For nonback pain group, the average respondent answ ered 4.64 questions in the positioning/transfer construct, 5.12 questions in the lifting/carrying construct, and 5.45 questions in the walking/moving construct. The CAT administered more questions for back pain group than nonback pain group.

PAGE 113

113 In order to inspect the linear association between the measures, Pearson product moment correlations were calculated. Table 4-2 and 4-3 pr ovide Pearson correlation coefficients between the CAT measures, short form measures, and the ODQ measures. Overa ll, the CAT measures had moderate to high correlations with the shor t form measures and had moderate correlations with the ODQ measures. The correlations betwee n the CAT and three short form measures among back pain/non-back pain grou ps were moderate to high ( r = 0.805/ r = 0.569 for positioning/transfer, r = 0.808/ r = 0.545 for lifting/carrying, and r = 0.620/ r = 0.647 for walking/moving). In addition, the correlati ons between the CAT measures and the ODQ measures were slightly lower than between th e CAT measures and short form measures. The correlations between back pain/non-b ack pain groups were moderate ( r = 0.605/r = 0.037 for positioning/transfer, r = 0.530/ r = 0.058 for lifting/carrying, and r = 0.594/ r = 0.029 for walking/moving). All correlations between the CAT and the short form were statistically significant at the p < 0.01 level, while all correlations between the CAT and the ODQ measure were not statistically significant. In an auxiliary investigation of the linear relationships be tween CAT and the short form measures, and CAT and the ODQ measures, each pair of measures were plotted against each other (Figure 4-1, 4-2, and 4-3). Scatter plots of the CAT and short form measures clustered slightly more around the center of graph than th at of the CAT and the ODQ. In addition, the ODQ measures were more dispersed in the y-coordinate direction than other measures, while the CAT measures clustered into the center of the graph. As noted in Tabl e 4.4, the CAT had 24-32% less variance than the short fo rms and 22-36% less variance than the ODQ while the short form had similar levels of variance as the ODQ. Scatter plots of a ll relationships showed linear relationships. Of these plots, the scatter plot of the CAT and short form measures for the

PAGE 114

114 positioning/transfer and lifting/carrying construct wa s the closest to a line (i.e., these measures had the highest correlation, r = 0.605 and 0.808). The pattern of scatter plot between the CAT versus the short form meas ures and the ODQ measures wa s relatively consistent. Comparisons of the relative precision (RP) of the two measures to discriminate groups differing in back pain are presented in Ta ble 4-4. As was hypothesized, the CAT measure achieved almost 2 times greater RP than the shor t form for the positioning/transfer construct. This indicates that the CATs ability to discrimi nate between individuals in the back pain and non-back pain groups was twice as effective as the short forms ability. In addition, the CAT for the lifting/carrying construct had 16% greater RP in discriminating the groups than the short form, while the CAT for the walking/moving construct had 38% less RP in discriminating the groups. Comparison between the CAT and the ODQ meas ures had a similar pattern to that of the CAT and the short form measures. That is, the CAT positioning/transfer construct achieved 116% greater RP and the CAT for the lifting/carrying construct had 42% greater RP in discriminating the groups than the ODQ measure. The RP ratio for discriminating the groups did not favor the CAT measure for the walking/moving construc t, showing 16% less pr ecision than the ODQ measure. As we hypothesized, in comparison be tween the short form measures and the ODQ measure, short form measure for all constructs had greater RP (6% for positioning/transfer, 22% for lifting/carrying, and 35% for walki ng/moving construct) than the ODQ. Discussion Summary of Results The ODQ a nd its versions are widely used as outcome measures for disability resulting from back pain. They have been extensively ci ted more than 200 times in the Science Citation Index ( 73). Despite the popularity of the ODQ, numer ous studies reveal s ubstantial concerns regard ing its measurement precision as well as measurement breadth ( 30,77,132). That is, the

PAGE 115

115 ODQ is recommended to use for the assessment of a particular severity group (i.e., high disability) due to its floor effects ( 23). The ODQ also appears to have a gap where items do not closely m atch the ability of the sample of interest ( 80) and lead to deficits in precision. The creation of fixed short form s has been a popular method of achieving measurement efficiency and reducing respondent burden and administration co st. However, increases in efficiency often result in decreases in precision because item reduction leads to inadequate coverage of items relevant for all ability levels We hypothesized that the ICFA M computer adaptive assessment would be superior to short form and conventio nal measures. The purpose of this study was to compare the precision of person measures obtained from the CAT, short forms, and the ODQ, a conventional back pain instrument. Correlations Correlations between person m easures from the CAT and those from the short forms indicate a moderate to high degree of correspond ence, while person measures from the CAT and the ODQ show a moderate degree of corresponden ce. The CAT measures showed an acceptable range of correlations with short form measur es across all three constructs. However, the correlations of CAT measure with the ODQ dropped from r ranging from 0.620 to 0.805 to r ranging from r = 0.530 to r = 0.605. The CAT and the short form for the lifting/carrying construct had the hi ghest correlation ( r = 0.808) compared to other two constructs ( r = 0.805 for the positioning/transfer and r = 0.620 for the walking/moving cons truct). This could be due to the fact that the lifting/carrying construct contai ns items that are most relevant for individuals with back pain. In comparis on to the correlation between the CAT and the ODQ, the greater correlation between the CAT and the short form is consistent with what we expected. It is probably because the short forms were originated from the ICFAM item bank.

PAGE 116

116 Relative Precision Relativ e precision (RP) was used to examine whether there are empirical advantages in measurement precision using the CAT over the short form and the ODQ as a conventional measure. RP is based on the ratio of pair wise F statistics (an index of between-subject variability to within-subject variability) of two different measures. The magnitude of the F statistics from the ANOVA (analysis of variance ) represents a measure of precision. Thus, the RP estimates indicate how much more or less precise a measure is relative to another measure ( 11). In this study, RP com parisons were conducted using known-groups validity (i.e., back pain and non-back pain groups) in discriminating back pain and non-back pain groups. This knowngroup validity addresses the extent to which a measure differs as predicted between groups who should score low and high on an ability tra it. Supportive evidence of know-group validity typically is provided by signifi cant differences in mean scor e across independent samples ( 133). As was hypothesized, the results showed that the CAT m easures achieved greater RP in discriminating back pain and non-back pain groups than did the short form measures. Furthermore, the CAT measures had greater RP in discriminating the groups than did the ODQ measures except for with the walking/moving cons truct. In addition, the short form measures outperformed the ODQ in RP. This may indicate that CAT measures outperform short form measures and short form measures outperform conventional measures such as the ODQ measure in terms of measurement precis ion. On the other hand, for the walking/moving construct, the CAT measure achieved less RP than did the shor t form measure in discriminating the groups. Likewise, the CAT measure also achieved less RP than did the ODQ for this construct. This may indicate that the CAT and the short form measure for the walking/moving construct were not successful to discriminate individuals with back pain. That is, these indi viduals appear to be

PAGE 117

117 reporting with higher rating (e.g., no difficulty) rather than lower rating (e.g., have not done) on the construct. Our results supported the notion that the CA T generally outperformed the short forms (44, 46, 58). Previous researchers have found similar results in term s of measurement precision. In effort to compare the CAT to conventional lu mbar spine functional status (LFS) instruments, Hart and colleagues (2006) found that CAT measures produced as precise as the LFS instrument for back pain disability ( 134). Likewise, Haley and colleague s (2004) compared CAT to the 10item short forms assessing physical/mobility, pers onal care/instrumental, and applied cognition with three activity item pools consisting of 101 items, 62 items, and 59 items, respectively. The results showed that CAT measures were more pr ecise than the 10-item fixed short forms across the three constructs of the Activ ity Measure for Post-Acute Care (AM-PAC). Other than physical activity domain, a six-item short form survey for measuring Headache Impact Test (the HIT-TM) also showed that the short form was as responsiv e as the CAT in headache impact. In general, the results of the present study are consistent with previous st udies in precision comparisons between CAT and short form measures. In addition, we attempted an additional comparison between CAT and the ODQ measure as conventiona l instrument. Excluding the walking/moving construct, the CAT measure appeared to be mo re effective than did the 10-item short forms measures, while the 10-item short form measures appeared to be more effective than did the ODQ measure in terms of measurement precision. Limitations and Future Implications The present study has several lim itations. Comp uter adaptive testing methods shorten test length by 62.5%, or require only an estimated nine items ( 58). When we preset the algorithm of the CAT, the stopping rules of CAT were; 1) te n item s for the maximum number of items, 2) four items for the minimum number of items, and 3) the standard error < 0.4. In the present study,

PAGE 118

118 our CAT used much fewer items than the pres et ten items and average respondents answered 6.08 items for each construct. Since the standard error of CAT measures was not included in analysis, which criteria were me t to reach the person measure was unknown. Future research is needed to investigate the effect s of adjusting the stopping rules to make them more rigorous, thus allowing more information to be obtained about respondents.

PAGE 119

119 1. Begin with initial ability estimate (Bn) 2. Select & present optimal scale item (Di) 3. Re-calculate Person Measure (Bn) 4. Estimate Confidence Interval (SE) 5. Is stopping rule satisfied? ( SE < 0.40 ) N o 7. End of battery? 8. Administer next construct 9. Stop N o Ye s Yes 6. End assessment Produce final person measure for construct ( Bn ) Yes Figure 4-1. Computer adaptiv e testing algorithm. Adapted from Wainer, Dorans, Eignor, Flaugher, Green, Mislevy, St einberg, and Thissen (2000).

PAGE 120

120 Table 4-1. Demographic characte ristics of study participants Characteristics Back Pain Group n=42 Non-Back Pain Group n=42 Age < 20 1 (2.4) 3 (7.0) 21 30 3 (7.1) 6 (14.4) 31 40 10 (23.8) 9 (21.4) 41 50 8 (19.0) 6 (14.4) 51 65 8 (19.0) 8 (19.0) > 65 12 (28.6) 10 (23.8) Mean SD 53.74 20.13 48.76 19.7 Gender Female 29 (69.0) 27 (64.3) Male 13 (31.0) 15 (35.7) Education Middle/Junior High 2 (4.7) 0 High School 19 (45.3) 14 (33.3) College 12 (28.5) 23 (54.8) Graduate 9 (21.5) 5 (11.9) Race/Ethnic African American 7 (16.6) 5 (11.9) Hispanic American 1 (2.3) 2 (4.8) American Indian 1 (2.3) 0 (0.0) White, not Hispanic origin 32 (76.2) 25 (59.5) Asian/Pacific Islander 2 (4.6) 10 (23.8) Years that has had related problems Less than a year 14 (33.3) 0 (0.0) 1 through < 4 years 5 (12.0) 0 (0.0) More than 4 years 20 (47.6) 2 (4.7) Missing 3 (7.1) 40 (95.3)

PAGE 121

121 A B Figure 4-1. Scatter plot of ability measures from the CAT measure versus the short form measure for positioning/transfer and lifting/ carrying construct. Figure A represents the plot of ability measures for the CAT and the short form for positioning/transfer; Figure B represents the plot of ability measure for the CAT and the ODQ measure for lifting/carrying. represent that Pear sons correlation is significant at the 0.01 level. 0 50 100 050100r = 0.805* CAT measure p ositionin g /transfe r 0 50 100 050100r =0.808* CAT measure lifting/carrying

PAGE 122

122 A B Figure 4-2. Scatter plot of ability measures from the CAT measure versus the short form measure. Figure A represents the plot of ability measures for the CAT and the short form measure for walking/moving; Figure B represents the plot of ability measure for the CAT and the ODQ measure for posit ioning/transfer. represent that Pearsons correlation is si gnificant at 0.01 level. 0 50 100 050100r = 0.620* CAT measure walking/moving 0 50 100 050100 CAT measure positioning/transfer r = 0.605*

PAGE 123

123 A B Figure 4-3. Scatter plot of ability measures from the CAT measure ve rsus the ODQ measure. Figure A represents the plot of ability measures for the CAT and the ODQ measure for lifting/carrying; Figure B represents the plot of ability measure for the CAT and the ODQ measure for walking/moving. re present that Pearsons correlation is significant at 0.01 level. 0 50 100 050100CAT measure lifting/carrying r = 0.530* 0 50 100 050100CAT measure walking/moving r = 0.594*

PAGE 124

124 Table 4-2. Correlations coefficients for CAT, short forms, and ODQ measure for back pain group CAT P/T CAT L/C CAT W/ M SF P/T SF L/C SF W/M ODQ CAT P/T 1.000 CAT L/C 0.837* 1.000 CAT W/M 0.614* 0.647* 1.000 SF P/T 0.805* 0.632* 0.568* 1.000 SF L/C 0.671* 0.808* 0.536* 0.635* 1.000 SF W/M 0.524* 0.566* 0.620* 0.554* 0.548* 1.000 ODQ 0.605* 0.530* 0.594* 0.605* 0.576* 0.605* 1.000 Note: correlation is significant at the 0.01 leve l (2-tailed). CAT P/T: CAT Positioning/Transfer measure, CAT L/C: CAT Lifting/Carrying meas ure, CAT W/M: CAT Wa lking/Moving measure, SF P/T: Short Form Positioning/Transfer measure, SF L/C:Short Form Lifting/Carrying measure, SF W/M: Short Form Walking/Moving measure, and ODQ: Oswestry Back Pain Disability Questionnaire measure. Table 4-3. Correlations coefficients for CAT, short forms, and ODQ measure for non-back pain group CAT P/T CAT L/C CAT W/ MSF P/T SF L/C SF W/M ODQ CAT P/T 1.000 CAT L/C 0.699* 1.000 CAT W/M 0.843* 0.623* 1.000 SF P/T 0.569* 0.331* 0.559* 1.000 SF L/C 0.402* 0.499* 0.512* 0.784* 1.000 SF W/M 0.574* 0.354* 0.606* 0.836* 0.788* 1.000 ODQ 0.037 0.058 0.029 0.132 0.098 0.064 1.000 Note: correlation is significant at the 0.01 leve l (2-tailed). CAT P/T: CAT Positioning/Transfer measure, CAT L/C: CAT Lifting/Carrying meas ure, CAT W/M: CAT Wa lking/Moving measure, SF P/T: Short Form Positioning/Transfer measure, SF L/C:Short Form Lifting/Carrying measure, SF W/M: Short Form Walking/Moving measure, and ODQ: Oswestry Back Pain Disability Questionnaire measure.

PAGE 125

125 Table 4-4. Mean difference between means for back pain and non-back pain groups Measure Means (SE) F Relative Precision Back pain Non-back pain CAT P/T 49.83 (0.61) 55.55 (0.61) 41.76** 2.02 Short Form P/T 53.85 (2.31) 83.07 (2.30) 20.58** 1.00 CAT L/C 50.24 (0.77) 56.02 (0.77) 27.36** 1.16 Short Form L/C 50.68 (3.08) 78.09 (3.14) 23.56** 1.00 CAT W/M 53.14 (0.89) 58.33 (0.89) 16.34** 0.62 Short Form W/M 58.28 (2.74) 86.98 (2.73) 26.09** 1.00 CAT P/T 49.83 (0.61) 55.55 (0.61) 41.76** 2.16 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 CAT L/C 50.24 (0.77) 77.00 (0.77) 27.36** 1.42 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 CAT W/M 53.14 (0.89) 58.33 (0.89) 16.34** 0.84 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 Short Form P/T 53.85 (2.31) 83.07 (2.30) 20.58** 1.06 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 Short Form L/C 50.68 (3.08) 78.09 (3.14) 23.56** 1.22 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 Short Form W/M 58.28 (2.74) 86.98 (2.73) 26.09** 1.35 ODQ 53.69 (2.69) 85.38 (2.69) 19.26** 1.00 Note: ** F statistics is significant at the 0.001 level. CAT: Computer Adaptive Testing, P/T: positioning/transfer measure, L/C: lifting/carryin g measure, W/M: walking/moving measure, and ODQ: Oswestry Back Pain Disability Questionnaire measure.

PAGE 126

126 CHAPTER 5 CONCLUSION Back pain is the m ost common cause of activity limitation in our society (1). The need for assessm ent of disability resulting from back pain has led to a prolifera tion of health status measures ( 35). Many of these measures are self-re ports of functional status. Self-report functional status m easures have been shown to be as reliable as or more reliable than physical measurements of function and more re levant to the patient and society ( 24). In addition, selfreports of pain and dis ability appear to be sensitive indicators of long-term outcomes ( 6, 7). Because of these superio r characteristics, self -report measures of back-related disability developed ( 23) with most, if not all, having adequate psychom etric properties (13, 23, 24, 26, 27, 29, 30, 40, 65, 70, 72, 79, 80). The Oswestry Back Pain Disability Questionnaire (ODQ) is one of the most wide ly used conventional self-report m easures (3, 13, 30, 73, 80). Due to the abundance of these measur es, a prevailing challenge is selecting the optim al measure. One characteristic that may make some measures less ideal than others is the presence of ceiling and floor effects. This problem may be the result of instrument development based solely on the Classical Test Theory (CTT) measurement model. Measurement imprecision generally results from the use of items that do not closely match to the ability of the population of interest ( 35, 55). In order to overcome these limitations, the ideal m easure should have items that cover a wide range of the underlying construct with high precision. However, most conventional measures fail to evaluate in dividuals precisely throughout the whole range of disability. Utilizing Item Response Theory (IRT) and Co mputer Adaptive Testing (CAT) methods, the ICF Activity Measure (ICFAM) was deve loped, creating an efficient and precise measurement system based on the activity di mension of International Classification of

PAGE 127

127 Functioning, Disability and Health (ICF). It ems relating to activities involving movement, moving around and daily life task s as defined by the activity dimension of the ICF were developed with the intent to create an equiprecise measurement (i.e., one with precise measurement across the entire ra nge of a construct). Creating short forms is a conventional approach to achieving measurement effici ency by reducing the number of items ( 28,35). However, the loss of precision is inevitable in short form creation ( 8,36,44-46,115). Critical questions are to what extent and using what m ethods can the precision of short forms be optimized. Three research questions were proposed as part of this dissertation project; 1) What are the psychometric properties of the computer ad aptive ICF activity meas ure constructs of positioning/transfer, lifting/carrying, and walking/ moving with a sample of individuals having activity limitations resulting from back pain?, 2) What are the psychometric properties of three newly generated short forms developed from ite ms on the positioning/transfer, lifting/carrying, and walking/moving constructs?, and 3) How doe s the precision of th e ICFAM CAT measures, the short form measures developed from the ICFAM, and the ODQ compare? Unidimensionality To address the first research question, confir m atory factor analyses (CFA) and exploratory factor analyses (EFA) were conducted to inve stigate the dimensionality. The CFA did not confirm the unidimensionality of the three ICFAM constructs. In order to identify the factor structure, EFA was subsequently performed and revealed a multidimensional factor structure for each ICFAM construct, including the full item ba nk for all constructs. We speculated that the low subject/item ratio (approximately 5 subjects pe r item) may have contributed to the failure of confirming unidimensionality. Therefore, we im proved the subject/item ratio (approximately 10 subjects per item) by performing the same analysis using 10-item short forms. CFAs still failed

PAGE 128

128 to reveal unidimensional structures for the three short forms. The subsequent EFAs for the three, 10-item short forms revealed multidimensional constructs composed of three factors for the positioning/transfer construct, two factors for the lifting/carrying construct, and two factors for the walking/moving construct. The factors retained for the short forms appear plausible from a clinical point of view. For the positioning/transfer short form, complex activ ity items (e.g., kneeling and getting out of bath tub) composed one factor, while simple activity items (e.g., changing and shifting position) make up a separate factor. In terms of motor control theory, the activity of kneeling requires greater metabolic demands than standing or stooping ( 135) and involves the complex neural activity associated w ith balance ( 125). The logical progression of item difficulty is even more prominent in lifting/carrying and w alking/moving short forms. For the lifting/carryi ng short form, the twofactor model grouped the items into lifting hea vy and lifting light objects Likewise, for the walking/moving short form, the two factor mode l grouped items into simple walking activities and more difficult climbing/ running activities. In summary, the factors appear to be subcomponents of each of the three ICFAM cons tructs, determined to a large degree by the difficulty of the activities. This multidimensional nature of the construct creates a serious challenge for this study, since unidimensionality is a requ irement of most IRT models. Th e reason for that is, a single construct can better explain the relationship between person performance and the item continuum in any data set ( 37). However, in practical terms, unidimensi onality is an ideal th at is never fully achieved and in most successful cases is approximated. Applying multidimensional IRT models to these existing constructs of the IC FAM may be worthwhile in future analyses,

PAGE 129

129 although many multidimensional mode ls are still in the early stages of development and refinement. Hypothetical versus Empirical Item Hierarchies The hypothetical hierarchy of activity based on Metabolic E quivalent (MET) was partially supported by the empirical hierarchy of item diffic ulty generated by Rasch analysis. Of the three constructs studied, only the walk ing/moving construct showed an item hierarchy that can be explained by MET. For instance, the most di fficult item, jogging one mile has a MET of 11.0, while the easiest item, walking on carpeting has a MET of 2.0. In contrast, for the positioning/transfer construct, the item difficulty hi erarchy can be explained better by the clinical features of back pain than the lo gical progression of the MET. That is, individuals with back pain demonstrated greater difficulty in maintaining postures for a prolonged time than shifting or changing postures. This hierarchi cal order is not supported by th e MET values. One of the most challenging items in the positioning/transfer cons truct (e.g., lying down on back 5-8 hours) has a MET rating of 1.0, while the least challenging item (e.g., changing st anding to sitting in chair) is rated as 2.0 METs. The item difficulty hierarc hy of the lifting/carrying construct only partially concurred with the MET categorization. Differe nt weight activities paralleled the MET categorization (e.g., lifting 25 pounds with 3.0 M ETs was more challeng ing than pulling wet laundry out from a washing machine with 2.0 METs). However within the lifting/carrying construct, the three above average items with different item difficulty calibrations had the same MET value. That is, the empirical item di fficulty order generated by Rasch analysis differentiated the three items (lifting 25 pounds shoul der to above head, lif ting 25 pounds floor to waist, and carrying 25 pounds for 25 feet), while th e MET values of the these items are the same (3.0 METs). The different difficu lty levels of the lifting items may be more a function of biomechanical challenge and pain experienced than energy expenditure. That is, lifting from

PAGE 130

130 shoulder level to above the head is more biomech anically challenging than lifting from floor to waist and may be more painful because lifting from shoulder level to above the head is a burden on both arms and back. There are limitations associated with determ ining the hypothetical item difficulty hierarchy of activity relevant to the items of the ICFAM cons tructs. First, the standardize the assignment of MET intensities in physical ac tivity questionnaires is based on a compendium of physical activities that was developed for use in epidemiologic studies ( 136,137). The values do not estim ate the energy cost of physical activity in individuals in ways that account for differences in body mass, age, gender, efficiency of movement or geographic and environmental condition in which the activities are performe d. Therefore, individual differen ces in energy expenditure for the same activity can be large. Second, there are no values generated for activities that consume less than one MET, which is defined as 1 Kcal/kg/hour and is roughly eq uivalent to the energy cost of sitting quietly. Many ICFAM items are no t comparable, since we have many bed mobility items in the positioning/transfer construct that may be less than one MET. Third, although we attempted to select the closest MET value when the relevant item was not available from the compendium of physical activities, the accuracy of these estimates is uncertain. Furthermore, the compendium of physical activities does not prov ide detailed descriptions of the physical activities. Thus, comparing our item difficulty hierarchy to a hierarchy based on MET values provides only a general sense of distin ctions between the two hierarchies. There were evidences that the item difficulty hierarchies appeared plausible from a painrelated clinical point of view. Motor control theory purports that complex tasks involving the use of multiple joints and challenging environmental factors are more difficult than functional tasks requiring only a single joint or mo re optimal environmental factors ( 94). In this study, as

PAGE 131

131 hypothesized, complex tasks were found to be mo re difficult (e.g., kneeling 10-20 minutes) than simple tasks (e.g., standing 1-2 hours). However, a relatively simple task (e.g., lying back 5-8 hours) was found to be more difficult than a co mplex task (e.g., change position lying back to sitting). These findings were neit her in agreement with motor control theory nor the MET values. Also, for the positioning/transfer construct, we speculate that individuals with back problems may have been primarily affected by pain not en ergy expenditure. That is, individuals with back pain who are having difficulty with a transient ta sk such as changing position from lying on their back to sitting (i.e., an easiest item) would be e xpected to have more difficulty with a prolonged activity such as lying on their back 5-8 hours activity (i.e ., above average difficult item). Furthermore, lying on their back for a prolonged period of time would be a difficult task for individuals with back pain ev en though the activity does not involve complex biomechanical modifications or adjustments. Thus, the logical progression of item hierarchies for the constructs positioning and transfers and lifting/carrying have a tendency to reflect the clinical features of back pain. Future research should investigate the relationship between pain during particular activities and the Rasch generated item difficulty hierarch ies to appraise this hypothesis. Short Forms Several m ethods have been used to develop short forms from original tests. These methods, based on the Classical Test Theory (CTT) framew ork, often include the de letion of items with low item-total correlations, items with the least impact on the overall internal consistency of test, and items with low factor loadings. In this study, using an IRT method, we focused on having items distributed across the difficulty range for each construct. The item-level psychometrics based on Rasch analysis (one-parameter IRT mode l) were effective in equally distributing ten items across the full range of ability and selec ting items that matched person ability location.

PAGE 132

132 This method focuses on maintaining measurement pr ecision across the full ra nge of the construct (i.e., equiprecise measurement ) while reducing the number of items. Despite these attempts, there was a loss of pr ecision with the three newly created 10-item short forms in comparison to the full test, as well as decreased pe rson reliability. Test Information Function (TIF) graphs were used to visually inspect the loss of precision. Fisher (1920) defined information as the reciprocal of the precision with whic h a parameter could be estimated. Thus, if one could estimate person ability with precision, one would have more information about the persons ability ( 130). The TIF graph is obtaine d by plotting the am ount of information against ability. The TIF for the positioning/transfer short forms showed a considerable loss of information as a large propo rtion of items was removed from the entire set of items (46 of 56 items removed). In contra st, the lifting/carrying and walking/moving short forms displayed less information loss, as a much smaller proportion of it ems were removed from these constructs (17 of 27 items and 10 of 20 items, respectively). All of the TIF graphs showed that different ability levels are estimated with di ffering degrees of precision. As one moves to the extremes of the scale (both low and high), less information and less precision is obtained. Constructing a fixed short form is a conve ntional approach to achieving measurement efficiency with fewer items. Although it is inevit able to sacrifice some precision in short form creation, short forms are always attractive from the perspective of patient and administrative practicality. Short forms reduce the burden on re spondents and test admi nistration. In addition, short forms may be useful in a situation wher e computer access is not readily available to researchers and clinicians. The short forms of the ICFAM have a few advantages over the ODQ. First, the ICFAM short forms provide optimal precision across a wide range of ability. This would substantially reduce deficits in measuremen t such as ceiling/floor effects. Secondly, the

PAGE 133

133 ICFAM short forms offer three constructs (i .e., positioning/transfer, lifting/carrying, and walking/moving), while the ODQ provides only three items relevant to positioning/transfer, one item relevant to lifting/carrying, and one item relevant to walking/moving. Researchers and clinicians may maximize their effectiveness in de tecting group differences or clinical change by selecting the ICFAM constructs that are most rele vant to individuals with back pain. Since the ICFAM positioning/transfer construct and lifting/ca rrying constructs are mo re precise than the ODQ, these measures may be preferable to the ODQ. Of note, the two items reinstated in order to fill substantial gaps on the high extreme in the ability continuum, carrying toddler on back for lifting/carrying short form and running one block for walking/moving short form showed high fit statistics. These two items were measuring the extremes of the construct and had a lack of observations on particular response categories that might lead to la rge observed variances. This may be a limitation of the short forms despite their adequate br eadth of measurement. Therefor e, our short forms could be improved in future research by developing items that more adequately fill gaps and replace the misfitting items. Precision As we hypothesized for relative precision, both the CAT and short form measures of the ICFAM showed m ore precision than the ODQ for the positioning/transfer and lifting/carrying constructs. That is, discriminati ng clinically irreleva nt groups (i.e., back pain versus non-back pain), the CAT outperforms both the short forms and the ODQ and the ICFAM short forms outperform the ODQ in. This was not true fo r the walking/moving construct. For the walking/moving construct, the CAT was less prec ise than both the short form and the ODQ in discriminating individuals with back pain from those without back pain. For the positioning/transfer construct, the CAT performed about two times greater in terms of relative

PAGE 134

134 precision (RP) than did the short form or the ODQ, while the short form performed 42% greater in terms of RP than did the ODQ. For the lifting/carrying cons truct, the CAT performed 16% greater in terms of RP than did the short form and 42% greater than did the ODQ, while the short form performed 22% greater than did the ODQ. The failure of the walking/moving CAT to show more precision than the short form or the ODQ appears to be related to the relative variances. The F statistic is a rati o of between group estimates to within group estimates. The low CAT F statistic for the walking/moving construct is a result of either high variance of person measures between the two groups or low variance of the person measures within the groups. In practical terms, the walking/moving construct may have less relevance than the other constructs for individua ls with back pain. This might lead to either lower between group variance or higher within group variance relative to the other constructs. In addition to precision, the CAT method provi des a means for administering items in a way that is efficient ( 28,34,36). In the present study, in term s of efficiency the CAT outperformed both the 10-item short forms and the ODQ. That is, on the CAT average respondents answered 5.62 items for the positioning/transfer, 6.37 items for the lifting/carrying, and 6.25 items for the walking/moving constructs while both the short form and ODQ required answering 10 questions. In summary, our data did not fit the models in CFA and subsequent EFA exploring factor structure of each construct did not show su fficient evidence to support the existence of unidimensional constructs. These findings may indicate the need for use of multidimensional models to adequately describe the dimensiona lity of physical function. In addition, there is a need for future studies to further develop th e constructs of the IC FAM, particularly the walking/moving construct based on physiological m easures such as METs. Another limitation of

PAGE 135

135 this study is that we sacrificed considerable precision in short fo rm creation. This may be partly due to reinstating two problematic items for the substantial gaps in the short forms. This may implicate that short forms could be improved by future research addressing: 1) replacing problematic items and 2) developing items that more adequately fill the gaps in the person ability to cover a wider range of the trait. Despite the multidimensional constructs on the ICFAM and the short forms, the adequate item level psychometrics suggests that the CA T method for measuring physical activity has promise. The CAT and the short forms of the ICFAM showed more precision than the ODQ for the positioning/transfer and lifting/carrying cons tructs, although the CAT of the ICFAM for the walking/moving construct was less precise than the short form and the ODQ measure. Overall, the CAT and the short forms of the ICFAM have se veral advantages over traditional self-report measures such as the ODQ. For researchers, precise measures decrease the number of subjects needed for a study and maximize the possibility of detecting differences between groups. For clinicians, precise measures capture small but pot entially significant incr ements of improvements in response to clinical interventions. In the present study, we presented evidence of the advantages of IRT-based short forms and CA T measures over a conventional back pain questionnaire. With the increased use of computers and web-based devices for data collection in research and clinical practice, CAT measures ma y become preferable due to their efficiency without loss of precision. When these devices are not availa ble, IRT-based short forms appear to a reasonable alternative. In general, the fi ndings are supportive of implementing contemporary IRT-based measures in both rese arch and clinical settings.

PAGE 136

136 APPENDIX THE OSWESTRY BACK PAIN DISA BILITY QUESTIONNAIRE (ODQ) This questionnaire has been designed to give your therapist infor m ation as to how your back pain has affected your ability to manage in everyday life. Please answer ev ery question by placing a mark in the one box that best describes your cond ition today. We realize y ou may feel that 2 of the statements may describe your condition, but please mark only the box that most closely describes your current condition. Pain Intensity I can tolerate the pa in I have without having to use pain medication. The pain is bad, but I can manage without having to take pain medication. Pain medication provides me with complete relief from pain. Pain medication provides me with moderate relief from pain. Pain medication provides me with little relief from pain. Pain medication has no effect on my pain. Personal Care (e.g., Washing, Dressing) I can take care of myself norma lly without causing increased pain. I can take care of myself normally, but it increases my pain. It is painful to take care of myself, and I am slow and careful. I need help, but I am able to manage most of my personal care. I need help every day in most aspects of my care. I do not get dressed, I wash w ith difficulty, and I stay in bed. Lifting I can lift heavy weight s without increased pain. I can lift heavy weights, but it causes increased pain. Pain prevents me from lifting heavy weights off the floor, but I can manage if the weights are conveniently positioned (e.g., on a table). Pain prevents me from lifting heavy weights, but I can manage light to medium weights if th ey are conveniently positioned. I can lift only ve ry light weights. I cannot lift or carry anything at all. Walking Pain does not prevent me from walking any distance. Pain prevents me from walking more than 1 mile. (1 mile = 1.6 km). Pain prevents me from walking more than 1/2 mile. Pain prevents me from walking more than 1/4 mile. I can walk only with crutches or a cane. I am in bed most of the time and have to crawl to the toilet. Sitting I can sit in any chair as long as I like. I can only sit in my favorite chair as long as I like. Pain prevents me from sitting for more than 1 hour. Pain prevents me from sitting for more than 1/2 hour. Pain prevents me from sitting for more than 10 minutes. Pain prevents me from sitting at all.

PAGE 137

137 Standing I can stand as long as I want without increased pain. I can stand as long as I wa nt, but it increases my pain. Pain prevents me from standing for more than 1 hour. Pain prevents me from standing for more than 1/2 hour. Pain prevents me from standing for more than 10 minutes. Pain prevents me from standing at all. Sleeping Pain does not prevent me from sleeping well. I can sleep well only by using pain medication. Even when I take medication, I sleep less than 6 hours. Even when I take medication, I sleep less than 4 hours. Even when I take medication, I sleep less than 2 hours. Pain prevents me from sleeping at all. Social Life My social life is normal and does not increase my pain. My social life is normal, but it increases my level of pain. Pain prevents me from participating in more energetic activities (e.g., sports, dancing). Pain prevents me form going out very often. Pain has restricted my social life to my home. I have hardly any social life because of my pain. Traveling I can travel anywhere without increased pain. I can travel anywhere, but it increases my pain. My pain restricts my travel over 2 hours. My pain restricts my travel over 1 hour. My pain restricts my travel to short necessary journeys under 1/2 hour. My pain prevents all trav el except for visits to the phys ician / therapist or hospital. Employment / Homemaking My normal homemaking / job activities do not cause pain. My normal homemaking / job activ ities increase my pain, but I can still perform all th at is required of me. I can perform most of my homemaking / job duties, but pain prevents me from performing more physically stressful activities (e.g., lifting, vacuuming). Pain prevents me from doing anything but light duties. Pain prevents me from doing even light duties. Pain prevents me from perfor ming any job or homemaking chores. Source: Fritz JM, Irrgang JJ. A co mparison of a modified Oswestry Low Back Pain Disability Questionnaire and the Quebec Back Pain Disability Scale. Physical Therapy 2001;81:776-788.

PAGE 138

138 LIST OF REFERENCES 1. Andersson GB. Epidemiological features of chronic low-back pain. Lancet. 1999;354(9178):581-5. 2. Bergner M, Bobbitt RA, Pollard WE, Martin DP, Gilson BS. The sickness impact profile: validation of a health status m easure. Med Care. 1976;14(1):57-67. 3. Fairbank JC, Couper J, Davies JB, O'Brien JP. The Oswestry low back pain disability questionnaire. Physiotherapy. 1980;66(8):271-3. 4. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Me d Care. 1981;19(8):787805. 5. Waddell G. An approach to backache. Br J Hosp Med. 1982;28(3):187, 90-1, 93-4, passim. 6. Roland M, Morris R. A study of the natura l history of low-back pain. Part II: development of guidelines for trials of treatme nt in primary care. Spine (Phila Pa 1976). 1983;8(2):145-50. 7. Roland M, Morris R. A study of the natural hist ory of back pain. Part I: development of a reliable and sensitive measur e of disability in low-back pain. Spine (Phila Pa 1976). 1983;8(2):141-4. 8. Ware JE, Jr., Sherbourne CD. The MOS 36-ite m short-form health survey (SF-36). I. Conceptual framework and item se lection. Med Care. 1992;30(6):473-83. 9. Haley SM, McHorney CA, Ware JE, Jr Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensiona lity and reproducibility of the Rasch item scale. J Clin Epidemiol. 1994;47(6):671-84. 10. Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping DL, et al. The Quebec Back Pain Disability Scale: conceptualizati on and development. J Clin Epidemiol. 1996;49(2):151-61. 11. McHorney CA, Haley SM, Ware JE, Jr Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods. J Clin Epidemiol. 1997;50(4):451-61. 12. Fisher WP, Jr. Foundations for health stat us metrology: the stability of MOS SF-36 PF10 calibrations across samples. J La State Med Soc. 1999;151(11):566-78. 13. Fritz JM, Irrgang JJ. A comparison of a modi fied Oswestry Low Back Pain Disability Questionnaire and the Quebec Back Pain Disability Scale. Phys Ther. 2001;81(2):776-88.

PAGE 139

139 14. Davidson M, Keating JL. A comparison of five low back disability questionnaires: reliability and responsiveness. Phys Ther. 2002;82(1):8-24. 15. Ware J, Jr., Kosinski M, Keller SD. A 12-It em Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Me d Care. 1996;34(3):220-33. 16. Million R, Hall W, Nilsen KH, Baker RD, Jayson MI. Assessment of the progress of the back-pain patient 1981 Volvo Award in Clin ical Science. Spine (Phila Pa 1976). 1982;7(3):204-12. 17. Ruta DA, Garratt AM, Wardlaw D, Russell IT Developing a valid and reliable measure of health outcome for patients with low back pain. Spine (Phila Pa 1976). 1994;19(17):1887-96. 18. Greenough CG, Fraser RD. Assessment of outcome in patients with low-back pain. Spine (Phila Pa 1976). 1992;17(1):36-41. 19. Manniche C, Asmussen K, Lauritsen B, Vinter berg H, Kreiner S, Jordan A. Low Back Pain Rating scale: validation of a tool for assessment of low back pain. Pain. 1994;57(3):317-26. 20. Daltroy LH, Cats-Baril WL, Katz JN, Fossel AH, Liang MH. The North American spine society lumbar spine outcome assessment Instrument: reliability and validity tests. Spine (Phila Pa 1976). 1996;21(6):741-9. 21. Williams RM, Myers AM. Functional Abilities Confidence Scale: a clinical measure for injured workers with acute low back pain. Phys Ther. 1998;78(6):624-34. 22. Williams RM, Myers AM. A new approach to measuring recovery in injured workers with acute low back pain: Resumption of Ac tivities of Daily Living Scale. Phys Ther. 1998;78(6):613-23. 23. Muller U, Roder C, Greenough CG. Back rela ted outcome assessment instruments. Eur Spine J. 2006;15 Suppl 1:S25-31. 24. Deyo RA. Measuring the functional status of patients with low back pain. Arch Phys Med Rehabil. 1988;69(12):1044-53. 25. Kopec JA. Measuring functional outcomes in persons with back pain: a review of backspecific questionnaires. Spine (Phila Pa 1976). 2000;25(24):3110-4. 26. Muller U, Roeder C, Dubs L, Duetz MS, Greenough CG. Condition-specific outcome measures for low back pain. Part II: scal e construction. Eur Spin e J. 2004;13(4):314-24. 27. Muller U, Duetz MS, Roeder C, Greenough CG. Condition-specific outcome measures for low back pain. Part I: valid ation. Eur Spine J. 2004;13(4):301-13.

PAGE 140

140 28. McHorney CA. Generic health measuremen t: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med. 1997;127(8 Pt 2):743-50. 29. Fairbank JC, Pynsent PB. The Oswestry Disability Index. Sp ine (Phila Pa 1976). 2000;25(22):2940-52; discussion 52. 30. White LJ, Velozo CA. The use of Rasch measurement to improve the Oswestry classification scheme. Arch Phys Med Rehabil. 2002;83(6):822-31. 31. Baker C, Pynsent, PB, Fairbank, JCT. The Oswestry Disability Index revisited: Its reliability, repeatability, and validity, and a co mparison with the St. Thomas's Disability Index. 1989(In: Roland MO, Jenner JR, eds. B ack Pain: New Approaches to Education and Rehabilitation. Manche ster, UK: Manchester Un iversity Press,):174-86. 32. Deyo RA, Battie M, Beurskens AJ, Bombardi er C, Croft P, Koes B, et al. Outcome measures for low back pain research. A propos al for standardized use. Spine (Phila Pa 1976). 1998;23(18):2003-13. 33. Liang MH, Lew RA, Stucki G, Fortin PR, Daltroy L. Measuring clinically important changes with patient-oriented questionnair es. Med Care. 2002;40( 4 Suppl):II45-51. 34. Velozo CA, Kielhofner G, Lai JS. The us e of Rasch analysis to produce scale-free measurement of functional ability. Am J Occup Ther. 1999;53(1):83-90. 35. McHorney CA. Health status assessment me thods for adults: past accomplishments and future challenges. Annu Rev Public Health. 1999;20:309-35. 36. Jette AM, Haley SM. Contemporary measurem ent techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37(6):339-45. 37. Hambleton RK. Emergence of item response modeling in instrument development and data analysis. Med Ca re. 2000;38(9 Suppl):II60-5. 38. DeVellis RF. Classical test theory. Med Care. 2006;44(11 Suppl 3):S50-9. 39. Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Arch Phys Med Rehabil. 1989;70(12):857-60. 40. Velozo CA, Choi B, Zylstra SE, Santopoalo R. Measurement qualities of a self-report and therapist-scored functional capacity in strument based on the Dictionary of Occupational Titles. J Occ up Rehabil. 2006;16(1):109-22. 41. Velozo CA, Wang Y, Lehman L, Wang JH. Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running. Disabil Rehabil. 2008;30(6):458-67. 42. Velozo CA, Peterson EW. Developing mean ingful Fear of Falling Measures for community dwelling elderly. Am J P hys Med Rehabil. 2001;80(9):662-73.

PAGE 141

141 43. Weiss D. Improving measurement quality and efficiency with adaptive testing. Applied Psychological Testing. 1982;6:473-92. 44. Haley SM, Coster WJ, Andres PL, Kosinski M, Ni P. Score comparability of short forms and computerized adaptive testing: Simula tion study with the activity measure for postacute care. Arch Phys Med Rehabil. 2004;85(4):661-6. 45. Velozo CA, Lai JS, Mallinson T, Hauselman E. Maintaining instrument quality while reducing items: application of Rasch analysis to a self-r eport of visual function. J Outcome Meas. 2000;4(3):667-80. 46. Haley SM, Andres PL, Coster WJ, Kosinski M, Ni P, Jette AM. Short-form activity measure for post-acute care. Arch Ph ys Med Rehabil. 2004;85(4):649-60. 47. Bjorner J, Ware Jr., JE. Using modern psycho metric methods to measure health outcomes. Med Outcome Trust Monitor 1998;3:12-6. 48. Elhan AH, Oztuna D, Kutlay S, Kucukdeveci AA, Tennant A. An initial application of computerized adaptive testing (CAT) for measur ing disability in patients with low back pain. BMC Musculoskelet Di sord. 2008;9:166. PMCID: 2651163. 49. Haley SM, Ni P, Ludlow LH, Fragal a-Pinkham MA. Measurement precision and efficiency of multidimensional computer ad aptive testing of physical functioning using the pediatric evaluation of disability inventory. Arch Phys Med Rehabil. 2006;87(9):1223-9. 50. Hart DL, Cook KF, Mioduski JE, Teal CR, Cr ane PK. Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. J Clin Epidem iol. 2006;59(3):290-8. 51. Haley SM, Siebens H, Coster WJ, Tao W, Black-Schaffer RM, Gandek B, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes. Arch Phys Med Rehabil. 2006;87(8):1033-42. 52. Jette AM, Haley SM, Ni P, Olarsch S, Moed R. Creating a computer adaptive test version of the late-life function and disability instrument. J Ger ontol A Biol Sci Med Sci. 2008;63(11):1246-56. PMCID: 2718692. 53. Haley SM, Gandek B, Siebens H, Black-Sch affer RM, Sinclair SJ, Tao W, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes. Arch Phys Med Rehabil. 2008;89(2):275-83. PMCID: 2666330. 54. World Health Organization. Towards a comm on language for Functioning, Disability and Health. Geneva. 2002. 55. Jette AM. Assessing disability in studies on physical activity. Am J Prev Med. 2003;25(3 Suppl 2):122-8.

PAGE 142

142 56. Picavet HS, Schouten JS. Musculoskeleta l pain in the Netherlands: prevalences, consequences and risk groups, the DMC(3)-study. Pain. 2003;102(1-2):167-78. 57. Ware JE, Jr., Kosinski M, Bjorner JB, Bayliss MS, Bate nhorst A, Dahlof CG, et al. Applications of computerized adaptive te sting (CAT) to the assessment of headache impact. Qual Life Res. 2003;12(8):935-52. 58. Hol A, Vorst, HCM, Mellenbergh, GJ. Com puterized adaptive te sting for polytomous motivation items: Administra tion mode effects and a comparison with short forms. Applied Psychological Measure. 2007;31:412-29. 59. Flynn KE, Dombeck CB, DeWitt EM, Schulman KA, Weinfurt KP. Using item banks to construct measures of patient reported outcomes in clinical trials: investigator perceptions. Clin Trials. 2008;5(6):575-86. PMCID: 2662709. 60. Nunnally JC, Bernstein, I.H. Psychometric Theory 1994(New York, NY: McGraw-Hill). 61. Kosinski M, Bayliss MS, Bjorner JB, Ware JE, Jr., Garber WH, Batenhorst A, et al. A six-item short-form survey for measuring h eadache impact: the HIT-6. Qual Life Res. 2003;12(8):963-74. 62. Deyo RA. Comparative validity of the sickne ss impact profile and shorter scales for functional assessment in low-back pain. Spine (Phila Pa 1976). 1986;11(9):951-4. 63. Carter WB, Bobbitt RA, Bergner M, Gilson BS Validation of an interval scaling: the sickness impact profile. Health Serv Res. 1976;11(4): 516-28. PMCID: 1071949. 64. Deyo RA, Carter WB. Strategies for improving and expanding the a pplication of health status measures in clinical settings. A researcher-developer viewpoint. Med Care. 1992;30(5 Suppl):MS176-86; discussion MS96-209. 65. Deyo RA, Diehl AK. Measuring physical and ps ychosocial function in patients with lowback pain. Spine (Phila Pa 1976). 1983;8(6):635-42. 66. Follick MJ, Smith TW, Ahern DK. The sickne ss impact profile: a global measure of disability in chronic low back pain. Pain. 1985;21(1):67-76. 67. Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping DL, et al. The Quebec Back Pain Disability Scale. Measurement properties. Spine (Phila Pa 1976). 1995;20(3):341-52. 68. Stratford PW, Binkley FM, Riddle DL. Health status measures: strategies and analytic methods for assessing change scores. Phys Ther. 1996;76(10):1109-23. 69. Stratford PW, Binkley J, Solomon P, Finch E, Gill C, Moreland J. Defining the minimum level of detectable change for the Ro land-Morris questionnaire. Phys Ther. 1996;76(4):359-65; discussion 66-8.

PAGE 143

143 70. Kopec JA, Esdaile JM. Functional disability s cales for back pain. Spine (Phila Pa 1976). 1995;20(17):1943-9. 71. Hsieh CY, Phillips RB, Adams AH, Pope MH Functional outcomes of low back pain: comparison of four treatment groups in a randomized controlled trial. J Manipulative Physiol Ther. 1992;15(1):4-9. 72. Beurskens AJ, de Vet HC, Koke AJ. Responsiveness of functional status in low back pain: a comparison of different instru ments. Pain. 1996;65(1):71-6. 73. Fairbank JC. The use of revised Oswestry Disability Questionnaire. Spine (Phila Pa 1976). 2000;25(21):2846-7. 74. Fairbank J. Revised Oswestry Disabili ty questionnaire. Spine (Phila Pa 1976). 2000;25(19):2552. 75. Bossons CR, Levy J, Sutterlin CE, 3rd. Rec onstructive spinal surgery: assessment of outcome. South Med J. 1996;89(11):1045-52. 76. Frost H, Lamb SE, Stewart-Brown S. Res ponsiveness of a patient specific outcome measure compared with the Oswestry Disability Index v2.1 and Roland and Morris Disability Questionnaire for patients with subacute and chronic low back pain. Spine (Phila Pa 1976). 2008;33(22):2450-7; discussion 8. 77. Fairbank JC. Use and abuse of Oswestry Disability Index. Spine (Phila Pa 1976). 2007;32(25):2787-9. 78. Stewart AL. Conceptual challenges in linking physical activity and disability research. Am J Prev Med. 2003;25(3 Suppl 2):137-40. 79. Taylor SJ, Taylor AE, Foy MA, Fogg AJ. Responsiveness of common outcome measures for patients with low back pain. Sp ine (Phila Pa 1976). 1999;24(17):1805-12. 80. Page SJ, Shawaryn MA, Cernich AN, Linacre JM. Scaling of the revised Oswestry low back pain questionnaire. Arch P hys Med Rehabil. 2002;83(11):1579-84. 81. Hart DL, Wang YC, Stratford PW, Mioduski JE. Computerized adaptive test for patients with knee impairments produced valid and responsive measures of function. J Clin Epidemiol. 2008;61(11):1113-24. 82. Fliege H, Becker J, Walter OB, Rose M, Bjorner JB, Klapp BF. Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. Int J Methods Psychiat r Res. 2009;18(1):23-36. 83. Hart DL, Wang YC, Stratford PW, Mioduski JE. A computerized adaptive test for patients with hip impairments produced valid and responsive measures of function. Arch Phys Med Rehabil. 2008;89(11):2129-39.

PAGE 144

144 84. Shone CC, Quinn CP, Wait R, Hallis B, Fooks SG, Hambleton P. Proteolytic cleavage of synthetic fragments of vesicleassociated membrane protein, isoform-2 by botulinum type B neurotoxin. Eur J Biochem. 1993;217(3):965-71. 85. Andersson GBI. Epidemiological featur es of chronic low-back pain. Lancet. 1999;354(9178):581-5. 86. U.S. Department of Labor. Nonfatal occupational injuries and illness requiring days away from work. Bureau of Labor Statisti cs; 2005 [updated 2005; cited]; Available from: http://www.bls.gov/iif/oshwc/osh/os/osh05_01.pdf 87. Pai S, Sundaram, LJ. Low back pain: an economic assessment in the United States. Orthop Clin North Am. 2004;35:1-5. 88. Frymoyer JW, Cats-Baril WL. An overview of the incidences and costs of low back pain. Orthop Clin North Am. 1991;22(2):263-71. 89. Manchikanti L. Epidemiology of low back pain. Pain Physician. 2000;3(2):167-92. 90. Hambleton RK. Comparison of classical test theory and it em response theory and their applications to test development. Educ Meas Issue Pract 1993:38-47. 91. Crocker LA, J. Introduction to cl assical and modern test theory. 1986. 92. Thurstone L. Measurement of social at titudes. J Abnorm Soc Psycholol. 1931;26:249-69. 93. Merbitz C, Morris J, Grip JC. Ordinal scales and foundations of misinference. Arch Phys Med Rehabil. 1989;70(4):308-12. 94. Shumway-Cook A, Woollacott, M, editor. Motor Control: Theory and Practical Applications. 2nd ed. ed. Philadelphia: Lippincott Williams & Wilkins; 2000. 95. Bond TG, Fox, CM. Applying the Rasch model, Fundamental measurement in the human sciences. 2001;2nd edition. 96. Linacre JM. Detecting multidimensionality : which residual data-type works best? J Outcome Meas. 1998;2(3):266-83. 97. Smith E. Detecting and evaluating the im pact of multidimensionality using item fit statistics and principle component analysis of residuals. J Appl Meas. 2002;3:205-31. 98. Brown TA. Confirmatory factor analysis of the Penn State Worry Questionnaire: Multiple factors or method effects? Behav Res Ther. 2003;41(12):1411-26. 99. Brown TA. Confirmatory Factor An alysis for Applied Research. 2008. 100. Child D, editor. The Essentials of Factor Analysis. 3rd edition ed: Continuum; 2006.

PAGE 145

145 101. Cattell RB. The scree test for the number of factors. Mutivariate Behavioral Research. 1966;1:245-76. 102. Norman GR, Steiner, D.L. Biostatistics: The bare essentials. 1994(St. Louise: Mosby Yearbook Inc.). 103. Linacre JM. WINSTEPS Rasch meas urement computer program. 2005. 104. Wright BD, Masters, G.N. Rating scale analysis. 1982. 105. Wang WC, Chen, C.T. Item parameter rec overy, standard error estimates, and fit statistics of the WINSTEPS Program for th e family of Rasch models. Educ Psychol Measure. 2005;65:376-404. 106. Wright BD, Linacre, J. M. Reasonable m ean-square fit values. 1994;8:3 Autumn(Rasch Measurement Transactions Conents):370. 107. Linacre JM. What do Infit and Outfit, M ean-square and Standardized mean? Rasch Measurement Transactions. 2002;16(2):878. 108. Correlations: point-bis erial, point-measure, residual. Special topuics. http://www.winsteps.com/winman/index.htm 109. Balady GJ. Survival of the fittest--more evidence. N Engl J Med. 2002;346(11):852-4. 110. Blair SN, Haskell WL, Ho P, Paffenbarger RS, Jr., Vranizan KM, Farquhar JW, et al. Assessment of habitual physic al activity by a seven-day recal l in a community survey and controlled experiments. Am J Epidemiol. 1985;122(5):794-804. 111. Fletcher GF, Balady, G.J., Amsterdam, E.A. Ex ercise standards for testing and training. a statement for healthcare prof essionals from the American Heart Association Circulation 2001;104:1694-740. 112. Montoye HJ, Kemper, H.C.G., Saris, W.H.M., and Washburn, R.A. Measuring Physical Activity and Energy Expenditure. Huma n Kinetics. 1996(Cahmpaign, IL):p 4-5. 113. Braith RW, Welsch MA, Mills RM, Jr., Ke ller JW, Pollock ML. Resistance exercise prevents glucocorticoid-induced myopathy in h eart transplant recipi ents. Med Sci Sports Exerc. 1998;30(4):483-9. 114. Reckase MD. The difficulty of test items th at measure more than one ability. Applied Psychological Measurement. 1985;Dec(9(4)):401-12. 115. Ware JE, Jr. A 12-Item Short-Form Hea lth Survey: Construction of Scales and Preliminary Tests of Reliability and Validity. Med Care. 1996;34(3):220-33. 116. Box G, Draper, N. Empirical Model Building and Response Surfaces. 1987;New York(John Wiley and Sons).

PAGE 146

146 117. Wilkinson L, The task force on statistical inference Statistical Methods in Psychology Journals. American Psychologist. 1999;54(8):594-604. 118. Feldt LS, Brennan, R.L. Reliability. Edu cational Measurement. 1989;3rd ed.(New York: Macmillan.):pp. 105-46. 119. Mallinson T, Stelmack, J., Velozo, C. A comparison of the separation ratio and coefficient in the creation of minimum item se ts. Med Care. 2004;42(1 suppl):I-17 I24. 120. Raykov T. Reliability if deleted, not "alpha if deleted": evaluation of scale reliability following component deletion. British Journal of Mathematical and Statistical Psychology. 2007;60:201-16. 121. Raykov T. "alpha if item deleted": A note on loss of criteri on validity in scale development if maximizing coefficient alpha British Journal of Mathematical and Statistical Psychology. 2008;61:275-85. 122. Thorndike RL, Hagen, E.P. Measurement and evaluation in psychology and education. 1977;4th ed. 123. Jette AM, Haley SM, Ni P. Comparison of func tional status tools used in post-acute care. Health Care Financ Rev. 2003;24(3):13-24. 124. Wright BD, Masters, G.N. Number of Person or Item Strata. Rasch Measurement Transactions. 2002;16:3:888. 125. Mezzarane RA, Kohn, A.F. Postural cont rol during kneeling. Experimental Brain Research. 2007;187(3):395-405. 126. Stineman MG, Goin JE, Granger CV, Fiedler R, Williams SV. Discharge motor FIMfunction related groups. Arch Phys Med Rehabil. 1997;78(9):980-5. 127. Stineman MG, Jette A, Fiedler R, Granger C. Impairment-specific dimensions within the Functional Independence Measure. Arch Phys Med Rehabil. 1997;78(6):636-43. 128. George D, Mallery, P. SPSS for Windows step by step: A simple guide and reference. 11.0 update. 2003;4th ed.(Boston: Allyn & Bacon). 129. McHorney CA, Ware JE, Jr., Lu JF, Sh erbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups Med Care. 1994;32(1):40-66. 130. Fisher RA. Theory of Statistical Estim ation. 1925;Proc. Cambridge Pill(Soc. 22):700-25. 131. Erhard RE, Delitto A, Cibulka MT. Relative effectiveness of an extension program and a combined program of manipulation and flexi on and extension exercises in patients with acute low back syndrome. Phys Ther. 1994;74(12):1093-100.

PAGE 147

147 132. Davidson M. Rasch analysis of three versions of the Oswestry Disa bility Questionnaire. Man Ther. 2008;13(3):222-31. 133. Netemeyer RG, Bearden, W.O., Sharma, S. Scaling Procedures, Issues and Applications. Thousand Oaks, California: Sage Publications, Inc.; 2003. 134. Hart DL, Mioduski JE, Werneke MW, Stratf ord PW. Simulated computerized adaptive test for patients with lumbar spine impa irments was efficient and produced valid measures of function. J Clin Epidemiol. 2006;59(9):947-56. 135. Gallagher S. Trunk extension strength and muscle activity in standing and kneeling postures. Spine (Phila Pa 1976). 1997;22(16):1864-72. 136. Ainsworth BE, Haskell WL, Leon AS, Jacobs DR, Jr., Montoye HJ, Sallis JF, et al. Compendium of physical activ ities: classification of en ergy costs of human physical activities. Med Sci Sports Exerc. 1993;25(1):71-80. 137. Ainsworth BE, Haskell WL, Whitt MC, Irwin ML, Swartz AM, Strath SJ, et al. Compendium of physical activit ies: an update of activity co des and MET intensities. Med Sci Sports Exerc. 2000;32(9 Suppl):S498-504.

PAGE 148

148 BIOGRAPHICAL SKETCH Bongsa m Choi received his Bachelor of H ealth Science degree in physical therapy from the Yonsei University in February 1987. He comp leted his Master of Health Science degree in public health at the Yonsei University, Seoul, Korea in February, 1989. Since coming to the United States in 1992, He has worked in a variety of inpatient, outpatient, re habilitation hospital, and home health care settings over 23 years of practice in physical therapy. He was a rehab supervisor of a company that provides the specia lized outpatient rehabili tation service at CORF (Certified Outpatient Rehabilita tion Facility) in Tarpon Springs, FL He is also an active member of American Physical Therapy Association. As he graduated with a PhD in Rehabilita tion Science from University of Florida, Gainesville, FL, he plans to remain active on bo th clinical practice and research to better measure the physical function of pain relate d population.