This item is only available as the following downloads:
1 TOWARDS AN OPERATIONAL DEFINITION OF PHARMACY CLINICAL COMPETENCY By CHARLES ALLEN DOUGLAS A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011
2 2011 Charles Allen Douglas
3 To my daughter Sophia to show her that with courage and perseverance one can achieve their dreams at any age
4 ACKNOWLEDGMENTS This dissertation study has been a wonderful transformative experience. I would like to thank the members of my dissertation committee, Dr. Ried, Dr. Beck, Dr. Kimberlin, and Dr. Miller. I would like to thank my wife, Martha for supporting me through the successes and hard times as well that accompanies the completion of a Ph.D. program. Without your love and support, this would not have been possible.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 9 LIST OF FIGURES ................................ ................................ ................................ ........ 13 ABSTRACT ................................ ................................ ................................ .................. 14 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 16 1.1 Background ................................ ................................ ................................ ....... 16 1.2 Statement of the Problem ................................ ................................ ................. 18 1.3 Purpose of Study ................................ ................................ .............................. 20 1.4 Theoretical Frameworks ................................ ................................ ................... 21 1.4.1 Validity ................................ ................................ ................................ ..... 21 1.4.2 Reliability ................................ ................................ ................................ 22 1.5 Research Questions ................................ ................................ ......................... 22 1.6 Significance of Problem ................................ ................................ .................... 23 2 LI TERATURE REVIEW ................................ ................................ .......................... 25 2.1 Competency ................................ ................................ ................................ ...... 25 2.1.1 Defining Competency ................................ ................................ .............. 26 2.1.2 Competency and Performance ................................ ................................ 28 2.1.3 Describing Competent Performance ................................ ........................ 28 2.1.4 Competency Summary ................................ ................................ ............ 30 2.2 Assessment ................................ ................................ ................................ ...... 31 2.2.1 Assessment Function ................................ ................................ .............. 31 2.2.2 Assessment Structure ................................ ................................ ............. 33 2.2.3 Performance Domain ................................ ................................ ............... 36 2.2.4 Role of the Preceptor ................................ ................................ ............... 37 2.2.5 Assessment Summary ................................ ................................ ............. 39 2.3 Vali dity ................................ ................................ ................................ .............. 40 2.3.1 A Defensible Assessment ................................ ................................ ........ 40 2.3.2 Content Validity ................................ ................................ ....................... 41 2.3.3 Reliability ................................ ................................ ................................ 44 2.3.4 Accuracy ................................ ................................ ................................ .. 46 2.3.5 Validity Summary ................................ ................................ ..................... 46 2.5 SUCCESS Overview ................................ ................................ ......................... 47 2.6 Research Questions ................................ ................................ ......................... 49 2.6.1 Research Question 1 ................................ ................................ ............... 50 2.6.2 Research Questions 2 and 3 ................................ ................................ ... 51
6 2.7 Literature Review Summary ................................ ................................ .............. 52 3 METHODS ................................ ................................ ................................ .............. 55 3.0 Methods Introducti on ................................ ................................ ........................ 55 3.1 Phase I Methods ................................ ................................ ............................... 55 3.1.1 Study Participants ................................ ................................ .................... 55 3.1.2 Su rvey Instrument Development ................................ ............................. 56 3.1.3 Data Collection ................................ ................................ ........................ 59 3.1.4 Data Analysis ................................ ................................ .......................... 60 3.1.5 Phase I Methods Summary ................................ ................................ ..... 60 3.2 Phase II Methods ................................ ................................ .............................. 60 3.2.1 St udy Participants ................................ ................................ .................... 61 3.2.2 Simulation Development ................................ ................................ .......... 62 3.2.3 Data Collection ................................ ................................ ........................ 73 3.2.4 Data Analysis ................................ ................................ .......................... 73 3.2.5 Phase II Methods Summary ................................ ................................ .... 76 4 RESULTS ................................ ................................ ................................ ............... 79 4.0 Results Introduction ................................ ................................ .......................... 79 4.1 Delphi Panel ................................ ................................ ................................ ..... 79 4.1.1 Round One Panel ................................ ................................ .................... 79 4.1.2 Domain Specification Results ................................ ................................ .. 80 4.1.3 Performance Criteria Results ................................ ................................ .. 86 4.1.4 Round Two Panel ................................ ................................ .................... 99 4.1.5 Doma in Specification Results ................................ ................................ 100 4.1.6 Performance Criteria Results ................................ ................................ 101 4.1.7 Summary of Delphi Panel Results ................................ ......................... 110 4.2 Phase II Video Simulation ................................ ................................ ............... 111 4. 2.1 Expert Panel ................................ ................................ .......................... 111 4.2.2 Expert Panel Results ................................ ................................ ............. 112 4.2.3 Summary Expert Panel Results ................................ ............................. 115 4.2.4 Preceptor Panel ................................ ................................ ..................... 117 4.2.5 Prec eptor Panel Results ................................ ................................ ........ 118 4.2.6 Summary Preceptor Results ................................ ................................ .. 127 5 DISCUSSION ................................ ................................ ................................ ....... 141 5.0 Overview ................................ ................................ ................................ ......... 141 5.1 Phase I Delp hi Panel ................................ ................................ .................... 141 5.1.1 Domain Specification ................................ ................................ ............. 142 5.1.2 Performance Criteria ................................ ................................ ............. 145 5.1.3 The Delphi Panel ................................ ................................ ................... 146 5.2 Phas e II Video Simulation ................................ ................................ ............. 150 5.2.1 Video Production ................................ ................................ ................... 150 5.2.2 Preceptor Panel ................................ ................................ ..................... 152
7 5.2.3 Reliability Results ................................ ................................ .................. 153 5.2.4 Accuracy Results ................................ ................................ ................... 159 5.2.5 Rating Scale Comparison ................................ ................................ ...... 161 5.3 Future Research ................................ ................................ ............................. 163 5.3.1 Content Validation ................................ ................................ ................. 163 5.3.2 Video Simulation Strategy ................................ ................................ ... 165 5.3.3 Video Simulations Production ................................ ............................. 166 5.3.4 Training ................................ ................................ ................................ 167 5.3.5 Rating Scale Analysis ................................ ................................ ............ 169 5.3. 6 Performance Levels ................................ ................................ ............... 170 5.3.7 Beyond Graduation ................................ ................................ ................ 171 5.3.8 National Validation Study ................................ ................................ ...... 171 5.4 Summary and Conclusions ................................ ................................ ............. 172 APPENDIX A SUCCESS COMPETENCIES ................................ ................................ ............... 179 B DRUG THERAPY EVALUATION AND DEVELOPMENT ................................ ..... 180 C IRB DOCUMENTS ................................ ................................ ................................ 183 D DELPHI PANEL R OADMAP ................................ ................................ ................. 188 E ANALYSIS PATH AND ASSESSMENT RUBRIC ................................ ................. 191 F SKILL STATEMENTS AND CHECKLISTS ................................ ........................... 194 G PERFORMANCE CRITERIA GLOSSARY ................................ ............................ 196 H DELPHI PANEL (ROUND ONE) RESULTS ................................ ......................... 19 8 I DELPHI PANEL (ROUND TWO) RESULTS ................................ ......................... 204 J CASE STUDY SUMMARIES ................................ ................................ ................ 209 K EXPERT PANEL SCRIPT TARGETS AND RESULTS ................................ ......... 212 L PRECEPTOR PANEL RESULTS ................................ ................................ ......... 215 M PRECEPTER RELIABILITY RESULTS ................................ ................................ 217 O PRECEPTOR ACCURACY RESULTS ................................ ................................ 220 P SUPERVISION PROXY RATING SCALE ................................ ............................. 223 Q RATING SCALE COMPARISION ................................ ................................ ......... 224 LIST OF REFERENCES ................................ ................................ ............................. 227
8 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 244
9 LIST OF TABLES Table page 3 2 Video vignettes times ................................ ................................ ............................. 77 3 150 ................................ ................. 77 4 1 Delphi panel (round one) demographics ................................ .............................. 133 4 2 Delphi panel (round one) practice site characteristics ................................ ......... 134 4 3 Delphi panel (round two) demographics ................................ ............................... 135 4 4 Delphi panel (round two) practice site characteristics ................................ .......... 136 4 5 Expert panel demographics ................................ ................................ ................ 137 4 6 Expert panel practice site characteristics ................................ ........................... 138 4 7 Preceptor panel demographics ................................ ................................ ............ 139 4 8 Preceptor practice site characteristics ................................ ............................... 140 H 1 Responses for Skill A ................................ ................................ .......................... 198 H 2 Responses for Skill B ................................ ................................ .......................... 198 H 3 Responses for Skill C ................................ ................................ .......................... 198 H 4 Responses for Skill D ................................ ................................ .......................... 198 H 5 Responses for Skill E ................................ ................................ .......................... 198 H 6 Responses for Skill F ................................ ................................ ........................... 198 H 7 Responses for Skill G ................................ ................................ .......................... 199 H 8 Responses to question 8 ................................ ................................ ..................... 199 H 9 Responses to question A1 ................................ ................................ ................... 199 H 10 Responses to question A2 ................................ ................................ ................. 199 H 11 Responses to question B1 ................................ ................................ ................. 199 H 12 Responses to question B2 ................................ ................................ ................. 199
10 H 13 Responses to question C1 ................................ ................................ ................ 200 H 14 Responses to question C2 ................................ ................................ ................ 200 H 15 Responses to question D1 ................................ ................................ ................ 200 H 16 Responses to question D2 ................................ ................................ ................ 200 H 17 Responses to question E1 ................................ ................................ ................. 200 H 18 Responses to question E2 ................................ ................................ ................. 200 H 19 Responses to question F1 ................................ ................................ ................. 201 H 20 Responses to question F2 ................................ ................................ ................. 201 H 21 Responses to question G1 ................................ ................................ ................ 201 H 22 Responses to question G2 ................................ ................................ ................ 201 H 23 Responses to complexity question ................................ ................................ .... 201 H 24 Responses to reliability question ................................ ................................ ....... 201 H 25 Responses to deficient performance question ................................ ................... 202 H 26 Responses to efficiency question ................................ ................................ ...... 202 H 27 Responses to entry level performance question ................................ ................ 202 H 28 Responses to excellent performance question ................................ .................. 202 H 29 Responses to performance criterion question ................................ ................... 202 H 30 Responses to quality question ................................ ................................ ........... 202 H 31 Responses to supervision question ................................ ................................ ... 203 I 1 Responses to question A1 ................................ ................................ .................... 204 I 2 Response to question B1 ................................ ................................ ...................... 204 I 3 Responses to question D1 ................................ ................................ .................... 205 I 4 Collapsed responses to question D1 ................................ ................................ ..... 205
11 I 5 Responses to question D2 ................................ ................................ .................... 205 I 6 Responses to question E1 ................................ ................................ .................... 206 I 7 Responses to question E2 ................................ ................................ .................... 206 I 8 Responses to question G1 ................................ ................................ .................... 206 I 9 Responses to question G2 ................................ ................................ .................... 207 I 10 Responses to complexity question ................................ ................................ ...... 207 I 11 Responses to deficient performance question ................................ .................... 207 I 12 Responses to excellent performance question ................................ ................... 208 K 1 Diabetes performance targets from the script ................................ ...................... 212 K 2 Diabetes assessments by expert panel ................................ ............................... 212 K 3 Heart Failure performance targets by the script ................................ ................... 213 K 4 Heart Failure assessments by expert panel ................................ ......................... 213 K 5 Anticoagulation performance targets by the script ................................ ............... 214 K 6 Anticoagulation assessments by expert panel ................................ ..................... 214 L 1 Diabetes assessments by preceptor panel ................................ .......................... 215 L 2 Heart Failure assessments by preceptor panel ................................ .................... 215 L 3 Anticoagulation assessments by preceptor panel ................................ ................ 216 M 1 Reliability competent vs not competent ................................ ........................... 217 M 2 Reliability excellent vs. entry level ................................ ................................ .... 217 M 3 Reliability of scale items Skill A ................................ ................................ ........ 218 M 4 Reliability of scale items Skill B ................................ ................................ ........ 218 M 5 Reliability of scale items Skill D/E ................................ ................................ .... 219 M 6 Reliability of global assessment co mpetent vs. not competent ........................ 219
12 N 1 Accuracy Skill A ................................ ................................ ................................ 220 N 2 Accuracy Skill B ................................ ................................ ................................ 220 N 3 Acc uracy Skill D/E ................................ ................................ ............................ 221 P 1 Skill A Comparing rating scale with proxy scale ................................ ................ 224 P 2 Skill B Comparing rating scale with proxy scale ................................ ................ 224 P 3 Skill D/E New rating scale vs. proxy supervision scale ................................ ..... 225
13 LIST OF FIGURES Figure page 2 ................................ ................................ .... 52 2 ................................ ................................ ......................... 53 2 3 level assessment ................................ .................... 5 3 2 4 SUCCESS rating scale ................................ ................................ .......................... 54 3 1 Number of video simulations for 40 preceptor raters ................................ .............. 78 3 2 CE course objectives ................................ ................................ ............................. 78 3 3 2 x2 student assessment tables ................................ ................................ .............. 78 B 1 Seven competency skills ................................ ................................ ..................... 180 E 1 Delphi panel endorsement criteria ................................ ................................ ....... 191 E 2 Data collection and analysis pathway ................................ ................................ .. 191 N 1 Skill A Comparing expert and preceptor panels ................................ ................ 221 N 2 Skill B Comparing expert and preceptor panels ................................ ................ 222 N 3 Skill D/E Comparing expert and preceptor panels ................................ .............. 222 P 1 Skill A Comparing rating scale with proxy scale ................................ ................ 225 P 2 Skill B Comparing rating scale with proxy scale ................................ ................ 226 P 3. Skill D/E New rating scale vs. proxy sup ervision scale ................................ .... 226
14 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TOWARDS AN OPERATIONAL DEFINITION OF PHARMACY CLINICAL COMPETENCY By Charles Douglas 2011 Chair: L. Douglas Ried Major: Pharmaceutical Outcome and Policy The scope of pharmacy practice and the training of future pharmacists have undergone a strategic shift over the last few decades. The pharmacy profession recognizes greater pharmacist involvement in patient care activities. Towards this strategic objective, pharmacy schools are training future pharmacists to meet these new clinical demands. Pharmacy students have clerkships called Advanced Pharmacy Practice Experiences (APPEs), and these clerkships account for 30% of the professional curriculum. APPEs provide the only opportunity for students to refine clinical skil ls under the guidance of an experienced pharmacist. Nationwide, schools of pharmacy need to evaluate whether students have successfully completed APPEs and are ready treat patients. Schools are left to their own devices to develop assessment programs that demonstrate to the public and regulatory agencies, students are clinically competent prior to graduation. There is no widely accepted method to evaluate whether these assessment programs actually discriminate between the competent and non-competent students. The central purpose of this study is to demonstrate a rigorous method to evaluate the validity and reliability of APPE assessment programs. The method introduced in this study is applicable to a
15 wide variety of assessment programs. To illustrate this me thod, the study evaluated new performance criteria with a novel rating scale. The study had two main p hases In the first phase, a Delphi panel was created to bring together expert opinions. Pharmacy schools nominated exceptional preceptors to join a Delphi panel. Delphi is a method to achieve agreement of complex issues among experts. The principal researcher recruited preceptors representing a variety of pr actice settings and geographical regions. The Delphi panel evaluated and refined the new performance criteria. In the second phase, the study produced a novel set of video vignettes that portrayed student performances based on recommendations of an expert panel. Pharmacy preceptors assess ed the performances with the new performance criteria. can be used to establish benchmark s for future comparisons. Findings from the first phase suggested preceptors held a unique perspective where APPE assessment s are based in relevan ce to clinical activities. The second phase analyzed assessment results from pharmacy preceptors who watched the video simulations. Reliability results were higher for non ra ndomized compared to randomized video simulations. Accuracy results showed preceptors more readily identified high and low student performances compared to average student s These results i ndicat ed the need for pharmacy preceptor training in performance as sessment. The study illustrated a rigorous method to evaluate the validity and reliability of APPE assessment instruments.
16 CHA PTER 1 INTRODUCTION 1.1 Background being. Advances in medicine have yielded a continually expanding body of therapies, many requiring complex administration and vigilant monitoring. 1 According a report by the Institute of Medicine (IOM), To Err Is Human: Building a Safer Health System 2 Because of the immense variety and complexity of medications now the pharmacist has become an essential resource in modern and thus access to his or her expertise must be possible at all times. The pharmacist is the most accessible health care provider and patients visit community pharmacies more than any other health care setting. 3 Pharmacy comprises the third largest health profession in the United States. 4, 5 Pharmacists are considered an essential component of the health care system in both institutional settings as well as local community practices. The public expects educational institutions to meet their social obligation to graduate competent profession als who can provide safe and effective health care. Rising concern over the competency of health care professionals has increased public pressure for accountability 6, 7 of educational institutions. This public outcry coincides with widely publicized reports o f drug safety problems 8 and the uneven delivery of quality health care across the nation. 9, 10 In response to growing public concern over the c ompetency of health care professionals, IOM report Health Professions Education: A Bridge to Quality 11 ions will not address US health care safety issues. Rather, the IOM is promoting a change toward competency
17 based education for all new health care graduates. Competency based education focuses on integrating competency into all facets of training and asse ssment. 12 Institutional leaders in the pharmacy profession have taken initiatives toward adoption of competency based education. The Accreditation Council for Pharmacy Education (ACPE) has been engaged in a decade long pr ocess of fundamental reform of the educational model and its goals. 13 15 Based on professional standards, the Center for the Advancement of Pharmaceutical Education (CAPE) Educational Outcomes r eport sets the standards for accreditation, guides pha rmacy education, and establishes competencies for graduating pharmacy students. The 2004 CAPE Educational Outcomes report 16 categorizes the practice of pharmacy into three broad dimensions, each containing a set of competencies: Pharmaceutical care Provide pharmaceutical care based upon sound therapeutic pri nciples and evidence based data Systems management Effectively ma nages and uses health care system resources Public health Promote improvement in health, wellness, and disease pr evention in the community The first dimension, pharmaceutical care, reflects the ongoing change in the scope of the pharmacy practice toward s patient centered drug therapy. According to the Commission to Implement Change in Pharmaceutical Education 13, 14 central objective of the educational mission. According to Zlatic 17 pharmacy education today is to design, implement, and assess curricula that integrate the general and professional abilities that will enable practitioners to be responsible for drug therapy outcomes and the well
18 determining if the student is sufficiently competent to enter the profession of pharmacy. 18 20 1.2 Statement of the Problem based reforms is graduation of competent practitioners who provide safe and effective health care to the public. The ACP E shares the goal of graduating competent pharmacists and each pharmacy program is required to assess student educational outcomes including advanced pharmacy practice experiences (APPE) performance. The 2008 American Association of Colleges of Pharmacy ( A ACP ) president 21 and the American College of Clinical Pharmacy (ACCP) Educational Affairs Committee have called for a standard APPE assessment instrument. 22 In order to meet this need, competency assessment needs to be applicable not only across practice settings but also across geographical regions. However, several issues must be addressed in order to realize valid and reliable assessment of student performance during APPE rotations. First, the ACPE does not provide any guidance for the validation of an APPE assessment. M any schools of pharmacy have developed proprietary APPE assessment instruments. Preceptors may assess students differently depending on which assessment instrument is used. This complicates any attempt by governing bodies to ensure the competency of gradua ting students across institutions. Developing guidelines for a rigorous process to establish the validity of a national standard assessment instrument is essential. Second, the CAPE Educational Outcomes report defined the competencies expected of an entry level practitioner However, does not describe the behavior a preceptor would observe of a competent or incompetent student. This opens the issue
19 of how to operationalize performance criteria usable by precep tors. Pharmacy preceptors should assess students based on a comparison of observed behavior with professional performance standards. 23 25 Findings suggests preceptors may simply app ly their professional judgment and this is a major reason for variation between standards. 26 27 Performance criteria based on professional standards are a central element of a valid assessment instrument and the pharmacy profession has not addressed this need Third, the ACPE does not provide an y guidance on acceptable preceptor inter rater reliability, nor does it define acceptable accuracy levels for an assessment instrument. Inter rater reliability is an integral aspect of a valid assessment instrument, which has little value if it produces in consistent results between preceptors. For example, a preceptor in one APPE rotation may assess students as being not competent when in fact they are competent. Then, a similar APPE rotation with a different preceptor may assess students competent when the y are not. Thus, a standard method to estimate preceptor accuracy and inter rater reliability needs to be established. Finally, implementing regional assessment instrument requires the burden of teaching different assessment programs to busy preceptors w ho are often volunteers. 28 T he American Physical Therapy Association (APTA) requires all students and preceptors to pass an examination demonstrating knowledge of the APTA assessment system. 29 Taking a page from the APTA, t he methods developed in this study to
20 estimate accuracy and inter rater reliability can be turned around and used to train and test whether individual preceptor s reliably and accurately assess APPE performances. 1.3 Purpose of Study In response to the need for a single national assessment instrument for advanced pharmacy practice experiences, this study will take initial steps toward developing a validation method that is applicable for a national APPE assessment instrument. Based on the CAPE Educational Outcomes report the SUCCESS instrument assesses student performance during APPE rotations 30 and defines thirteen broad competencies for assessment. SUCCESS, which is an acronym for Universal Clinical Competency is a continuing collaboration between al l each of the four Florida schools of pharmacy. A list of the thirteen broad competencies appears in Appendix A. The focus of this study is assessment of Drug Therapy Evaluation and Development competency, which is one of the thirteen competencies. This co mpetency is comprised of seven competency skill s and a complete description appears in Appendix B. Drug therapy is essential to pharmacy education and the pharmacy profession. T o illustrate the validation process the study will demonstrate steps to intro duce a multifactorial rating scale and other new performance criteria with the Florida assessment system. Th ese steps will provide evidence toward content validity and measure preceptor inter rater reliability and accuracy First, a panel will identify relevant competency skills and evaluate and refine the performance criteria Second, this study will develop methods to measure reliability and accuracy of clinical faculty assessments
21 1.4 Theoretical Frameworks 1.4.1 Validity (1995 ) 31, 32 unitary concept of construct validity, the Standards for Educational and Psychological Testing 33 (Standards) describe validity as support the interpretations of test scores 33 categorized five sources of validity evidence: 1) Content, 2) Response Process, 3) Internal Structure, 4) Relation to Other Variables, and 5) Consequences. These different categories of evidence are interdepende nt and may be complementary. A content validation study is the first essential step in accumulating evidence of validity and represents a link between the hypothetical construct of competency and measurable indicators. Subject matter experts (SMEs) judge content validity They evaluate the degree to which elements of an assessment instrument are relevant and representative of the hypothetical construct for a particular assessment purpose. 34, 35 Findings suggest appropriate selection of experts to participate in the content validation st udy contributes to the strength of the validity evidence. For the purpose of this study r 34, 36, 37 and r epresentativeness is d escribe d as 32, 34, 35, 38 Messick (1996 ) 32 describes two major threats to validity: construct irrelevant variance and construct underrepresentation. When descriptions are too narrow, the assessment may under represent the construct of interes t. On the other hand, when descriptions ate too broad, the assessment may introduce construct irrelevant variance.
22 The principal researcher will address issues of construct variance and construct underrepresentation. 1.4.2 Reliability Reliability is integral to judging validity since the reliability of assessment scores is a factor in the interpretation of assessment scores. Reliability is necessary but not sufficient evidence for a claim of validity. There are different types of reliab ility measures important in assessment measurement. This study is primarily interested in estimating the inter rater reliability of preceptor assessment scores. Inter rater reliability refers to the degree to which assessment scores are consistent and pro vide clear information to differentiate between individuals for the purpose of the assessment. 39 Acceptable levels of reliability vary according to the objectives of the assessment. Estimates of inter rater reliability are dependent on the population of preceptors, the population of students, the instrument, the number of times st udents are rated, 39 and the sample size. 40 R eliability is associated with variance observed in assessment scores. Based on classical test theory, variability in a due to random and syste matic error. The sources of variance in assessment scores can separated into the variance attributed to students, preceptors, and random error individually. The proportion of variation attributed to preceptors is used to quantify the inter rater reliabilit y of assessments. 1.5 Research Questions Research Question 1: What is the degree of agreement regarding content validity Development competency?
23 Research Question 2: Can precept ors use the modified SUCCESS instrument to make reliable assessments with in Drug Therapy Evaluation and Development competency? Research Question 3: Can preceptors use the modified SUCCESS instrument to make accurate assessments with Drug Therapy Evaluati on and Development competency? 1.6 Significance of Problem Literature suggests more time and resources have been spent on creating competency statements for various health care professions than have been allocated to developing a sound scientific foundatio n for determining the validity of assessment instruments. 41, 42 This study will address the question: Are we actually measuring wi th the professional standards of performance for a given competency? The outcomes of the Delphi panel will address the content validity of the operationalized performance measures used by preceptors. The methods used in this study will demonstrate a scient ifically sound process to conduct content a valid ity stud y and establish a standard for the content validation of other pharmacy competencies. APPEs account for at least 30% of the professional curriculum. 28, 43, 44 The APPEs provide the only opportunity for students to demonstrate their competency in a clinical setting, under the guidance of an experienced preceptor pharmacist. This comprises a ity for authentic assessment. This study will establish a standard method to estimate preceptor accuracy and inter rater reliability for this critical training period. The degree of reliability and accuracy are of critical importance, since they have an ef fect on the interpretation of the assessment scores. The measurement of inter rater reliability and accuracy from this
24 study may be used to create a benchmark for comparison and contribute to preceptor training and qualification standards nationwide. Accor ding to the Commission to Implement Change in Pharmaceutical Education, 13 commitment to both society and the individual patient is ensuring the fitness to practice of new graduates. 45 The current fragmented and ad hoc assessment strategy does not a dequately demonstrate to the public whether pharmacy graduates are actually fit for practice. An assessment instrument supported by strong evidence of validity will be a significant advancement towards the realization of the competency based educational go als of the pharmacy profession. This study addresses essential elements related to
25 CHAPTER 2 LIT ERATURE REVIEW 2.1 Competency Assessing fitness for practice helps maintain professional standards and a key requisite of patient safety is ensuring fitness for practice of all health professionals. 45 The public demands safe and effective healthcare from all pharmacists, including new graduates. According to the Commission to Implement Change in Pharmaceutical 13 Eraut (1994) 46 proposes that a profession define itself through the development and assessment of entry level compet ency. Governing institutions have the responsibility to establish national professional standards for entry into the profession. The Accreditation Council for Pharmacy Education (ACPE) is the institution responsible for accrediting pharmacy schools in the United States. The ACPE Board of Directors is comprised of major pharmacy organizations such as the AACP, the American Pharmacists Association (APhA), and the National Association of Boards of Pharmacy (NABP). In addition, the American Council on Educatio n (ACE) has a pharmacy representative on their board of directors. Through the ACPE, the pharmacy profession establishes the professional standards for all pharmacy educational programs. Sponsored by the AACP, the CAPE educational outcomes have been incorp 44 The CAPE competencies represent professional standards and guide pharmacy education, establish competencies for new graduates, and represent an important
26 Assessing fitness for award aims to differentiate high performing students and s educational institution or individual academic achievement. High quality programs are expected to graduate individuals who progress into leadership roles in the profession. Governing and educational institutions need to measure training programs in order to identify high performing students and high quality programs. In addition, measurement of educational outcomes help educational institutions understand the learning and teaching process. 47 There are two key reasons for assessing clinical competency prior to graduation. One, is fitness for practice and the other is fitness for award. 47 Assessing fitness for practice helps maintain professional standards and patient confidence. Assessing fitness for award helps identify high performing students and high quality programs. 2.1.1 Defining Competency Clinical co mpetency is a hypothetical construct that is useful in conceptualizing how well an individual performs in the clinical setting. 24, 48 51 Gonczi 23 proposes three perspectives for conceptualizing competence: technical, underlying, and a combination of both. The technical competence perspective focuses on detailed and measurable performance tasks. Using this perspective, an assessment measures if the individual is capable of completing specific tasks related to their profession. The underlying traits perspective conceptualizes a set of underlying traits that infers the capacity of competency. Competency is inferred to be present from observation of stu dent behavior. These underlying traits are separate from any individual technical tasks. The final perspective combines the previous two and conceptualizes competency as an amalgam of both technical skills and underlying traits.
27 According to Eraut (1994) 46 the ability to exercise a specific set of technical skills is based in a tradition of the skilled trades. However, this technical competency perspective has been criticized as being too narrow and attempts t of a given profession. 46, 50 Decomposing pharmacy competencies into an exhaustive list of specific skills does not address the holistic cha racter of a competent pharmacist since these skills are not discrete or independent, and typically several of these technical tasks are performed simultaneously. 24 Technical skills alone do not capture the holistic richness of professional practice. 23, 49, 50 How ever, specific technical skills are easier to define and assess than underlying traits. 52 Erau t (1994) 46 argues the perspective of conceptualizing competence as a set of underlying traits is a development of management research and this perspective focuses on the ability of an individual to perform in the workplace. Grounded in this perspective, affective qualities such as initiative and critical thinking are viewed as setting. 46 However, there are technical skills required of any competent pharmacist and these technical skills need to be considered along with underlying traits in a definition of competency. Thus, competency can be conceptualized as a combination of spec ific technical skills and underlying traits. 23 Eraut (1994 ) 46 proposes two dimensions of competence for professionals. The first dimension is, conceptualizes competence on a continuous scale ranging from novice to expert. A newly graduated pharmacist should be autonomous and competent within their scope of
28 practi profession. 2.1.2 Competency and Performance (1990) 53 widely cited model of clinical performa nce (Figure 2 1) implies there is a step by step progression towards competent clinical performance and is er illustrates setting. Miller suggests knowledge and skills are the foundation for competency, however competent performance emerges in the presence of a patient. 53 Literature suggests there is a distinction between competency and competent performance and proposes that competency is a necessary but insufficient requirement for competent performance in a clinical setting. 18, 20 Competence has been described as how individuals perform under ideal conditions and knowing that they are being assessed if they have the requisite knowledge, skills and attitudes 18 On the other han d, performance describes behavior in the clinical setting 18, 54 Literature suggests that assessment should focus on performance and not competence. 18, 20 This proposition is consistent with the two main rationales for competency assessment: fitness for prac tice and fitness for award. However, this does not address the challenge of describing competent performance appropriate for entry level pharmacists. 2.1.3 Describing Competent Performance Miller (2000 ) 55 suggests that a complete description of competent performance n eeds to realistic, general, and representative of good practice. This study assumes that
29 professional practice standards are representative of good practice. The operationalization of the performance criteria and rating scales for assessment is the process of writing descriptions of observable behaviors indicative of competent performance. Performance is conceptualized in terms of knowledge, skills, and attributes in the context of realistic professional tasks. 24 used classification system that describes educational objectives in terms of knowledge, skills, and attitudes, and is not r elated to any specific learning theory. 56 Comparison with other taxonomies is not in the scope of this study. Three domains occupy the first level of this hierarchical system: cognitive, affective, and psychomotor. The cognitive domain refers to the intellectual aspect of performance and includes knowledge, critical thinking skills, and recall of facts, procedures, and concepts. It has six major categories: knowledge, comprehension, application, analysis, synthesis, and evaluation. Each category in the cognitive domain operates in a developmental progression. 56, 57 The affective domain refers to the emotional and attitudinal aspects of competent performance and includes feelings, values, motivations, and attitudes. There are five major categories which also operate in a developmental progression: receiving phenomen a, responding to phenomena, valuing, organizing, and internalizing values. 56, 57 Third, the psychomotor domain refers to physical skills of competent performance. Psychomotor aspects include physical movement, coordination, and use of motor sk ills. These skills are measured according to speed, precision, and techniques in execution. 56 58 The CAPE Educational Outcomes report identifies competencies that encompass the scope of pharmacy practice. These CAPE outcomes are consistent with 46
30 competence is expected. The objective of the operationalized performance criteria for each competency is to capture the h olistic richness of professional practice. 23, 49 This is ) 59 is more than a demonstration of iso Michael Kane (1992) 60 defined clinical competence as the degree to which an individual can use the knowledge, skills, and judgment associated with the profession to perform effectively in the domain of possible encounters defining the scope of professional practice. Assessments of student performance in APPEs are grou nds to generalize professional competence upon graduation. 2.1.4 Competency Summary profession has established competency standards for entry into the profession. APPE assessment s are responsible for discrimination between students who are competent and students, who are not competent as well as identifying high performing students and programs. APPE rotations are an important part of the training future pharmacists. Eraut 46 proposed two dimensions of competency. The first dimension is CAPE guidelines define the scope of pharmacy practice. However, the pharmacy profession does not provide guidance addressing the important issue of describing the observable student behaviors that are indicative of a competent entry level pharmacist.
31 2.2 Assessment 2.2.1 Assessment Function similar meanings in everyday language. It is important to assign specific definitions to these terms for this study. Measurement is the process of empirically observing and assigning numbers to an observed at tribute, characteristic, or phenomenon according to established rules. Here, the instrument is the examination or test used to collect information. Evaluation is a judgment on the information collected. 61 Assessment integrates the measurement, examination, and judgment processes. Specifically, assessment scores provide information about the student, the preceptor, and the program. 126.96.36.199 Assessment Purpose Assessments are either formative or summative. Each has a different perspective and generates different outcomes. 62 established and consiste nt with the overall aim of the APPE program. Formative assessment aims to support the student in the learning process. 63 compared to predetermined educational goals. The feedback should identify areas where the s tudent excels and those that require remediation. Formative assessments are used when the educational aim is to further skill development. 63 competence. Summative outcomes address the presence or absence of competent perform ance. This function is consistent with the gate keeping responsibility of APPE
32 assessments. Summative assessments are consistent with the purpose of assessment for fitness to practice and fitness for award. Formative assessments focus mainly on teaching re sponsibilities, while summative assessments deal primarily with gate keeping. Findings suggest adjusting to information and suggests mixing formative and summative functions withi n a single assessment process may create conflicts. 26 Preceptors may experience a conflict between the relationships developed with students through teaching and th eir responsibilities for providing a definitive assessment of performance. 47, 64 Students may become overly focused on demonstrating performance and, as a result, overlook learning opportunities. 26, 63 188.8.131.52 Assessment Reference Interpretation of assessment scores varies depending on whether the assessment is norm referenced or criteria to be clearly stated and consistent with the overall aim of the APPE program. No rm average score of a reference group. Interpretation of norm referenced assessments scores will inform how well the student performed compared to the reference group. This approach may help motivate students to assess their strengths and weaknesses from feedback in the norm referenced assessment. This approach may help students develop the capacity for self assessment. However, norm referenced assessments will not provide information whe ther students have mastered a particular competency. Criteria external criteria. This external reference may be comprised of descriptions of competent
33 performance. Criteria referenced assessm ents may be used to select students for advanced educational programs and to predict future performance. Sax 65 proposes that criteria referenced assessments can determine whether a stude nt has demonstrated competent performance. The purpose of the APPE assessment is to ascertain fitness for practice and fitness for award. This purpose suggests that criteria referenced assessments are appropriate for APPEs. 2.2.2 Assessment Structure 2.2 .2.1 Checklists Findings suggest that checklists improve the reliability of assessments. The reliability of medical and nursing preceptor scores who used performance criterion improved compared to those who used an open ended evaluation form. 66 69 A s ubstantial portion of errors by medical students in the clinical setting are errors of omission and the use of performance criteria may help preceptors identify omissions 70 70 Ho lmboe (2008 ) 71 argues performance criterion help preceptors frame the observation of student behaviors. Checklists used in this st udy list the activities that a competent pharmacist would be able to complete for a given competency and expert preceptors will evaluate and refine these checklists. 184.108.40.206 Rating Scales Norman (2005 ) 72 cautions that checklists are not a replacement for rating scales in assessment instruments for medical students. Norman argues that checklist scores do not directly assess c
34 reported in the literature including visual analog scales (VAS), behavioral observation scales (BOS), global scale s, and behaviorally anchored scales (BAS). 73 level of performance. Descriptions may be included along the VAS line. 73 Using BOS, preceptors would use the simple behavior descriptions contained in the scale and rate the frequency of the behavior on a Likert type scale. 7 3 With global scales, the preceptor rates the degree to which a student demonstrates a general characteristic. 74 BAS scales have anchors, which are behavioral descriptions for the preceptor to observe. A behavioral description is available for each performance level for a particular competency and are usually developed through a consensus process. 73 Literature discusses several advantages of BAS scales as compared to oth er assessment scales. The BAS performance criteria are descriptive, specific to the position, and introduce less construct irrelevant variance. 74 76 Kingstrom and colleagues (1980 ) 75 concluded BAS scales do not have superior validity compared to other ratin g scale formats. However, Gomez Mejia (1988 ) 74 proposes that the lack of difference observed between other scale formats and the BAS scale in the Kingstrom et al. study may be due to the use of overly complicated BAS performance criteria. The resulting preceptor burden may have resulted in less discriminating assessments. Gomez explanation is consistent with research related to the asse ssment of student nurses in the clinical setting. Findings from two Bondy studies (1983, 1984 ) 66, 67 of nursing assessment suggest that clarity of BAS behavior definitions is a significant source of error. 66, 67
35 Literature also suggests that preceptors may use the BAS performanc e criteria as a simple checklist and not as the criteria referenced assessment intended when presented with low quality descriptors. 76, 77 There is evidence that when the BAS performance criteria are c hanged to observed qualities of performance, rather than a performance criteria or simple checklist, the validity of assessments increases. 77 The advantage of BAS may result from how the behavior descriptors provide preceptors with a clearer understanding of the rating task and what behaviors to observe. Wolf (1995 ) 77 showed behavior descriptions were interpreted variably unless supported by illustrations of what they actually look lik e in practice. (1970) 73 seminal article reviewed performance assessment literature and argue that there is ample evidence that BAS has an advantages in validity and reliability compared to numerical scales or adjective phrases. Landy and Farr suggest that, in addition t o rigorously developed anchor descriptions, effective performance scales may also reflect participation by representative of those who will actually use the assessment scales. 220.127.116.11 Rating Levels One of the observations established by the l iterature is that too few or too many categories can negatively affects scale reliability. 73, 78 Bending (1954 ) 79, 80 investigated this problem and indicated a decrease in reliability with less than three or more than seven categories. Considering the effects on both scale reliability and rater reliability, Bending argues that there is no gain in efficiency when the number of categories is increased from five to nine. 73, 79, 80 Evidence suggests that preceptors avoid rating students on either extreme of a rating scale. Due to end aversion bias, researchers 39 argue that a 5 point scales might
36 actually be used as a 3 point scale in pr actice. There is a belief that loss of response categories tends to decrease both efficiency and reliability. 39 The purpose of the APPE assessment is to ascertain fitness for practice and fitness for award. Assessments for fitness for practice necessitate a scale of at least two categories. These two categories wo uld differentiate between students who are competent and students who are not competent. Likewise, assessments of fitness for award necessitate a scale of at least two categories for competent students, which would differentiate between high performing stu dents and other competent students. Addressing both fitness for practice and fitness for award implies performance assessments need to differentiate at least three levels of performance: non competent, competent, and high performing. This suggests a scale of at least three but no more than five categories. 2.2.3 Performance Domain Literature suggests common themes across the medical, veterinary, occupational therapy, and physical therapy healthcare professions for identifying attributes that qualify studen ts for entry level practice and the preceptor decision making process according to Jette et al. (2007 ) 81 Preceptors approached assessment i n a holistic manner and without prioritizing any particular student attribute. Jette et al. (2007) 81 propose s that the preceptor decision pro cess is multifactorial. Preceptors synthesize a number of student attributes and decide whether a student has demonstrated entry level performance. Jette et al. commented that the student attributes revealed in their study of the physical therapy precepto r decision making process were similar to the American
37 several of the participants used for student assessment. The CPI is the product of a national effort by the APTA and ha s been in use since 1997. Following a major update in 2006, 152 out of 212 physical therapy schools have adopted the CPI. 29 The CPI conta ins 18 competencies each containing multiple skill statements. Each competency is rated relative to entry level practice. The CPI used BAS with six performance levels: Beginning Performance, Advanced Beginner Performance, Intermediate Performance, Advanced Intermediate Performance, Entry level Performance, and Beyond Entry level Performance. The CPI multidimensional scale is based on unpublished analysis 29 of student data, suggesting that performance is comprised of five dimensions ( Figure 2 2 ). Scores reflect percentages (%) of necessary behavior students are capable of performing. These scores determine on which of the six performance levels the student should be assessed ( Figure 2 3 ). If a student does not complete all of the dimensions for a particular performance level (e.g. Entry level), then the student will not be assessed at that level. Unpublished findings from the APTA 29 suggest that the psychometric properties of this multidimensiona l scale were strong. Preceptors were able to discriminate among all six performance levels. There was n o evidence of clustering or ceiling effects. 2.2.4 Role of the Preceptor Pharmacy preceptors are the practicing professionals who assess student perform gatekeepers. Pharmacy preceptors are expected to assess students based on a
38 comparison of observed behavior with professional performance standards as outlined in the CAPE Educational Ou tcomes report. 23 25 According to the literature, preceptors may apply personal standards to their evaluations of individual students, and this is a major reason for variation between preceptors since assessments are not grounded in professional standards. 26, 27 The Jette et al. (200 7 ) 81 study found that preceptors commonly use an intuitive decision represent their cognitive integration of those characteristics into a decision about the 81 Cross and Hicks (1997 ) 82 conclude that preceptors commonly use implicit criteria in the decision making process. Preceptors would ask making framework rather than use clinically based objective measures. Findings in a study by Alexander et al. (1997) 83 desirable characteristics of an entry level practitioner. However, this decision making framework is affected by impressions of previous students and a personal perception of what constitutes an entry level practitioner. Preceptor experience positively affects quality of performance ratings and those preceptors who are judged as better clinicians are better at rating the job p erformance of others. 73 In addi tion, findings suggest preceptors with little experience or substandard clinical skills have greater idiosyncratic assessment scores that increase score variation. 71, 84 Findings showed different student assessments between nursing educators and nursing clinical preceptors. 66, 85 This study suggests that the educator s
39 who often develop assessment instruments have different values and conceptualization of competency compared to pra cticing clinicians. 2.2.5 Assessment Summary Assessment integrates the measurement, examination, and judgment processes. Researchers have concluded that APPE assessments should be criteria referenced rather than norm referenced 7 The assessment process should provide a summativ e judgment of whether a student qualifies for entry level practice. 7 There are a number of professional attributes to evaluate in the assessment process and are consistent with the five performance domains used in the CPI instrument. The five performance domains used in the CPI are based on analysis of physical therapy students. Some of these attributes may be a value in APPE assessment and should be reviewed by pharmacy experts. Preceptors bring considerable experience and their participation is crucial to the assessment proces s. The assessment process is multifactorial and preceptors are faced with the difficult task of synthesizing a number of student behaviors. The assessment instrument should assist the preceptors in conducting reliable assessments based on professional stan dards. The structure of the instrument affects the accuracy and reliability of assessments. Findings suggest the use of checklists improves assessment reliability; however, checklists are not a replacement for rating scales. APPE rating scales are most r eliable when they range from three to five levels. There appears to be advantages of BAS over other scale formats. 73, 86 Expert clinician preceptors should be recruited to develop the BAS behavior anchors. Literature also suggests that student participation is beneficial. Behavior descriptions need to be clearly written and unambiguous. 35 The development
40 of a glossary of term s may also promote a more uniform understanding of the behavior descriptions used in the instrument, 87 89 such that these different items help reduce variation due to differing interpretations of behavior descriptions. 12, 90 Assessment variation due to the preceptor rater effects decrease inter rater reliability and accuracy results 2.3 Validity 2.3.1 A Defensible Assessment Assessments in healthcare education require evidence of validity to possess meaningfu l interpretations. 91 Assessment instruments unto them selves are not strictly valid or invalid; rather, the assessment scores have more or less evidence to support a specific interpretation, such as passing or failing an APPE competency. The collection of evidence in support of validity must form a structured and coherent argument based on accepted standards of test measurement. 18, 92 (1995 ) 31, 32 unitary concept of construct validity, the Standards for Edu cational and Psychological Testing 33 (Standards) des cribe validity as 33 categorized five sources of validity evidence: 1) Content, 2) Response Process, 3) Internal Structure, 4) Relation to Other Variables, and 5) Consequence s. These different categories of evidence are interdependent and may be complementary. Briefly stated, the categories of evidence are: Content :
41 Response Process strategies, and thought processes of individual respondents or observers. Differences in response processes may reveal sources of variance that 93 Internal Structure 93 Relations to Other Variables and 93 Consequences The evaluation of unintended effects that can support or challenge the validity of score interpretations. 93 In addition, Messick (1996) 32 describes two major threats to validity, construct irrelevant variance and construct underrepresentation. When assessments are too narrow, the assessment may under represent the construct of interest. O n the other hand, when assessments are too broad, the assessment may introduce construct irrelevant variance. These should be taken into consideration in developing a case for validity. Evidence from multiple sources are required to support or challenge a case for validity. 91 2.3.2 Content Validity A content validation study is the first essential step in accumulating evidence of validity and is a major focus of this study. Evidence supporting content validity is not grounded in empirical data but rather the degree to which these scores can be assumed to tap the targeted construct. 70 An important purpose of a content validation study is to
42 minimize potential error variance due to poorly defined constructs of clinical competency or poorly operationalized student behaviors. Several descriptions of content validi ty are available in literature and hold the same essential meaning. 33, 35, 94, 95 Haynes et al. (1995) 34 provides the following instrument are relevant to and representative of the targeted construct for a particular assessment purpose. evaluation for relevancy and representativeness. The relevance of an assessment instrument refers to the appropriateness of its elements for the targeted construct and function of assessment. 34, 94, 96, 97 32, 34, 35, 38 This study focuses on important element s associated with assessment of Drug Therapy and Development competency within the SUCCESS instrument (Appendix B). Haynes et al. (1995) 34 points out that within the definition of content validity that ively 97 99 A panel of preceptors who are subject matter experts (SMEs) in a particular area would evaluate a specific part of an assessment instrument related to threshold the item is accepted. In failing to meet this threshold, the expert panel may refine the item for reevaluation or remove it from consideration entirely. The appropriate selection of experts to participate in the content validation study contributes to th e strength of the validity evidence. The panel should be comprised of recognized SMEs from different practice settings and geographical regions in order to provide confidence
43 that the assessment scores and interpretations can be generalized across practice settings and geographical regions. The tasks of the expert panel can be broken down into two parts: 1) domain specification and 2) evaluation of the behavior domains. Literature suggests construct specification is the first task in a content validation s tudy. 34, 100 103 The targeted competency should be unambiguously defined. The expert panel will evaluate whether all seven skill statements are relevant and represent all the important facets of the Drug Therapy and Development competency in the SUCCESS instrument Guion (1977) 97 t the panel will understand the measurement issues and recognize when operationalized performance indicators are inside or outside those boundaries. Haynes, et al. (19 95 ) 34 affirms the importance of this step and proposes, "a construct that is poorly defined, undifferentiated, and im precisely partitioned will limit the content validity of the assessment instrument." Two key objectives for the expert panel are the evaluation of behavior domains within the instrument for relevancy and representativeness. The panel needs to assess the re levancy of the scores for the student behaviors described in the instrument for targeted skill statements. Are all the scores from the student behaviors described in the instrument necessary to the targeted skill statement? On the other hand, are the score s for the student behaviors described in the instrument comprehensive and account for the necessary of knowledge, skills, and personal attributes for the targeted skill statement?
44 Guion (1977) 97 behavior observed in the measurement procedure represents the whole class of behavior that falls wi assessment entails a preceptor observing samples of student behavior described in an instrument. A representative sample of student behaviors should contain the essential characteristics of t he universe of possible behaviors, in their proper proportion and balance. 104 Lennon (19 56 ) 104 argues we are really after is test score variance which is attributable to the same sources as the variance in the performance criteria. The matching of specificity appears to be a key aspect in evaluating representativeness. 102 2.3.3 Reliability Reliability is an integral aspect of validity evidence since the reliability of assessment scores is a factor in the interpretation of assessment scores. There are different types of reliability measures that are important in assessment measurement. Downing (2005) 105 argues that inter rater reliability is an essential component of validity evidence for all assessments using raters. 105 This study will estima te preceptor inter rater reliability. Inter rater reliability refers to the degree to which assessment scores are consistent and provide clear information useful for differentiating between individual students for the purpose of the assessment. 39 Literature suggests that measures of inter rater reliability in the clinical setting are often low and estimates of .80 or above are rarely achieved. 106 Acceptable levels of reliability vary according to the objectives of the assessment and the objective of the study is to establi sh benchmarks for comparison. APPE assessment needs to discriminate between students who are competent and students who are not competent students, as well as discriminate between high
45 achieving students from the remaining student cohort. APPE assessments are based on an external criteria and the numerical differences among the preceptor scores are important in comparing inter rater reliability benchmarks. 39 the variability conceptualized as the average of all the observed student scores if there were a infinite number of observations and scoring was error free. 39 Error is commonly categorized into two groups, random error and systematic error. 107 Random error is measurement error and may be due to inattention or tiredness. This error may equally increase or decrease the score. Random error is conceptualized such that these score increases and decreases over an infinite number of observations are s elf canceling. 107 Systematic error is one type of construct irrelevant variance. One source of may display excess reliable variance and is associated with measuring constructs other than the construct targeted by the assessment. 108 Preceptors may also introduce systemic error in the form of leniency/severity error, halo error, central te ndency error, or other forms of bias. 71 The different sources of variance in the assessment scores may be separated into the variance attributed to students, preceptors, and random error. The ratio of the variation attributed to preceptors to the total variation is used to quantify the inter rater reliability of preceptor assessments. Estimates of inter rater reliability are dependent on the populatio n of preceptors, the population of students, the assessment instrument, and the sample size. 39, 40
46 2.3.4 Accuracy A study by Noel et al. (1992) 68 measured the accuracy of student assessments by medical faculty (n=203). All participants viewed two clinical evaluation (CEX) case simulations on tape. Using a validated scoring system, overall accurac y was calculated at 32% for the group using the open ended evaluation form and 60% for the group using the structured evaluation form. When all participants were ranked by accuracy scores, 19 out of the top 20 most accurate participants were from the group that used the structured evaluation form. For the 20 least accurate participants, 15 participants used the open ended form and 5 participants used the structured evaluation form. Over half of all the participants rated the student performances in the two scenarios as satisfactory or superior. In both case simulations, however, enough errors were purposely included such that all participants should have rated the student performances as less than satisfactory. Thus, accuracy was less than desirable even wit h the improved inter rater reliability measures for the group using a structured evaluation form. This study illustrates two points. First, a higher inter rater reliability measure (in this case, one group compared to the other) by itself does not support a case for validity. Second, this study demonstrates the use of videotaped simulations as a sound method to measure preceptor inter rater reliability and accuracy. 2.3.5 Validity Summary (199 8 ) 36 review of the development of content validity the author approaches to instrument validation were insufficient for supporting the use of a test for a content validation is also insufficient. Supporting a case for validity, an assessment
47 process needs a balanced of evidence and should include content validity data and em pirical data. Reasonably high preceptor inter rater reliability estimates are required to support the meaningful interpretation of ratings. However, high reliability measurements alone do not necessarily provide evidence for an equally high level of validi ty. Reliability is necessary but does not provide sufficient evidence to claim validity, and one reason is that even preceptors who agree may be completely wrong. 105 The use of video simulations is generally accepted as an effective and economical method to evaluat e assessment strategies. 109, 1 10 Videotapes may be supplemented with simulated physician order s, laboratory reports, and other documents. The objective is to simulate an authentic encounter with a student during APPE rotations and provide the types of evidence a preceptor needs to make a valid assessment. The aim is to generate evidence supporting validity that is congruent with the APPE environment. Healthcare education experts argue the need for reliable and valid student assessments. 111 Assessment instruments need to be defended with rigorous scientific methods, which demonstrate that students are capable of performing in rea l world clinical settings. 112 114 Despite this widely held position, the pharmacy profession has yet to establish a national policy outlining acceptable validation criteria for APPE assessment instruments. Establishing a sound process for validity that is generalizable across practice set tings and geographical regions will advance the practice of evidence based education in the pharmacy profession. 2.5 SUCCESS Overview SUCCESS is the result of the collaboration among all of the Florida schools of pharmacy. In 2001, representatives from Uni versity of Florida, Florida A&M, and Nova
48 Southeastern University (there were three Florida pharmacy schools in 2001) attended the AACP spring training institute to jointly develop ed a Web based APPE assessment instrument. SUCCESS is an acronym for the Sys tem of Universal Clinical Competency Evaluation in the Sunshine State. The ASHP residency evaluation system served as a model for the development of SUCCESS. Beta testing was initiated in 2004 by preceptors from two of the four Florida schools. 115 Based on the CAPE Educational Outcomes guidelines, the Florida schools developed a set of 13 competency statements that reflect an over arching category of knowledge, skills, attitudes, and behaviors that qualify a student for entry into the profession (Appendix A). Within each of the thirteen competency statements are a number of skills. There are 96 skill statements in the system. An assessment scale with behavior anchors for three levels of performance are describ ed for each skill. The scale assesses for one dimension of behavior: the level of supervision relative to an entry level pharmacist. 115 The rating scale is i llustrated in Figure 2 4 Preceptors select from a mong the list of skill statements they expect the students to encounter during the APPE rotation. Preceptors score student performances for each skill statement based on the behavior description contained in the BAS scale. Preceptors score students at one of three levels of performance: deficient, competent, or excellent. Preceptor grades for the rotation are for comparison. The SUCCESS system assigns the students letter grade for the clerkship rotation. In addition, for safety concerns, certain skills were indicate the potential need for remediation. 115
49 Since its implementation, the SUCCESS instrument has been continuously used by all the Florida schools of pharmacy. In addition, the instrument documented University of Florida student experiences in the United Kingdom, Spain Ecuador, and Mexico. Findings from the Ried et al. ( 2007 ) 30 study suggest that the letter grades for the clerkship rotation provide d by the preceptor correlated with the SUCCESS system assigned grade. In addition, students incrementally achieved higher competency scores as they progressed through their scheduled rotations. 2.6 Research Questions This literature review has drawn togeth er information regarding the nature of competence and assessment. Wolfe and Gitomer ( 2001 ) 77 remind us that the principles for the design of performance assessments are very much in their infancy. Healthcare education experts argue the need for reliable and valid student assessments. 111 However, the pharmacy profession has yet to establish a national policy outlining validatio n criteria for APPE instruments. Evidence supporting or challenging assessment instruments need to be grounded in rigorous scientific methods. The aim of this study is advance evidence based education by establishing a sound process for validation of a nat ional APPE assessment instrument. CAPE Education Outcomes guidelines outline competency standards for entry into the profession. Preceptors assess student performance compa red to the operationalized performance criteria from the instrument. However, the CAPE guideline does not describe the operationalized performance criteria that predict competent entry level practice. The instrument should assist preceptors in conducting r eliable and valid
50 summative assessments as to whether a student is qualified for entry into the pharmacy profession. Assessment of student APPE performance in drug therapy is the focus of this study. The Drug Therapy Evaluation and Development competency in SUCCESS is an essential aspect of both pharmacy education and the pharmacy profession. Literature suggests that a valid national APPE assessment instrument should incorporate certain key features and processes. 2.6.1 Research Question 1 SUCCESS assessment instrument has been successfully used in the state for over four years. However, there is need to evaluate use beyond the state of Florida. This study will facilitate an expert panel to evaluate and refine the operationalized performanc e criteria for relevance and representativeness based on professional standards that cross practice settings and national regions. Findings suggest that the addition of a checklist improves the reliability of assessment among preceptors. An expert panel wi ll evaluate the process of incorporating a checklist into the existing performance criteria. Currently, SUCCESS employs a one dimensional scale and student performances are assessed by the degree of supervision required compared to that of an entry level p ractitioner. Findings suggest addition performance dimensions may be indicative of competent entry level practice. The expert panel will evaluate incorporating multiple dimensions into the existing scale. Findings suggest variation in scores attributable t o differing preceptor interpretations of student performance descriptions. Literature suggests that the development of a glossary will promote a uniformed understanding among preceptors. 87 89 The expert panel will
51 develop a glossary that will contribute to uniformed interpretation of the perform ance criteria. These processes will contribute evidence to support content validity. The process will refine the operationalized performance criteria that may be generalizable across practice settings and national regions. This aspect of the study will es tablish guidelines for content validity studies of a national APPE assessment instrument. Research Question 1: What is the degree of agreement regarding Drug Therapy Evaluation and Develop ment competency? 2.6.2 Research Questions 2 and 3 Second, literature suggests that measurement of inter rater reliability is an essential component of validity evidence 105 and measurement of accuracy would contribute evidence of validity. There are no publish ed measures of preceptor inter rater reliability or accuracy in the body of pharmacy literature. This study will include an estimate of preceptor inter rater reliability and accuracy using video simulations. Video simulations are generally accepted as an e ffective and economical method to evaluate assessment instruments 109, 110 The aim is to genera te evidence supporting validity that is congruent with the APPE environment. This aspect of the study will establish guidelines for measuring preceptor inter rater reliability and accuracy. These measures would contribute validity evidence for a national A PPE assessment instrument. Research Question 2: Can preceptors use the modified SUCCESS instrument to make reliable assessments with Drug Therapy Evaluation and Development competency?
52 Research Question 3: Can preceptors use the modified SUCCESS instrument to make accurate assessments with Drug Therapy Evaluation and Development competency? 2.7 Literature Review Summary This study will accumulate evidence to support the inferences for competency in Drug Therapy Evaluation and Development of student APPE per formance with the SUCCESS. Research questions that address research gaps have been presented. An expert panel will provide content validity evidence that will represent various practice settings and geographical regions. The analysis of preceptor scores wi ll demonstrate whether SUCCESS can produce reliable and valid assessments, and whether the operationalized performance criteria are readily usable. Figure 2
53 Figure 2 Figure 2 3. level assessment
54 COMPETENCY OVERALL KNOWLEDGE, SKILLS AND ATTITUDES Skill statement What should the student know? What should the student do? What does the student value? Excellent Practices at a level above an entry level pharmacist into the profession (an excellent student). Competent Practices at the level of an entry level pharmacist in the profession (an average student). Deficient Cannot practice at the entry level. Figu re 2 4. SUCCESS rating scale Adapted from the SUCCESS web site (2009)
55 CHAPTER 3 METHODS 3.0 Methods Introduction This chapter will address the motivations for the research questions, and will specifically, describes the participants, instrument develo pment, data collection, and analysis process. There are two phases associated with this study. Each phase addresses distinct research questions and samples different populations. Both phases include a pilot study. 3.1 Phase I Methods Phase I has a descrip tive purpose with a survey design. A non probabilistic, purposeful sample of expert preceptors was recruited. The purpose was to collect evidence to support content validity for the modified Drug Therapy and Development competency within the modified SUCCE SS instrument. 3.1.1 Study Participants Literature suggests a Delphi panel of professionals requires the participation of 15 30 individuals. 116 121 Directors of experiential programs from ten schools of pharmacy will be asked to nominate ten preceptors with recognized expertise in drug therapy who also have experience as pharmacy preceptors. Schools were selected due to some degree of experience with SUCCESS. This previous experience took different forms and included partial implementation of SUCCESS for APPE assessment or participation in a SUCCESS workshop. Nominated preceptors represented various regions across the nation and practiced in both public and private institutions. The objective was to build a non probabilistic purposeful panel with of expert clinicians with a mix of Boar d of
56 Pharmaceutical Specialists (BPS) and non BPS practitioners representing a variety of practice settings, regional locations, and educational institutions. Literature suggests the panel should be comprised of the types of individuals who will use the in strument, namely practicing clinicians who are preceptors. 73 Literature also suggests the quality of T he nomination process defines the inclusion criterion and will ensure high quality expertise within t he panel. 122 A high dropout rate is anticipated due to the perceived burden placed on the participation of busy professionals and the study will attempt to recruit thirty participants. Participation notices and reminders were adapted from the contact scheme outlined in the Tailored Design Method by Dillman et al.(2009) 123 and used to promote a high recruitment rate. If funding becomes available, compensation will be used to encourage participation. Demogr aphic characteristics were collected and summarized to describ e participants and the practice setting Questions concerning gender, age, degree, years of practice, years precepting students, primary role with students, institution type, pharmacy practice s etting, and population (of practice setting) were adopted from the AACP Annual National Survey of Volunteer Pharmacy Preceptors. 28 However, comparison with the AACP survey is limited due to a low response rate by t he convenience sample. 3.1.2 Survey Instrument Development The Delphi method was originally developed by the Rand Corporation in the 1950s as a means of extracting opinion from a group of experts. 124 According Linstone and Turoff (1992) 124 there are four key elements of this method. The first element is anonymity; panel members are unaware who contributed a given statement or opinion.
57 This is to avoid peer influences and to allow panel members to change their opinion s without fear of social consequences for changing opinions. The second element is iteration. Panel member interaction is carried out through series of questionnaires. Discussion items are refined in each successive round. There may be several survey round s and the panel members will complete essentially the same survey each round. The third element is group feedback. Each successive survey round contains a summary of the responses and opinions from the previous round. Group feedback reports the percent sup porting each opinion in the survey. The final element is When this threshold is achieved, the item is adopted. An item under discussion will be continually refined, bas ed on panel input, until the approval threshold is achieved. Conducting a Delphi panel appears to be more art than science. There are no concrete guidelines for the panel inclusion criteri a such as panel size, number of rounds, feedback or agreement defin itions. 116 120, 124 128 It appears that there are no published accounts of a Delphi panel with the exact same purpose as this study, although t here are several similar studies. 129 137 The Delphi method provides for structur ed communication among panel members and designed to exploit the advantages of group based work while overcoming its disadvantage. 124 This goal is accomplished by focusing attention on the free exchange of ideas along with hindering any single group or individual from dominating the conversation The design of the Delphi survey instrument is based on the R eactive Delphi model. The Reactive Delphi method, as defined by McKenna (199 4 ) 118 is a method that requests panel members to react to previously prepared material rather than ask them to generate original material.
58 With this strategy, panel members were asked for their pro fessional opinion on the materials presented. In this study, the Delphi panel was asked for their opinion on specific competency skills for evaluation and the performance criteria for assessment. Some questions in are associated with more than one target subject and reflects the cross method strategy of the Delphi survey. This gave the investigator an opportunity to evaluate for majors themes, discrepancies, help provide a robust description of the study question. The competency skills were adapted from the Drug Therapy and Development competency. The performance criterion include d the checklist rating scale, and performance levels. A physician at the University of Florida S h ands Medical Center dev eloped the checklist. The rating scale and performance levels were adapted from the physical therapy assessment instrument. The intention was to seed the discussion with the new items in the context of Drug Therapy and Development competency assessment T he principles of survey development outlined in the Dillman et al. (2009) 123 Tailored Design Meth od were used to create th e survey instrument. In addition, literature suggests participants should be provided with materials related to the conceptual basis for competency and the operations of the survey. 122 A brief C Literature also suggests that the quality of the feedback from previous survey rounds is a crucia l feature for optimal participation. The objective of the feedback process is to provide relevant data that the panel can readily interpret. This feedback is both quantitative and qualitative. The resulting summary of the percent response for a single survey question from previous Delphi survey rounds is an example of quantitative feedback. On the other hand, an
59 opportunity for panel members to contribute comments i s provided for every survey question, and the researcher conducts comments, which are coded according to key words or phrases. This summary comprises the qualitative feedback. A pilot test was conducted to uncove r any operational issues and assess the readability of the instrument. The University of Florida Director of Experiential Programs and the AACP Director of Academic Affairs and Assessment were asked to nominate preceptors for the pilot test. Three experien ced preceptors were recruited for the pilot study. 3.1.3 Data Collection The UF Institutional Review Board (IRB 02) approved the study and the approval letters are presented in Appendix C. The principal researcher used the third party web service Survey Mo nkey 138 to administrate the on line survey. The on line survey contained the informed consent for m, an explanation of the study and the actual survey. When a survey round was distributed, panel members were asked to submit their responses within 10 calendar days. Distribution of the next survey with feedback was scheduled within three calendar days o This cycle was scheduled to continue up to three survey rounds or completion of the Delphi panel tasks. Literature suggests that at least two survey rounds will be necessary for s is consistent with several similar Delphi studies with a greater participation burden that required three Delphi rounds. 131, 135 137, 139, 140
60 3.1.4 Data Analysis The definition of panel endorsement was set prior to starting the Delphi process. Several methods are cited in healthcare literature to quantify content validity evidence. 141 143 Literature suggests that percent agreement is a reasonable definition of panel endorsement for this application and illustrated. This study used an 80% agreement threshold for panel endorsement of an item. 3.1.5 Phase I Methods Summary The intention of this study was to collect content validity evidence for the modified SUCCESS competency Drug Therapy Evaluation and Development. The principal researcher used an online Delphi method since it is an effective and economical way to facilitate discussion among experts. Preceptors were nominated by schools of pharmacy to form a diverse group of recognized experts. The Delphi panel evaluated and refined the dom ain specification and performance criteria (Figure 3 1 ). The aim of Phase I is to demonstrate a scientifically sound process that could be replicated by other academy members for conducting a content validation study for a national APPE assessment instrum ent. 3.2 Phase II Methods Phase II used a non experimental, descriptive design. Data collected in Phase II addressed Research Questions 2 and 3, namely establishing benchmarks of reliability and accuracy for future comparisons. The objective of the second phase is to collect data from preceptors to estimate assessment reliability and accuracy. Preceptors observed simulated student performances, and used the new rating scale to assess simulated student clinical performances.
61 3.2.1 Study Participants A non p robabilistic convenience sample of typical preceptors was recruited to participate in an on line Continuing Education (CE) course. In addition to ACPE credit, participants were offered a $100 eGift certificate from AMAZON.COM. The Texas Pharmacy Associatio n and the principal researcher market ed the CE course The Texas Pharmacy Association distributed promotion material on the TPA WEB site and was included on regular email lists. The principal researcher contacted a number of schools to distribute promotion al materials. Except for two Florida based schools, all schools declined to forward promotional materials. These two schools provided the researcher with preceptor mailing list s Citing limited resources, the schools asked the researcher to distribute prom otional materials directly to preceptors. The target number of preceptors to recruit was based on the requirements of the analytical methods. This study used Intra Class Correlation (ICC) to estimate reliability and the Fisher Exact Test to estimate accura cy. The Fisher Exact Test is an exact test and not sensitive to small sample sizes. However, the ICC is sensitive to a number of factors and requires sample size estimation. The classical approach to sample size calculations are based on statistical power and minimal detectable difference. There is a growing trend to estimate sample size based on confidence intervals rather than classical hypothesis testing, and this is the strategy used in this study. 144 Sample size was calculated based on a modification of a formula described by Bonett (2002). 14 5 The study selected an of 0.05, of 0.2 and a confidence interval of 0.5. Figure 3 2 illustrates the number of video simulations needed with 40 preceptor raters and an ICC ranging from 0.5 to 0.9 in order to assess 9 examples of each skill. The unit of analysis is the assessment score based on the new rating scale. The
62 objective of the sample size estimate is to allow the study to compare and contrast assessment scores based on the new rating scale between all skills for all students. This would allo w analysis of the results of each competency skill illustrated by each of the nine student vignettes. 3.2.2 Simulation Development 18.104.22.168 E xpert Panel A non probabilistic convenience sample from the Delphi p anel was recruited. Invitations were sent to 1 2 preceptors based on the quality and quantity of comments in the Delphi panel. The study needed preceptors who were able to provide usable advice with a rationale for their opinions and did not shy away from giving advice. Participants were offered $400 c ompensation for their time and effort. Ten preceptors accepted my invitation to join the expert panel, however only seven completed the project. The expert panel had two main tasks. The first task was to review and edit scripts prior to filming. The object ive of each video vignette was to simulate an authentic encounter between a preceptor and a student during APPE rotations. The filmed simulations and supporting materials were evaluated for authenticity and appropriateness. The video script provided eviden ce a preceptor needs to make a valid assessment. In the second task, the expert panel watched the video vignettes and scored the performance using the new performance criteria These assessment scores were compared with a group of typical preceptors. 3.2.2 .2 Case Study Development The objective of the study was to evaluate a new performance criterion for the assessment of the seven skills under the SUCCESS Drug Therapy and Development competency. The Delphi panel evaluated and revised the competency skills and
63 performance criteria However, producing video vignettes that illustrate all the competency skills would have made the videos prohibitory long. This would hamper recruitment of typically busy preceptors in the next step in the study In the s opinion, illustrating only Skills A, B, and D/E would provide the most complete opportunity for assessment within a reasonable time limit. The expert panel was asked to focus on the quality of the scripts and not to be concerned about any time limitation s. The expert panel was sent summaries of six case studies and asked to select their top three preferences. Panel members were asked for their opinion on which case studies held the best potential as a vehicle to illustrate Skill A, Skill B, and Skill D/E. The poll was informal and intended to start a dialog. The expert panel preferred to use a case study on diabetes, heart failure, and anticoagulation. The first two cases focus chronic conditions and the third case offered an opportunity in an impatient setting. Panel members chose to develop scripts for these subjects to offer an opportunity to consider an i npatient setting. The three cases not selected dealt with hypertension, chronic obstructive pulmonary disease, and pain management. The expert panel was consulted regarding an appropriate setting for the video vignettes. Panel members who were available for consultation suggested capturing the give and take dialog between a student and their preceptor. The vignette would capture the re sponses of the student from preceptor questions. One preceptor suggested a particular framework of preceptor questions. Developed by the Association of Health System Pharmacists, this strategy outlines a series of inquiries to help pharmacists evaluate dru g therapy. The objective of the video vignettes was to show different student behaviors for assessment. On purpose, the variation in dialog within each
64 vignette illustrates different levels of student clinical performance and not the performance of the pre ceptor. To accomplish this goal, the preceptor prompts for each case study were the same regardless of the student or the performance level illustrated. 22.214.171.124 Script Development Process Scripts were distributed to the expert panel within the period for completion. After the responses were complied, the new version was sent back to the expert panel review. The principal investigator called several members of the panel for clarification of terms and comments. At the end of each script, the expert panel was asked to respond to four questions. A summary and sample of comments follows. Question 1: Do these vignettes illustrate a typical interaction between the preceptor and the student? Question 2: Is there enough information illustrated in the script to jud ge student performance? Question 3: How do you tell the difference between a competent student performance (e.g. Excellent and Entry Level) and a not competent student performance (e.g. Deficient)? Question 4: student performance and a student performance ? Question 1: All members of the expert panel agreed on the scripts, which, for the purpose of this study, represented typical interaction between a student and a pharmacy precep tor. Many comments suggested that preceptors spend considerable time asking questions to elicit responses from students. The behavior of the preceptor is not depicted in the vignettes. A sample of comments follows. 1 em to talk. 2. For an [Entry level] student I would expect a whole lot more prompting from the preceptor.
65 2. Typically [even for Entry level] adequate students are not confident in what they know so I have to ask tons of questions but after they re alize they really did know the info they just didn't put it all together in the beginning. Question 2: All members of the expert panel agreed on the scripts, which, for the purpose of this study, presented enough information for a typical preceptor to as sess performance. The panel agreed that the scripts were ready for filming. There was some reservation that assessment of one example) of a student, performing a skill may be difficult. The first draft of the script included a scene where the preceptor giv es a history student performance that is not in the video. This was an attempt by the researcher to give the impression of multiple performances to help make assessment feel more realistic. However, the panel uniformly rejected this scene. The panel explai ned that this prevented the preceptor from basing their assessment on their observations and introduced bias. A sample of comments follows. 1. I think there is enough info illustrated in the script to judge student performance. However, one caveat is, the preceptor will have judged multiple cases to determine their level by the end of a rotation. Sometimes only one is hard. 2. 3. Yes with modifications 4. Question 3: The expert panel related several behaviors they want to observe for evaluation of whether a student is competent or not. First, the competent student should and the therapeutic objective Medication therapy is seen as a balance of benefits and risks based on the principles of evidence based medicine. Knowledge of medication should be consistent and detailed. Beyond a
66 technical understanding of therapeutics, the competent student has a patie nt centered perspective. They will inquire about patient compliance and investigate barriers to access to medication. A small sample of comments follows. 1. At some point (either history or in discussion later) I would like [the competent student] to mak compliance . . No point in adding meds if the patient isn't going to take them. 2. [For a competent student], I would also want to see her bring in [patient] her glucometer records to download them and follow up with her that she is monitoring at home at least monthly and not wait 3 months for the HgbA1C. 3. I would expect [the non competent student] to just to handout the pamphlets 4. When the students are giving their recommendation, I never le t them get away with "intermediate acting insulin" they would need to specify which product the patient will be receiving. 5. [a competent student] . understands what the medications are and what they are for. 6. . I drive for understanding of the risks and benefits. And also include compliance and financials. 7. Have him [the non competent student] prioritize a different problem . . 8. Have him [the non competent student] pick a very expensive insulin (like lantus) and not have co nsidered if her insurance would pay for it or what her copay would be or justify it by him stopping the other generic meds which have low co pays. 9. The student [the non competent student] has the right meds but an out of this world dose. Have him r ecommend a much higher dose since he is stopping all of her meds. (like ummm I think 1 Unit per kilogram BID) but not know why he chose that dose. 10. . giving me [by the non competent student] a long list of factors that may affect INR [laborato ry results] but are not relevant to this case.
67 Question 4: Every member of the expert panel expressed difficulty describing behavior that was different between Excellent and Entry Level performance. Both the Excellent and Entry level students understand the therapeutic objective They perceive medication therapy as a balance of benefits and risks based on the principles of evidence based medicine. Knowledge of medication should be consistent and detailed. However, there were several suggestions to show examples in which Entry level students ask for assistance, or in which the Excellent student recognizes when medical documentation does not contain all the answers and one needs to ask the patient. It was also suggested tha t the script show how the Excellent student reached out to connect with the patient to ensure compliance, verify educational needs, and recognize barriers. Comments suggested that Excellent students should have a grasp for the strengths and weakness of lab oratory tests and an advanced understanding of practice protocols as compared to other students. A small sample of comments follows: 1. product costs (i.e., If deciding between 2 insulins which would be better for that reason). 2. An excellent student would verify whether the patient needs diabetic education. . just because Joyce is a nurse, I would assume she would need more diabetic education, she may think she knows what to eat but nurses tend to be the worst and she would benefit from education. 3. . I may even suggest [the excellent student] phone the patient at home after 1 week to see how she is doing with her new regimen. 4. [For an excellent student], I would also want to see her bring in her glucometer records to download them and follow up with her that she is monitoring at home at least monthly and not wait 3 months for the HgbA1C.
68 5. When talking about self management activities, excellent students m ay mention compliance and written info, but perhaps also suggest other compliance devices like a pill box or give him a chart to record his meds as he takes them may be of benefit also for an excellent student. 6. [For an excellent student] . he has CAD, I would probably continue his aspirin but there has been some controversy surrounding ASA recently. Is that what you are currently recommending here at your clinic?" 7. Considers im pact of non medication factors,. . consistent diet is the key [in the anticoagulation case] 8. . focusing on the patient at hand. He is not significantly elderly [in the anticoagulation case]. After the patient details, I as a pharmacist would be asking her, why do you think today his INR is 4.6? 9. He [an excellent student] needs to talk more about diet, signs and symptoms of clotting, lifestyle issues with warfarin, compliance, etc. What questions do we need to ask him? 126.96.36.199 Rating Scale and Student Behavior The expert panel was asked for input on student behaviors that would illustrate different levels of performance for each of the five dimensions (see Appendix F and G ) of the new rating scale. The poll was informal and intended to help the researcher effectively use the feedback from the panel. Brief summaries of the strategy used to illustrate behavior for the different levels of performance with each of the rating scale follows: Supervision: This is the level and extent of assistance required by the student to achieve entry level perfor mance. The rating has four levels of assessment (see Appendix F and G). The supervision rating is different from all the other scale behaviors. The preceptor bases the degree of supervision on behaviors captured in the other rating scale items. For example the preceptor may observe a student recommending the wrong dose or missing important information in the medical records. In this case, the
69 preceptor would make the appropriate assessment for consistency or complexity in the rating scale based on these ob servations. The preceptor may also assess the necessary degree of supervision differently based on these observations, which is, in effect, a global assessment of the student. Many comments suggested that preceptors spend considerable time asking questions in order to elicit certain responses from students and that the degree of prompting is a strongly associated to the degree of supervision needed by the student. Previously explained, preceptor prompts were identical regardless of the student performance level. This is unfortunate since many preceptors use the degree of ample of comments follows: 1. Typically my worst students take the longest to get through a case because they miss so much or I have to drag the information out of them. 2. Students who are adequate are likely looking for positive reinforcement and will ask me what I would normally do if they are not confident in their response and this would indicate a little m ore supervision is being required. Perhaps a response from the student as "Can you explain what you mean by this? 3. . I have to prompt students to get them to talk. 4. For an [entry level] student I would expect a whole lot more prompting fro m the preceptor. 5. Typically adequate students are not confident in what they know so I have to ask tons of questions but after they realize they really did know the info they just didn't put it all together in the beginning. Quality: The objective of this scale is to rate the degree of knowledge and skill assessment (see Appendix F and G) As previously mentioned, the competent student
70 understands what the therapeutic ob jective for a can recite textbooks; however, preceptor comments suggest several key behaviors. First, the student should understand the medications therapeutic objective Second, the student should understand the benefits and risks of medication therapy. Lastly, the student should understand the therapeutic objective of the medication for the patient A sample of comments follows: 1. I typically don't care for [students] to tell me statistics if they can tell me w hy they need to do x or y for a patient. 2. I drive for understanding of the risks and benefits. 3. When the students are giving their recommendation, I never let them get away with "intermediate acting insulin" they would need to specify which pro duct the patient will be receiving. 4. So when discussing the problems I would expect some statement from the student that the [patient] has signs and symptoms which require drug therapy . Complexity: The objective of this scale is to rate the pro ficiency with simple or complex cases. The number of elements of the case (e.g., simple or complex) must be considered relative to the patient, task, and environment. The rating scale has two levels for assessment (see Appendix F and G) It was suggested t hat a deficient student would fail to address all the relevant factors with a question for a complex patient. A sample of comments follows. 1. When Joseph can't list all the drugs that exacerbate heart failure, the preceptor may want to encourage but then respond with the other classes or may request he look into this and report back more thoroughly by the end of the day. 2. I may include some encouraging statements throughout (e.g., when she is listing all the diff meds that exacerbate heart failure, to indicate she got most if not all of them.
71 3. Have him talk about hypoglycemia for the question about considerations and completely miss the weight gain. Consistency: The frequency of occurrences of desired behaviors (e.g. infrequent ly, occasionally, and routinely). The rating scale has t hree levels of assessment (see Appendix F and G) One suggestion was t o show that this behavior is to have the students give a good rational for a medication and then select the wrong dose without any rational e A sample of comments follows: 1. much higher dose . (like ummm I think 1 Unit per kilogram BID) but not know why he chose that dose. 2. Have him [student] talk about hypoglycemia for the question about considerations and completely miss the weight gain. 3. Have him skip meds that he doesn't think are related (like ASA or statin) and have preceptor ask if that is all the meds. Efficiency: This is defined as the ability to perform in an effective and timely manner (e.g., inefficient/slow, efficient/timely). The rating scale has two levels of assessment (see Appendix F and G) management decisions were suggested as o ne way to demonstrate efficiency in the video vignettes. Another example is to have the student give a messy and disorganized presentation to the preceptor. The following is a sample of participant comments: 1. Also, these students [the non competent st udent] tend to make a lot of excuses so instead of him just saying he needs extra time for the other patients, I would expect him to say something like I haven't gotten to them as I've been working on the journal club article for next week when the patient s are due for an appointment in a few minutes (terrible prioritization skills). 2. . these messy students show how unorganized they are 3. .
72 188.8.131.52 Randomization The final step prior to filming was randomization. During script development, the performance level for each student was the same for every skill. For example, the performance level for Mary, Patricia, and Linda was Excellent for all skills. This was to ma ke the job of editing easier. More importantly, preceptors and assessment instrument s should reliably and accurately assess competency skill s on the merits of th e particular performance. A given assessment should be based on the performance crit erion (e.g. skill description and the checklist) and not influenced by prior behavior of this or other students. The first 3 video vignettes were not randomized. This was done to help evaluate the effect of randomization in the study The remaining 6 video vignettes were randomized by skill resulting in an almost equal number of examples of e xcellent, e ntry level, and d eficient performances (Table 3 1) The expert panel was blinded to the randomization 184.108.40.206 Video Production Nine students from the School of Pharmacy at the Southwestern Oklahoma State University (SWOSU) were recruited. The principal researcher obtained the approval of the SWOSU IRB and used the consent documentation approved by the University of Florida IRB (Ap pendix C). These third year students were compensated $100 for their time. Students completed rehearsals and filming in one day. An experienced SWOSU faculty played the role of the preceptor. The longest vignette was over 10 minutes in length, the shortest video was 5 minutes, and 10 seconds long (Table 3 2 ). It t ook 70 minutes to play all the vignettes consecutively. This was much longer than originally expected. However, no time limit was placed on development of the s cripts.
73 220.127.116.11 Expert Panel Data Col lection The principal researcher used the third party web service Survey Monkey 138 to administer the on line survey. The on line survey contained the informed consent form, an explanation of the study purpose, access to video vignettes, and the survey. Data collected from the expert panel are presented in the RESULTS chapter and are compared with the assessments collected by the preceptor panel. 3.2.3 Data Collection webinar and, as mentioned earlier, Survey Monkey 138 was used to show the video vignettes and collect assessment data. Preceptors were given C E credits and a $100 AMAZON.COM eGift certificate for participatio Board (IRB 02 ) approved the study and the approval letters are presented in Appendix C. The CE program employed a variety of teaching formats, including topic presentations, video illustration of APPE performance, an d expert discussion. The teaching objectives are shown in Figure 3 3 3.2.4 Data Analysis Figure G 1 in a ppendix G illustrate s the data collection and analysis pathway. The same video vignettes. Each of the nine video vignettes illustrate d student performance of Skills A, B, and D/E. Performance evaluations based on the new performance criteria were collected. Performance levels for each skill are were based on the assessment rubric discussed previously. A ssessment data is collapsed into two 2X2 tables for analysis. One table compares c ompetent vs. n o t c ompetent assessments and the other table compares
74 e xcellent vs. e ntry l evel assessments. For the evaluation of reliability, the single measure Interclass Correlation Coefficient (ICC) is estimated for both 2X2 tables. The estimates of reliability from the ICC addresses research question two. For an evaluation of accuracy, the Fisher Exact test is estimated for both 2X2 tables. The estimates of accuracy from the Fisher Exact test address research question three. Research Question 2: Can preceptors use the SUCCESS instrument to make reliable assessments of Drug Therapy Evaluation and Development competency? Research Question 3: Can preceptors use the SUCCESS instrument to make accurate assessments of Drug Therapy Evaluation and Development competenc y? 18.104.22.168 Reliability test The ICC is a widely used statistical test to estimate the degree of inter rater reliability. 146 149 Shrout and Fleiss ( 1979 ) 149 d escribes six forms of the ICC and give a decision rubric for selecting the appropriate form. The first decision is specifying whether the study should analyze the data using a one way or a two way model. The researcher would like to separate error into sys tematic and random error in order to measure single measure reliability and this objective requires a two way model. Each preceptor needs to rate all nine video vignette s The single measure ICC compares both the reliability of raters and the agreement be tween raters for a criterion based rating scale as used in this study. 146 The second decision is whether the study conceptualizes the preceptor raters as a fixed effect or a random effect. The aim of this study is to produce results that may be generalizable; this is why a ran dom effects model will be utilized by the study The selection of the unit of analysis is the third decision. In practice, only one preceptor rates student performances during a clerkship rotation, as compared to other healthcare
75 professions in which multi ple preceptors assess student performance. The unit of For the measurement of inter rater reliability preceptors and students are treated as random factors in a two way random effects model. Since the three pe rformance levels ( e.g. excellent, entry level, and deficient) are hierarchical and this is a violation of ICC assumptions the study will collapse the scores into 2x2 tables for comparison. Competent students will be compared to non competent students and excellent students will be compared to all other students as shown in Figure 3 4 The ICC of students scores. The null hypothesis is H 0 a : This test will demonstrate if the reliability of assessment among preceptors is greater than by chance alone. For the purpose of discussion, Landis & Koch 150 create d (Table 3 2 ). This table can be used to interpret ICC results. According to the authors, these divisions are arbitrary. However, they are useful for discussion. 22.214.171.124 Accuracy test non parametric test to compare differences in the distribution of scores between groups whos e matrix may contain cells with values less than five. As previously mentioned, assessment data is collapsed into two 2X2 tables for analysis. One table compares c ompetent with n on c ompetent assessments and the other table compares e xcellent to e ntry l evel assessments. The Fisher Exact test treats th is collapsed data as nominal data.
76 The null hypothesis is H 0 : the probability of observing no difference 10% in the scores between preceptors and experts. The alternat ive hypothesis is Ha: the probability of observing a difference > 10% between preceptors and experts. This test will show if typical preceptors assess students with the same accuracy, within a 10% range, as compared to the expert panel beyond chance alone. Ho wever, there are no widely accepted criteria defining acceptable accuracy for assessment. The principal researcher will note results up to the 30% level as a form of sensitivity analysis 3.2.4. 3 Rating Scale Analysis With the SUCCESS assessment instrumen t, preceptors indicate the degree of supervision required for each competency skill. This supervision scale rates student performances on three levels (Figure 2 4). The rating scale introduced in this study rates student performances on five domains of beh avior. To facilitate comparison between the two scales, The principal researcher created a surrogate scale to approximate the SUCCESS instrument. This approximation was accomplished by collapsing the responses for levels of supervision from the rating scal e. This proxy scale will be used to estimate the three supervision currently used in SUCCESS (see appendix O). The Fisher Exact test will quantify differences between the two scales. The study will compare r esults for the competent vs. not competent and th e excellent vs. entry level comparisons are shown in Table s P 1, P 2, and P 3 for competency skills A, B, and D/E respectively. 3.2.5 Phase II Methods Summary The purpose of Phase II was to collect quantitative evidence of validity from preceptors for the SUCCESS competency Drug Therapy Evaluation and Development. Given our goal of establishing a scientifically sound method to measure reliability and
77 accuracy of APPE assessment instruments, our results should be generalizable to the clinical setting and es tablish a benchmark for comparison. Table 3 1. Performance targets from the script Diabetes Cases Mary Thomas Susan Skill A Excellent Entry Level Deficient Skill B Excellent Entry Level Deficient Skill D/E Excellent Entry Level Deficient Heart Failure Cases Patricia Joseph Dorothy Skill A Entry Level Excellent Deficient Skill B Entry Level Deficient Excellent Skill D/E Deficient Excellent Deficient Anticoagulation Cases Linda David Barbara Skill A Excellent Entry Level Deficient Skill B Entry Level Excellent Deficient Skill D/E Excellent Deficient Excellent Table 3 2. Video v ignettes times Diabetes Mary Thomas Susan 10:13 8:19 5:17 Heart Failure Patricia Joseph Dorothy 5:31 9:05 7:35 Anticoagulation Linda David Barbara 6:32 7:39 5:10 Table 3 3. L 150 Strength of Agreement ICC Strength of Agreement < 0.0 Poor 0.00 0.20 Slight 0.21 0.40 Fair 0.41 0.60 Moderate 0.61 0.80 Substantial 0.81 1.00 Almost Perfect
78 Figure 3 1 Number of video simulations for 40 preceptor raters Figure 3 2 CE course objectives Figure 3 3 2x2 student assessment tables CE Course Objectives Compare and contrast how preceptors assess competency in other health professions. Discuss Accreditation Council for Pharmacy E ducation (ACPE) experiential education requirements. Describe the factors indicative of pharmacy competency. (APPE) performance. APPEs.
79 CHAPTER 4 RESULTS 4.0 Results Introduction There are two study phases and e ach phase addresses different research questions and samples different populations. Phase I recruited a Delphi panel to obtain expert opinion from a cross section of clinical faculty. The Delphi panel ev aluated and refined the competency skills and performance criteria. The Delphi panel outcomes address Research Question 1 and provide evidence of content validity. Phase II examined evidence for research questions 2 and 3. With the assistance of an expert panel, the principal researcher developed video simulations illustrating selected competency skills and performance criteria as a pharmacy student interacts with their preceptor. A preceptor panel assessed student performances with the new performance crit eria. This data was used to estimate assessment reliability and accuracy. 4.1 Delphi Panel 4.1.1 Round One Panel Pharmacy s chools were selected according familiarity with the SUCCESS assessment system. From the 10 contacted schools, th e principal researcher received responses from 7 schools to email invitations to their best performing preceptors Citing resource constraints, two schools requested that the principal researcher contact preceptors directly and provided contact information. The primary researcher made r equests for the remaining 5 participating schools to resend the email invitations every two weeks for three follow up contacts. Reminder invitations were sent every two weeks for a total of three follow up contact s. Of the 26 individuals who completed the on paneli s ts
80 completed the entire survey and one panelist partially completed the survey. This represents a 73% response rate based on the number of individuals who completed the on line consent form. The survey responses and demographic data reported will include the usable responses from the 20 participants of the Delphi panel and the two pilot study panel i s ts Panel demographics are listed in Table 4 1. ages ranged from 29 years to 63 years a majority (54%) reporting 41 years or older. A slight majority (52%) of the panel s were male and almost half (47%) held a doctor of pharmacy (PharmD) degree. Several panelists possessed advanced degrees beyond the P harm BS and PharmD degrees, including a MBA, a MPH, and a PhD. The majority (64%) of the panel reported that they had practiced in pharmacy for 11 years or more. The number of precepting students varied. The majority of panelists (73%) indicated that their primary role with students was as a preceptor. The site demographics are listed in Table 4 2. 73% of panelist s were affiliated with public institutions, while the remainder was affiliated with private institutions. Approximately half (53%) worked i n hospitals and the remaining worked in various settings. Panel s reported a variety of population settings. Panel i s ts represented 8 states and most respondents were from Florida (43%) followed by Pennsylvania with (24%). 4.1.2 Domain Specification Results The first series of questions address the domain specification of the construct Drug Therapy Evaluation and Development This line of inquiry asks the Delphi panel to evaluate whether a competency skill is relevant for assessment. The panel reviewed the seven competency skill statements (see list in appendix B) and were asked to respond to these two questions:
81 Domain question: Is competency S kill X relevant in defining the Drug Therapy Evaluation and Development competency? Follow up domain q uestion: Are there any aspects of pharmacy knowledge, attributes, or skills missing in the any of the skill statements? Skill A domain question : All 22 panel members indicated that competency Skill A is clinical ly relevant (Table H 1). Comments suggested incorpo rating patient input with in the process of identifying problems rather than relying solely on data gathered from medical records and other documents. It was suggested that the description be modif ied by reducing the wording to limit the focus on identifyin g medication problems. The following is a sample of comments. MEDICATION history and the student ability to perform a thorough medication history by interviewing the patient. This statem ent could be existing co morbidities and no meds prior to admission. Or there are the wrong meds. . this is simple 3 does not include complete current medications and past medications. Would consider specifically mentioning these as they 4 thorough patient his tory is much different from using a chart to look up lab data. The third skill then is summarizing into important and relevant to determine the problems. It would be nice to have more areas to assess and not lump so many together where a student can be goo d at one part so you don't want to give them a low score but could use work on the Skill B domain question : All panelists (22) indicated Skill B r elevant (Table H 2). Comments are as follows
82 . see above. If they are in decomp'd CHF, some decision has to be made on a hospital admission as to whether we keep them on the very important to know the rationale, particularly when dication needs with other health care providers. Skill C domain question : Nineteen ( 86%) panelists responded that Skill C was relevant and 14% (3) in d icated it was (Table H 3). C omments suggested this description is actually Skill B that is the process of prioritizing drug related problems and this competency skill actually describes a high priority event The following is a sample of panel comments. but perhaps impossible to truthfully answer as on most ro tations, students don't encounter these events so preceptors end up assuming the student has this ability through discussions. This may not be reflective of the student s competence in a real life situation. Instead of just emergency medical attention, per haps it would be more relevant to evaluate their ability to recognize emergency and immediate (ie. blood glucose of 1000, potassium of 6.5 opposed to just EMERGENCIES such 2 3 Not sure that this should be a priority for pharmacy students. If it implies emergency drug related problems I would assume that is incorporated into skill B "prio Skill D domain question : Competency Skill D was clinically relevant for 86% (19) of the panelists and 9% (2) panelists indicated Skill (Table H 4). Panelists suggested including patient needs, financial implications, and approved guidelines since these are factors used in evaluating treatment plans. Comments recommended the skill should not be limited to pharmacokinetic data and drug formulation data and include other factors such as
83 pathophysi ology and pharmacodynamics. Th is competency skill should be written in the form of a therapeutic monitoring plan. The following is a sample of comments. changed. The formulation (i.e. tab let, capsule, suspension, etc) is not relevant to Drug Therapy Evaluation and Development. Instead it would be more relevant if the statement included the phrase . rational drug therapy based on known pathophysiology of disease as well as pharmacodyna mics of drugs used to treat the disease(s) . . 2 appropriate solution to each of the patient s problems is important. I think more goes into a decision than just PK and drug formulation data. The solution also needs to incorporate patient needs, financial implications, approved guidelines, etc. This should be in the form of a therapeutic 3 4 ement is too limited in that it only includes pharmacokinetic and pharmaceutics. There are many more basic and clinical sciences involved in the ability to make good decisions about optimal drug therapy. These include pharmacology, medicinal chemistry, phy siology, pathology, Skill E domain question : Of the 21 panelists who answered this question, 86% (18) indicated competency S kill E as r elevant 10% (2) panelists indicated that S kill E and 5% (1) re spondent indicated S (Table H 5). A number of panelists indicated the need to consider all the factors in competency S kill E and the factors described in the S kill D before a therapeutic plan could be developed. S ample of panel com ments follow 2 3 solutions are determined and the best solution becomes the recommendation and in cludes a monitoring plan for that solution. All of
84 these items need to be determined before the recommendation is 4 This implies importance of rationale. Skill F domain question : O nly fifteen ( 68%) panelists indicated this competency skill as relevant and 32% (7) panelists indicated it was (Table H 6). P anel members suggested that "back up" plan is not the process that occurs in the clinical setting. Whe n the primary plan does not work, the pharmacist needs to reassess reasons why the plan did not work and use this information to develop an alternative plan. The following is a sample of comments. ns. If a medical situation changes, then an alternate plan is determined. I don't think I've ever included backup plans in a progress note to a physician although by nature, I have alternate plans. They are rarely communicated verbally unless necessary; th erefore it is difficult for me to remember to encourage 2 up to the monitoring plan because the monitoring plan will take into consideration what adverse effects or lack of efficacy and i f it occurs one starts at the beginning of 3 not work, pharmacist needs to reassess reasons why the plan(s) did not work and then be willing to develop alternative plan(s).This also should include alternative treatments should the recommended therapy not be Skill G domain question : Twenty ( 91%) panelists reported this competency skill as relevant and 9% (2) panelists indicated it was (Table H 7). Panelist s recommended that the statement needed to identif y the target audience. One panel suggested three audiences ; 1) do cumentation for the patient, 2)
85 documentation for the chart, and 3) documentation at the pharmacy. The following is a sample of panel comments. effective manner regardless of the educa 2 not occur, then payment does not follow. It is necessary and vital to 3 documentation for t he patient, documentation for the chart and other HCP and documentation at the Follow up domain q uestion: A slight majority 55% (12) indicated there were no missing aspects in the competency skill statements. However, 45% (10) panelists indicat ed there were missing items (Table H 8). Comments suggested that patient centered factors were missing in the skill statements and several recommended incorporating drug information from the patient and patient's values within the treatment plan. Comments indicated that skill statements lacked factors important with making evidence based decision clinical decisions. Several paneli s ts suggested skill statements D and F should be combined since in their practice these factors are evaluated together when devel oping a treatment plan. The following is a sample of panel comments. 1 changed. The formulation (i.e. tablet, capsule, suspension, etc) is not relevant to Drug Therapy Evaluation and Develo pment. Instead it would be more relevant if the statement included the phrase . rational drug therapy based on known pathophysiology of disease as well as pharmacodynamics of drugs used to treat the disease(s) . . 2 patient, respect patient's time and values Patient history review of systems, being thorough, appropriately respond to patient concerns p rioritizing drug therapy problems q uality of recommendations i dentifying alternatives and then making the recommend ation with care plan including goals of therapy, what requires monitoring and appropriate time frames ? Follow up with the patient to
86 monitor the plan and recommend adjustments. Documentation in a language understandable by the patient, appropriate for other HCP and 3 plan (implied efficacy and toxicity) and communication of that plan to the 4 do not include evidence based decision making and 4.1. 3 Performance Criteria Results This series of questions asks the Delphi panel to evaluate the performance criteria (e.g. checklist rating scale, and performance levels ) These items associate observed student behavior with a degree of competent performance. The first set of questions relates to the relevance and representativeness of the performance criteria with each competency skill statement The following and concluding series of questions address descriptions of the individual elements of the rating scale and definitions of the performance levels. Descriptions of the performance criteria are shown in appendix F and G. Relevant que stion Are all the elements of the checklist and the rating scale behaviors relevant for assessing performance in S kill X ? Representative question Considering the universe of performance measures, are the elements of the checklist items and rating scale behaviors representative of the skills, knowledge, and attitudes required to assess performance in S kill X ? The Delphi panel was given the competency skill statement, checklist, rating scale, and performance descriptions for each competency skill and an explanation of their purpose. The checklist is a description of activities the
87 student needs to perform to complete an individual competency skill. The rating scale describes student behaviors for a particular performance level (e.g. excellent, competent, and deficient). Student performance are to be rated across five performance dimensions. Supervision: refers to the degree of assistance. Quality: refe rs to the degree of knowledge and skill demonstrated. Complexity: refers to the number of elements that must be considered. Consistency: refers to the frequency of desired behaviors. Efficiency: refers to the ability to perform effectively and timely. Ski ll A relevance q uestion: All panelists (22) indicated the performance criteria relevant for assessing performance of competency S kill A (Table H 9). Panelists commented that data gathered from the patient was important to S kill A and suggested including the data gathered directly from patients would improve the checklist. A panelist commented maintaining a complete caseload is not indicative of excellent performance This should be excluded from the rating scale. 1. "Mainta in ing a complete caseload" may not be needed to achieve an excellent rating. The quality of the skill is more important than the quantity. Student will acquire the ability to do multiple tacks as they gain more dents, particularly 4th year students, to go see the patients and obtain and basic (pertinent) physical exam face to face, they so consider the severity of disease, as well as the comprehensiveness of the care
88 and not relying on what others have in the chart or medical re cord. Pharmacists can often get more thorough home med lists from patients and may see side effects that other HCP miss when speaking with the patient as we are focused on these issues. Students should be able to interpret lab data and test results so this Skill A representative question: All 22 panelists indicated that the performance criteria were representative as defined (Table H 10). comment s suggested a need for greater emphasis on patient interaction. Panelists suggested exclude maintains a full time caseload, seeks to assist others where needed, and capable of supervising othe rs from rating scale. The following is a sample of panel comments. 2 labs before forming a problem list, I would also like the to look at the home meds, as well as the chief complaint, history of present illness, ht, wt, allergies, (which aren't always i n the history or physical), family history, social history including alcohol, drugs, tobacco, caffeine, social setting. regarding the labs, I would also expect the students to take into consideration the exams and studies prior to forming a problem list (C T, MRI, Chest Xray, EKG, etc.) specific edit suggestions: "previous history of illness" is more precisely put "past medical history" "sex" should be "gender" should also include in this section ht/wt/allergies "complain" should be complaint" "synthesizes . should include the other components as discussed above. "expansive view of the profession" this 3 (or at least should be op tional): time caseload and seeks to assist others where needed. 4 experiencing that may be caused by drug therapy or the lack of drug
89 Skill B relevance q uestion: Seventeen (81%) panelists indicated performance criteria were relevant and 19% (4) indicated a need for revisions (Table H 11). comments suggested the problems listed need to be prioritized on more f actors than morbidity and mortality in the checklist The following is a sample of panel comments. of the patient. A condition that does not appear to be clinically important may be v Organize to Prioritize. ust morbidity and mortality! Patient priority should also have some bearing on this and team priority may also come into play. Looking at the current medication list also needs to be expanded to look for inappropriate frequencies, inappropriate doses (too Skill B representative question: Fourteen ( 76% ) panelists indicated the performance criteria were representative while 33% (14) of the panelists indicated a need for revisions (Table H 12). Comments suggested that the difference between deficient and competent performance was too great and some panel comments suggested that there was not enough of a difference between competent and excellent performance in the rating scale. Panelist s suggeste d excluding caseload from rating scale. The following is a sample of panel comments. 1 T he difference between competent and deficient is too different, Students can't all carry a full caseload . even I have difficulty with this sometimes. This should be reworded to the assigned number of patients All students are not created equal; some can carry a full load and others cannot: however that one element makes it difficult to choose competent
90 2 did they come in on?? If the patient isn't doing well, have we asked if there were meds taken prior to admission that could account for this change? Has a formulary substitution been made that has been a therapeutic failure for this 3 excellent is relevant in this skill. This category seems to o stringent for most students to achieve. The checklist contains items that really only a qualified, 4 too small to determine. It would be nice if these were a little more defined to more easily see the difference. What is a caseload defined as? It would be nice if there were more gui dance like each student in an ambulatory care rotation had to follow 10 patients in a 2 month time frame so I can Skill C relevance q uestion: Seventeen ( 81% ) panelists indicated performance criteria are relevant and 19% (4) panelists recommended revisions (Table H 13). C omments suggested consolidating S kill C within S kill B, since this is part of prioritizing patient problems. The following is a sample of comments. falls under the prioritizing the problems of the last skill set and can be the care is provided by certified nurse practi ti oners in additiona l Skill C representative question: Fou r teen ( 76% ) panelists indicated performance criteria were representative W hile 19% (4) of the panelists indicated a need for revisions and 5% (1), panelists
91 (Table H 14). Panelists suggested consolidating S kill C within S kill B, since this is prioritizing patient problems The following is a sample of panel comments. e last skill set so one statement is on defining the problems and one is on priori ti zing but I don't believe emergency situations need a separate statement. capable of deliveri Skill D relevance q uestion: Thirteen ( 62% ) of the panelist indicated all the performance criteria elements were relevant and 38% (8) panelists indicated a need for revisions (Table H 15). C omments suggested the consolidating S kill D with S kill F, s ince more factors are considered than pharmacokinetics and formulation. Comments mentioned the checklist does not mention financial implications, patient values, and health literacy. Panelist s suggested excluding caseload and supervision from the rating sc ale. The following is a sample of panel comments. 1 capable of supervising others is relevant to performance in Skill D 2 education as an option. Also m ore goes into the decision than PK and formulation either leave this part out or expand and include all factors in 3 economic factors including cost, patient's readiness to accept/use therapy/dosage fo rm, health literacy, Skill D representative question: Twelve (57%) panelists indicated the performance criteria are representative W hile 38% (8) of the panelists indicated a need for revisions or indicated the checklist and 5% (1), panelists indicated that the rating
92 (Table H 16). Panelists suggested that Comments suggested that parts of the rating scale were recommended excluding; 1) maintaining a full time caseload, 2) seeks to assist others, and 3) managing patients with conditions beyond expected of an entry level practitioner from the rating scale. The following is a sample of panel comments. 1 State of Health is too n arrowly described. gets a scratch that gets red an is deemed to be community MRSA, and is given a prescription for clinda for a week versus grandma who's 75 and is in septic shock on pressors & CRRT with a VAP . on vanco q12 with a 3 eficient needs to be redefined. A student coming out of 3 years of a pharmacy program had better be able to design and evaluate treatments or they shouldn't be on rotation. Deficient should be seen as not examining all options or not taking into account pa tient factors (only looking at guidelines, etc). Competent would be all options included but the best one not chosen and Excellent would be choosing the most 4 her disciplines beside pharmacokinetics and Skill E relevance q uestion: Fifteen ( 71%) panelists indicated the performance criteria were relevant and 29% (6) panelists indicated a need for revisions (Table H 17). Comments suggested the co nsolidating S kill D with S kill E. Panelists mentioned the checklist does not mention financial implications, patient values, and health literacy. Several s uggested excluding caseload and supervision from the rating scale. The following is a sample of panel comments.
93 2 that should be reflected in the checklist per the following . evaluates treatment regim ens for optimal outcomes using disease states and previous or current drug therapy as well as 3 for a problem/disease state based on clinical data (ie. study results, guidelines) it only focuses on patient psycho social and healthcare system factors. In the rating scales I don't feel that being capable of supervising others 4 This should be included in the last skill statement (D 5 Skill E representative question: Thirteen ( 62%) panelists indicated the performance criteria w ere representative while 38% (8) panelists indicated a need for revisions (Table H 18). The following is a sample of panel comments. n might be considered here too. Jehovah Witness people and the no blood product (or albumin like in LVP or epo products based on disease state specific criteria (ie. current standard of care, Skill F relevance q uestion: Sixteen ( 76%) panelists indicated all the per formance criteria are relevant and 24% (5) panelists indicated a need for revisions (Table H 19). Comments suggested the checklist could be improved by having the student identify patient specific factors that would necessitate a change in drug therapy. Pa nelists suggested including "checkpoints" in the checklist for monitoring. The following is a sample of panel comments. 1 Checklist #2 implies this is an ambulatory situation where information is directly shared with a patient, in an inpatient (ie ICU) rotation statement #2 would not imply and should perhaps read patient/healthcare provider
94 2 capable of supervising others 3 be aw are of all the clinical parameters and/or lab values to monitor and when and if they are not in the range of what should be done which may include adding a drug, increasing or decreasing a drug, discontinuing a drug, etc. Also education is important as the re may be diet or lifestyle activities that can impact the plan. This needs to be made more 4 back up plans and more about monitoring therapy. Although some up front discussion of medication issues and/o r checkpoints need to be discussed as foundation to monitoring, this skill seems more related to developing monitoring plans 5 guidelines to formulate their plan B Skill F representative question: Sixteen ( 76% ) panelists indicated that the performance criteria were relevant, while 24% (5) panelists indica ted a need for revisions (Table H 20). The following is a sample of panel comments. 1. "Describe patient specific factors that would necessitate a change in . . There are so many things that go into a back up p lan. Explain to the patient ? Not exactly gonna happen in the assessment. The student must be able to identify appropriate therapeutic outcomes and eval Skill G relevance q uestion: Thirteen ( 62% ) panelists indicated that all the elements of the performance criteria were relevant and 38% (8) indicated a need for revisions (Table H 21). comments suggeste d the need to state where the documentation should occur and that the checklist would need major revisions to be appropriate for the hospital setting. The following is a sample of comments.
95 1 ther health care professionals, then the elements are too involved. If it is for patients, why bother? All of the information on the check list is readily available in on 2 lize this was just referring to the plan for patients (written in LAY language). I have always graded this regarding progress notes to providers. This should be changed to include BOTH. And it needs to be rewritten to be clear on what is actually being gra 3 4 of resources at the site. While it may be nice to h ave written the above details, I am afraid this is probably being accomplished by students printing out drug info pamphlets from Clinical Pharmacology or other software applications. I would like to see this in giving the patient the monitoring plan with w ritten drug information and take out what it needs to include as there are many items missing from the above and others not relevant. I would want the above information covered in a verbal education Skill G representative question: Fourteen ( 67%) panelists indicated the performance criteria w ere representative as defined. W hile 33% (75) panelists indicated a need for revisions (Table H 22) comments suggested that the checklist was appropriate for patient communication, however the checklist lacked the essential elements for communicate with physicians and other health care professionals. The following is a sample of panel comments. 1 2 which is probably the highest dollar 3 HCP, patient, agent or docum entation at the rotation site so other pharmacy students or preceptors can continue care. Also needs to include follow
96 4 communication, they essentially ignore the need t o effectively communicate with physicians and other health care professionals. Complexity question : Sixteen (76%) panelists accepted the definition as written and 24% (5) indicated revisions we re needed. No panelists indicated that the definition was not helpful (Table H 23). Comments suggested the description was not clear and made a number of suggestions to improve understanding. Several panelist s suggested that examples would illustrate the b ehavior for assessment. The following is a sample of panel comments. . with seem to be just E.g. is 5 comorbid conditions too many for any student? At what point should we be expecting these advances student derived his or her answers to problems Consistency question : Nineteen (90%) panelists accepted the definition as written and 10% (2) panelists indicated revisions were needed (Table H 24) F ollowing is a sample of panel comments. frequency, as many students may obsess over how fast that they need to improve. Words like routinely
97 Deficient p erformance question : Thi rteen ( 62% ) panelists accepted the definition as written and 38% (8) i ndicated that revisions were needed (Table H 25) Panelists suggested that the performance as described was too narrow and made a number of suggestions to improve its utility Several panelists su ggested that examples would illustrate the behavior for assessm ent The following is a sample of comments. 1 . feedback might be changed to something like correction Everyone get s feedback. When the feedback isn't positive . there's the 2 patients (sentence one) to tasks 3 clinical reasoning is performed in an inefficient manner really means. What is an inefficient manner as it relates to reasoning? Actions can be inefficien t but it's not clear to me how reasoning can be inefficient Wording that comes to my mind includes reasoning below expectations or lack of clinical reasoning or clinical reasoning that is consistently incorrect 4 You may wish to include poor 5 6 Deficient with Novice Efficiency question : Twenty (95%) panelists accepted the definition as written and 5% (1) panelists indicated that revisions were needed (Table H 26) There were no comments from the panel Entry level performance question: Sixteen (76%) panelists accepted the definition as written W hile 19% (4) panelists indicated that revisions were needed an d 5% (1) panelists indicated, the definition was not helpful (Table H 26). Panelists made a number of suggestions. level performance should include complex patients. In most cases, entry level would indication that the student can
98 student with pharmacist Entry level or use another term. tudents at any level to have no clinical supervision. This needs to be reworded to A student whose recommendations are thorough, appropriate and in agreement with the preceptor's without coaching Excellent p erformance question: Si xteen ( 76%) panelists accepted the definition as written and 24% (5) panelists indicated revisions were needed (Table H 27). Panelists made a number of suggestions, in particular suggested the removal of Several panelists suggested the performance criterion was too high. The following is a sample of comments. : with simple or highly complex and . is consistently proficient and is capable of . . They are NOT capable of supervising others sentence in particular is the most troubling ( student is able to maintain a full time caseload and seeks t o assist others where needed. 5. I don't think it is appropriate for students at any level to have no clinical supervision. This needs to be reworded to A student whose recommendations are thorough, appropriate and in agreement with the preceptor's without coaching Performance c riterion question: All panelists a ccepted the definition as written (Table H 28) There were no panel comments.
99 Quality question: Seventeen ( 81%) panelists accepted the definition as written and 10% ( 2) panelists indicated revisions were needed (Table H 29). The following is a sample of comments. wrong term, wrong place, wrong definition. N o skill, limited skill, appropriate skill, excellent skill or somet Supervision question: Twenty ( 95% ) panelists accepted the definition as written and 5% (1) indicated revisions were needed (Table H 30). The following is a sample of panel comments. one size fits all licensed and still require a degree of supervision until they graduate. We are prep aring them for the world after graduation but they are not there yet even in their last rotation they still have requirements to meet before 4.1. 4 Round Two Panel From the 22 panelists in round one, only eight participated in round two. Thi s low number of panelists for the second was well below the recommended sample size discussed in literature. However, the principal researcher deviated from the original plan and took the unorthodox step to recruit additional preceptors directly into round two. The principal researcher contacted 28 pharmacy schools The study received positive responses from 14 schools to participate in the study and email invitations to their best preceptors. The researcher made r equests for schools to resend the email inv itations to the selected preceptors every two weeks for a maximum of three follow up contacts.
100 Reminder invitations were sent directly to these preceptors every two weeks for a maximum of three follow up contacts. Of the 71 individuals who completed the o n page, 66 panelist s completed the entire survey and one panelist partially completed the survey. This represents a 93% response rate based on the number of individuals who completed the on line consent form. The num ber of preceptors who were nominated by pharmacy schools to participate and chose not to participate in the Delphi panel is unknown. Panel demographics are listed in Table 4 3 ages ranged from 29 years to 63 years with a slight majority (52%) r eporting 40 years or younger. A majority (62%) of the panelists was female and (77%) earned a doctor of pharmacy (PharmD) degree. Several panel s possessed advanced degrees including MBAs, MPHs, and PhDs. The majority (54%) of panelist s reported that they h ad practiced pharmacy for 11 or more years. A majority (60%) reported precepting students for 10 or less years. The majority of panelist s (56%) indicated that their primary role with students was as a preceptor, followed by 24% who were Educational Coordin ators. Panel site demographics are listed in Table 4 4 73% percent of panelist s were affiliated with public institutions, while the remainder w as affiliated with private institutions. A majority (60%) worked in hospitals and the remainder worked in various settings. The majority (77%) of panelist s reported working in an urban setting. Panelist s represented 18 states and most were from Texas (20%) followed by Florida (15%). 4.1. 5 Domain Specification Results The first round of the Delphi panel gave strong results in address ing the domain specification of the construct Drug Therapy Evaluation and Development Except for
101 competency Skill F, panel agreement was greater than 86% percent. These results meet the thr eshold of panel endorsement, and domain questions were not carried over to the second round. 4.1. 6 Performance Criteria Results Items that failed to reach panel agreement of at least 80% agreement are presented again with the prior panel rating scores and informative comments. As previously described, this series of questions ask the Delphi panel to evaluate the performance criteria (e.g. checklist rating scale, and performance levels ) These items relate observed student behavior with a degree of competen t performance. The first set of questions relates to the relevance and representativeness of the performance criteria with each competency skill The next and concluding series of questions ask for evaluation of rating scale descriptions, and definitions o f the performance levels. Relevant question Are all the elements of the checklist and the rating scale behaviors relevant for assessing performance in S kill X ? Representative question Considering the universe of performance measures, are the elements of the checklist items and rating scale behaviors representative of the skills, knowledge, and attitudes required to assess performance in S kill X ? Descriptions of the performance criteria are shown in Appendices F and G. The Delphi panel was given the ch ecklist and a rating scale for each competency skill. The checklist is a description of activities the student needs to perform to complete an individual skill statement. The rating scale describes student behaviors for a particular performance level (e.g. excellent, competent, and
102 deficient). Student performances are to be rated across five performance dimensions: Supervision: the degree of assistance. Quality: the degree of knowledge and skill demonstrated. Complexity: the number of elements that must be considered. Consistency: the frequency of desired behaviors. Efficiency: the ability to perform effectively and timely. Skill A relevant q uestion: Of the 67 panelists in round two, 66% (44) indicated that the performance criteria were relevant. 33% (22) panelists recommended rev isions and 2% (1) respondent rejected the checklist items and the rating scale behaviors as (Table I 1). There were a number of recommendations; generally, that t he rating scale should reflect a greater emphasi s on the quality of data collected from the student patient encounter and recognition of the patient's chief complaint One panelist s uggested modifying the checklist to emphasize the identification of problems based on collecting an accurate and comprehen sive patient history and physical exam. The following is a sample of comments. 1 would meet all of the criteria for Excellent. If that is the goal, it is probably fine. But, if the goal is to make Excellent attainable, you may want to reconsider some elements (e g most new grads couldn't manage a full 2 ever ything, but not understand what is important or relevant to the main 3 which is under your descriptor. I would prefer changing the excellent section to competent the competent descriptor to above average. I think this is beyond what the typical student can accomplish in
103 the typical APPE rotations unless the student has the good fortune of having the best rotations and best rotation preceptors. For this exampl e, I think residency is required to meet the competent 4 Excellent -would a student really be expected to or have the ability of supervising others? I think that statement is unnecessary. I realize that it says capable of supervi sing others, but I think this is asking 5 I would include social, cultural and economic factors in the checklist for the history. The patient's ability to pay for a medication (i.e. insurance status) may contribute to the problem list and the plan. Additionally, social and cultural factors may need to be considered in a patient 6. Rating Scale: Our experience with our own eval process using a different instrument but similar scale is that preceptors avoid using the rating of Deficient We ended up with really just a 2 part rating scal e (Developing and Proficient). We changed our wording to Novice so that it would not sound so derogatory as Deficient Skill B representative question: Fifty three ( 79%) panelists indicated performance criteria as representative W hile 19% (13) pan elists recommended revisions and 2%, (1) respondent rejected the performance criteria (Table I 2). Panelists comments suggested a number of items to add drug related problems to the checklist, including dosage regimen, drug interactions, adherence, and compliance issues. They r ecommend that the checklist to include a medication taken by the pharmacy student. The following is a sample of panel comments. reconciliation 2 are good expectations for APPE students. Unfortunately, many of the APPE rotations a t our local pharmacy schools do no t emphasize these skills enough. Congrats if your rotations do so! 3 admission? With as big as medication reconciliation is right now, some significant drug re lated problems actually occurred prior to admission.
104 The student being able to identify that so the resumption of home meds perform a medication histo ry if not evaluated elsewhere as it is impossible to identify all potential medication problems without finding out Skill D relevant question : Fifteen (23%) panelists indicated that the perform ance criteria were relevant, w hile 55% (36) panelists recommended merging S kill s D E into a single skill. Within the combined D and E skill statement, 20% (13) panelists endorsed including severity of illness into the checklist. Three percent (3%) recommen ded including pharmacodynamic data into the skill statement (Table I 3). Collapsing the responses into two outcomes, 1) panelists accepted the performance criteria as written and 2) panelists recommend merging Skill D and Skill E into a single skill for assessment (Table I 4). Of the 66 panelists who answered question, 23% (15) accepted the checklist items and rating scale as written. Seventy seven percent (51) panelists recommended merging S kill D and Skill E into a single competency skill. Panelists mad e a number of suggestions to improve the checklist. C omments suggested changes to preceptor guidance and caseload in the rating scale. 1 How about considering therapeutic alternatives (i.e. the person should be able to identify all feasible pharmacothera peuitc alternatives available for achieving the therapeutic outcomes)? This should include both pharmacologic and non pharmacologic therapies. It seems that Skill set E should include a combination of drug allergy, drug drug, drug disease, interactions, pa tient age, organ impairment (renal/hepatic), adverse 2 get past your definition of competent to include the need for preceptor's guidance. Competence to me means able to complete independently. I agree with adding both severity of illness and pharmacodynamic data. The descriptions for this section are pretty vague compared to the previous skills and thus open to interpretation. I do not agree with combining this one with Skill E
105 3 clarified to suggest that alternative plans need to be con sidered when developing a treatment plan. For example, with the revised skill statement, what if a patient cannot afford the medication proposed in the initial plan? 4 Skill D representative q uestion: Thirty one ( 47%) panelists indicated the performance criteria as representative while 52% (34) panelists recommended removing the last three items in the Excellent rating scale. Two percent (1) panelist rejected the chec (Table I 5). Comments suggested that narrowly defined and would be applicable to a small minority of students. Some panelists suggested t rather than pharmacy students. The following is a sample of panel comments. 1 level practitioner" for any of the excellent ratings for any o f the skills. Very few students (less than 5%) graduate with skills at this level and most require time on the job or a residency to achieve ratings as you describe under the excellent rating. 2 s face it : economics play a part in therapy! We mus t be aware of 3 and beyond entry ealing with pharmacy residents I believe those last three items are relevant but for a student it may be too much. You may have some dynamic students but probably not as a standard for evaluating all Skill E relevance q uestion : Twelve ( 18%) p anelists indicated the performance criteria were relevant while 80% (53) panelists recommended merging Skills D and E
106 into a single skill. One (2%) panelist rejected the performance criteria (Table I 6). C omments suggested adding c ultural competence and p atient preference with the combined skill statement. Panelists performance is too narrowly defined and would be applicable to a small minority of students. The following is a sample of comments. for excellent, I would also the reviewer is exploring any non pharmacological measures that may be used or added to improve get enough of these experiences to a chieve any level of competence at Excellent and Deficient beyond expected of entry level practitioner for any of the excellent ratings for any of the skills. Very few students (less than 5%) graduate with skills at this level and most require time on the job or a residency to achieve ratings as you describe under the excellent rating. 4 social does not inherently include the concept of cultural competence, which should be considered for inclusion into this skill. Treatment regimens need to factor in for patient beliefs and expectations, their own healing traditions, and other cul Skill E representative question : Thirty one ( 47%) indicated the performance criteria were representative while 50% (33) panelists recommended removing the last three items in the Excellent rating scale (i.e. full time caseload, supervis ing others, beyond expected of entry level practitioner). 3% (2) panelists rejected the checklist (Table I 7). comments provided rational e for retaining or modifying the rating scale. 1
107 2 and beyond entry level part. I think you should leave the supervise others 3 as they are now repetitive across this set of 4 patient in a day is no success. Them doing it for 15 20 in a day (i e ., full caseload) is excellent . Skill G relevant question: Twenty two ( 36%) panelists indicated the performance criteria were relevant and 64% (42) panelists recommended revisions (Table I 8) A number of comments suggested the need for students to learn how to create written documentation for patients and healthcare professionals. The following is a sample of comments. with students is when to use appropriate terminology (lay terms versus medical terminology) with patients and health care providers. I think they should be able to communicate with each group in a way that the patients and other health care providers will think they are knowledgeable, educated, and 2 document, and then at the end of the rotations we assess what they actually did with our teaching, then we're getting somewhere. The multidisciplinary approach isn't additional, it's foundational. I think adding "other healthcare professionals" should come right after adding that they should use a pencil or pen . 3 for the patient and writing a true pharmacy care plan are two different things. We want the care plan to reflect the full drug therapy assessment completed by the student and to include all medical conditions being treated with medications. This is complet ely different from writing a patient information page for a patient, Skill G representative question: Thirty three ( 50%) indicated the performance criteria were representative while 48% (32) panelists recommended revisions Two
108 percent (1) panelist (Table I 9) Comments provided rational e for retaining or modifying the checklist and rating scale to include documentation for healthc are professionals. The following is a sample of panel comments. 2 w right a note appropriate for inclusion in the medical record, i.e., com municates the plan to other healthcare professionals. However, if skill E is not achievable for students, 3 boss that they can do great . but only in one or two patients??? No, the 4 time caseload/supervisors others/beyond expected entry level pract it ioner and fully agree. I would never expect a student or even a new graduate to carry a full time case load, nor would I expect them to be able to supervise others. I do not allow my residents to supervise students until I am comfortable with their abilities. I concur that all 3 of these items would be more appropriate in the Complexity question: F o rty ( 61%) panelists accepted the definition as written, while 38% (25) panel s recommend examples. One panelist (2%) indicated the definition was not helpful (Table I 10) One panelist suggested that the degree of complexity should be assessed for the duration of the individual rotation rather than the course of all the clerkship rotations. The following is a sample of comments. long as they are not assumed to patients/tasks/environment as a student moves through the APPEs? Our system is set up with a somewhat random match so it is possible a student may have a BMT rotation first and a community pharmacy rotation last this would not necessarily allow for increased complexity as defined above. Instead, perhaps complexity should be over the course of a given
109 rotation the studen t moves from simple to complex for each rotation with 3 evaluating in a consistent manner Deficient performance question: Twenty two ( 33%) panelists accepted the definition as written and 65% (43) panelists recommended revisions to the description. and should monitoring 11) would strongly recommend getting rid of or modifying the statement about performance reflects little or no experience to something keep the first sentence from the original (A student who requires close clinical supervision w ith constant monitoring and feedback, even with Deficient to Novice to xcellent. These statements are very well written and accurately describe what is deficient performance. Well Excellent performance question : Twenty four ( 36% ) panelists accepted the definition as written and 64% (42) paneli sts recommended changes to performance description (Table I 12) The following is a sample of comments. accomplish. I think too many of your excellent performance criteria are tho W e should be careful when using the term no clinical supervision I
110 4.1. 7 Summary of Delphi Panel Results The first line of questions address ed domain specification. Th is is to g ather expert agreement whether a particular competency skill is relevant in assessing Drug Therapy and Development competency. This first Delphi panel round gave strong results and agreement was greater than 86% for all competency skills, except for compet ency Skill F The Delphi panel commented that the backup plan described in Skill F does not play a relevant role in the clinical setting. The panel discussed how the activities described for skills D and E are intermixed and recommended combining these com petency skills into a single skill for assessment. drug performance characteristics (i.e. Skill D), and activities which consider patient preferences and social economic fact ors (i.e. Skill E) tend to perform jointly. Consistent with a clinical perspective of performance assessment, panel members inquired when and how the student would incorporate data obtained directly from the patient. That is, how will the student integrat e patient obtained data? Synthesizing medical information as described in Skill A is incomplete without including the role of patient acquired data. There was strong objection to some elements of the performance level Panel members concluded that duties in APPE rotations.
111 There was a strong theme across the Delphi panel of whether an item under evaluation was relevant in the clinical setting. This viewpoint reflects the unique perspective of clinical faculty in which performance assessment is intertwined with relevant clinical activity. Many panelists felt descriptions of many of the performance criteria were written in broad terms and that examples are needed. In other words, these training materials need to show the performance criteria in the context of the clinical setting. Video vignettes and case studies would help train preceptors in performance assessment. 4.2 Phase II Video Simulation 4.2.1 Expert P anel An expert panel of preceptors was recruited from the Phase I Delphi panel. Out of the nine preceptors who agreed to join the expert panel, only seven completed all the tasks. D emographics of the expert panel are listed in Table 4 5 The participant ages ranged from 30 to 66 years with a majority ( 57 %) reporting 40 years or younger. A majority ( 71 %) of the participants was female and all ( 100 %) held a doctor of pharmacy (PharmD) degree. All participants possessed advanced degrees or BPS certifications The majority ( 86 %) of participants reported that they had practiced 10 years or fewer A majority (8 6 %) reported precepting students for 10 years or fewer The majority of participants ( 8 6%) indicated that their primary role with students was as a preceptor, followed by one participant (14% ) who reported Clinical Coordinator as their primary role with students. The site demographics are listed in Table 4 6 All the participants (100%) were affiliated with public institutions The largest group ( 7 1%) worked in hospitals and the remainder work ed in various settings including Clinical Decision Support services,
112 representing organizations covering 2.5 million beneficiaries The majority ( 86 %) of participants reported working in an urban setting. Participants represented five states and most res pondents were from Alabama (29%) and Texas ( 29 %). 4.2.2 Expert P anel R esults Using an internet based WEB site 138 members of the expert panel watched the video vignettes and evaluated the performances using the new rating scale. These evaluations took place two to three months after the last round of script edits. Based on these rating scale scores, performance leve ls were derived using a previously described rubric (see Appendix G). 4.2.2 .1 Diabetes Case Study Results The target performance levels for the Diabetes cases are presented Table K 1. Student pharmacist Mary in video Vignette #1 was intended to illustrat e excellent performance for all three skills. Meanwhile, student pharmacist Thomas in video Vignette #2 the goal was to illustrate entry level performance for all three skills. In video Vignette #3, student pharmacist Susan was scripted to illustrate defic ient performance for all three skills. The assessments by the expert panel are in Table K 2. The bookends of targeted performances levels matched the expert panel rating scores. However, there level performance. One panelist commented that the videos effectively illustrated the difference between excellent and entry level performances, as compared to simply reading the scripts. armacy Thomas (the entry well
113 above average compared to their pharmacy students. T he following is a sample of particip ant comments. 1 [ Mary vignette # 1 ] I rarely/never have students perform at this level. It is challenging to get residents to perform at this level. I sure would love it if students could perform this well, and residents too! 2 [Thomas vignette #2] I think you made the difference between excellent and entry clearer in the videos compared to the scripts. I remember that the entry students knew almost every answer. Having them not know some things is more entry level. 3 [Thomas vignette #2] till we ll above average for students 4 [Susan vignette #3] his is still better than about half the students we get, and I consider this level of performance to be lower than acceptable 5 [Susan vignette #3] As with skill B, I think this was supposed t o be novice but in this area I felt like the student was not as good as Thomas in V ideo #2 maybe he should have been more "excellent" in my ratings. I thought the student here just continued to demonstrate a lack of confidence and some lapses in judgme nt/information. 4.2.2 .2 Heart Failure Case Study In the Diabetes cases, the target performance level for each skill was uniform level, or deficient for each competency skill port rayed by the student actor. However, the Heart Failure vignettes were randomized and the target performance levels for the Heart Failure cases vary for each skill illustrated by a given student. The panel was blinded to this randomization. The expert panel assessments are in Table K 4 and show mixed results among all the Heart Failure video vignettes. One panelist commented that they were able to recognize differences in performance levels between the three competency skills illustrated within a given stud ent vignette. However, there were several comments about how panelists struggled
114 to make fair assessments based on a single case presentation. One expert panelist acknowledged how previous student performances influenced their rating scores of following st udents. T he following is a sample of participant comments. 1. [Patricia #4] nconsistent level of understanding s he often asks for help, which indicates to me she is aware of deficiencies and is trying to correct them. 2. [Patricia #4] I had a harder time distinguishing this [Skill A] I t seemed to be on the entry/novice line. 3. [Patricia #4] I think efficiency is difficult to judge sometimes from these limited interactions (one case presentation, however, preceptor should be able to use overall ex periences to rate that too). 4 [Joseph #5] I think this was done very well . overall the only reason I hesitated with this was sometimes the student provided too much information which seemed to make the message of what he was saying be slower so I was struggling with whether that was entry level instead of excellent. 5 [Josep h #5] The change in performance/knowledge between skills was recognizable. 6 [Dorothy #6] This student did not seem as proficient in this section. Again, as with the first section, I think it was in the way the student delivered the information that was made it seem entry level or even novice; however, I feel I chose entry level because the skills here were blurred some (in my mind) vs. the other sections so I gave the benefit of the doubt this is what I actually do on rotations. 4.2.2 .3 Anticoagu lation c ase s tudy Like the Heart Failure case study, the Anticoagulation case study vignettes were randomized and the target performance levels for the three competency skills vary. The performance targets are presented in Table K 5. The assessments by th e expert panel are presented in Table K 6 and again show mixed results. As with the Heart Failure cases, there are several comments relating to the struggle experienced in making the
115 assessments. One panelist acknowledged referencing a copy of the scripts while 1. [Linda # 7] Although not inefficient in this category it was not as efficient as the presentation of the pt initially.[using the performance of Skill A to influence the assessmen t of Skills B and D/E] 2. [Linda # 7] I felt this was in between entry level and excellent I felt at times it was "excellent" but maybe I leaned toward entry because of the performance in the previous skill. 3. [Barbara #9] I'm not sure this was co mpletely inefficient but maybe not as focused or fluid as it should have been so I thought it on the verge of entry/novice. 4.2. 3 Summary E xpert P anel R esults There were twelve out of twenty seven competency skills illustrated in the video vignettes with unanimous agreement or had a single dissenting panelist. These twelve competency skills matched the intended performance level from the script. The following is a list by competency skill: Skill A: Mary (#1), Susan (#3), Dorothy (#6), Linda (#7), Barbara (#9) Skill B: Mary (#1), Susan (#3), Barbara (#9) Skill D/E: Mary (#1), Susan (#3), Patricia (#4), David (#8) The assessments for all three competency skills for Mary were unanimous. The assessments for Susan in the Diabetes cases were also unanimous. Mary deficient. P deficient.
116 Several factors may have contributed to different assessments within the expert panel and from the performance levels, the videos intended to portray. There were several com ments about how panelists struggled to make fair assessments based on a single case presentation. These panelists were intimately involved with editing the scripts and devising the behaviors that represent each element in the rating scale. However, there w ere several months between the last round of script edits and assessment of the video performances. This time span would have diminished an intimate understanding of the performance criteria and rating scale. The video vignettes were long and played contin uously, taking over 70 minutes to view. Panel members were able to view the videos at their convenience. Nevertheless, there were 27 individual competency skills for evaluation and this is a large number of video vignettes to watch and assess for a group o f busy professionals. Some panelists mentioned that they experienced difficulty recognizing when one competency skill ended and the next competency skill started. Other panel members commented how they were readily able to observe different levels of performances between compet ency skills. The principal researcher did not perform a debriefing to collect detailed information linking specific student behaviors illustrated in the video with rating scale items or recognition of individual competency skills. There were several comments suggesting that performances of previous skills within the same vignette influenced rating scores. Each competency skill presented should have been scored based on its own performance criteria. In fact, one panelist used a copy of the script to aid in scoring the performances. However, the panel was blinded to the randomization and any panelist who used their copy of the script would
117 have given inaccurate rating scores. The rating scores for this panel ist were included in analys is since it is not known how many other panelists used their copy of the scripts while scoring the video vignettes. 4.2. 4 Preceptor Panel Pharmacy preceptors participated in the study through an internet portal hosted by the Texas Pharmacy Association (TPA ). T he Accreditation Council of Pharmacy Education (ACPE) accredits TPA as a provider of continuing pharmacy education and t his three hour program was accredited for 3.0 Contact Hours (0.3 CEUs). The CE course started with a 20 minute introduction to asses sment concepts and the new performance criteria. Participants then watched nine video vignettes and scored the performances using the new performance criteria. Participant demographics are listed Table 4 7 Forty two participants completed the program out of the seventy four who registered. Only data collected from participants who scored all the video vignettes were included for analysis. Participant ag es ranged from 26 to 61 years with a majority (62%) reporting 40 years or younger. A majority (67%) of t he participants was female and (71%) held a doctor of pharmacy (PharmD) degree. Sev eral participants possessed adv anced degrees and certificates including MS s and PhDs. The majority (53%) of participants reported that they had been practic ing pharmacy for 10 years or fewer A majority (84%) reported experience precepting students for no more than 10 years. The majority of participants (76%) indicated that their primary role with students was as a preceptor, followed by 5% who were Educational Coordinators. Preceptor site demographics are listed in Table 4 8 Sixty nine percent of participants were affiliated with public institutions, while remaining participants were
118 affiliated with private institutions. The largest group (41%) practiced in hospitals and th e remaining practiced in various settings. The majority (67%) of participants reported their practice resided in an urban setting. Participants represented five states and the majority of participants practiced in Florida (81%) followed by Texas (12%). 4. 2. 5 Preceptor P anel R esults Figure F 2 illustrates the data collection and analysis pathway previously described. Participants scored student video performances based on the new performance criteria. Using a previously described assessment rubric in append ix G, shows how student performance levels are calculated and assessment results are shown in Appendix L. Reliability and accuracy estimates comparing competent vs. non competent assessments and competent vs. excellent assessments are evaluated. A comparis on between the rating scale and a proxy for the supervision scale is evaluated. 4.2. 5.1 Estimate of Reliability T he sample size calculation used in this study assumed ICC estimate s would be equal to or greater than 0.5. Unfortunately, this assumption was n ot realized and the researcher was unable to estimate reliability of each of the three competency skills for each student video vignette as originally planned. However, reliability estimates of the three competency skills by case study offers an opportunit y to observe important ICC trends. The ICC estimates the degree of agreement among preceptor assessment s. Higher ICC values reflect greater inter rater reliability among preceptor assessments. The Landis & Koch 150 results. However, these divisions are arbitrary and are used only for the purpose of discussion.
119 Table M 1 shows the competent vs. not compe tent reliability estimates for each competency skill and is broken down by case study. The ICC point estimates range from 0.37, 0.31, and 0.30 for each competency skills A, B, and D/E respectively. According to previously described guidelines 149 these val ues indicate fair inter rater reliability. Each point estimate has similarly wide 95% confidence intervals (CI) and ranges from 0.15, representing slight reliability, to 0.69, representing substantial reliability. The ICC point estimates and 95% CIs of th e excellent vs. entry level comparison (Table M 2) are generally lower. The 0.24, 0.19, and 0.19 ICC point estimates indicated fair to slight inter rater reliability for s kill s A, B, and D/E respectively The 95% confidence intervals are wide and range fro m a low 0. 0 9 indicating slight reliability to 0.55 representing moderate reliability. 4.2. 5.2 Influence of Randomization The video vignettes were presented in the same order shown in the tables. Diabetes cases were first, followed by Heart Failure, and concluded with the anticoagulation cases. Although there may be some weariness due to the three hour long program, the inter rater reliability trend is inconsistent with participant fatigue. There were noticeable differences depending on the case study. T he Diabetes cases have the highest inter rater reliability estimates and slimmest (95%) CI ranges for all three competency skills. The lowest reliability estimates were from the Heart Failure cases and some ICC results were not statistically significant. T he ICC results increased slightly with the Anticoagulation cases. This trend is consistent for the competent vs. not competent and the excellent vs. entry level comparisons. The CI ranges are wide and none of the ICC results is significantly different from each other.
120 T he results suggest preceptor panel ist s were able to assess performances in the non random video vignettes (Diabetes) with a greater degree of reliability compared to the random ized vignettes (Heart Failure and Anticoagulation) 4.2. 5.3 Ratin g Scale Items The rating scale scores student performances on five domains of behavior (see Appendix G). Specific scores are associated with different levels of performance (e.g. excellent, entry level, and deficient) by an assessment rubric. Tables M 3, M 4, and M 5 give the ICC values for each rating scale item for Skill A, Skill B, and Skill D/E respectively. The rating scale items for the Diabetes cases have the highest inter rater reliability estimates and the slimmest (95%) CI ranges. The lowest relia bility estimates were from the Heart Failure cases and some results were not statistically significant. The rating score reliability results increased for the Anticoagulation cases. In all cases, the 95% CI ranges were wide and none of the rating scale ite ms was significantly different from each other. No particular rating scale item showed observable trends that crossed over the three competency skills. 4.2. 5.4 Global Assessment The principal researcher asked the preceptor panel to give their global impression Specifically the level of performance (e.g. excellent, entry level, and deficient) for each of the 27 competency skills they watched. Table M 6 shows the global assessment inter rater reliability estimates comparing the competent vs. not compet ent assessment. The inter rater reliability was higher for the non random Diabetes performances compared to the r andomized cases. The previous reliability estimates also show this general trend. This similarity suggests that the rating scale, to some degre e, reflects the
121 4.2. 5.5 Estimating Accuracy The Fisher Exact test is used to estimate accuracy and is used to compare the preceptor assessment with the expert panel. 151 Low p values indicate that the preceptor panel assessed the performances differently than the expert panel. The principal researcher selected p values equal to or less than 0.10 as the signif icant threshold. Given that the 0.10 value is arbitrary, the study will note p values up to the 0.30 level. Accuracy results are shown in Table s N 1, N 2, and N 3 for competency Skills A, B, and D/E respectively. 4.2. 5.6 Competency The expert and the prec eptor panels appear to assess competency differently. Figures N 1, NO 2, and NO 3 show the expert panel rating scores resulted in tougher assessments compared to the preceptor panel. Forty nine (49%) percent in Skill A, forty eight (48%) percent in Skill B and fifty two (52%) percent in Skill D/E were assessed deficient by the expert panel. In contrast, more student performances were found competent by the preceptor panel. Rating scores from the preceptor panel resulted in, twenty two (22%) percent in Skil l A twenty nine (29%) percent in Skill B and thirty (30%) percent were assessed not competent. The Fisher Exact test results for the competent vs. not competent comparisons are shown in the middle column of the tables of Table s N 1, N 2, and N 3. For comp etency skills A, B, and D/E, the p value s for Thomas range from 0. 06 to 0.15. A ll seven (100%) members of the expert panel considered performance competent for all three skills This is in contrast to the mixed assessments from the preceptor panel There were negative comments about the Metformin dose recommended in contradiction with the expert panel, which approved the script.
122 For competency in Skill D/E, the low p value (0.03) for Patricia gives strong assessments are different from the expert panel All seven (100%) members of the expert panel considered On the other hand, only twenty four ( 57 %) preceptor s out of forty two gave the same negative assessment. Unlike the r eliability results, no trends across case studies were observed; nor were any trends across the three skills. Summary comments on the competency accuracy results are the same for the excellent accuracy results and are found at the end of the following sect ion. 4.2. 5.7 Excellence Figures N 1, N 2, and N 3 show little overall difference between the expert and level performances. Thirty one (31%) compared to thirty (30%) percent were assesse d as excellent in Skill A, for the expert and preceptor panels respectively. For competency Skill B, twenty six (26%) and twenty seven (27%) percent were assessed as excellent, for the expert and preceptor panels respectively. For competency Skill D/E, twe nty three (23%) were assessed excellent A for both panels. The Fisher Exact test results for the excellent vs. entry level comparison results are shown in the far left column of Tables N 1, N 2, and N 3. For competency Skills A, B, and D/E, the p value s fo r Mary ranged from 0. 14 to 0.18. A ll seven (100%) members of the expert panel considered performance excellent This is in contrast to the mixed assessments from the preceptor panel. One t, however disagreed with
123 the exception of the drug therapy recommendation. Strongly s uggesting d iffer ing assessments may be related to disagreements on technical aspects of the drug therapy. For competency Skill A p value (0.10) gives evidence th at the preceptor by chance of sampling alone. Six (86%) out of seven expert preceptors, determined performance was excellent In contrast, only 25 (60%) preceptors gave the same outstanding as sessment. The low p value (0.07) for assessments are different from the expert panel for competency Skill B The expert ( 14 %) expert determined Dor performance was excellent, one (14%) assessed entry level, and the assessments show a wide variation in the expert and preceptor panel assessments For both the competency and excellent comparisons, many competency skills show a wide variation in the expert and preceptor panel assessments. This variation suggests the preceptor panel experienced the same difficulty assessing the performance in the video vignette as the exper t panel. In some cases, the video vignettes may not effectively portray the intended performance level and limits interpreting results. On the other hand, some assessment variation appears related to different standards of medical care among the preceptors and with the expert panel. These comments present strong evidence of rater inconsisten cy. 152, 153 4.2. 5.8 Rating Scale Comparison The rating scale scores student performances on five domains of behavior. The SUCCESS instrument asked the preceptor to sore the level of supervision required for
124 each competency skill. The principal researcher created a surrogate scale to approximate the SUCCESS instrument by collapsing the responses for levels of supervision from the rating scale. This proxy scale will be used to estimate the three supervision scale items currently used in SUCCESS. The Fisher Exact test will quantify differences between the two scales. Results for the competent vs. not competent and the excellent vs. entry level comparisons are shown in Ta ble s O 1, O 2, and O 3 for competency skills A, B, and D/E respectively. The Fisher Exact test shows that most of the competent vs. not competent comparisons are significantly different. The two scales appear to assess competency differently. Figures O 1 O 2, and O 3 show the proxy scale was more lenient than the new rating scale. Seventy eight (78%) percent in Skill A, sixty six (66%) percent in Skill B and seventy one (71%) percent were assessed competent with the supervision proxy scale. On the other hand, fewer student performances were found competent according to the new rating scale. With the new rating scale, fifty six (56%) percent in Skill A, fifty three (53%) percent in Skill B and fifty six (56%) percent in Skill D/E were assessed as competent Most of the excellent vs. entry level comparisons are not significantly different. The assessment results in figures O 1, O 2, and O 3 show little differences in the percentage of excellent assessments. However, there the Fisher Exact test and the graphs show that entry level and deficient assessments are different between the two scales. 4.2. 5.9 Preceptor Comments Although the Diabetes vignettes may have more effectively illustrated student performance compared to the remaining randomized cases, t here ar e other factor s to
125 consider. In the Diabetes cases, the target performance levels were purposely selected. In each Diabetes vignette, all the competency skills were presented at the same level (e.g., excellent, entry level, or deficient). On the other hand the Heart Failure and Anticoagulation vignettes were randomized and preceptors could observe a student with an excellent performance for competency Skill A followed by a deficient performance for competency Skill B. Participants were verbally notified wh ether they were about to watch a video vignette with randomized performance levels. However, if participants were overwhelmed with a great deal of information presented in the webinar or simply misunderstood the directions, this could have lead to assessme nt errors. T he e xpert panel previously mentioned difficulty identifying when an activity for one competency skill was completed and an activity for the next competency skill started in a video. Although this is a real life challenge faced by all preceptors participants in this study may have nevertheless experienced similar difficulty untangling student behavior associated with a specific competency skill. To help mitigate this issue, t he first line of dialog for each skill was given to the preceptor panel D ifficulty recognizing behavior associated with any specific competency skill would increase assessment variation. These errors would affect assessment results with the randomized cases to a greater degree than the non randomized cases. There were severa l comments suggesting that performances of previous skills within the same vignette influenced the process of rating the student Comments performance, strongly suggest presence of the halo effect. 153, 154 The halo effect has
126 assessment on other performances, resulting in a failure to discriminate among different competency skills. 153, 155, 156 Rater inconsistency criteria is inconsistent with other preceptor s 152, 153 There were contradictory comments between preceptors on student performance within the same co mpetency skill. One Another example of contradictory observations resulting in differin g assessments can be seemed unsure of some These comments strongly suggest the presence of inter rater inconsistency. There were many comments indicating different standards of medical care among t, however disagreed with the drug therapy recommendation. There were negative comments about the Metformin dose recommendation by Thomas. Both comments are in contradiction with the expert panel that approved the script. Another example of contradictory m information gathering skills, very thorough analysis, demonstrates advanced knowledge SrCr was an issue or at least didn't mention it; also stated ASA was being used as anti preceptor complemented the thoroughness of the student (David) key role of
127 herbal interactio ns with the A nticoagulation case. This was followed by another comment on the carelessness of the student (David) for missing DI of Coumadin/Amiodorone again Preceptor comments suggested there were disagreements on the technical aspects of the drug therapy shown in the vignette and strongly suggested d ifferent standards of medical care between preceptors and with the expert panel. This is an important source of inter r ater inconsistency Rater inconsistency criteria is inconsistent with other preceptor s 152, 153 Preceptors may have assign ed high ratings to students who deserve low ratings and low ratings to those who deserve high ratings This would increase assessment variation and reduce inter rater reliability results, and limit interpretation of accuracy results. 4.2.6 Summary Preceptor Results The ICC measure s ent of students scores. This test quantifies whether assessment reliability among preceptors is greater than by chance alone. In this study, t he null hypothesis is H 0 alternative hypothesis is H a The competent vs. not competent ICC p oint estimates range from 0.37, 0.31, and 0.30 for competency Skills A, B, and D/E respectively (Table M 1). Each competency skill has a similarly wide 95% confidence interval and ranges from 0.15 to 0.69. The ICC point estimates of the excellent vs. entry level comparison (Table M 2) are lower and range from 0.24, 0.19, and 0.19 for s kill s A, B, and D/E respectively The 95% confidence intervals are wide and range from a low 0. 0 9 to a high 0.55. These results are greater than zero and indicate that assessment is greater than chance alone.
128 The global assessment ICC estimates were h igher for the non random Diabetes performances compared to the r andomized cases. The ICC estimates with the rating sca le also show this trend. This similarity suggests that the rating scale, to some H alo effect, rater inconsistency, large number of video vignettes, and the degree of famili arity with performance criteria would increase assessment variation. This would lower ICC point estimates and w iden the 95% CI for all competency skills assessed. These factors diminished the ability of the preceptors to reliably differentiate between comp etent vs. not competent student performances and between excellent vs. entry level student performances One of the aims of this study was to establish benchmarks to facilitate comparison of alternate assessment instruments. Literature suggests measures o f inter rater reliability in the clinical setting are often low and estimates of 0 .80 or above are rarely achieved. 106 Studies of medical residents 157, 158 have shown inter rater reliability estimates ranging from 0.79 0.87 and 0.29 0.42 for medical students. 159 162 Two physical therapy studies reported inter rater reliability estimates ranging from 0.50 0.84. 163, 164 The study reported single measure reliability estimates with a two way random effects ICC model. Study data is from summative assessments of cross sectional snap shots of st udent performances. Examination of inter rater reliability studies will illustrate a sample of challenges faced comparing reliability estimates among studies. The Maxim and Dielman (1 9 97) 157 study of third and fourth year medical students used the kappa t ype statistics described by Landis and Kosh. 150 In another study of
129 medical students, Daelmans and colleagues (2005) 158 reported a one facet design with preceptors nested within students to estimate variance. No details of the computational algorithms were incl 159 studied assessment of internal medicine residents and used a standardized mean agreement to estimate inter rater reliability. The paper describes the standardized mean a greement is derived by calculating all variables without weights and by weighting each subject's contribution by the square root of the number of evaluations. These studies use fundamentally different computational methods and these differences limit the a bility to draw meaningful comparisons. Several medical resident studies use ICCs to estimate inter rater reliability of preceptor assessments. Kwolek and colleagues (1997) 160 evaluated clinical performance of surgical residents. An average of seven preceptors rated 72 re sidents he inter rater reliability of the mean overall performance was 0.82. During and colleagues (2003) 162 evaluated reliability among preceptors who used the American Boa rd of Internal Medicine Monthly Evaluation Form Preceptors evaluated 15 competency skills every month for one year. Reliability among preceptors for each competency skill assessed for a particular resident was evaluated and the study reported ICC result s as high as 0.80. These two studies used repeated assessments of student performances. These results are difficult to compare since they do not describe the ICC models used lack results at the individual competency level to facilitate for comparison with this study, and the repeated measures design limits the ability to make useful comparisons.
130 In 2002, the APTA published an evaluation of the student clinical performance instrument (CPI). 163 Two preceptors assesse d student performances on a visual analog scale representing degree of supervision. The single measure ICC was reported for each of the 22 competency skills. ICC point estimates ranged from 0.35 0.89 and confidence intervals were not reported. There are a number of competency skills similar to the three competency skills evaluated in this study. Student performances were observed over the four week clerkship. A pair of preceptors gave a single summative assessment of the competency skills at the end of th e clerkship. The number of students rated is included below. Applies principles of logic and scientific method to the practice of physical therapy (ICC=0.64, students rated=23) Screens patients using procedures to determine the effectiveness of and need fo r physical therapy services (ICC=0.41, students rated=17) Evaluates clinical findings to arrive at a physical therapy diagnosis (ICC=0.64, students rated=25) Designs a physical therapy plan of care that integrates goals, treatments, and discharge plan (ICC =0.61, students rated=25) Meldrum and colleagues (2008) 164 evaluated the inter rater results reliability of student assessments by physiotherapy preceptors in Ireland. Student clinical performances were assessed using a standardized assessment form. One summative assessment by two preceptors was completed for each of the 12 rotation periods Scores were based on behaviorally anchored rating scales Competencies covered three broad areas and p atient management area contains many competency skills
131 similar t o this study. The study reported an ICC of 0.75 for patient management but did not report the 95% CI. The two physical therapy studies used the single measure reliability for a two way random effects ICC model as used in this study. The inter reliability r esults in this study are comparably lower. The Fisher Exact test is used to estimate accuracy and gives th e probability of whether assessments were unevenly distributed between the preceptor and expert panels. 151 This test will show whether typical preceptors assess students as compared to the expert panel beyond chance alone. The null hypothesis was H 0 : the probability of observing no difference 10% in the score s between preceptors and experts. The alternat ive hypothesis is Ha: the probability of observing a difference > 10% between preceptors and experts. Results are in Table s N 1, N 2, and N 3 for competency skills A, B, and D/E respectively. Unlike the reliabi lity results, no discernable trends were observed. Interpreting the accuracy results is related to the degree of agreement among the expert panel themselves. Descriptive statistics show preceptor assessments were more lenient than the expert panel. However t he wide variation of expert panel assessments dilutes the use of these assessments as a standard for comparison. This limits the interpretation of the accuracy estimates of the preceptor panel assessments. The study compared the rating scale with a prox y for the SUCCESS assessment strategy. Using the reposes from the same preceptors, this line of analysis gave the opportunity to compare the effect of the scale on assessment outcomes. The two scales appear to assess competency differently. The Fisher Exac t tests showed the proxy scale was more lenient when assessing competency as compared to the new rating
132 scale. On the other hand, most of the excellent vs. entry level comparisons were not significantly different.
133 Table 4 1. Delphi p anel ( r ound o ne) d emographics
134 Table 4 2 Delphi p anel ( r ound o ne) practice site characteristics
135 Table 4 3. Delphi p anel ( round two ) d emographics
136 Table 4 4. Delphi p anel ( r ound two ) practice site characteristics
137 Table 4 5 Expert panel d emographics
138 Table 4 6 Expert panel practice site characteristics
139 Table 4 7 Preceptor panel d emographics Participants may have more that on degree and the percent column may be greater than 100%
140 Table 4 8 P receptor practice site characteristics
141 CHAPTER 5 D ISCUSSION 5.0 Overview This study sought to show evidence supporting a proof of concept of a method to gather evidence demonstrating whether an APPE assessment program has content validity, and clinical faculty is able to make valid and reliable assessments. In this study of the content validity of these programs, nominated preceptors evaluated and refined the domain specification and performance criteria for competent performance in Drug Th erapy and Development. This study successfully outline d a process to collect and analyze assessment data within a controlled environment. In the process of estimating reliability and accuracy, the study created performance benchmarks for future comparisons The results demonstrated a need to continue review and refinement of relevant competency skills and performance criteria. 5.1 Phase I Delphi Panel A content validation study is the first step in accumulating evidence of validity and links the hypothet ical construct of competency with observable student behaviors. Haynes, and colleagues ( 1995 ) 34 affirms the importance of content validation and argues, "A construct that is poorly defined, undifferentiated, and imprecisely partitioned will limit the content validity of the assessmen t instrument." The Delphi panel conducted a content validation study and addressed two main subjects : 1) domain specification and 2) performance criteria. Specifically, the panel identified competency skills relevant for the assessment purpose. The n the panel evaluated and refined the performance criteria used to assess the se competency skills (e.g. checklists, rating scales, and performance levels).
142 comments. The first theme was ev aluating whether each element of the domain specification and performance criteria was relevant in a clinical setting. The second theme was the role of the patient in providing clinical information, such that the pharmacist may integrate this information i nto in an evaluation of the current and proposed medication regimens. Third, panelists felt descriptions of many of the performance criteria were written in broad terms. The panel recommended descriptions of performance criteria in the context of the clini cal setting. This study showed that preceptors were able to evaluate and refine hypothetical construct s of competency and associated performance criteria. 5.1.1 Domain Specification Domain specification is the first task in a content validation study 34, 100 103 T h e Delphi panel evaluated the relevancy of the s even competency skills in Drug Therapy and Development (see Appendix B). The panel recommended joining Skill D and Skill E together However the panel did not recommend Skill F The Delphi panel commented tha t the backup plan described in Skill F does not play a relevant role in the clinical requirement to speculate about possible future problems arising from the primary th erapeutic plan. Future problems are considered, but this process is an evaluation of benefits and risks towards the therapeutic objective. The panel asserted that alternative therapeutic plans tend to rely on important information that is usually unavailab le during the development of the primary therapeutic plan. This is because symptoms and signs that account for the failure of the primary therapeutic plan would not be evident before
143 implementation of the primary plan in the first place. Theoretical backup plans and identification of likely problems may be more appropriate in the classroom setting. The panel discussed how the activities described for skills D and E are intermixed and recommended combining these competency skills into a single skill for asse ssment. drug performance characteristics (i.e. Skill D ), and activities which consider patient preferences and social economic factors (i.e. Skill E ) tend to perform jointly. This viewpoint reflects the unique perspective of clinical faculty in which performance assessment is intertwined with clinical relevance Studies by Bondy (1983) 66 and Lankshear (1993) 85 suggest that educators who develop assessment instruments conceptualiz e competen t performances differently than practicing clinicians This difference reflects different values Educators and clinicians have similar concepts of passing students. However, clinicians distinguish relevancy by evaluation whether the competency skills or performance criteria are necessary in the cl inical setting. The performance criteria are meant to quantify performance of competencies outlined in the initial CAPE Educational Outcomes report published in 1998. 165 This document lists the required competencies of graduating pharmacy students from all schools of pharmacy. Leading figures in the pharmacy profession developed this document after nine years of diligent work. Th is report categorizes the practice of pharmacy into five general Professional Practice Based Outcomes and seven General Ability Based Outcomes. The report also included 34 skill statements describing the Professional Practice Based Outcomes and the General Ability Based Outcomes with an additional 84 specifications describing these skill statements. These competencies
144 formulate the attributes of a competent pharmacist. The large number of highly specified competency skills in the CAPE Educational Outcomes r eport is consistent with the large number of highly specified competency skills created by leaders in other healthcare professions. 41, 166 168 Huddle and Heudebert (2007) 169 disagree with the use of highly specified competency skills in clinical performance assessment, a practice that is common in the healthcare professions. The authors claim that highly specified skill statements tend to disregard the holisti c connection between the practitioner and any particular clinical activity. 169 Huddle and Heudebert argue that healthcare practitioners are engaged in a process of responsiveness to a clinical situation in which perception may lead to the appropriate intervention or may lead to proposition al reflection. This act of reflection problem. Consistent with this focus on clinical activity, ten Cate and Scheele (2007) 170 proposed that students demonstrate their degree of clinical competency as the preceptor feels the student is able to handle increa singly more demanding clinical duties. The recommendations of the Delphi panel reduced the number of competency skills from seven to five. Pharmacy schools assess students for many more competency skills than evaluated in this study. Th is researcher claim s that evaluating competency skill definitions according to a clinical activity viewpoint may reduce the total number of activities needed for APPE assessment. This clinical viewpoint is consistent with the the provis to the Commission to Implement Change in Pharmaceutical Education. 13, 14
145 5.1.2 Performance Criteria The Delphi panel evaluated the performance criteria, which includes checklists, rating scales, and performance levels. Although there was spirited d iscussion of specific elements, the checklists and rating scale were generally accepted by the panel. Although panel members shared diverse recommendations, a number of important recommendations were suggested. Consistent with a clinical perspective of performance assessment, panel members inquired when and how the student would incorporate data obtained directly from the patient. That is, how will the student integrate patient obtained data into the medical info rmation described into their knowledge of the patient history and examination results? Panel members pointed out the necessity for students to realize the limitations in medical documentation and recognize the need to incorporate information from the patie nt or proxy. One comment noted that home medication reconciliation is important in synthesizing medical information and that medical documentation might have an incomplete home medical history. Therefore, a synthesis of medical information as described in Skill A is incomplete without including the role of patient acquired data. There was strong objection to some elements of the performance level was widely rejected. This performance criterion was judged as inappropriate for student Panel members conclud ed that these elements were not relevant to student pharmacists. The panel asserted that these performance level descriptions were more
146 Not a single panel member voiced an objection to using checklists and the re were many helpful suggestions for improvement. These comments suggested a need for fundamental change s to the checklist format. As it is, the checklist describes a systematic process that a student should complete: a list of necessary steps the student must complete to demonstrate competency in a specific skill. Instead of outlining every activity, panel members suggested a benefit & risk approach to rational drug therapy, referencing appropriate evidence based guidelines. These suggestions are consisten t with the clinical viewpoint observed by this Delphi panel, where in each activity for assessment is related to activities actually performed in the clinical setting. Performance levels describe behavior that is less than expected, expected, or beyond the expectations of an entry level pharmacist. Descriptions of rating scale items are written in broad terms and many panel members requested materials to help them understand the rating scale in the context of the clinical setting. Video vignettes and case st udies describing differences in performance levels and associated rating scale scores would help train preceptors with student assessment. 5.1.3 The Delphi Panel The Delphi method was used to facilitate agreement on a complex subject among a panel of exper ts. 124 There are no rigorous guidelines for inclusion criteria, panel size, and agreement criteria. 116 120, 124 128 The principal researcher recruited 22 preceptors from the original list of 10 pharmacy schools in round one T he principal researcher wa s unable to increase panel size, never less the number of panel members was still within the 15 30 members planned. However, only eight panel members choose to participate in the second round. This number of participants was well below the recommended pane l size. With these constraints in mind, the principal researcher took the unorthodox
147 step to recruit additional panel members directly into round two. The principal researcher contacted additional pharmacy schools to nominate additional expert preceptors a nd welcomed 58 new panel members directly into the second round. The panel for the second round had 66 members. Since the study used the reactive Delphi model, new members were readily incorporated into the existing panel. New and existing panel members we re able to contribute valuable insights. The principal researcher sought to recruit pharmacy preceptors with expertise in drug therapy and proficiency precepting students. The study sought to attract a mix of Board of Pharmaceutical Specialists (BPS) and non BPS practitioners representing a variety of practice settings, regional locations, and educational institutions. Panel d emographics and practice site characteristics were collected Questions colle cting information about gender, age, degree, years of practice, years precepting students, primary role with students, institution type, practice site characteristics and population (of practice setting) were adopted from the 2008 AACP Annual National Su rvey of Volunteer Pharmacy Preceptors. 28 There were no significant demographic differences between round one panel members (Table 4 1) and preceptor s who participated in the AACP survey 28 On the other hand, there was one significant demographic difference with the second Delphi panel (Table 4 3) which had a greater proportion of female preceptors (p=0.02) compared to the AACP survey. Age or number of years typically measu res clinical experience. 171 The principal researcher expected panel members to be older, have greater clinical and preceptor experience, and hold higher degrees and certifications than the typical pharmacy precept or. However, these measures were not significantly
148 different from AACP survey participants. The principal researcher did not seek to recruit the average preceptor and the impact of these demographic differences or lack of differences is difficult to measur e The low response rate of 26.5% of the AACP survey may limit the value of the comparisons made. 28 The principal researcher sought to recruit panel members representing a variety of practice settings, regional loc ations, and educational institutions. However there were some differences in practice site characteristics compared with the 2008 AACP preceptor survey 28 In the first Delphi panel, there was greater proportion of panel members practicing in public institutions (p=0.02) than private institutions. More panelists practiced in hospitals and clinics (p=0.02) and more of these practices were located in small towns and rural locations (p=0.02), see Table 4 2 T here we re two differences in the practice site characteristics of the second Delphi panel (Table 4 4) compared with the 2008 AACP preceptor survey. 28 This panel had a greater proportion of panelists practicing in public i nstitutions (p=0.05) than private institutions and more practicing in hospitals and clinics (p>0.00). Panelists in the second round represented 18 states and most were from Texas (20%) followed by Florida (15%). S ix percent of the panel members practiced i n the states of California and Oregon The low representation of west coast s tates is disappointing. T hese proportions represent a high level of participation by pharmacy schools in Texas and the direct marketing of preceptors in Florida by the principal r esearcher The q outcome s 122 T he principal researcher sought to recruit pharmacy preceptors with outstanding qualities and relied on the judgment of experienced d irectors to nominate
149 participants with these qualities. Experienced d irectors have a demanding schedule and it is uncertain whether and to what extent these directors adhered to the selection criteria Guion ( 1977 ) 97 suggests that panel members need to understand the boundaries of the competency skill and recognize when operationalized performance criteria are inside or outside those boundaries. 97 Therefore, t raining materials were used in order to familiarize the panel with conceptual foundations of performance assessment and descriptions of the operation alized performance criteria However, whether these materials effectively prepared panel members is un known and there was no follow up testing of professional expertise in drug therapy or performance assessment knowledge. Independent evaluation may have en sured panelists met the desired qualifications For example the study could have restricted participation to recipients of preceptor awards. Zatas (1999) 172 reported a self administered questionnaire that identif ies effective p hysician a ssistant preceptors A s imilar instrumen t for pharmacy preceptors could be used to ensure preceptors with the desired expertise in drug therapy and performance assessment, an instrument that may improve this line of research in future studies The principal researcher provided materials outlini ng the competency skills and performance criteria. However, many panelists commented that these items were written in broad terms and requested examples of these skills as they occur in context of the clinical setting. Materials demonstrating examples of e lements of the performance criteria in a typical clinical environment would be an effective training tool for preceptors. Video vignettes and case studies would help future participants understand how student behavior relates to the rating scale and perfor mance levels.
150 Recruitment and retention was a major challenge to this study Financial incentives and AACP support may have helped increase participation of preceptors with the desired expertise, as well as broaden regional representation. The nomination p rocess, training, demographic makeup and regional representation of the panel may limit the generalizability of the evidence collected. However, t he panel met the objectives of the first study phase and permitted the study t o continue with the next phase. 5.2 Phase II Video Simulation 5. 2.1 Video Production The video vignette scripts were based on the recommendations of preceptors recruited from the Delphi panel. Although the expert panel received financial compensation, their professional commitment was outstanding. The principal researcher benefited from th e many phone calls and email exchanges between the researcher and panel members. The enthusiasm for this research study was significant and appreciated. The expert panel was recruited from members of the Delphi panel. The principal researcher sought to at tract preceptors with considerable clinical expertise and preceptor experience. There were a number of demographic differences between the expert panel (Table 4 3) and preceptor demographics of the 2008 AACP survey 28 There was a greater proportion of female preceptors (p>0.00). Although there was no significant differences in age, years of practice (p>0.00) and years precepting student (p>0.00) were less. The proportion of PharmD and advanced degrees was greater (p >0.00). The proportion of the expert panel ists who reported their primary role as precepting students (p>0.00) was greater than reported in the AACP survey.
151 There were also differences in the practice site characteristics (Table 4 4) compared to the AACP survey 28 There was a greater proportion of panel members practicing in public institutions (p>0.00) than private institutions. More panelists practiced in hospitals and clinics (p>0.00), and more of these practic es were located in urban locations (p>0.00). Many members of the expert panel held key positions in teaching hospitals and pharmacy schools. After selecting case studies to best illustrate student performances for the study this expert panel developed a l ist of specific behaviors characterizing rating scale scores for specific performance levels. The objective was to simulate an authentic encounter between a preceptor and a student during APPE rotations. Third year pharmacy s tudents portrayed the APPE stud ents in the video vignettes based on these scripts. The expert panel watched and scored the final video vignettes on line. Expert panel assessments varied widely and panelists noted the difficulty of assessing the video portrayals that were based on scrip ts they actually wrote. Weeks and in some cases months between the last script edits and assessment of the videos took place. Familiarity with the performance criteria and the assessment instrument may have faded for some. On the other hand, one panel mem ber commented on how the portrayals were more realistic than expected. After viewing the final video vignettes, one assessment expert, mentioned that the portrayal of these specific behaviors in the final video vignettes were subtle and might be difficult for an audience to recognize. Future studies should consider using professionals with experience in script development and
152 video production, as well as trained actors. These steps would help future studies produce video vignettes that more effectively port ray the intended student behaviors. With few exceptions, the diverse assessments among the panel suggest that many of the video vignettes did not effectively portray the student performances originally intended by the expert panel. This posed a serious measurement issue for the study Video vignettes with higher assessment agreement among the expert panel would ha ve set the stage for success in the next stage of the study The use of video production professionals and trained actors may have increased the quality of student performances. Future studies will need to secure funding to allow re filming of problem vign ettes since they are a crucial element of this kind of study. 5. 2.2 Preceptor Panel The principal researcher sought to recruit typical preceptors to watch and score the video vignettes and expected demographic characteristics similar to preceptors who part icipated in the 2008 AACP survey 28 However, th ere were a number of demographic differences (Table 4 7). The preceptor panel had a greater proportion of female preceptors (p=0.01) and they were younger (p>0.00). T he panel had fewer years of practice (p>0.00) and fewer years precepting students (P>0.00). However, the proportion of PharmD and advanced degrees was greater (p=0.02) than PharmBS degrees. The proportion of the preceptor s who reported their primary role w as precepting students not significantly different. The members of this preceptor panel were younger, represented more women, and were more educated compared to the 2008 AACP survey. Age or number of years typically measures clinical experience and this pa nel does not represent demographics of the typical preceptor described in the 2008 AACP survey. 171
153 There was no significant difference between public and private institutions or practicing in hospitals, clinics, or community settings compared to the AACP survey. More panelists practiced in large towns (p=02) rather than rural, small town, or urban locations. Practice site characteristics are shown in Table 4 8 P anelists represented five states and most were from F lorida (81 %) followed by Texas (1 2 %). T hese proportions represent the direct marketing of preceptors in Florida by the principal researcher 5. 2.3 Reliability Results The ICC estimates the degree of agreement among preceptor assessment s. Shrout and Fleiss (1979) 149 conceptualized the ICC as the ratio of between groups variance to total variance The study used a two way random effects model and reported the single measure reliability Since the three performance levels (e.g. excellent, entry level, and deficient) are hierarchical ; and because this property is a violation of ICC assumptions, the study collapse d the scores into 2x2 tables for comparison. Competent students were compared to non competent students and excellent st udents w ere compared to all other competent students as shown in Figure 3 4. Higher ICC estimates reflect increasing inter rater reliability and perfect agreement gives an ICC result of one. Downing (2005) argues that i nter rater reliability is an essentia l component of validity evidence for all assessments using raters 105 One of the central aims of this study is to establish benchmarks for comparison. However, the ICC is strongly dependent on the demographic variance of the population from which it is measured. Th e preceptors participating in this study were younger, less experienced, and more educated. There were also more women represented in this study than the 2008 AACP preceptor survey. Therefore, this panel does not represent
154 the demographics of the typical p receptor and the impact of this disparity on the validity of the results is uncertain. This problem may complicate any effort to generalize the results. An assessment instrument may be judged "reliable" or "unreliable," depending on the test population. Fu ture studies should recruit participants that represent the demographic characteristics that are generalizable to the greater preceptor community. I nter rater reliability results from this study should be comparable to measures of reliability results in the clinical setting Reported results are often low and estimates of 0 .80 or above are rarely achieved. 106 Studies of medical residents 159 162 have reported inter rater reliability estimates ranging from 0.79 0.87 an d 0.14 0.42 for medical students. 157, 159 162 Researchers explain that results are higher for residents since clinical faculty have greater opportunity to supervise students due to longer rotation schedules. In addition, residents treat patients and clinical faculty has a greater stake in supervising their perfo rmance. 173 175 Physical therapists have reported reliability estimates ranging from 0.50 0.87 163, 176, 177 and physical therapy students treat patients. These factors are impo rtant considerations when evaluating reliability r esults for pharmacy students in the clinical setting. However, comparison with other performance assessment studies can be challenging since there are different research designs and different computational methods to estimate inter rater reliability. Haber and Avins (1994) 159 questioned whether the American Board of Internal Medicine (ABIM) resident evaluation form can assess clin ical competency. Faculty evaluations of 110 first year residents were analyzed in this multi hospital study. The mean inter rater agreement was a high 0.87; however, there was strong quantitative evidence of the halo effect. Despite the high inter rater re liability, raters failed to
155 differentiate among the number of clinical care factors, and this revelation reduced the validity of assessments. This is also an example were of the problem wherein professional organizations develop assessment instruments and fail to take any steps to test for reliability and accuracy. Kwolek and colleagues (1997) 160 studied surgical resident evaluations over a one year period. The evaluation form contained 10 specific performance ratings and a global assessment. Inter rater reliability of th e overall performance rating was 0.82. Analysis gave strong evidence of the halo effect. The halo influences the assessment with other performances resulting in a failure to d iscriminate among different competency skills. 153, 155, 156 Factor a nalysis indicate d that faculty members were making a single global, undifferentiated judgment and that these ratings did not identify deficient performance skills. Preceptor comments in this study provided graphic examples of the halo effect. 153, 154 Literature describes three explanations for this effect. Linn and Gronlund (2000) 156 sug performances. On the other hand, Robbins (1989) 178 mentions preceptor assessment may be influenced by their previous assessments. Saal, Downey, and Lahey (1980 ) 155 suggest s there is a failure by the preceptor to discriminate among distinct facets of the performance Preceptor comments reported assessments were influenced by previous assessment. The h alo effect is a form of bias. Noel and colleagues (1992) 68 evaluated the ratings skills of 203 medical faculty of which 96% were board certified and 74% served as clinical evaluation exercise (CEX) evaluators in the last 5 years. Faculty watched and scored two video vignettes. Each of
156 the 50 minute CEX simulat ions portrayed medical residents taking a history, physical examination, and counseling. Enough errors were included to merit a marginal rating. The results showed that half (50%) of the faculty rated each of the two simulations satisfactory or superior wh en both vignettes should have received a substandard rating. Preceptor comments gave vivid accounts of rater inconsistency. This effect is present when raters inconsistently apply performance criteria compared to other raters. Longford (1994) 179 argues this type of rater error is a major factor in low estimates o f reliability. Myford and Wolf (2004) 152 proposes that rater inconsistency indicates a lack of understanding of the performance criteria Prior to scoring the video vignettes, preceptors were given a twenty minute introduction to assessment principles and new performance criteria. A majority of preceptors in the study practiced in Florida and should have recogni zed the competency skills. However, the performance criteria were new to all participants, specifically the rating scale and checklist. There was a relatively short time to become familiar with the new performance criteria. These observations suggest that assessments are unreliable when preceptors do not have a clear understanding of the assessment instructions. 47, 180 Some of the participants in the previously described Noel (1992) 68 study watched a 15 minute instructional videotape on assessment; however, this failed to improve assessment quality. Literature strongly sug gests healthcare preceptors should improve assessment skills with training. 181 185 Future studies could p rovide more substantial training prior to rating simulations. The training may mitigated some of the rater errors observed in this study. T raining could focus on the judgment processes and improving detection and perception of the clinical performance. A m ajor goal assessment training is to improve accuracy through by
1 57 like halo effect. Training will familiarize preceptors with the competency skills and performance criteria. Preceptor comments in this study included sever al disagreements on aspects of drug therapy. These comments strongly recommend d ifferent standards of medical care between preceptors and with the expert panel. Findings show rater experience positively affects quality of assessments and raters who are jud ged as better professionals are better at rating the performance of others. 73 Findings in separate studies by Holmbre (2008) 71 and Chapman (1998) 84 strongly suggest that clinical faculty with little experience or substandard clinical skills have more idiosyncratic assessment scores thereby increasing score variation. This is a major source of inter rater inconsistency since assessments are not grounded in professional standards. 26, 27 Preceptors participating in this study were younger and less experienced, howev er they were more educated compared to the 2008 AACP survey. 28 Findings suggest that this lack of experience may increase score variation, however greater education would indicate increased clinical skills and dec rease score variation. 67, 71, 73, 84, 182 The ICC combines two sources of rater disagreement. First, rater inconsistency and is associated whether preceptors understand the performance criteria in the same way. The second source of disagreement concerns rater bias. This is whether a preceptor's mean ratings are higher or lower compared to other preceptors. The ICC does not provide information that independently measures contributions of rater inconsistency and bias. The ICC results decrease in response to both lower correlation betwee n raters and larger rater mean differences.
158 The halo effect and rater inconsistency are rater effects. These effects are due to measuring construct s targeted by the assessment. 108, 186 Training has been shown to reduce rater effects and rater inconsistency. Rater effects are sources of construc t irrelevant variance. These errors are important threats to the validity of inferences made from the assessment results. 33 T he sample size calculation used in this study assumed that ICC estimate s would be equal to or greater than 0.5. Unfortunately, this assumption was not realized and therefore the stud y could not measure the inter rater reliability of individual competency skills for each student video. However, the non randomized vignettes came closer to this target than the randomized vignettes. The results suggest the principal researcher underestimated the magnitude of the rater errors and the effectiveness of the video vignettes impact on the ICC results. Bonnett (2002 ) 145 determined that ICC results are a function of sample size, number of ratings per s tudent significance level and confidence interval width. Sinc e significance levels and confidence interval width are often predetermined, higher ICC results are achievable with higher agreement among preceptors larger numbers of preceptors or a combination of both factors Future studies may achieve their ICC targ ets by increasing the number of preceptors and improving preceptor training The preceptor panel assessed every video vignette with the new performance criteria. Performance levels were calculated and results were collapsed into two tables for comparison. ICC estimates were made for a competent vs. non competent comparison and for an excellent vs. entry level comparison. However, not every
159 vignette gave the study an opportunity to make a distinction between excellent and entry level performances. When all r aters assess a student deficient (novice), there are no excellent or entry level data points to evaluate let alone any rationale to make this evaluation. The lower ICC point estimates may reflect that fewer data points were available. This circumstance dem onstrates how the rating per student tally influences ICC results. 5. 2.4 Accuracy Results The Fisher Exact test estimates accuracy by determining if typical preceptors assess performance in the same way as the e xpert panel beyond chance alone. The test gives th e probability that assessments were unevenly distributed between the two panels under a hypergeometric function. 151 Low p values indicate that the preceptor panel assessed the performances differently than the expert panel. The principal researcher encountered three central issues with interpreting accuracy results. First, i nterpreting the accuracy results is related to the degree of agreement among the expert panel. The study needed examples of high quality assessment to make useful comparisons. However, t he low degree of agreement essment as an example of high quality assessments. Twelve out of twenty seven competency skills illustrated in the video vignettes produced unanimous agreement or had only a single dissenting panelist. These twelve competency skills matched the intended pe rformance level from the script. These skills were bookends, in other words the extremes of the performance levels. All were either excellent or deficient (novice) performances. However, the Fisher Exact test indicated a significant different between the e xpert and preceptor panels
160 for her performance of Skill D /E. The preceptor panel gave mixed results, one rated excellent, seventeen (40%) rated entry level, and 24 (57%) ra deficient (novice). These assessments are very different to the expert panel. All seven (100%) assessed a deficient (novice) performance. Comments suggested rater inconsistency with the preceptor panel assessments. Second, t he Fi sher Exact test assumes assessments are independent. 151 This means that the value of a given assessment does not affect the value of another assessment. However, t here is strong evidence of bias in both the expert panel and preceptor panels. Bias in the form of the halo effect interferes with assessment independence and reduces the veracity of the assessments collected. Third, the expert panel members and the prece ptors participating in this study were younger and less experienced, however they tended to be more educated compared to the 2008 AACP survey. 28 The impact of these demographics on assessment outcomes is difficult to measure. Findings suggest that lack of experience may increase assessment variation; however, greater education would indicate increased clinical skills and decrease variation. 67, 71, 73, 84, 182 Any increase in assessment variation in both panels would dilute differences and inc rease p values. high quality assessments and reduce the veracity of the assessments collected. These mentioned Noel (1992) 68 study underscores the distincti on between reliability and accuracy, and the importance of measuring both reliability and accuracy. The study reported high inter rater reliability estimates among a group of highly qualified clinicians with assessment
161 experience. However, these reliabilit y results fail to demonstrate reasonable levels of accuracy and validity. 5. 2. 5 Rating Scale Comparison Findings suggest that preceptors synthesize a number of student attributes in competency assessment and that these attributes are captured in a multifa ctorial rating scale. 81 83, 187 These findings underscore the rationale to evaluate this rating scale for tem has a different assessment strategy in which preceptors assess student performances with a item scale. The supervision needs are assess ed for each competency skill as shown in Figure 2 4. Participation in the three hour long program to collect assessment data for this study placed a heavy burden on busy professionals. However, it would be informative to compare differences between the tw o assessment systems. The principal researcher created a surrogate scale by collapsing the supervision domain of the multifactorial scale into a three item rating scale. This supervision proxy scale approximates The study compared assessment results of the multifactorial rating scale with the supervision proxy scale. The Fisher Exact tests showed that most of the competent vs. not competent comparisons. The supervision proxy scale appears to be more lenient than the multifactorial rating scale. Thes e results are consistent with other research findings that suggest increasing scale categories has the potential to convey more information about the quality of student performance and discriminate more accurately between students. 66 The measurement of discrimination is an essential factor for drawing meaningful
162 inferences. A scale with too few categories does not allow sufficient discrimination of studen t performances, whereas a scale with too many categories may be beyond the errors. 188 Most of the excellent vs. entry level comparisons were not significantly different, suggesting that the rating scale items representing excelle nt performances have similar degrees of discriminability between the two rating scales. Gardner (1960) 189 argues that validity is a function of the degree of discriminability inherent in the items being rated. Therefore, it becomes necessary to make the distinction between the maximum number of categories that a scale can discriminate and the number of categories that are meaningful in interpreting as sessment scores. The SUCCESS rating scale reflects a fundamentally different assessment strategy compared with the multifactorial rating scale. Rating supervision differs from other scale behaviors. Degree of supervision may be associated with behaviors al ready captured within the behavior domain of other rating scale items. Different rating scores for consistency complexity or efficiency may be linked to different supervision rating scores. The impact of multicollinearity on assessment results needs eval uation. On the other hand, the SUCCESS strategy lies solely with the degree of supervision. Limiting assessment to one domain of behavior may reduce the degree of discriminability. Whether one assessment strategy is superior to the other requires rigorous head to head testing.
163 5.3 Future Research 5.3.1 Content Validation A robust assessment instrument is driven by competencies and an assessment process that make sense to its stakeholders 190 A content validation study is the first essential step in accumulating evidence of validity and represents a link between the hypothetical construct of competency and measurable indicators. Specifically, a content validation study will evaluate and refine the domain specification and performance criteria that represent the standards of practice within the profession Healthcare organizations expend considerable time and resources creating competency statements and the pharmacy profession is no exception. 41, 42 T he CAPE Educational Outcomes report s set the standards for accreditation, guides pharmacy education, and establishes competencies for graduating pharmacy students. CAPE published the first set of educational outcomes in 1994 191, 192 and followed with revi sions in 1998 and 2004. 16, 165 Other pharmacy societies, including the American College of Clinical Pharmacy, American Society of Health System Pharmacists, American Pharmacists Association National Association of Chain Drug Stores, and National Community Pharmacists Association have published lists of competency skills for the benefit of th e profession. 193 196 These documents are long lists of general competencies that every pharmacist should possess upon graduation. These documents define effort to develop these documents. Nevertheless, these documents still form a hypothetical construct of competency Various organizations within the organization readily describe general
164 perfor mance of a general competency is difficult and the profession does not yet have standard guidelines to ensure valid and reliable assessments. However, healthcare educators are tasked with gathering evidence that their programs produce competent graduates. Therefore, educators prefer to use narrowly defined competency skills, which are observable and easier to assess. The drawback to this strategy is the long list of narrowly defined competency skills without a unifying set of principles to govern them. Prof essor ten Cate (2007) 170 argues these highly specified competency skills fail to reflect the original meaning of the general competency. This opinion is comparable to findings in this study. The Delphi panel recommended combining two narrowly defined competency skills together. This reconnected the newly defined competency skill with a releva nt clinical activity. In another example, the Delphi panel recommended not to use one of the competency skills since it lacks a connection to the general competency in the clinical setting. The initial steps taken by this study for a single competency skil l should be expanded to evaluate all thirteen general competencies outlined in the SUCCESS instrument. This researcher recommends that any future content validation study focus on the difficult task of linking performance criteria with general competencies The objective is to develop an APPE assessment instrument that lays the foundation to autonomously. 60 With adequate funding and support from pharmacy professional societies, a content validation study can be successfully completed. The content validation stud y would benefit from a team member with a particular skill, and the following will explain the skill and benefit. Participants in a content
165 validation study for physical therapists appealed for a reduction in the number of clinical performance items. Clini cal faculty in other professions has the same grievance. 170 The previous APTA assessment instrument contained 24 clinical performance skills. Redundant competencies were addressed and clinical performance skills were reduced to 18 items. These changes were well received by clinical faculty. However, no clinical performance behaviors were remo ved, but rather, the performance skills were rewritten and consolidated. 29 Writing well developed and a widely accepted competency skill s is a valuable art and usually requires training and experience. 197, 198 Future content validation studies would benefit from including a team member with these skills. 5.3.2 Video Simulation Strategy Each video vignette in this study ga ve preceptors a single opportunity to observe the student performing for a specific competency skill. The use of video simulat ions in healthcare educations is typically limited to performance. 68, 158, 159 Using a single observation per student limits the ability of the study to disting uish the impact of the quality of the case study and the quality of the video vignette on the assessment data collected. 199 Preceptors routinely make summativ e assessments based on multiple observations spanning the clinical rotation. Video vignettes illustrating multiple examples of a student performing the same competency skills with a mix of patients is one strategy. This presents multiple cross sectional sn ap shots for assessment. An alternative strategy would simulate multiple interactions between a student and preceptor for a number of competency skills revolving around consecutive exchanges with the same patient. 200 This strategy gives a longitudinal perspective for assessment.
166 Either of these simulation strategies would increase the number of observations for assessment and more clos ely mirror the assessment challenges faced by preceptors. Paradoxically, t here is little evidence in the body of healthcare literature to g uide summative assessment of multiple observations. 182, 201 There is a need to research assessment guidelines to help preceptors give a single summative assessment for an individual competency skill based on multiple observations. A s tand ard s etting study to establish guidelines to summarize individual observations and establish a cut off score for summative assessment is beyond the scope of this study. 202, 203 5.3.3 Video Simulations Production After selecting case studies to best illu strate student performances for the study this expert panel developed a list of specific behaviors characterizing rating scale scores for specific performance levels. However, after viewing the final video vignettes, several panelists mentioned that the p ortrayal of student behaviors in the final video vignettes were subtle and were difficult to recognize. With few exceptions, the wide assessments among the panel suggest that many of the video vignettes did not effectively portray the student performances originally intended by the expert panel. For effective illustration of important student behaviors, future studies should consider using professional video directors and scriptwriters. The preceptor role in the video vignettes was scripted to ask the same questions performance by emphasizing performance variation among student performances. However, the feedback revealed that many preceptors us e the degree of prompting as strong indicator of performance.. The lack of this preceptor behavior may have affected
167 audience perception of performances and the assessments collected. Integrating this item into the scripts may improve the authenticity of t he preceptor student encounter. Third year pharmacy s tudents portrayed the APPE students in the video vignettes. Fourth year students were not available since they were busy completing APPE clerkships. However, simulations would have benefited from student actors who have completed APPE assessments. APPE experienced student actors would benefit from training similar to simulated patients. 204 The budget of this proof of concept study prevented re filming any video vignettes with questionable quality. Future studies need to budget time and funding to re film and reassess problem videos. The expert panel asse ssments were the standard of high quality assessment for comparison. This is a crucial element in measuring reliability and accuracy. As described in this study, high quality video vignettes can be used to collect valuable assessment data from pharmacy pre ceptors. 5.3.4 T raining The marginal assessment skills among healthcare faculty reported in the literature is consistent with the findings of this study. Cross (2000) 47 argues that assessment scores may be unreliable when preceptors do not have a clear understanding of the assessment instructions. Recently, t he definition s of competency and methods of assessment have changed radically. These changes may chal lenge how preceptors conceptualize and assess competency 180 However, there is strong evidence indicating that assessment skills improve with training. 76, 181, 183 185, 205 T raining programs are designed to familiarize the preceptors w ith a clear understa nding of how to identify competency skills and the performance criteria used to score performances. Training to
168 reduce the halo effect and rater inconsistency are prominent themes since they are common issues that degrade the quality of assessment results. To insure that physical therapy preceptors and students have a working understanding of the physical therapy assessment instrument, the APTA requires successful completion of an on line training program prior to clerkship rotations. This requirement is b ased on strong evidence from an APTA funded study measuring trainings positive influence on reliability and accuracy. Findings demonstrated preceptor assessments improved independent of previous assessment experience. 185 Pharma cy preceptors and students would benefit from a training program. However, comments from study participants strongly suggested that video depictions could be an effective tool to illustrate the assessment instrument in the clinical context. Brazeau and col leagues (2002) 206 describe a teaching Objective Structured Clinical Examination (OSCE) program. Medical faculty retooled an OSCE and used it to provide medical students and clinical faculty interactive venue with clinical scenarios. Clinical faculty, g uided by the assessment instrument, gave feedback as students progressed though the scenario. The researchers reported preceptors appreciated the opportunity to learn principles of assessment in a clinical context. Student responses were positive and value d the introduction to clinical performance assessment prior to clerkship rotations. Validated video vignettes and scripts depicting APPE student performance could lay the foundation for a strong pharmacy preceptor and student training program. Video vigne ttes can give an introduction of the assessment instrument for preceptors. Validated scripts and supporting materials would provide a clinical scenario for teaching
169 students. This researcher recommends developing training material with this approach for cl inical faculty and student training. 5.3.5 Rating Scale Analysis The rating scale introduced in this study scored student performances on five domains of behavior. Preceptors were able to score from two to four rating points for each domain of behavior. Educators must decide how many points to use in a rating scale 207 209 According to Preston and Colman (2000) 209 literature does not collectively outline a simple answer to the matter. Findings suggest preceptors avoid rating students on either extreme of a rating scale and tend to rate students somewhere in the middle. 156 Iramaneerat and colleagues (2000) 156 explains this range restriction has two detrimental effects. First, this rater effect reflects precepto Second, limiting rating variability limits reliable discrimination between competent and incompetent student performances. These detrimental effects may be mitigated with a rigorous rating scale development and testing process. Findings suggest that test retest and inter item reliability increases as a function of the number of scale points. According to true score variance increases at a faster rate than error variance due to number of scale items. 35 This relationship contributes to the increasing reliability with increasing scale items. 35, 210 Findings show strong evidence that increasing the number of scale categories has the potential to convey more information about the quality of student performance and disc riminate more accurately among students. 66 Gardner (1960) 189 argues that validity is a function of the amount of discriminability inherent in the items being rated. The measurement of discrimination is an essential
170 factor to draw meaningful inferences. A scale with too few categories does not allow sufficient discrimination of student performances, whereas a scale with too many categories may be beyond the scales ability to discriminate student performances increasing measurement errors. 188 According to Komorita and Graham (1965) 188 the ultimate criterion for adoption of a particular number of scale points is its effect on validity. When the response scale does not correlate with a criterion, the validity of the scale may not be affected despite an increase in reliability. 188 The comparison of the new multifactorial scale with the supervision proxy scale demonstrates the influence of the rating scale makeup on assessment results. There is strong evidence that too few or too many rating items can negatively affect reliability, validity, and interpretation of assessment scores. 73, 78 Findings strongly suggest preceptors synthesize a number of student attributes in competency assessment and these attributes are captured in a multifactorial rating scale. 81 83, 187 These findings underscore the rationale to evaluate a multifactorial rating scale for pharmacy APPE assessment. Further evaluation of rating scale items is warranted. 5.3.6 Performance Levels This study adapted the three performance instrument. Summative assessments fall into one of three performance levels as described in Figure 4 2. The three levels are: excellent, entry level, and deficient (novice). However, other healthcare assessment instruments hav e used five or more performance levels and there is a call for the SUCCESS instrument to increase the number of performance levels. 211
171 Finding from a study by Adams and colleagues (2008) 212 evaluated the multifactorial rating scale and performance levels with the APTA national assessment instrument. The multifactorial rating scale in this study was adapted from the APTA rating scale. This study analyzed data from seven graduating classes and reported strong evidence preceptors were able to discriminate six level of student performance. The rational for adoption of the six performance levels was based on different minimally acceptable performance levels among the participating physi cal therapy schools. These results suggest the multifactorial rating scale adopted in this has the potential to discriminate six levels of student performance and merits investigation. 5.3.7 Beyond Graduation The objective of APPE assessment instrument is expected performance in similar clinical situations autonomously. 60 Evaluating the performance of graduates would help schools measure whether the APPE assessment outcomes have any value in predicting future performance. Follow up research would inform schools how assessment outcomes address aid in teachin g and learning. Follow up studies would provide the benchmark for setting meaningful performance standards. 5.3.8 National Validation Study S chools of pharmacy have developed proprietary APPE assessment instruments or have worked within regional consortium s The development and maintenance of assessment instruments consume substantial resources. There is the cost to develop the assessment instrument. Development requires expertise in performance assessment, the APPE learning environment and computer program ming Preceptor training requires expertise in developing and distributing educational materials. Maintenance of software application requires technical personnel, fees for internet
172 access if web based, and maintenance of computers. Development costs for a national effort would reduce individual schools costs by distributing costs among the schools. Without valid and reliable national assessment instrument, the pharmacy profession is unable to ensure the competency of graduating students. The 2008 AACP pres ident and the ACCP Educational Affairs Committee have called for a standard APPE assessment instrument. 21, 22 Healthcare education experts argue the need for reliable and valid student assessments. 111 Assessment instruments need to be defended with rigorous scientific methods Evaluation needs to predict whether students are capable of performing in real world clinical settings. 112 114 Despite this widely held position, the pharmacy profession has yet to establish a national policy outlining acceptable validation criteria for APPE assessment instruments in a meaningful wa y. Without rigorous scientific evaluation, t here is no assurance that any of the proprietary APPE assessment instruments assess student performance in a meaningful way. Clearly, the public expects pharmacy schools to graduate competent practitioner s, and i t is only reasonable that pharmacy school s to be able to demonstrate graduates are competent. Sound assessment is a matter of public safety and trust This study, in combination with the recommended studies in this section, outlines a sound validation meth od. This important effort warrants a national commitment. 5.4 Summary and Conclusions APPEs account for 30% of the professional curriculum. 28 Recommendation s to increase APPE s from one year to two years have been p roposed. 213 This increase means APPEs could account for 50 % of the curriculum. APPEs provide the only opportunity for students to refine clinical skills, under the guidance of an experienced pharmacist. Assessment during APPE rotations is not the only assessment strategy
173 used to examine students. However, they re present the only evaluation opportunity in the clinical setting. Students have a critical need for guidance. 154, 214, 215 F indings from a study by Langendyk (2006) 215 underscore the need for guidance. Findings showed that low achieving medical students gave inaccurat e self assessments and peer reviews. Paradoxically, high achieving students were harsher on their performances than faculty. Efforts to train self assessment skills were not always successful Schools need to produce competent graduates and meaningful asse ssment relies on experience d practitioners. The pharmacy profession is awash with a never ending list of competencies. However, these lists describe hypothetical construct s of competency Research question #1 in this study showed preceptors were able to ev aluate and refine hypothetical construct s and associated performance criteria. Evaluations were grounded in relevancy to the clinical setting. Patient centered practices were evident from recommendations to integrate patient data and preferences into the c ompetency statements. In keeping with th is clinical perspective, preceptors requested training m aterials to show competencies and the performance criteria in the clinical context The study demonstrated a systematic approach to illustrate student clinical performance with video vignettes The principal researcher coordinated the input of an expert panel of preceptors to refine the pharmacy case studies, develop the format of a student reporting to a preceptor, and created preceptor questions The panel developed specific student behaviors indicative to specific scores with the new multifactorial rating scale. These behaviors were building blocks to construct student behavior indicative of
174 performance at different levels. The step by step proc ess demonstrated in this study can be used to develop new scripts and video vignettes. Typically, pharmacy students mentor under a single preceptor during APPE assessment. The principal researcher need ed to develop a controlled environment to collect assessment data that allowed preceptors to assess the same student performance Th is study d emonstrated the use of video vignettes to address research questions #2 and #3, specifically collecting evidence of assessment reliability and accuracy Findings from this study showed low inter rater reliability results. However, r eliability results were higher for non randomized compared to randomized simulations. Accuracy results showed preceptors more readily identi fied high and low student performances compared to average performing students. Findings in this study demonstrated preceptors were able to synthesize a number of student attributes with the new multifactorial rating scale and accepted the new checklists The study demonstrated a sound method to collect assessment data for rigorous analyses. The study noted the presence of the halo effect and rater inconsistency. Research has shown healthcare preceptors improve assessment skills with training. 181 185 The study note d differences in standards of care among the participating preceptors and the expert panel. Findin gs suggest clinical faculty with little experience or substandard clinical skills have more idiosyncratic assessment scores. 71, 84 Th ese examples de scribe a major source of rater inconsistency since assessments are not grounded in professional standards. 26, 27 A study b y Colthart and colleagues (2008) 216 suggested poorly perform ing clinicians are the least able to self assess accurately. 216 This is consistent with studies suggesting the poorly performing clinicians are not necessarily
175 aware of their deficiencies despite some with high self confidence 214, 217, 218 This study demonstrated p harmacy preceptors have the same v ulnerab ilities rating student performances as other healthcare preceptors indicating the need for training in performance assessment. Video vignettes similar to the simulations created in this study can be used to build a robust preceptor train ing and test ing pr ogram. Never the less, pharmacy APPE rotations appear to train and assess students. Students are exposed to a number of clinical environments and preceptors during their APPE rotations. Findings from this study show preceptors have the highest accuracy wit h extreme performances. That is, preceptors notice d students with the highest and lowest performance levels. The degree of inter rater reliability is dependent on the population of preceptors, student rating opportunities and the instrument 39 The greater variety of preceptors and the greater number of student rating opportunitie s increase inter rater reliability results. The more time spent in APPE rotations increase reliability of assessments. It is hoped there are enough rating opportunities from proficient preceptors to identify poorly performing students. Findings from Brown (2000) 187 suggest preceptors may intervene with troubled students without regard to the assessment instrument. Lankshear (1990) 219 describes barriers preceptors face just to give accurate assessment of poorly performing students. Th ese are example s where preceptors are left to their own devices without the support a robust assessment system However, many preceptors simply pass students to the next rotation. 85, 219 Pharmacy poorly performing students and to start the r emediation process ea r lier rather than later.
176 There are applications for robust pharmacy assessment instrument s beyond APPE clerkships. As the pharmacists transform their role in the healthcare and move from dispensing to clinical services, licensing exami nations need to reflect this strategic change. Currently the United States medical licensing examination (USMLE) use simulated patents in the licensing process. 220 The USMLE Step II Clinical Skills Exam s objective is to assess clinical competency. Like the medical licensure examination, a pharmacy version would need a valid and reliable assessment instrument applicable to newly graduated students nationwide. The validation process outlined in this study would address this requirement. Taking a page from medical literature, portfolios are used to document professional development. 221 Med ical residents routinely use their portfolios as evidence of clinical skills in employment interviews. PharmD students could use the results from a widely accepted instrument as evidence of outstanding clinical skills with employment interviews or applicat ions for pharmacy residency programs. Employers and residency programs look for evidence of clinical skills in the selection process. The residency and clerkship programs for most healthcare professions require a fixed time length. Many pharmacy residency programs range from one to two years and medical residency programs are often longer. During this time, residents must complete the training objectives outlined in the program. These residency programs are required for board certification. M edical institut ions and licensing authorities ponder the change to a competency based curriculum 112, 170, 222 225 Emerging research suggest acquisition of clinical skills can be accelerated within a competency based program compared to the current standard time based approach 112, 223, 224, 226, 227 The foundation to adopt a
177 competency based program is grounded with a valid and reliable assessment instrument. Comments from the Delphi panel indicated many of the rating scale items and performance level descriptions were appropriate for pharmacy residents. The past decade has seen the rapid growth of BPS pharmacy specialties to meet the growing need for highly t rained clinical pharmacists. The BPS program would benefit from valid and reliable assessment instrument to document clinical competency of residents and evaluate program effectiveness. Healthcare education experts argue the need for reliable and valid as sessment instrument s. 111 In par t, this study was inspired by a study by Dr. Hubbard and colleages. 228 In 1963, he examined a high stakes examination for US medical students. When a student failed this test, they did not progress to graduation. This was a bedside oral examination rated by a number of experienced physicians. Three yea rs of results with over 10,000 examination outcomes showed weak inter rater reliability. For decades, this test seemed reasonable to highly trained physicians. However, until someone took the time to analyze assessment data, no one realized the test did no t assess student performance as expected. Nevertheless, it took almost ten years for the evaluation of assessment instruments need scientific evaluation. However, the pharm acy profession has yet to establish a national policy outlining validation criteria for APPE instruments. Evidence supporting or challenging assessment instruments need s groundi ng with in rigorous scientific methods. T he central purpose of this study is to demonstrate a rigorous method to evaluate the validity and reliability of APPE
178 assessment programs. This study is a step in establishing a culture of continuous assessment.
179 A PPENDIX A SUCCESS COMPETENCIES The SUCCESS instrument contains 96 skills stateme nts categorized within 13 competencies. 1. Drug distribution systems 2. Disease State Knowledge 3. Drug therapy evaluation and development 4. Monitoring for Endpoints 5. Patient Case Presentations 6. Patient Interviews 7. Patient Education/Counseling 8. Drug Information 9. Formal O ral Presentations 10. Formal Written Presentations 11. Professional team interaction 12. Professionalism/Motivation 13. Cultural Sensitivity
180 APPENDIX B DRUG THERAPY EVALUAT ION AND DEVELOPMENT Figure B 1 Seven competency skills Skill A : Synthesizes complete patient history and laboratory and physical exam data to identify problems Excellent: Independently synthesizes complete patient history, laboratory and physical exam data (collects this data if necessary) to identify most if not all problems. Entry level: With preceptor, guidance synthesizes complete patient history, laboratory and physical exam data (using incomplete data at times) to identify the most critical problems. Deficient (Novice): Even with preceptor guidance, the student has difficulty synthesizing patient history, laboratory and physical exam data (makes no effort to fill in the gaps in information) to identify problems. Skill B : Identifies and prioritizes both actual and potential drug related problem stating rationale (*** critical skill) Excellent: Independently identifies and prioritizes most if not all actual and potential drug related problems stating rationale for prioritization. Entry level: With guidance from the preceptor identifies and prioritizes the most crit ical actual and potential drug related problems stating rationale for prioritization when necessary. Deficient (Novice): Even with preceptor guidance, the student has difficulty identifying and prioritizing both actual and potential drug related problems Does not state rationale for prioritization. Assistance required preventing errors.
181 Skill C : Identifies problems that require emergency medical attention (*** critical skill) Excellent: Independently identifies any problems that require emergency medical attention and also identifies what steps should be taken to activate emergency procedures. Entry level: With preceptor, guidance identifies problems that require emergency medical attention and also identifies who to contact to determine what steps should be taken to activate emergency procedures with occasional assistance. Deficient (Novice): Even with guidance from the preceptor, the student is not able to identify problems that require emergency medical attention and also does not know who to co ntact to determine what steps should be taken to activate emergency procedures. Preceptor intervention required to prevent errors. Skill D : Designs and evaluates treatment regimens for optimal outcomes using pharmacokinetic data and drug formulation data. (*** critical skill) Excellent: Independently designs and evaluates most if not all treatment regimens for optimal outcomes using pharmacokinetic data and drug formulation data. Entry level: Designs and evaluates the most critical treatment regimens for o ptimal outcomes using pharmacokinetic data and drug formulation data. Deficient (Novice): design or evaluate regimens for optimal ou tcomes using pharmacokinetic data and drug formulation data. Preceptor intervention required to prevent errors. Skill E : Designs and evaluates treatment regimens for optimal outcomes using disease states and previous or current drug therapy as well as inc luding psycho social, ethical legal, and financial data. (*** critical skill) Excellent: Independently designs and evaluates most if not all treatment regimens for optimal outcomes using disease states and previous or current drug therapy including psychos ocial, ethical legal, and financial data using documentation from a reliable source. Entry level: Designs and evaluates the most critical treatment regimens for optimal outcomes using disease states and previous or current drug therapy including psycho soc ial, ethical legal, and financial data using documentation from a reliable source. Requires some assistance from the preceptor to produce more detail analysis. Deficient (Novice): Even with guidance from the preceptor the student is not able to design and evaluate treatment regimens for optimal outcomes using disease states and previous or current drug therapy including psycho social, ethical legal, and financial data. Fails to use documentation from a reliable source. Preceptor intervention required to pr event errors.
182 Skill F : Develops backup plans based on what problems are likely to occur from/with the primary plan. Excellent: Independently develops backup plans based on what problems are likely to occur from/with the primary plan for most if not all dr ug therapy problems. Entry level: Develops backup plans based on what problems are likely to occur from/with the primary plan for the most critical drug therapy problems. Requires some assistance for more detailed planning. Deficient (Novice): Even with guidance from the preceptor, the student rarely develops backup plans based on what problems are likely to occur from/with the primary plan. Assistance required preventing errors. Skill G : Provides written documentation of the pharmaceutical care plan tha t is clear, complete, and concise. Excellent: Independently provides written documentation of the pharmaceutical care plan that is clear, complete, and concise. Entry level: Provides written documentation of the pharmaceutical care plan that is complete, but could be more concise and/or clear. Requires guidance to produce detail documentation. Deficient (Novice): Either provides no written documentation of the pharmaceutical care plan or provides documentation that is not complete. Preceptor intervention required to prevent errors.
183 APPENDIX C IRB DOCUMENTS
188 APPENDIX D DELPHI PANEL ROADMAP Study Overview Based on the Center for the Advancement of Pharmaceutical Education's (CAPE) competencies, the SUCCESS instr ument is used to assess student performance during advanced pharmacy practice experiences (APPEs). SUCCESS is the result of collaboration between all four Florida schools of pharmacy. It addressed the burden of multiple assessment systems from different sc hools. The purpose of this study is to collect validity evidence for the content of the Drug Therapy Evaluation and Development competency used in the SUCCESS instrument. Content validity is an essential step in validation, represents the link between tar geted constructs with measurable indicators, Competency in drug therapy is essential to the practice of pharmaceutical care, which is the mission of pharmacy education. Drug Therapy Evaluation and Development is one of thirteen competencies within SUCCESS. The study has two phases, and you have been asked to participate in the first phase exclusively. In the first phase, a Delphi panel is charged with the critical examinat ion of the operational definitions used to assess clinical performance. The panel will evaluate the performance criteria for relevancy and representativeness, as well as make recommendations, if warranted. In addition, the panel will also develop a glossar y of standard terms. The Delphi method uses a series of survey rounds, along with anonymous feedback from the previous round, which allows an expert panel to quickly reach agreement on complex issues. It is estimated that each Delphi survey will require 45 60 minutes to complete and the study will need 2 or 3 survey rounds to reach 80% agreement. The Delphi method has been widely used to generate expert consensus in health care education. The knowledge gained from this study will be useful for the validation of a national APPE assessment instru ment and the development of national training materials for preceptors. The Delphi process The Delphi method uses a series of survey rounds with anonymous feedback. This allows an expert panel to freely share ideas and form a consensus quickly. In the first survey round, the panel will complete an on line survey of
189 the existing and newly develop ed operational descriptions. Completed surveys are automatically sent back to the study to summarize the responses and comments. The same survey is used for the second round but will containing a summary of the previous round. The summary is both quantit ative and qualitative. This will provide the panel with relevant data that can be easily interpreted. The percent response to each question will be provided. A thematic analysis of the comments from the panel members will be categorized by key words and co mpiled into a summary. If agreement is not achieved by the end of the second round, a third and final round will be conducted. Panel agreement is defined at 80%. To reduce the burden on the participation, items under discussion will be removed on ce the panel has reached agreement. Survey Steps You will receive an email with a link to the on line survey. After reading the consent form agreement to participate in study as described. This is followed by the instructions page. Step 1) You will be asked if you think the seven skills included in the Drug Therapy Evaluation and Development competency are relevant and comprise a complete description of the competency. You will have the opportunity to add, remove or refine skill statements. Relevancy Whether the individual skill statement is necessary for assessing the Drug Therapy Evaluation and Development competency. Comprehensive Are there any Are there any aspe cts of pharmacy knowledge, attributes, or skills missing in the any of the skill statements Step 2) You will be asked evaluate about the relevancy and representativeness of the performance criterion (e.g. checklist and rating scale) for each of the seven skills under Drug Therapy Evaluation and Development competency. Relevancy Whether all the elements of the checklist and the rating scale behaviors are relevant for assessing Drug Therapy Evaluation and Development competency. Representativeness Consid ering the universe of performance measures, are the elements of the performance criterion (e.g. checklist items and rating scale behaviors) representative of the skills, knowledge, and attitudes required to assess Drug Therapy Evaluation and Development co mpetency. Critical skills skill. Some skills are labeled critical skills and students must show competency. A student performing a critical skill at an unsatisfactory level could potent ially be harmful to a patient or to the practice site. Student remediation may be necessary and the preceptor would want to notify the APPE director. What you will see:
190 Seven skill statements comprise the Drug Therapy Evaluation and Development competency. Each of the skills has a checklist and rating scale In addition, some sub critical. The checklist is a description of activities the student needs to perform to competently complete an individual skill. The rating scale describes student behaviors for a particular performance level (e.g. excellent, competent, and deficient). You will be asked to evaluate the behavior descriptions in terms of the following six performance dimensions. o Supervision: refers to the level and extent of assistance. o Quality: refers to the degree of knowledge and skill proficiency demonstrated. o Complexity: refers to the number of elements that must be considered relative the expected performance an entry level practitioner. o Reliability : refers to the frequency of occurrences of desired behaviors related to the performance criterion. o Efficiency: refers to the ability to perform in an effective and timely manner. Some skills are labeled a critical skill and students must show competency in t hese. A student performing a critical skill at an unsatisfactory level could potentially be harmful to a patient or to the practice site. Student remediation may be necessary and the preceptor would want to notify the APPE director. Step 3) There are a fe w questions concerning a glossary of terms. The development of a standard glossary is intended to support consistent and clear interpretations of the performance criterion by preceptors. Step 4) Finally, you will be asked for some basic demographic infor mation. This will help study understand the composition of experts who participated in the Delphi panel. Thank you for your participation and I look forward to your input. Charles Douglas, EMBA PhD student, UF College of Pharmacy firstname.lastname@example.org
191 APPENDIX E ANALYSIS PATH AND AS SESSMENT RUBRIC Figure E 1. Delphi panel endorsement criteria Figure E 2. Data collection and analysis pathway Delphi Panel Domain Specification ( W hat competency skill s to assess ? ) Performance Criteria ( How to assess competency skills e.g. Checklist, rating scale, and performance levels) 80% < Panel agreement No endorsement Panel e ndorsement Rating Scale Scores Supervision Quality Complexity Reliability Efficiency Performance levels per Assessment rubric Excellent Entry Level Deficient(Novice) Estimate of Accuracy (Fisher Exact Test) c ompetent vs. n ot c ompetent e xcellent vs. e ntry l evel Estimate of Reliability (ICC) c ompetent vs. n ot c ompetent e xcellent vs. e ntry l evel
192 Rating Scale Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond ent ry level) Quality: Shows limited knowledge & skills Shows entry level knowledge and skills Shows knowledge and skills beyond entry level Complexity: Proficient with simple cases Proficient with complex cases (for entry level) Consistency: Inconsistent or poor performance Consistently proficient Efficiency: Inefficient/slow Effective/timely (for entry level) Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry level) Quality: Shows limited knowledge & skills Shows entry level knowledge and skills Shows kn owledge and skills beyond entry level Complexity: Proficient with simple cases Proficient with complex cases (for entry level) Consistency: Inconsistent or poor performance Occasional lapses Consistently proficient Efficiency: Inefficie nt/slow Effective/timely (for entry level) Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry level) Quality: Shows limited knowledge & skills Shows entry level knowledge and skills Shows knowledge and skills beyond entry level Complexity: Proficien t with simple cases Proficient with complex cases (for entry level) Consistency: Inconsistent or poor performance Occasional lapses Consistently proficient Efficiency: Inefficient/slow Effective/timely (for entry level)
193 Assessment Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry leve l) Quality: Shows limited knowledge & skills Shows entry level knowledge and skills Shows knowledge and skills beyond entry level Complexity: Proficient with simple cases Proficient with complex cases (for entry level) Consistency: Inco nsistent or poor performance Occasional lapses Consistently proficient Efficiency: Inefficient/slow Effective/timely (for entry level)
194 APPENDIX F SKILL STATEMENTS AND CHECKLISTS Skill statement A: Synthesizes complete patient history, laboratory, and physical exam data to identify problems Checklist A: 1) Read the history and identify: Age, gender and race of the patient Symptoms and signs of disease Past medical history 2) Read the physical examination and look for signs of disease by system (e.g. cardiovascular or respiratory disease?). Read the laboratory and look for: Laboratories that reflect function of the system affected or associated with patient complain. Are the laboratories results normal or abnormal? What is the trend of the values (if previous laboratory results are available)? Look for the rest of laboratories (not associated with the specific symptom or system affected). Are they normal or abnormal? (interpretation of results) Skill statement B: Identifies and prioritizes both actual and potential drug related problem stating rationale. Checklist B: 1) List the problems of this patient: Prioritize problems according to the impact on mortality and morbidity for this patient (e.g. treatment of hypertension has a greater impact decreasing mortality and morbidity than eczema) 2) List the current medications of this patient: From the current medication list, look for unnecessary medication From the current medication list look for medications that are contraindicate d because can aggravate some of the symptoms or diseases of the patient From the current medication list, look for medications that can cause side effects
195 Skill statement D/E: Designs and evaluates treatm ent regimens for optimal outcomes using pharmacokinetic and drug formulation data. Design considers psycho social, ethical legal and financial factors. Checklist D/E: 1) Review in detail possible therapeutic options and if any medication could affect nega tively (have side effects) the other affected systems in this specific patient. 2) Are there any better medications for this patient based on his/her: Age (some medication are act different in children than in adults) Gender (some medications are better according to gender) Ethnicity (some medications are better for some ethnicities) State of health (e.g. pregnancy or immediate post surgical state) Affected systems (the medication is good for the primary problem is treated but it is contraindicated in ren al patients because it is excreted by urine) 3) Are there any better medications for this patient based on his/her: ty (e.g. anti depressive medication increase risk of suicide in teenager)
196 APPENDIX G PERFORMANCE CRITERIA GLOSSARY Excellent performance A student, who requires minimal clinical supervision (beyond entry level) with simple to highly complex patients, is a ble to function in unfamiliar or ambiguous situations. At this student willingly as sumes a leadership role for managing more difficult cases and is able to serve as a resource for others. Actively contributes to the enhancement of the pharmacy with an expansive view of the profession. Entry level performance A student, who requires the expected degree of supervision for an entry level pharmacist and shows entry level knowledge and skills. Consults with others and reasoning is consistently proficient in sim ple and complex cases. The student performs in a timely and efficient manner. Deficient (Novice) performance A student who requires more than entry level clinical supervision even with simple patients. At this level, clinical performance is inconsistent a nd inefficient. Performance reflects little or no experience. Performance is slow and inefficient. Complexity: The number of elements (e.g., simple or complex) that must be considered relative to the patient, task, and environment. As a student progresses increase, with fewer elements under direct control by the preceptor. Consistency: The frequency of occurrences of desired behaviors (e.g., infrequently, quality performance is expected to progress from infrequently to routinely. Efficiency: The abilit y to perform in an effective and timely manner (e.g., should progress from a high expenditure of time and effort to efficient and timely performance. Performance cri terion: A description of the behaviors (e.g. checklist and rating scale) that define the expected performance of students. When criteria are taken in aggregate, they describe the expected performance of the graduate upon entry into the practice of pharmacy Quality: The degree of knowledge and skill proficiency demonstrated. (e.g. limited skill, high skill). As a student progresses through clinical education experiences, quality should range from demonstration of limited skill to a skilled or highly skille d performance.
197 Supervision: The level and extent of assistance required by the student to achieve entry supervision needed is expected to progress from close supervision to being cap able of independent performance. This may vary with the complexity of the patient or environment.
198 A PPENDIX H DELPHI PANEL (ROUND ONE) RESULTS Table H 1 Responses for Skill A Table H 2 Responses for Skill B Table H 3 Responses for Skill C Table H 4 Responses for Skill D Table H 5 Responses for Skill E Table H 6 Responses for Skill F
199 Table H 7 Responses for Skill G Table H 8 Response s to question 8 Tabl e H 9 Responses to question A1 Table H 1 0 R esponses to question A2 Table H 1 1 Responses to question B1 Percent Count Yes (all are relevant) 81% 17 No (needs revisions) 19% 4 No (not relevant) 0% 0 Table H 1 2 Responses to question B2
200 Tabl e H 1 3 Responses to question C1 Table H 1 4 Responses to question C2 Tabl e H 1 5 Responses to question D1 Table H 1 6 Responses to question D2 Table H 1 7 Responses to question E1 Table H 18 Responses to question E2
201 Table H 19 Responses to question F1 Tabl e H 20 Responses to question F2 Tabl e H 2 1 Responses to question G1 Tabl e H 2 2 Responses to question G2 Table H 2 3 Responses to complexity question Table H 2 4 Re sponses to r eliability question
202 Table H 2 5 Responses to deficient performance question Ta ble H 2 6 Respon ses to e fficiency q uestion Table H 2 7 Responses to e ntry level p erformance question Table H 28 Responses to e xcellent p erformance question Table H 29 Res ponses to performance criterion question Table H 3 0 Responses to q uality question
203 Tabl e H 3 1 Responses to supervision question
204 A PPENDIX I DELPHI PANEL (ROUND TWO) RESULTS Table I 1 Responses to question A1 Table I 2 Response to question B1
205 Table I 3 Responses to question D1 Table I 4 Collapsed responses to question D1 Table I 5 Responses to question D2
206 Table I 6 Responses to question E1 Table I 7 Responses to question E2 Table I 8 Responses to question G1
207 Table I 9 Responses to question G2 Table I 10 Responses to c omplexity question Table I 11 Respo nses to d eficient performance question
208 Table I 12 Responses to e xcellent performance question
209 A PPENDIX J CASE STUDY SUMMARIES Diabetes Case Study: Mrs. Davis is a 65 year old retired nurse. She has a history of hypertension, occasional angina and type 2 diabetes mellitus. On examination her blood pressure is 135/80 mmHg, heart rate regular 65 beats/minute, weight 150 lbs, height 5 feet 6 inches, BMI 23.5, and waist circumference 30 inches. Mrs. Davis is a non smoker and consumes 2 to 3 small glasses of wine per week. She plays golf once a week and walks for one hour each day with her husband. Mrs. Davis reports that she is careful about the type and amount of food she eats and watches her intake by carbohydrate counting. Over the last six months, monitored random daily non fasting blood glucose levels have been slowly increasing and over the last two weeks have been between 198 and 23 4 mg/dl. The ideal range for non fasting blood glucose is 75 110 mg/dl. Six months ago, her glycated hemoglobin (HbA1c) measurement was 8.5% and 2 weeks ago, it was 9%. Three months ago a timed overnight urine collection demonstrated microalbuminuria at 72mg/day. The ideal range should be 30 300 mg/day. She has no evidence of retinopathy or neuropathy. current medications are: aspirin 81 mg daily, perindopril 4 mg daily, metop rolol tartrate 25 mg twice daily, sublingual nitrate prn (on average once per month), simvastatin 20 mg daily, metformin 850 mg three times a day, and glyburide 2.5 mg twice daily with food. She has no known allergies.
210 Heart Failure Case Study : Mr. Johnso n is a 72 year old man with recently diagnosed heart failure. He has a history of ischemic heart disease and occasionally suffers from angina. His blood pressure readings consistently range between 132 142/86 94 mmHg. An echocardiogram eight weeks ago show ed Mr. Johnson had a left ventricular ejection fraction of 30% and no valvular abnormalities. At that time, he was started on lisinopril 2.5 mg once daily and furosemide 40 mg once daily. His Primary Care Physician has been gradually increasing the lisinop ril dose, aiming for a dose that will better manage his hypertension up to a maximal 40 mg /day dose. Three weeks ago, his serum creatinine (SCr) and potassium (K) levels were normal and his physician increased his lisinopril from 5 mg to 10 mg once daily Mr. 7, yesterday and two days ago, showed abnormal results: SCr1.9 mg/dL and K 5.7 mEq/L. The normal SCr range for an adult is 0.7 1.3 mg/dL and the normal K range for an adult is 3.8 4.9 mEq/L. All other Chem 7 results were normal. Mr. Jo sublingual nitrate PRN and uses it once every 4 to 8 weeks. Mr. Johnson feels well and is asymptomatic. He has no other significant medical history. Mr. Johnson lives by himself and is activ e and independent. He is not overweight, does not drink alcohol and gave up smoking last year.
211 Anticoagulation Case Study : Mr. Williams is a 65 year old man with atrial fibrillation, has been on warfarin for the past 12 months after he presented to the local emergency department with signs of a TIA. A head CT scan and trans esophageal echocardiogram done at the time were normal. He has been well since. Mr. Williams reports that the most recent INR, measured this morning, was 4.6. Up until now, his INR re sults, which have been measured monthly, have been stable and in the range of 2.0 3.0. Mr. Williams also has hypertension and osteoarthritis. He had a left total hip replacement 6 months ago. His current medications are atenolol 50 mg once daily, ramipril 10 mg once daily, amiodarone 200 mg daily, and warfarin 6 mg at night. Mr. today, requested consultation with your anticoagulation service, and wants your recommendation on the next warfari n dose.
212 A PPENDIX K EXPERT PANEL SCRIPT TARGETS AND RESULTS Table K 1 Diabetes performance targets from the script Skill Mary Thomas Susan Skill A Excellent Entry Level Deficient Skill B Excellent Entry Level Deficient Skill D/E Excellent Entry Level Deficient Table K 2. Diabetes assessments by expert panel Excellent Entry Level Deficient Mary Skill A 7 (100%) 0 (0%) 0 (0%) Skill B 7 (100%) 0 (0%) 0 (0%) Skill D/E 7 (100%) 0 (0%) 0 (0%) Thomas Skill A 3 (43%) 4 (57%) 0 (0%) Skill B 3 (43%) 4 (57%) 0 (0%) Skill D/E 2 (29%) 5 (71%) 0 (0%) Susan Skill A 0 (0%) 0 (0%) 7 (100%) Skill B 0 (0%) 0 (0%) 7 (100%) Skill D/E 0 (0%) 0 (0%) 7 (100%)
213 Table K 3 Heart Failure performance targets by the script Skill Patricia Joseph Dorothy Skill A Entry Level Excellent Deficient Skill B Entry Level Deficient Excellent Skill D/E Deficient Excellent Deficient Table K 4 Heart Failure assessments by expert panel Excellent Entry Level Deficient Patricia Skill A 1 (14%) 2 (29%) 4 (57%) Skill B 0 (0%) 3 (43%) 4 (57%) Skill D/E 0 (0%) 0 (0%) 7 (100%) Joseph Skill A 2 (29%) 3 (43%) 2 (29%) Skill B 0 (0%) 2 (29%) 5 (71%) Skill D/E 2 (29%) 3 (43%) 2 (29%) Dorothy Skill A 0 (0%) 1 (14%) 6 (86%) Skill B 2 (29%) 0 (0%) 5 (71%) Skill D/E 1 (14%) 1 (14%) 5 (71%)
214 Table K 5 Anticoagulation performance targets by the script Skill Linda David Barbara Skill A Excellent Entry Level Deficient Skill B Entry Level Excellent Deficient Skill D/E Excellent Deficient Excellent Table K 6 Anticoagulation assessments by expert panel Excellent Entry Level Deficient Linda Skill A 6 (86%) 0 (0%) 1 (14%) Skill B 3 (43%) 2 (29%) 2 (29%) Skill D/E 3 (43%) 2 (29%) 2 (29%) David Skill A 1 (14%) 2 (29%) 4 (57%) Skill B 2 (29%) 4 (57%) 1 (14%) Skill D/E 0 (0%) 1 (14%) 6 (86%) Barbara Skill A 0 (0%) 1 (14%) 6 (86%) Skill B 0 (0%) 0 (0%) 7 (100%) Skill D/E 0 (0%) 2 (29%) 5 (71%)
215 A PPENDIX L PRECEPTOR PANEL RESU LTS Table L 1. Diabetes assessments by preceptor panel Excellent Entry Level Deficient Mary Skill A 27 (64%) 10 (24%) 5 (12%) Skill B 28 (67%) 9 (21%) 5 (12%) Skill D/E 29 (69%) 9 (21%) 4 (10%) Thomas Skill A 9 (21%) 22 (52%) 11 (26%) Skill B 8 (64%) 23 (55%) 11 (26%) Skill D/E 8 (19%) 19 (45%) 15 (36%) Susan Skill A 0 (0%) 1 (02%) 41 (98%) Skill B 0 (0%) 2 (05%) 40 (95%) Skill D/E 0 (0%) 0 (0%) 42 (100%) Table L 2. Heart Failure assessments by preceptor panel Excellent Entry Level Deficient Patricia Skill A 4 (10%) 15 (36%) 23 (55%) Skill B 4 (10%) 11 (26%) 27 (64%) Skill D/E 1 (02%) 17 (40%) 24 (57%) Joseph Skill A 9 (21%) 24 (57%) 9 (21%) Skill B 6 (14%) 15 (36%) 21 (50%) Skill D/E 9 (21%) 21 (50%) 12 (29%) Dorothy Skill A 2 (05%) 15 (36%) 25 (60%) Skill B 4 (10%) 15 (36%) 23 (55%) Skill D/E 1 (02%) 21 (50%) 20 (48%)
216 Table L 3. Anticoagulation assessments by preceptor panel Excellent Entry Level Deficient Linda Skill A 25 (60%) 13 (31%) 4 (10%) Skill B 19 (45%) 18 (43%) 5 (12%) Skill D/E 17 (40%) 18 (43%) 7 (17%) David Skill A 11 (26%) 18 (43%) 13 (31%) Skill B 11 (26%) 17 (40%) 14 (33%) Skill D/E 5 (12%) 8 (19%) 29 (69%) Barbara Skill A 0 (0%) 8 (19%) 34 (81%) Skill B 2 (05%) 8 (19%) 32 (76%) Skill D/E 2 (05%) 13 (31%) 27 (64%)
217 A PPENDIX M PRECEPTER RELIABILIT Y RESULTS Table M 1. Reliability competent vs. not competent Skill Case per Skill ICC Lower (95% CI) Upper (95% CI) p value Skill A 0.37 0.20 0.69 0.00 Diabetes 0.66 0.33 0.99 0.00 Heart Failure 0.15 0.03 0.88 0.00 Anticoagulation 0.46 0.17 0.97 0.00 Skill B 0.31 0.16 0.63 0.00 Diabetes 0.63 0.30 0.99 0.00 Heart Failure 0.00 0.02 0.45 0.37 Anticoagulation 0.37 0.12 0.96 0.00 Skill D/E 0.30 0.15 0.62 0.00 Diabetes 0.67 0.34 0.99 0.00 Heart Failure 0.07 0.01 0.78 0.01 Anticoagulation 0.29 0.08 0.94 0.00 * Significant Table M 2. Reliability excellent vs. entry l evel Skill Case per Skill ICC Lower (95% CI) Upper (95% CI) p value Skill A 0.24 0.11 0.55 0.00 Diabetes 0.54 0.22 0.98 0.00 Heart Failure 0.10 0.01 0.84 0.00 Anticoagulation 0.26 0.07 0.93 0.00 Skill B 0.19 0.09 0.49 0.00 Diabetes 0.49 0.19 0.98 0.00 Heart Failure 0.00 0.02 0.42 0.40 Anticoagulation 0.24 0.06 0.93 0.00 Skill D/E 0.19 0.09 0.48 0.00 Diabetes 0.52 0.21 0.98 0.00 Heart Failure 0.02 0.01 0.61 0.12 Anticoagulation 0.18 0.04 0.90 0.00 * Significant
218 Table M 3. Reliability of scale items Skill A Case Rating Scale ICC Lower (95% CI) Upper (95% CI) p value Diabetes Supervision 0. 65 0. 33 0.99 0.00 Quality 0. 76 0. 46 0. 9 2 0.00 Complexity 0.6 3 0. 30 0.99 0.00 Consistency 0. 62 0. 29 0.9 9 0.00 Efficiency 0. 6 7 0. 34 0.98 0.00 Heart Failure Supervision 0. 16 0. 03 0. 8 9 0.00 Quality 0. 12 0. 02 0. 86 0.00 Complexity 0. 11 0. 02 0. 85 0.00 Consistency 0. 10 0. 02 0. 84 0.00 Efficiency 0. 14 0. 02 0. 89 0.00 Anticoagulation Supervision 0. 56 0. 24 0.9 8 0.00 Quality 0. 55 0. 23 0.9 8 0.00 Complexity 0. 45 0. 16 0.9 7 0.00 Consistency 0. 53 0. 22 0.9 8 0.00 Efficiency 0. 32 0. 10 0.9 5 0.00 * Significant Table M 4. Reliability of scale items Skill B Case Rating Scale ICC Lower (95% CI) Upper (95% CI) p value Diabetes Supervision 0.64 0.3 2 0.99 0.00 Quality 0.72 0. 40 0.9 9 0.00 Complexity 0.65 0. 32 0.99 0.00 Consistency 0.67 0. 34 0.99 0.00 Efficiency 0.75 0. 44 0.9 9 0.00 Heart Failure Supervision 0.03 0.0 1 0. 69 0.0 9 Quality 0.03 0.01 0. 68 0.0 9 Complexity 0.00 0.01 0. 47 0. 32 Consistency 0.04 0.0 0 0. 71 0.0 3 Efficiency 0.01 0.02 0. 33 0. 54 Anticoagulation Supervision 0.38 0. 13 0. 96 0.00 Quality 0.34 0. 11 0. 96 0.00 Complexity 0.36 0. 11 0.9 6 0.00 Consistency 0.39 0. 13 0.9 6 0.00 Efficiency 0.24 0. 06 0.9 3 0.00 * Significant
219 Table M 5. Reliability of scale items Skill D/E Case Rating Scale ICC Lower (95% CI) Upper (95% CI) p v alue Diabetes Supervision 0. 72 0.3 9 0.99 0.00 Quality 0. 78 0. 48 0.99 0.00 Complexity 0. 71 0. 39 0.99 0.00 Consistency 0. 74 0. 42 0.99 0.00 Efficiency 0. 83 0. 55 0.99 0.00 Heart Failure Supervision 0. 15 0.0 3 0. 88 0.09 Quality 0. 24 0.0 6 0. 93 0.09 Complexity 0. 1 0 0.01 0. 83 0.32 Consistency 0. 19 0.0 4 0. 91 0.03 Efficiency 0. 12 0.02 0. 86 0.54 Anticoagulation Supervision 0.3 5 0.1 1 0.96 0.00 Quality 0. 32 0.1 0 0.9 5 0.00 Complexity 0. 19 0. 04 0.9 1 0.00 Consistency 0. 28 0. 08 0.9 4 0.00 Efficiency 0. 18 0.0 4 0.9 0 0.00 * Significant Table M 6. Reliability of global assessment competent vs. not competent Skill Case per Skill ICC Lower (95% CI) Upper (95% CI) p value Skill A Diabetes 0.78 0.48 0.99 0.00 Heart Failure 0.25 0.07 0.93 0.00 Anticoagulation 0.57 0.24 0.98 0.00 Skill B Diabetes 0.65 0.32 0.99 0.00 Heart Failure 0.03 0.01 0.68 0.13 Anticoagulation 0.35 0.11 0.96 0.00 Skill D/E Diabetes 0.77 0.47 0.99 0.00 Heart Failure 0.26 0.07 0.94 0.00 Anticoagulation 0.32 0.10 0.95 0.00 * Significant
220 A PPENDIX N PRECEPTOR ACCURACY R ESULTS Table N 1. Accuracy Skill A Skill A Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.45 0.14 Thomas 0.15 0.36 Susan 0.86 N/A 1 Heart Failure cases Patricia 0.62 0.56 Joseph 0.50 0.46 Dorothy 0.19 0.89 Anticoagulation cases Linda 0.55 0.10 ** David 0.18 0.69 Barbara 0.62 0.18 ** Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed Table N 2. Accuracy Skill B Skill B Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.45 0.18 Thomas 0.15 0.32 Susan 0.73 N/A 1 Heart Failure cases Patricia 0.51 0.45 Joseph 0.26 0.54 Dorothy 0.44 0.07 ** Anticoagulation cases Linda 0.26 0.55 David 0.30 0.58 Barbara 0.18 N/A 1 * Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed
221 Table N 3. Accuracy Skill D/E Skill D/E Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.53 0.18 Thomas 0.06 * 0.67 Susan N/A 1 N/A 1 Heart Failure cases Patricia 0.03 * N/A 1 Joseph 0.66 0.51 Dorothy 0.23 0.16 Anticoagulation cases Linda 0.38 0.50 David 0.34 0.64 Barbara 0.54 0.77 * Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed Figure N 1. Skill A Comparing expert and preceptor panels 31% 20% 49% 30% 48% 22% 0% 10% 20% 30% 40% 50% 60% Excellent Entry level Deficient expert panel preceptor panel
222 Figure N 2. Skill B Comparing expert and preceptor panels Figure N 3. Skill D/E Comparing expert and preceptor panels 26% 23% 48% 27% 44% 29% 0% 10% 20% 30% 40% 50% 60% Excellent Entry level Deficient expert panel preceptor panel 23% 22% 52% 23% 48% 30% 0% 10% 20% 30% 40% 50% 60% Excellent Entry level Deficient expert panel preceptor panel
223 A PPENDIX O SUPERVISION PROXY RA TING SCALE The principal researcher created a surrogate scale by collapsing the supervision domain of the multifactorial scale into a three item rating scale. This supervision proxy 1) Excellent performance : Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry level) 2 ) E ntry level performance: Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry level) 3 ) Deficient performance: Supervision: Close supervision (constant monitoring) More than standard level of supervision (for entry level) Standard level of supervision (for entry level) Minimal supervision (beyond entry level)
224 A PPENDIX P RATING SCALE COMPARISION Table P 1. Skill A Comparing rating scale with proxy scale Skill A Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.10 * 0.50 Thomas 0.01 * 0.24 Susan 0.00 * N/A 1 Heart Failure cases Patricia 0.00 * 0.56 Joseph 0.00 * 0.35 Dorothy 0.01 * 0.64 Anticoagulation cases Linda 0.06 * 0.56 David 0.01 * 0.58 Barbara 0.01 * N/A 1 ** Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed Table P 2. Skill B Comparing rating scale with proxy scale Skill B Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.22 0.58 Thomas 0.07 * 0.40 Susan 0.09 * N/A 1 Heart Failure cases Patricia 0.00 * 0.28 Joseph 0.01 * 0.36 Dorothy 0.01 * 0.49 Anticoagulation cases Linda 0.22 0.30 David 0.04 * 0.59 Barbara 0.02 * 0.41 ** Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed
225 Table P 3. Skill D/E New rating scale vs. proxy supervision scale Skill D/E Case study Student Competent v s. Not competent p value Excellent vs. Entry level p value Diabetes cases Mary 0.50 0.58 Thomas 0.26 0.00 * Susan 0.12 N/A 1 Heart Failure cases Patricia 0.04 * 0.27 Joseph 0.15 0.46 Dorothy 0.00 * 0.37 Anticoagulation cases Linda 0.03 * 0.42 David 0.01 * 0.22 Barbara 0.01 * 0.47 ** Significant up to 10%, significant up to 30% 1) One variable is a constant and association not computed Figure P 1. Skill A Comparing rating scale with proxy scale
226 Figure P 2. Skill B Comparing rating scale with proxy scale Figure P 3. Skill D/E New rating scale vs. proxy supervision scale
227 LIST OF REFERENCES 1. Manasse J, Henri R., Speedie MK. Pharmacists, Pharmaceuticals, and Policy Issues Shaping the Work Force in Pharmacy. Am J Health Syst Pharm. 2007;64(12):e30 e48. 2. IOM. To Err is Human: Building a Safer Health System Washington, DC: Institute of Medicine; 1999. 3. U.S. Bureau of Health Professions. Report on Health Professional Accessibility. Washington, DC 1996. 4. Knapp KK, Cultice JM. New pharmacist supply proj ections: lower separation rates and increased graduates boost supply estimates. J Am Pharm Assoc (2003). Jul Aug 2007;47(4):463 470. 5. Bureau of Health Professions. The Pharmacist Workforce: A Study of the Supply and Demand for Pharmacists. 2000. Publishe d Last Modified Date|. Accessed Dated Accessed|. 6. Leach DC. Building and Assessment Competence: The Potential for Evidence based Graduate Medical Education. Quality Management in Health Care. 2002;11(1):39. 7. Lenburg C. The Framework, Concepts and Metho ds of the Competency Outcomes and Performance Assessment (COPA) Model. Online Journal of Issues in Nursing. 1999;4(2). 8. Richardson WC, ed. Crossing the Quality Chasm: A New Health System for the 21st Century ; 2000. 9. Wennberg DE, Lucas FL, Birkmeyer JD, Bredenberg CE, Fisher ES. Variation in Carotid Endarterectomy Mortality in the Medicare Population: Trial Hospitals, Volume, and Patient Characteristics. JAMA. 1998;279(16):1278 1281. 10. O'Connor GT, Plume SK, Olmstead EM, et al. A regional intervention to improve the hospital mortality associated with coronary artery bypass graft surgery. The Northern New England Cardiovascular Disease Study Group. JAMA. March 20, 1996 1996;275(11):841 846. 11. Greiner AC, Knebel E, eds. Health Professions Education: A B ridge to Quality ; 2003. 12. Harden RM. Developments in outcome based education. Medical Teacher 2002: 117 120.
228 13. Commission to Implement Change in Pharmaceutical Education. What is the mission of pharmaceutical education? Background paper I. American Journal of Pharmecutical Educucation. 1993;57:374 376. 14. Commission to Implement Change in Pharmaceutical Education. Entry level education in pharmacy: Commitment to change Background Paper II. American Journal of Pharmaceutical Education. 1993; 57:366 374. 15. Byrd G. Can the profession of pharmacy serve as a model for health informationist professionals? Journal of Medical Library Association. 2002;90(1):68 75. 16. CAPE Advisory Panel on Educational Outcomes. CAPE Educational Outcomes. http://w ww.aacp.org/Docs/MainNavigation/Resources/6075_CAPE2004.pdf. Accessed November 7, 2008, 2008. 17. Zlatic T. Redefining a profession: assessment in pharmacy education. In: Palomba CA BT, ed. Assessing Student Competence in Accredited Disciplines: Pioneering Approaches to Assessment in Higher Education Sterling, VA: Stylus Publishing; 2001:49 70. 18. Schuwirth LWT, Southgate L, Page GG, et al. When enough is enough: a conceptual basis for fair and defensible practice performance assessment. MEDICAL EDUCATION 2002;36(10):925 930. 19. McAllister A. Performance in the Workplace Sydney: School of Communication Sciences and Disorders, The University of Sydney; 2005. 20. Rethans JJ, Norcini JJ, Baron Maldonado M, et al. The relationship between competence and performance: implications for assessing practice performance. 2002. 21. Raehl CL. AACP Pharmacy Education Assessment Services: Outcomes, Assessment, Accountability. Journal of Pharmaceutica l Education. 2008;72(1). 22. CAPE Advisory Panel on Educational Outcomes. Utilization of the Center for the Advancement of Pharmaceutical Education Educational Outcomes, Revised Version 2004: Report of the 2005 American College of Clinical Pharmacy Educati onal Affairs Committee. 2006. 23. Gonczi A. Competency based assessment in the professions in Australia. Assessment in Education: Principles, Policy & Practice. 1994;1(1):27. 24. Kaslow NJ. Competencies in professional psychology. American Psychologist. 20 04;59:774 781.
229 25. Marrelli T, & Hoge, 2005. Strategies for Developing Competency Models Journal Administration and Policy in Mental Health and Mental Health Services Research 2005;32(5 6). 26. Govaerts M, van der Vleuten C, Schuwirth L, Muijtjens A. Broad ening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In training Assessment. Advances in Health Sciences Education. 2007/05/14/ 2007;12(2):239 260. 27. Clauser BE. Recurrent Issues and Recent Advances in Scoring Performance Asses sments. Applied Psychological Measurement. December 1, 2000 2000;24(4):310 324. 28. Skrabal MZ, Jones RM, Nemire RE, et al. National Survey of Volunteer Pharmacy Preceptors. American Journal of Pharmaceutical Education 2008; 72 (5) Article 112. 2008. 29. G andy JS. Personal communication with Director of Academic/Clinical Education Affairs at APTA; 2009. 30. Ried LD, Nemire, R., Doty, R., Brickler, M., Anderson, H., Frenzel Shepherd, E., Larose Pierre, M., and Dugan, D. An Automated Competency based Student Performance Assessment Program for Advanced Pharmacy Practice Experiential Programs. American Journal of Pharmaceutical Education. 2007;71(6). 31. Messick S. Standards of Validity and the Validity of Standards in Performance Asessment. Educational Measurem ent: Issues and Practice. 1995;14(4):5 8. 32. Messick S. Validity of performance assessments. In: Phillips GW, ed. Technical Issues in Large Scale Performance Assessment Washington: National Centre for Education Statistics.; 1996:pp. 1 18. 33. American Ed ucational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC 1999. 34. Haynes SN, Richard DCS, Kubany ES. Content Validity in Psycholog ical Assessment: A Functional Approach to Concepts and Methods. Psychological Assessment. 1995;7(3):238 247. 35. Nunnally JC. Psychometric Theory 1978: McGraw Hill Book Company; 1978. 36. Sireci SG. The construct of content validity. Social Indicators Res earch. Nov 1998;45(1 3):83 117. 37. Sireci SG. Gathering and analyzing content validity data. Educational Assessment. 1998;5(4):299 321.
230 38. Lynne S. Determining Content Validity of a Self Report Instrument for Adolescents Using a Heterogeneous Expert Panel. Nursing Research. 2007;56(5):361 366. 39. Streiner DL, Norman GR. Health Measurement Scales 2 ed. Oxford, England: Oxford University Pres s; 1995. 40. van der Vleuten CPM, Schuwirth LWT. Assessing professional competence: from methods to programmes. MEDICAL EDUCATION. 2005;39(3):309 317. 41. Cross V, Hicks, C and Barwell, F. Comparing the Importance of Clinical Competence Criteria across Spe cialties Impact on undergraduate assessment. Physiotherapy. 2001;87(7):351 367. 42. Carr D. QUESTIONS OF COMPETENCE. British Journal of Educational Studies. 1993;41(3):253 271. 43. Dugan BD. Enhancing Community Pharmacy Through Advanced Pharmacy Practice E xperiences. American Journal of Pharmaceutical Education. 2006;70(1):Article 21. 44. ACPE ACfPE. Accreditation standards and guidelines for the professional program in pharmacy leading to the doctor of pharmacy degree 2009. 45. Noyce P. Governance and the pharmaceutical workforce in England. Res Social Adm Pharm. Sep 2006;2(3):408 419. 46. Eraut M. Developing professional knowledge and competence London: Falmer Press; 1994. 47. Cross V, Hicks C, Barwell F. Exploring the Gap Between Evidence and Judgement: using video vignettes for practice based assessment of physiotherapy undergraduates. Assessment & Evaluation in Higher Education. 2001;26(3):189 212. 48. Hager P, Gonczi A. Ge neral issues about assessment of competence. Assessment & Evaluation in Higher Education. 1994/04// 1994;19(1):3. 49. Hager P, Gonczi A. What is competence? Medical Teacher. 1996/03// 1996;18(1):15. 50. Wolf A. Competence Based Assessment. Buckingham: Open University Press.; 1995. 51. Thorndike RL, Hagen E. Measurement and Evaluation in Psychology and Education 3 ed. New York: John Wiley and Sons, Inc; 1977. 52. Harris R, Guthrie H, Hobart B, Lundberg D. Competency Based Education and Training: Between a R ock and a Whirlpool Melbourne Macmillan Education; 1995.
231 53. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. Sep 1990;65(9 Suppl):S63 67. 54. Rethans JJ, Sturmans F, Drop R, van der Vleuten C. Assessment of the performance o f general practitioners by the use of standardized (simulated) patients. Br J Gen Pract. Mar 1991;41(344):97 99. 55. Miller DM, Linn RL. Validation of Performance Based Assessments. Applied Psychological Measurement. December 1, 2000 2000;24(4):367 378. 56 Andrich D. A framework relating outcomes based education and the taxonomy of educational objectives. Studies In Educational Evaluation. 2002;28(1):35 59. 57. Clark D. Learning Domains or Bloom's Taxonomy. 2007 http://www.nwlink.com/~donclark/hrd/bloom.ht ml. Published Last Modified Date|. Accessed Dated Accessed|. 58. Krathwohl DR. A Revision of Bloom's Taxonomy: An Overview. Theory Into Practice. 2002;41(4):212 218. 59. Epstein R, Hundert E. Defining and assessing professional competence. JAMA: Journal of the American Medical Association. 2002//09/2002 Jan 9 2002;287(2):226. 60. Kane MT. The Assessment of Professional Competence. Eval Health Prof. June 1, 1992 1992;15(2):163 182. 61. Worthen BR, Borg WR, White K. Measurement and evaluation in the schools Longman Longman 1993. 62. Hays RB, Davies HA, Beard JD, et al. Selecting performance assessment methods for experienced physicians. Med Educ. Oct 2002;36(10):910 917. 63. Boud D. Sustainable Assessment: Rethinking Assessment for the Learning Society. Stu dies in Continuing Education. 2000;22(2). 64. Duke M. Clinical evaluation -difficulties experienced by sessional clinical teachers of nursing: a qualitative study. Journal of Advanced Nursing. 1996;23(2):408 414. 65. Sax G. Princibles of Education and Phyc hological Measurement and Evaluation (3rd ed.) Belmont, CA: Wadsworth; 1997. 66. Bondy KN. Criterion Referenced Definitions for Rating Scales in Clinical Evaluation. Journal of Nursing Education. 1983;22(9):376 382. 67. Bondy KN. Clinical Evaluation of St udent Performance: The Effects of Criteria on Accuracy and Reliability. Research in Nursing & Health. 1984;7(1):25 33.
232 68. Noel GL, Herbers JE, Caplow MP, Cooper GS, Pangaro LN, Harvey J. How Well Do Internal Medicine Faculty Members Evaluate the Clinical Skills of Residents?. Annals of Internal Medicine. 1992;117(9):757. 69. Martin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who should rate candidates in an objective structured clinical examination? Acad Med. Feb 1996;71(2):170 175. 70. Wray N, Friedl and J. Detection and correction of house staff error in physical diagnosis. JAMA. 1983;249:1035 1037. 71. Holmboe ES, Hawkins RE. Practical guide to the evaluation of clinical competence Philadelphia, PA: Mosby/Elsevier; 2008. 72. Norman G. Checklists vs. Ratings, the Illusion of Objectivity, the Demise of Skills and the Debasement of Evidence. Advances in Health Sciences Education. 2005;10(1):1 3. 73. Landy FJ, Farr JL. Performance rating. Psychological Bulletin. 1980;87(1):72 107. 74. Gomez Mejia LR. Evaluating Employee Performance. Journal of Organizational Behavior Management. 1988;9(2):155 172. 75. Kingstrom PO, Bass AR. A CRITICAL ANALYSIS OF STUDIES COMPARING BEHAVIORALLY ANCHORED RATING SCALES (BARS) AND OTHER RATING FORMATS. Pe rsonnel Psychology. Summer81 1981;34(2):263 289. 76. Fay CH, Latham GP. Effects of training and rating scales on rating errors. Personnel Psychology. 1982;35:105 117. 77. Wolfe EW, Gitomer DH. The Influence of Changes in Assessment Design on the Psychometr ic Quality of Scores. Applied Measurement in Education. 2001;14(1):91 107. 78. Finn RH. Effects of some variations in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement. 1972;32:255 265. 79. Be ndig AW. Reliability and number of rating scale categories. Journal of Applied Psychology. 1954;38:38 40. 80. Bendig AW. Reliability of short rating scales and the heterogeneity of the rated stimuli. Journal of Applied Psychology. 1954;38:167 170. 81. Jett e DU, Bertoni A, Coots R, Johnson H, McLaughlin C, Weisbach C. Clinical Instructors' Perceptions of Behaviors That Comprise Entry Level Clinical Performance in Physical Therapist Students: A Qualitative Study. PHYS THER. July 1, 2007 2007;87(7):833 843.
233 82 Cross V, Hicks C. What Do Clinical Educators Look for in Physiotherapy Students? Physiotherapy. 1997;83(5):249 260. 83. Alexander HA. Physiotherapy student clinical education: The influence of subjective judgements on observational. Assessment & Evaluation in Higher Education. 1996;21(4):357. 84. Chapman J. Agonising about assessment. In: Fish D, Coles C, eds. Developing Professional Judgement in Health Care: Learning through the critical appreciation of practice Oxford: Butterworth Heinemann; 1 998:157 181. 85. Lankshear AJ. The use of focus groups in a study of attitudes to student nurse assessment.. Journal of Advanced Nursing. 1993/12// 1993;18(12):1986 1989. 86. Landy FJ, Guion RM. Development of scales for the measurement of work motivation. Organizational Behavior and Human Performance. 1970;5(1):95 103. 87. Davis MH, Harden RM. Competency based assessment: making it a reality. Medical Teacher. 2003/11// 2003;25(6):565 568. 88. Wojtczak A. Medical education terminology. Medical Teacher 2002 : 357 357. 89. Gray. Global rating scales in residency education. Acad Med. 1996;71(1). 90. Halpern R, Lee MY, Boulter PR, Phillips RR. A synthesis of nine major reports on physicians competencies for the emerging practice environment. Academic Medicine. 2 001;76(6):606 615. 91. Downing SM. Validity: on the meaningful interpretation of assessment data. Medical Education. 2003;37(9):830 837. 92. Clauser BE, Margolis MJ, Swanson DB. Issues of Validity and Reliability for Assessments in Medical Education. In: H olmboe ES, Hawkins RE, eds. Practical guide to the evaluation of clinical competence ; 2008:10 23. 93. Beckman TJ, Ghosh AK, Cook DA, Erwin PJ, Mandrekar JN. How reliable are assessments of clinical teaching? A review of the published instruments. J Gen Int ern Med. Sep 2004;19(9):971 977. 94. Messick S. In: Linn RL, ed. Educational measurement Phoenix, Az: American Council on Education and Oryx Press; 1993:13 104. 95. Walsh WB, Bezt NE. Tests and Assessment 4 ed. Upper Saddle River, NJ: Prentice Hall; 200 0. 96. Ebel RL, Frisbie DA. Essentials of Educational Measurement Englewood Cliffs, N.J.: Prentice Hall 1991.
234 97. Guion RM. Content Validity -The Source of My Discontent. Applied Psychological Measurement. 1977;1(1):1 10. 98. Lawshe CH. A QUANTITATIVE APP ROACH TO CONTENT VALIDITY. Personnel Psychology. Winter75 1975;28(4):563 575. 99. Lynn M. Determination and quantification of content validity. Nursing Research. 1986;35(5). 100. Crocker L. Assessing Content Representativeness of Performance Assessment Exercises. Applied Measurement in Education. 1997;10(1):83 95. 101. Guion RM. CHANGING VIEW FOR PERSONNEL SELECTION RESEARCH. Personnel Psychology. Summer87 1987;40(2):199 213. 102 Stelly DJ, Goldstein HW. Application of Content Validation Methods to Broader Constructs. In: McPhail SM, ed. Alternative validation strategies developing new and leveraging existing validity evidence San Francisco: Jossey Bass; 2007. 103. Moss PA. Shif ting Conceptions of Validity in Educational Measurement Implications for Performance Assessment. Review of Educational Research. Fal 1992;62(3):229 258. 104. Lennon RT. Assumptions Underlying the Use of Content Validity. Educational and Psychological Mea surement. October 1, 1956 1956;16(3):294 304. 105. Downing SM. Threats to the validity of clinical teaching assessments: What about rater error? Medical Education. 2005;39(4):353 355. 106. Dauphinee WD. Assessing clinical performance: Where do we stand and what might we expect? The Journal of the American Medical Association. 1995;274(9):741 743. 107. McDowell I. The theoretical and technical foundations of health measurement 1996. 108. Me ssick S, ed. Validity. 3 ed. New York: American Council on Education.; 1989. Linn RL, ed. Educational measurement. 109. Schuwirth LW, van der Vleuten CP. The use of clinical simulations in assessment. Med Educ. Nov 2003;37 Suppl 1:65 71. 110. Winters J, Ha uck B, Riggs CJ, Clawson J, Collins J. Use of videotaping to assess competencies and course outcomes. J Nurs Educ. Oct 2003;42(10):472 476. 111. Crossley J, Humphris G, Jolly B. Assessing health professionals. Medical Education. 2002;36(9):800 804.
235 112. Ca rraccio C, Wolfsthal SD, Englander R, Ferentz K, Martin C. Shifting Paradigms: From Flexner to Competencies. Academic Medicine. 2002;77(5):361 367. 113. Schwabbauer M. But can they do it? Clinical competency assessment. Clinical Laboratory Science. 2000;13 (1):47 52. 114. Whitcomb ME. Competency based graduate medical education? Of course! But how should competency be assessed? Acad Med. May 2002;77(5):359 360. 115. SUCCESS. SUCCESS WEB site. http://www.cop.ufl.edu/doty/success/help/. Accessed December 15th 2008, 2009. 116. Clayton MJ. DELPHI: A Technique to Harness Expert Opinion for Criticak Decision Making Tasks in Education. Educational Psychology. 1997/12// 1997;17(4):373. 117. Keeney S, Hasson F, McKenna HP. A critical review of the Delphi technique a s a research methodology for nursing. International Journal of Nursing Studies. 2001/4 2001;38(2):195 200. 118. McKenna HP. The Delphi technique: a worthwhile research approach for nursing? Journal of Advanced Nursing. 1994;19(6):1221 1225. 119. Powell C. The Delphi technique: myths and realities. Journal of Advanced Nursing. 2003;41(4):376 382. 120. Williams PL, Webb C. The Delphi technique: a methodological discussion. Journal of Advanced Nursing. 1994/01// 1994;19(1):180 186. 121. Waltz CF, Strickland O, Lenz ER. Measurement in Nursing Research Philadelphia: F.A. Davis Co; 1991. 122. Grant JS, Davis LL. Selection and use of content experts for instrument development. Research in Nursing & Health. 1997;20(3):269 274. 123. Dillman DA, Smyth JD, Christian L M. Internet, Mail, and Mixed Mode Surveys The Tailored Design Method Hoboken, NY: John Wiley ans Sons, Inc.; 2009. 124. Linstone H, Turoff M. The Delphi method: techniques and applications Reading, MA: Addison Wesley Publishing Company 1975. 125. Campbel l S, Hann M, Roland M, Quayle JA, Shekelle P. The Effect of Panel Membership and Feedback on Ratings in a Two Round Delphi Survey: Results of a Randomized Controlled Trial. Medical Care. 1999;37(9):964 968. 126. de Villiers MR, de Villiers PJT, Kent AP. Th e Delphi Technique in Health Sciences Education Research. Medical Teacher. 2005;27(7):639 643.
236 127. Duffield C. The Delphi Technique: a comparison of results obtained using two expert panels. Int J Nurs Stud. Jun 1993;30(3):227 237. 128. Erffmeyer RC, Erffmeyer ES, Lane IM. The Delphi Technique: An Empirical Evaluation of the Optimal Number of Rounds. Group Organization Management. 1986;11(1 2):120 128. 129. Defloor T, Van Hecke A, Verhaeghe S, Gobert M, Darras E, Grypdonck M. The clinical nursing competences and their complexity in Belgian general hospitals. Journal of Advanced Nursing. 2006;56(6):669 678. 130. Forrest FC, Taylor MA, Postlethwaite K, Aspinall R. Use of a high fidelity simulator to develop testing of the technical performan ce of novice anaesthetists. Br. J. Anaesth. March 1, 2002 2002;88(3):338 344. 131. Garfunkel LC, Sidelinger DE, Rezet B, Blaschke GS, Risko W. Achieving Consensus on Competency in Community Pediatrics. Pediatrics. April 1, 2005 2005;115(4):1167 1171. 132. Hobgood C, Riviello R, Jouriles N, Hamilton G. Assessment of communication and interpersonal skills competencies. Academic Emergency Medicine. 2002;9(11):1257 1269. 133. Irvine F. Exploring district nursing competencies in health promotion: the use of the Delphi technique. Journal of Clinical Nursing. 2005;14(8):965 975. 134. Lindsay P, Schull M, Bronskill S, Anderson G. The Development of Indicators to Measure the Quality of Clinical Care in Emergency Departments Following a Modified Delphi Approach. Acade mic Emergency Medicine. 2008;9(11). 135. Lofmark A, Thorell Ekstrand I. An assessment form for clinical nursing education: a Delphi study. Journal of Advanced Nursing. 2004;48(3):291 298. 136. Pfleger D, McHattie L, Diack H, McCaig D, Stewart D. Developing consensus around the pharmaceutical public health competencies for community pharmacists in Scotland. Pharmacy World & Science. 2008/01/21/ 2008;30(1):111 119. 137. Polivka BJ, Stanley SAR, Gordon D, Taulbee K, Kieffer G, McCorkle SM. Public Health Nursin g Competencies for Public Health Surge Events. Public Health Nursing. 2008;25(2):159 165. 138. Survey Monkey [computer program]. Version. Palo Alto, California, USA SurveyMonkey.com, LLC 2009. 139. Gibson F, Soanes L, Gibson F. The Development of Clinical Competencies for Use on a Paediatric Oncology Nursing Course Using a Nominal Group Technique. Journal of Clinical Nursing. 2000;9(3):459 469.
237 140. Walley T, Webb DJ. Developing a core curriculum in clinical pharmacology and therapeutics: a Delphi study. B ritish Journal of Clinical Pharmacology. 1997;44(2):167 170. 141. Polit DF, Beck CT. The Content Validity Index: Are you sure you know what's being reported? Research in Nursing & Health. 2006;29(5):489 497. 142. Wynd CA, Schmidt B, Schaefer MA. Two Quanti tative Approaches for Estimating Content Validity. West J Nurs Res. August 1, 2003 2003;25(5):508 518. 143. Beckstead JW. Content validity is naught. International Journal of Nursing Studies. 2009;46:1274 1283. 144. Doros G, Lew R. Design Based on Intra Cl ass Correlation Coefficients American Journal of Biostatistics. 2010;1(1):1 8. 145. Bonett DG. Sample Size Requirements for Estimating Intraclass Correlations with Desired Precision. Statistics in Medicine. 2002;21(9):1331 1335. 146. Berk RA. Generalizabil ity of Behavioral Observations: A Clarification of Interobserver Agreement and Interobserver Reliability. Am J Ment Defic. Mar 1979;83(5):460 472. 147. McGraw KO, Wong SP. Forming Inferences About Some Intraclass Correlation Coefficients. Psychological Met hods. 1996;1(1):30 46. 148. Shoukri MM, Asyali MH, Walter SD. Issues of Cost and Efficiency in the Design of Reliability Studies. Biometrics. 2003;59(4):1107 1112. 149. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psy chol Bull. Mar 1979;86(2):420 428. 150. Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977;33(1):159 174. 151. Agresti A. An Introduction to Categorical Data Analysis Second ed. Hoboken, NJ: John Wilry & Sons, Inc.; 2007. 152. Myford CM, Wolfe EW. Detecting and measuring rater effects using many facet Rasch measurement: Part II. J Appl Meas. 2004;5(2):189 227. 153. Iramaneerat C, Yudkowsky R. Rater Errors in a Clinical Skills Assessment of Medical Students. Eva l Health Prof. September 1, 2007 2007;30(3):266 283. 154. Motycka CA, Rose RL, Ried LD, Brazeau G. Self Assessment in Pharmacy and Health Science Education and Professional Practice. American Journal of Pharmaceutical Education. 2010;74(5).
238 155. Saal FE, Downey RG, Lahey MA. Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin. 1980;88(2):413 428. 156. Linn RL, Gronlund NE. Measurement and assessment in teaching 8th ed. Upper Saddle River, NJ: Prentice Hal l.; 2000. 157. Maxim BR, Dielman TE. Dimensionality, Internal Consistency and Interrater Reliability of Clinical Performance Ratings. MEDICAL EDUCATION. 1987;21(2):130 137. 158. Daelmans HEM, Hem Stokroos HHvd, Hoogenboom RJI, Scherpbier AJJA, Stehouwer CD A, Vleuten CPMvd. Global clinical performance rating, reliability and validity in an undergraduate clerkship. The Netherlands Journal of Medicine. 2005;6 3(7). 159. Haber R, Avins A. Do ratings on the american board of internal medicine resident evaluation form detect differences in clinical competence? Journal of General Internal Medicine. 1994;9(3):140 145. 160. Kwolek CJ, Donnelly MB, Sloan DA, Birrell SN, Strodel WE, Schwartz RW. Ward Evaluations: Should They Be Abandoned? Journal of Surgical Research. 1997;69(1):1 6. 161. Davis JD. Comparison of Faculty, Peer, Self, and Nurse Assessment of Obstetrics and Gynecology Residents. Obstetrics & Gynecology. 2002;99(4):647 651. 162. Durning SJ, Cation LJ, Jackson JL. The reliability and validity of the American Board of Internal Medicine Monthly Evaluation Form. Acad Med. Nov 2003;78(11):1175 1182. 163. Roach K, Gandy J, Deusinger SS, et al. The Development and Testing of APTA Clinical Performance Instruments. Physical Therapy. 2002;82(4):329 353. 164. Meldrum D Lydon A M, Loughnane M, et al. Assessment of undergraduate physiotherapist clinical performance: investigation of educator inter rater reliability. Physiotherapy. 2008;94(3):212 219. 165. CAPE Advisory Panel on Educational Outcomes. Professional and Gene ral Abilities Based Outcomes pdf] http://www.aacp.org/Docs/MainNavigation/ForDeans/5763_CAPEoutcomes.pdf. Accessed November 7,2008, 2008. 166. APTA Task Force. The development and testing of APTA clinical performance instruments. Physical Therapy. 2002/04/ / 2002;82(4):329 353. 167. (ACGME) ACfGME. Outcome Project. http://www.acgme.org/outcome/project/proHome.asp, 2008.
239 168. Kramer G, Neumann L. Validation of the National Board Dental Hygiene Examination. Journal of Dental Hygiene. 2007/06//2007 Summer 2007 ;81(3):63. 169. Huddle TS, Heudebert GR. Viewpoint: Taking Apart the Art: The Risk of Anatomizing Clinical Competence. Academic Medicine. 2007;82(6):536 541. 170. ten Cate O, Scheele F. Viewpoint: Competency Based Postgraduate Training: Can We Bridge the G ap between Theory and Clinical Practice? Academic Medicine. 2007;82(6):542 547. 171. Elstad EA, Lutfey KE, Marceau LD, Campbell SM, von dem Knesebeck O, McKinlay JB. What do physicians gain (and lose) with experience? Social Science & Medicine. 2010;70(11) :1728 1736. 172. Zayas T. Qualities of Effective Preceptors of Physician Assistant Students. Perspectives on Physician Assistant Education. 1999;10(1):7 11. 173. Remmen R, Denekens J, Scherpbier A, et al. An evaluation study of the didactic quality of cler kships. MEDICAL EDUCATION. 2000;34(6):460 464. 174. Busari JO, Scherpbier AJJA, van der Vleuten CPM, Essed GGM. The perceptions of attending doctors of the role of residents as teachers of undergraduate clinical students. MEDICAL EDUCATION. 2003;37(3):241 247. 175. Kachalia A, Studdert DM. Professional Liability Issues in Graduate Medical Education. JAMA. September 1, 2004 2004;292(9):1051 1056. 176. Meldrum D, Lydon AM, Loughnane M, et al. Assessment of undergraduate physiotherapist clinical performance: investigation of educator inter rater reliability. Physiotherapy. Sep 2008;94(3):212 219. 177. Loomis J. Evaluating clinical competence of physical therapy students. Part 1: the development of an instrument. Physiother Can. Mar Apr 1985;37(2) :83 89. 178. Robbins SP. Organizational behavior 4th ed. Englewood Cliffs, NJ: Prentice Hall; 1989. 179. Longford NT. Reliability of Essay Rating and Score Adjustment. Journal of Educational and Behavioral Statistics. September 21, 1994 1994;19(3):171 200 180. Bargagliotti T, Luttrell M, Lenburg C. Reducing Threats to the Implementation of a Competency Based Performance Assessment System. Online Journal of Issues in Nursing. Vol 4, No. 2. 1999;4(2). 181. Lievens F, Sanchez JI. Can training improve the quality of inferences made by raters in competency modeling? A quasi experiment. Journal of Applied Psychology. 2007;92(3):812 819.
240 182. Beck DE, O'sullivan PS, Boh LE. INCREASING THE ACCURACY OF OBSERVER RAT INGS BY ENHANCING COGNITIVE PROCESSING SKILLS. American Journal of Pharmaceutical Education. Fal 1995;59(3):228 235. 183. Holmboe E, Hawkins R, Huot S. Effects of training in direct observation of medical residents' clinical competence: a randomized trial. Annals of Internal Medicine. 2004/06//2004 Jun 1 2004;140(11):874. 184. Lievens F. Assessor training strategies and their effects on accuracy, interrater reliability, and discriminant validity. Journal of Applied Psychology. 2001;86(2):255 264. 185. Vendr ely A, Carter R. The influence of training on the rating of physical therapist student performance in the clinical setting. Journal of Allied Health. 2004/03//2004 Spring 2004;33(1):62 69. 186. Scullen SE, Mount MK, Goff M. Understanding the latent structu re of job performance ratings. Journal of Applied Psychology. 2000;85(6):956 970. 187. Brown N. What are the criteria that mentors use to make judgements on the clinical performance of student mental health nurses? An exploratory study of the formal writte n communication at the end of clinical nursing practice modules. Journal of Psychiatric & Mental Health Nursing. 2000;7(5):407 416. 188. Komorita SS, Graham WK. Number of Scale Points and the Reliability of Scales. Educational and Psychological Measurement December 1, 1965 1965;25(4):987 995. 189. Gardner PL. Scales and Statistics. Review of Educational Research. January 1, 1975 1975;45(1):43 57. 190. Sousa AC, Wagner DP, Henry RC, Mavis BE. Better data for teachers, better data for learners, better patien t care: college wide assessment at Michigan State University's College of Human Medicine ; 2011. 191. Commission to Implement Change in Pharmaceutical Education. What is the mission of pharmaceutical education? Background paper I. American Journal of Phar mecutical Educucation. 1994;57:374 376. 192. Commission to Implement Change in Pharmaceutical Education. Entry level education in pharmacy: Commitment to change Background Paper II. American Journal of Pharmaceutical Education. 1994;57:366 374. 193. ASHP A SoH SP. ASHP Health System Pharmacy 2015 Initiative. Revised March 2008 : American Society of Health System Pharmacists; 2005. 194. American College of Clinical Pharmacy, Burke JM, Miller WA, et al. ACCP White Paper: Clinical Pharmacist Competencies. Pharma cotherapy. 2008;28(6).
241 195. ASHP. ASHP long range vision for the pharmacy work force in hospitals and health systems. Am J Health Syst Pharm. 2007;64:1320 1330. 196. NACDS, NCPA, APhA. Project Destiny Executive Summary. : National Association of Chain Drug Stores, National Community Pharmacists Association, American Pharmacists Association. Project Destiny Executive Summary. ; 2008. 197. Kelly KA, Coyle JD, McAuley JW, Wallace LJ, Buerki RA, Frank SG. EVALUATION, ASSESSMENT, AND OUTCOMES: THE AACP INSTITUTE SUPPLEMENT Writing PharmD Program Level, Ability Based Outcomes: Key Elements for Success. American Journal of Pharmaceutical Education 2008; 72 (5) Article 98. 2008. 198. Anderson M, Cohen J, Hallock J, Kassebaum D, Turnbull JW. Med School Objectives Wri ting Group. Learning objectives for medical student education guidelines for medical schools: Report I of the Medical School Objectives Project Academic Medicine; 1999. 199. Cook DA, Dupras DM, Beckman TJ, Thomas KG, Pankratz VS. Effect of Rater Training on Reliability and Accuracy of Mini CEX Scores: A Randomized, Controlled Trial. J Gen Intern Med. 2009;29(1):74 79. 200. Linssen T, Van Dalen J, Rethans J relationship: experiences of simulated patients in suc cessive consultations. MEDICAL EDUCATION. 2007;41(9):873 878. 201. McKinley DW, Boulet JR, Hambleton RK. A work centered approach for setting passing scores on performance based assessments. Eval Health Prof. Sep 2005;28(3):349 369. 202. Barman A. Standard Setting in Student Assessment: Is a Defensible Method Yet to Come? Ann Acad Med Singapore. 2008;37:957 963. 203. Ben David MF. AMEE Guide No. 18: Standard setting in student assessment. Medical Teacher. 2000;22(2):120 130. 204. Wind LA, Van Dalen J, Muijt jens AMM, Rethans J J. Assessing simulated patients in an educational setting: the MaSP (Maastricht Assessment of Simulated Patients). MEDICAL EDUCATION. 2004;38(1):39 44. 205. Beck DE, Boh LE, O 'Sullivan PS. Evaluating Student Performance in the Experien tial Setting with Confidence. American Journal of Pharmaceutical Education 1995;59(3). 206. Brazeau C, Boyd L, Crosson J. Changing an Existing OSCE to a Teaching Tool: The Making of a Teaching OSCE. Academic Medicine. 2002;77(9).
24 2 207. Cicchetti DV, Shoinra lter D, Tyrer PJ. The Effect of Number of Rating Scale Categories on Levels of Interrater Reliability: A Monte Carlo Investigation. Applied Psychological Measurement. 1985;9(1):31 36. 208. Cox EP. The Optimal Number of Response Alternatives for a Scale: A Review. Journal of Marketing Research. 1980;17(4):407 422. 209. Preston CC, Colman AM. Optimal number of response categories in rating scales: reliability, validity, discriminating power, a nd respondent preferences. Acta Psychologica. 2000;104(1):1 15. 210. Symonds PM. On the Loss of Reliability in Ratings Due to Coarseness of the Scale. Journal of Experimental Psychology. 1924;7(6):456 211. Ried LD. Professor and Chair College of Pharmacy University of South Florida; 2010. 212. Adams CL, Glavin K, Hutchins K, Lee T, Zimmerman C. An Evaluation of the Internal Reliability, Construct Validity, and Predictive Validity of the Physical Therapist Clinical Performance Instrument. Journal of Physica l Therapy Education. 2008;22(2):42 50. 213. Jungnickel P, Kelley K, Marlowe K, Haines S, Hammer D. Addressing competencies for the future in the professional curriculum. Paper presented at: AACP Curricular Change Summit, 2009. 214. Eva KW, Cunnington JPW, Reiter HI, Keane DR, Norman GR. How Can I Know What I Don't Know? Poor Self Assessment in a Well Defined Domain. Advances in Health Sciences Education. 2004;9(3):211 224. 215. Langendyk V. Not knowing that they do not know: self assessment accuracy of thir d year medical students. MEDICAL EDUCATION. 2006;40(2):173 179. 216. Colthart I, Bagnall G, Evans A, et al. The effectiveness of self assessment on the identification of learner needs, learner activity, and impact on clinical practice: BEME Guide no. 10. M ed Teach. 2008;30(2):124 145. 217. Davis DA, Mazmanian PE, Fordis M, Van Harrison R, Thorpe KE, Perrier L. Accuracy of Physician Self assessment Compared With Observed Measures of Competence: A Systematic Review. JAMA. September 6, 2006 2006;296(9):1094 11 02. 218. Miller A, Archer J. Impact of workplace based assessment on doctors' education and performance: a systematic review. BMJ. 2010;341:c5064. 219. Lankshear AJ. Failure to fail: the teacher's delemma. Nursing Standard. 1990;4(20).
243 220. USLME. United States Medical Licensing Examination. http://www.usmle.org/, 2009. 221. Tochel C, Haig A, Hesketh A, et al. The effectiveness of portfolios for post graduate assessment and education: BEME Guide No 12. Med Teach. Apr 2009;31(4):299 318. 222. Long DM. Comp etency based residency training: the next advance in graduate medical education. Acad Med. Dec 2000;75(12):1178 1183. 223. Reznick RK, MacRae H. Teaching Surgical Skills: Changes in the Wind. New England Journal of Medicine. 2006;355(25):2664 2669. 224. Bh atti NI, Cummings CW. Competency in surgical residency training: defining and raising the bar. Acad Med. Jun 2007;82(6):569 573. 225. Meyers FJ, Weinberger SE, Fitzgibbons JP, Glassroth J, Duffy FD, Clayton CP. Redesigning residency training in internal me dicine: the consensus report of the Alliance for Academic Internal Medicine Education Redesign Task Force. Acad Med. Dec 2007;82(12):1211 1219. 226. Brown AK, O'Connor PJ, Roberts TE, Wakefield RJ, Karim Z, Emery P. Ultrasonography for rheumatologists: the development of specific competency based educational outcomes. Ann Rheum Dis. May 2006;65(5):629 636. 227. Dowson C, Hassell A, on behalf of the members of the Training Sub committee of the West Midlands Rheumatology S, Training C. Competence based assess ment of specialist registrars: evaluation of a new assessment of out patient consultations. Rheumatology. April 2006 2006;45(4):459 464. 228. Hubbard JP, Levitt EJ, Schumacher CF. An objective evaluation of clinical competence. N Engl J Med. 1963;272:1321 1328.
244 BIOGRAPHICAL SKETCH Charles Douglas was born and raised in Los Angeles California. He holds a n Executive M asters of Business Administration from the Peter F. Drucker Graduate School of Management at the Claremont Graduate University. He is the r ecipient of a 2007 American Association of Colleges of Pharmacy/Wal Mart scholar ship and the department Graduate Teaching Assistant Award in 2008. He is interested in evaluation of pharmacy clinical services and pharmacy administration.