<%BANNER%>

Design and Validation of a Virtual Human System for Interpersonal Skills Education

Permanent Link: http://ufdc.ufl.edu/UFE0022560/00001

Material Information

Title: Design and Validation of a Virtual Human System for Interpersonal Skills Education
Physical Description: 1 online resource (146 p.)
Language: english
Creator: Johnsen, Kyle
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: education, human, medical, virtual
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Training interpersonal skills requires availability, standardization, and diversity of practice interpersonal scenarios. To meet these requirements, we proposed a concept for an interpersonal simulator with a natural interface. We implemented a proof-of concept, called the IPS. We demonstrated that the IPS elicited user performance that was indicative of real world interpersonal skills, making the IPS a powerful tool for interpersonal skills education. To do this, we collaborated with healthcare educators to construct an interpersonal skills training experience using the IPS, called the VOSCE. The VOSCE simulated a patient encounter, and provided feedback on user performance. By conducting pilot studies with the VOSCE, we assessed and improved the usability and acceptability of the IPS for interpersonal skills training. Next, we demonstrated the validity of the VOSCE. An experiment was conducted, which compared user performance in the VOSCE to user performance in a similar real-life interaction. Results showed that performance was significantly correlated, implying that the VOSCE was a valid interpersonal skills evaluation tool. Finally, we demonstrated that changes to the natural interface of the IPS influenced user performance. A significant component of the IPS was the visual display. To provide insight into how changing the visual display component affected interaction with a virtual human, we designed two comparative user studies. The first study compared two immersive visual displays ? a large-screen projection display and a more immersive head-mounted display. Results suggested that higher immersion may impair users? ability to accurately reflect upon their own performance. Participants in the head-mounted display condition were less accurate in evaluating their use of empathy. Following up on results from the first study, the second study compared two non immersive visual displays?a plasma television and a smaller computer monitor. Results suggested that displays that enable life-size scale virtual humans enhance user performance. Participants in the plasma television condition were more engaged, empathetic, pleasant, and natural.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Kyle Johnsen.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Lok, Benjamin C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022560:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022560/00001

Material Information

Title: Design and Validation of a Virtual Human System for Interpersonal Skills Education
Physical Description: 1 online resource (146 p.)
Language: english
Creator: Johnsen, Kyle
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: education, human, medical, virtual
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Training interpersonal skills requires availability, standardization, and diversity of practice interpersonal scenarios. To meet these requirements, we proposed a concept for an interpersonal simulator with a natural interface. We implemented a proof-of concept, called the IPS. We demonstrated that the IPS elicited user performance that was indicative of real world interpersonal skills, making the IPS a powerful tool for interpersonal skills education. To do this, we collaborated with healthcare educators to construct an interpersonal skills training experience using the IPS, called the VOSCE. The VOSCE simulated a patient encounter, and provided feedback on user performance. By conducting pilot studies with the VOSCE, we assessed and improved the usability and acceptability of the IPS for interpersonal skills training. Next, we demonstrated the validity of the VOSCE. An experiment was conducted, which compared user performance in the VOSCE to user performance in a similar real-life interaction. Results showed that performance was significantly correlated, implying that the VOSCE was a valid interpersonal skills evaluation tool. Finally, we demonstrated that changes to the natural interface of the IPS influenced user performance. A significant component of the IPS was the visual display. To provide insight into how changing the visual display component affected interaction with a virtual human, we designed two comparative user studies. The first study compared two immersive visual displays ? a large-screen projection display and a more immersive head-mounted display. Results suggested that higher immersion may impair users? ability to accurately reflect upon their own performance. Participants in the head-mounted display condition were less accurate in evaluating their use of empathy. Following up on results from the first study, the second study compared two non immersive visual displays?a plasma television and a smaller computer monitor. Results suggested that displays that enable life-size scale virtual humans enhance user performance. Participants in the plasma television condition were more engaged, empathetic, pleasant, and natural.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Kyle Johnsen.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Lok, Benjamin C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022560:00001


This item has the following downloads:


Full Text

PAGE 1

DESIGN AND VALIDATION OF A VIRTUAL HUMAN SYSTEM FOR INTERPERSONAL SKILLS EDUCATION By KYLE JOHN JOHNSEN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1

PAGE 2

2008 Kyle John Johnsen 2

PAGE 3

To my Mother and Father, my inspiration for success 3

PAGE 4

ACKNOWLEDGMENTS I would like to acknowledge Dr. Benjamin Lok for being my research advisor and supervisory committee chair, for his continued mentorship and friendship in my academic career, and for the exceptional privilege to work with h im in his research. Dr. D. Scott Lind, Dr. Amy Stevens, Dr. Diane Beck, Dr. Juan Cendan, and Dr. Adeline Deladisma were critical to this work for their advice, and for providi ng the real-world application a nd end-users for our research. Thank you to Dr. Richard Ferdig, Dr. Paul Fishwi ck, Dr. Paul Gader, and Dr. Jorg Peters for being my supervisory committee members, and fo r their ideas and support in my research. Andrew Raij, Robert Dickerson, Cyrus Harrison, Aaron Kotranza, Brent Rossen, John Quarles, Harold Rodriguez, Xiyong Wang and the rest of the past and current members of the Virtual Experiences Research Group were invaluable for thei r assistance in this research, and for always being there to discuss ideas. I appreciate the University of Flor ida and Computer and Information Science and Engineering Department Faculty, and Staff for providing the financial and equipment support in this work. Also, I than k the University of Florida Alumni Graduate Program for the honor of receiving their fellow ship support during my first four years of research. Most importantly, I would like to ackno wledge my family, my mother Teresa Johnsen, my late father John Johnsen, my sisters Kelly Johnson and Jennifer Johnsen, and my fiance Erin Doyle for their love and support. You are the most important people in my life. The system described in this work is an ongoing project to understand and use virtual humans for real world benefit. While I am fu lly responsible for the design, development, and study of this system described in this work, many faculty members and students inside and outside of the University of Flor ida have contributed to the syst em and studies and continue to develop the system framework. 4

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ..........9 LIST OF FIGURES.......................................................................................................................10 ABSTRACT...................................................................................................................................12 CHAPTER 1 INTRODUCTION................................................................................................................. .14 1.1 Driving Issues..............................................................................................................14 1.1.1 Motivation: Enhance Interpersonal Skills Training............................................14 1.1.2 Challenges in Creating Effectiv e Virtual Human Experiences...........................16 1.2 Thesis Statement..........................................................................................................18 1.3 Overview of Approach.................................................................................................18 1.3.1 Interpersonal Simulation w ith a Natural Interface..............................................18 1.3.2 Virtual Human Experiences Using the IPS: The VOSCE...................................20 1.3.3 Usability and Acceptability of the VOSCE.........................................................22 1.3.4 Validity of the VOSCE........................................................................................22 1.3.5 Roles of Visual Display in Virtual Human Experiences.....................................23 1.4 Innovations...................................................................................................................24 2 REVIEW OF LITERATURE.................................................................................................26 2.1 Social Responses to Virtual Humans...........................................................................26 2.1.1 Theoretical Foundation: Social Responses to Computers...................................26 2.1.2 Using Virtual Humans to Elicit Social Responses..............................................27 2.1.3 Impact of Virtual Human Factors on So cial Responses to Virtual Humans.......29 2.2 Virtual Human Experiences.........................................................................................29 2.3 Learning with Interpersonal Simulations.....................................................................31 3 DEVELOPING THE VIRTUAL HUMAN EXPERIENCE..................................................34 3.1 Introduction..................................................................................................................3 5 3.2 Simulating Patient-Doctor Interaction.........................................................................37 3.3 Application: The VOSCE............................................................................................38 3.4 System: The IPS...........................................................................................................40 3.4.1 Natural Interface to Face-to-Fa ce Interpersonal Simulations..............................41 3.4.2 Interface Implementation.....................................................................................42 3.4.3 Simulation Implementation.................................................................................43 5

PAGE 6

3.5 Study I..........................................................................................................................49 3.5.1 Design..................................................................................................................49 3.5.1.1 Environment..............................................................................................49 3.5.1.2 Population..................................................................................................51 3.5.1.3 Procedure...................................................................................................51 3.5.1.4 Measures....................................................................................................52 3.5.2 Results.................................................................................................................53 3.5.2.1 Task performance......................................................................................53 3.5.2.2 Technology................................................................................................54 3.5.2.3 Application................................................................................................55 3.5.2.4 Debriefing..................................................................................................56 3.6 InterPersonal Simulator and VOS CE Changes and Improvements.............................60 3.6.1 Response Database Additions.............................................................................60 3.6.2 Speech Understanding.........................................................................................61 3.6.3 Tracking...............................................................................................................61 3.6.4 Virtual Objective Structured C linical Examination Changes..............................61 3.7 Study II .........................................................................................................................62 3.7.1 Design..................................................................................................................62 3.7.2 Study II Results...................................................................................................62 3.7.2.1 Task performance......................................................................................63 3.7.2.2 Technology................................................................................................63 3.7.2.3 Experience satisfaction..............................................................................64 3.7.2.4 Debriefing..................................................................................................65 3.8 Chapter Summary........................................................................................................67 3.9 Conclusions..................................................................................................................68 4 VALIDITY OF THE VOSCE................................................................................................69 4.1 Introduction..................................................................................................................6 9 4.2 System Implementation...............................................................................................71 4.3 Study............................................................................................................................74 4.3.1 Population and Environment...............................................................................75 4.3.2 Procedure.............................................................................................................77 4.3.3 Metrics.................................................................................................................78 4.4 Results........................................................................................................................ ..79 4.4.1 Interview Skills Checklist....................................................................................79 4.4.2 Patient Satisfaction..............................................................................................84 4.5 Limitations...................................................................................................................8 6 4.6 Chapter Summary........................................................................................................87 4.7 Conclusions..................................................................................................................88 5 IMPACT OF DISPLAY SYSTEM ON USER PERFORMANCE........................................89 5.1 Introduction..................................................................................................................8 9 5.1.1 Application Context.............................................................................................90 5.1.2 Motivation to Evaluate Visual Displays..............................................................91 5.1.3 Study Methodology.............................................................................................92 6

PAGE 7

5.2 Related Work...............................................................................................................93 5.2.1 Comparative Evaluations of Virtual Human Experiences...................................93 5.2.2 Impact of Display Systems on Presence and Copresence...................................94 5.3 Study I: Immersive Displays........................................................................................96 5.3.1 Design..................................................................................................................98 5.3.1.1 Application scenario and task...................................................................99 5.3.1.2 Population and environment......................................................................99 5.3.1.3 Measures..................................................................................................100 5.3.1.4 Procedure.................................................................................................103 5.3.1.5 Statistical methods...................................................................................104 5.3.2 Results...............................................................................................................104 5.3.2.1 Self-reported presence and copresence...................................................104 5.3.2.2 Self-evaluation and patient-evaluation....................................................104 5.2.2.3 Behavioral observations..........................................................................105 5.3.3 Summary and Discussion..................................................................................106 5.4 Study II : Non-Immersive Visual Displays.................................................................107 5.4.1 Differences between Study I and Study II .........................................................108 5.4.2 Design................................................................................................................110 5.4.2.1 Application scenario and task.................................................................110 5.4.2.2 Population and environment....................................................................111 5.4.2.3 Measures..................................................................................................112 5.4.2.4 Procedure.................................................................................................114 5.4.3 Results...............................................................................................................115 5.4.3.1 Self-evaluation........................................................................................115 5.4.3.2 Behavioral observations..........................................................................115 5.4.4 Summary and Discussion..................................................................................118 5.5 Limitations.................................................................................................................120 5.6 Chapter Summary......................................................................................................121 5.7 Conclusions................................................................................................................122 6 SUMMARY AND FUTURE DIRECTIONS.......................................................................123 6.1 Review of Results......................................................................................................123 6.2 Future Directions.......................................................................................................124 APPENDIX A SURVEYS...................................................................................................................... ......126 A.1 Technology Survey....................................................................................................126 A.2 Experience Satisfaction Survey.................................................................................127 A.3 Interview Skills Checklist..........................................................................................128 B RAW STUDY DATA...........................................................................................................132 B.1 Study Data for Section 3.4.........................................................................................132 B.2 Study Data for Section 3.5.........................................................................................133 7

PAGE 8

B.3 Study Data for Section 4.4.........................................................................................134 B.4 Study Data for Section 5.3.........................................................................................136 B.5 Study Data for Section 5.4.........................................................................................137 LIST OF REFERENCES.............................................................................................................138 BIOGRAPHICAL SKETCH.......................................................................................................146 8

PAGE 9

LIST OF TABLES Table page 4-1 The VOSCE/OSCE interview skills checklist...................................................................75 5-1 Comparison of the displays used in the study....................................................................98 5-2 Measures of copresence used in Study I..........................................................................101 5-3 Inter-coder reliability in judging criti cal moments (average measure intra-class correlation).......................................................................................................................115 9

PAGE 10

LIST OF FIGURES Figure page 1-1 User gestures to a life-size virtual huma n and asks, "Does it hurt here?" The lighted spheres support user tracking.............................................................................................19 1-2 Virtual humans in the first Virtual Ob jective Structured Clinical Examination (VOSCE)........................................................................................................................ ....20 2-1 Example text and selection based inte rface for interpersonal skills education..................32 3-1 (Right) A patient complains of abdominal pain. (Left) The doctor must interview the patient to get relevant information for a diagnosis............................................................35 3-2 The (right) student is di agnosing (left) DIANA, a patient with acute abdominal pain, while (middle) VIC observes.............................................................................................40 3-3 The IPS interface................................................................................................................42 3-4 Data flow diagram of IPS data processing.........................................................................44 3-5 Student asks, Does it hurt here?.....................................................................................45 3-6 User perspective correct viewing frustum.........................................................................48 3-7 Examination room in the Harrell Center where the IPS was installed..............................50 3-8 Results from the Technology survey on acceptability of the VOSCE for each item. Items were rated on the Likert scale: 1 (strongly disagree), 4 (neutral), 7(strongly agree)..................................................................................................................................55 4-1 Student interacts with the virtual human during the VOSCE............................................70 4-2 The system................................................................................................................. ........72 4-3 Study design............................................................................................................... ........74 4-4 Students begin their interview sessio ns by knocking on the door and entering the room...................................................................................................................................76 4-5 Significant (p<.005) correla tion in the overall score for VOSCE and OSCE sessions.....80 4-6 Interview skills checklist broken up into three areas.........................................................81 4-7 Scores for each subject area of the inte rview skills checklist are normalized between 0 and 1................................................................................................................................82 10

PAGE 11

4-8 Overall rank given to th e patient by participants in the MaSP survey separated by type of patient and study date. .........................................................................................84 4-9 Patient satisfaction as determined by the rank given by the participant is compared to the participants overall score as determined by the evaluator...........................................85 5-1 User interacts with a close, life-size virt ual human using either a fish-tank projection display (top) or head-mounted display (bottom)...............................................................97 5-2 Virtual human displayed on a plasma television (Top) and a monitor (Bottom)............109 5-3 Results for Moment 1 (Why arent I speaking to the doctor?).....................................116 5-4 Results for Moment 2 (Could this be cancer?).............................................................116 11

PAGE 12

Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DESIGN AND VALIDATION OF A VIRTUAL HUMAN SYSTEM FOR INTERPERSONAL SKILLS EDUCATION By Kyle John Johnsen August 2008 Chair: Benjamin Lok Major: Computer Engineering Training interpersonal skills requires availabi lity, standardization, and diversity of practice interpersonal scenarios. To meet these requireme nts, we proposed a concept for an interpersonal simulator with a natural interface. We implem ented a proof-of concept, called the IPS. We demonstrated that the IPS elicited user perf ormance that was indicative of real world interpersonal skills, making the IPS a powerful tool for interpersona l skills education. To do this, we collaborated with healthcare educators to construct an interpersonal skills training experience using the IPS, called the VOSCE. The VOSCE simulated a patient encounter, and provided feedback on user perfor mance. By conducting pilot studies with the VOSCE, we assessed and improved the usability and acceptability of the IPS for interpersonal skills training. Next, we demonstrated the validity of the VOSCE. An experiment was conducted, which compared user performance in the VOSCE to user performance in a similar real-life interaction. Results showed that performance was significan tly correlated, implying that the VOSCE was a valid interpersonal skil ls evaluation tool. Finally, we demonstrated that changes to the natural interface of the IPS influenced user performance. A significant component of the IPS wa s the visual display. To provide insight into 12

PAGE 13

how changing the visual display component aff ected interaction with a virtual human, we designed two comparative user studies. The firs t study compared two immersive visual displays a large-screen projection display and a more immersive head-mounted display. Results suggested that higher immersion may impair user s ability to accurately reflect upon their own performance. Participants in the head-mounted display condition were less accurate in evaluating their use of empathy. Following up on results from the firs t study, the second study compared two non immersive visual displaysa plasma television and a smaller computer monitor. Results suggested that displays that en able life-size scale virtual human s enhance user performance. Participants in the plasma television condition were more enga ged, empathetic, pleasant, and natural. 13

PAGE 14

CHAPTER 1 INTRODUCTION Virtual humans, computer generated characters that have human aesthetic qualities, are different from conventional interf aces. Virtual humans can comm unicate information to the user through social channels (e.g.,, speech, facial expressions, gestures, posture). Whereas many researchers are trying to understa nd how to leverage the communi cation power of virtual humans to enhance the effectiveness of existing com puter applications (e.g.,, tutoring systems), our research takes a different approa ch. We research a novel class of virtual human application, a virtual human experience, where the users task is the interaction with the virtual human. In this work, our goal was to enable virtual human experiences that had a realistic social impact on users, i.e. users gain experien ce for interactions with real hu man partners through interactions with the virtual human partners. This disserta tion presents my contributions towards that goal with the following: The InterPersonal Simulator (IPS), a system designed with a natural interface to virtual human agents. The application of the IPS to support medical interp ersonal skills education, one of the many application areas that would bene fit from virtual human experiences. Identification of the power and limitations of the IPS through pilot studies designed to assess and improve usabili ty and acceptability. The first validation for virtual human experiences to ev aluate the extent to which virtual human experiences elicit real world interpersonal skills. Understanding the impact of the visual display system on user performance. 1.1 Driving Issues 1.1.1 Motivation: Enhance Interpersonal Skills Training Fields such as the health care, military, business, and psychotherapy have had difficulties with the education, training, and re mediation of interpersonal skills [1], [2], [3], [4]. These 14

PAGE 15

issues have been addressed with mock experi ences (using actors), which have been largely successful [4], [5], [6]. However, there are si gnificant logistical issues with respects to: Availability : Providing the quantity of needed practice sessions Diversity : Providing a rich set of training scenar ios featuring physically, culturally, and racially diverse partners Feedback : Providing consistent, objective student ratings and suggestions for improvement Standardization : Providing the same experiences and opportunities for all students Cost-Effectiveness : Providing and maintaining facilities and personnel for actor-based training Virtual human experiences are a significant t ool to address these human-human training logistical issues. The power of virtual humans is that they are computa tional artifacts, enabling training opportunities th at are not possible using real humans. Availability and standardizat ion of virtual human experien ces could greatly exceed what could be provided with mock experiences. Virtua l human experiences could be used at any time, could be exactly duplicated, and t hus could provide the same experi ence at the same quality level to all students. Students could practice with the virtual human experience at their convenience as often as time allowed. Educators could ensure th at students were provided the same educational opportunities as other students independent of time and location, and could provide a consistent evaluation platform. Feedback, an essential component to learning, could also be enhanced with virtual human experiences. Virtual human experiences coul d log the events that took place during an interaction. By analyzing the event log, virtual human experien ces could provide users with a critique, providing praise, and offering suggestions for improvements. Virtual human experiences could always provide this feedback, and do so in a consistent manner for all 15

PAGE 16

students. In addition, the fee dback could be at the expert level, allowing students to get quality feedback without requiring an expe rt observer to be present. Diversity is perhaps the most compelling r eason to develop virtual human experiences. Virtual human experiences coul d simulate scenarios that st udents would otherwise never encounter in training, encounter too infrequentl y, or take place in a setting inappropriate for practice. Virtual humans could be modeled with a wide variety of human aesthetic characteristics, such as coloring of the skin and eyes, and physical deformities. Additionally, virtual humans could be programmed to demonstrate a variety of behavioral characteristics, such as eye movement disorders (e.g.,, lazy eye). Furthermore, virtual human experiences could simulate dangerous or exotic training environments (e.g.,, battlefield). Cost-effective training could be provided by virtual human experiences. This is largely because of the re-usability of the experiences. Creating a scenario (as opposed to actor training) is a one-time cost. In addition, the aesthetics and behavior of th e virtual human are independent of each other, meaning that scenario changes and maintenance (e.g.,, chan ging aesthetic qualities (e.g.,, skin tone, body type ) are inexpensive. We are not advocating the replacement of mock experiences with virtual human experiences. However, we do propose that virtua l human experiences could effectively augment mock experiences. This approach would incorporat e both the high fidelity interaction possible in mock experiences, and the logistical be nefits of virtual human experiences. 1.1.2 Challenges in Creating Eff ective Virtual Human Experiences To optimize the benefit of virtual human experien ces, the goal has been to get users to take their virtual human partner serious ly, i.e. to treat the virtual human as a real human. However, using virtual humans as partne rs in interpersonal skills tr aining has involved significant challenges in modeling realistic virtual hum an aesthetics, behavior, and interaction. 16

PAGE 17

The aesthetics of some virtual human models have approached human-level realism; however, graphics hardware has not yet been able to render these models in real-time. In addition, the virtual humans have needed to be re ndered onto displays that reduce the realism of the virtual human. These imperfect virtua l humans may face the uncanny valley phenomenon from robotics [7]. As virtual humans approach human level realism, they may become eerie and disturbing to users. Generating realistic virtual human behavior has also been challenging. The state-of-the-art in artificial intelligence is not able simulate normal adult human-level intelligent behavior [8]. Further complicating virtual human behavior, the driving mechanisms behind real human behavior are still largely unknown. Providing realistic interaction capability to virtual humans has also been a difficult challenge. Virtual humans need to be able to sense the user. While sensing and recognition technology for all five human senses exists, these technologies are limited when compared to the fidelity of real human senses. To achieve hi gher fidelity, many technologies increase user encumbrance (e.g.,, headset microphone, active marker-based tracking system). By intertwining realistic app earance, behavior, and interact ion capabilities, a natural interface could be created between the user and the underlying virtual human simulation. Previous research had suggested that a natura l interface would reduce th e users cognitive load, allowing the user to concentrate more on applic ation goals, and less on the computer interface [9], [10] Natural interf aces had also been shown to allow trai ning to transfer directly to the real experience [11], [12]. Natural interfaces for virtual human experiences present the unique issues discussed above, and therefore have challenges in terms of usability, acceptability, and validity. Thus, the specific 17

PAGE 18

challenge we have addressed is to design and dem onstrate a natural interf ace to a virtual human experience that has usability acceptability, and validity. 1.2 Thesis Statement We introduce the concept of an interpersonal simulator. Similar to a flight simulator, which simulates the experience of flying a plane, an interpersonal simulator simulates the experience of being with people. Thesis statement : An interpersonal simulator with a natural interface to virtual human agents elicits performance that is predictive of the users real-world interpersonal skills. 1.3 Overview of Approach Our approach to demonstrating the truth of this thesis statement was the following. We first designed an interpersonal simulator with a natural interface, the InterPersonal Simulator (IPS). Next, we applied the IPS to interpersonal skills training in health care education. Using healthcare students as participants, pilot st udies assessed and improved the usability and acceptability of the IPS for the application. Then a validation experiment tested the thesis. Finally, experiments on the visual display syst em demonstrated that changing the natural interface influenced user performanc e in interpersonal skills training. 1.3.1 Interpersonal Simulation with a Natural Interface Beginning with the interaction re quirements of interpersonal skills training, we designed a working prototype, the Interpersonal Simulator (I PS). The IPS was designed heuristically basedon available, well-known interface, modeling, and simulation technologies (e.g.,, speech recognition, user tracking, virtual reality renderin g, virtual human models and animation tools, and conversational agents). 18

PAGE 19

Figure 1-1. User gestures to a li fe-size virtual human and asks, "Does it hurt here?" The lighted spheres support user tracking The design of the IPS was novel because the set of actions the user performed, and what the user perceived was consistent with a similar real human experience The interface intentionally did not su pport artificial m eans of interaction, such as providing a typing or command-based interface. The IPS enabled users to naturally speak and gesture to virtual human agents. Further, the interface was symmetrical. The virtual humans were aware of the presence of the user, and naturally spoke, gestured, and e xpressed emotions (verbally a nd non-verbally). Finally, the interaction was directed at and came from life-size virtual humans. As seen in Figure 1-1, the user could interact naturally with a virtual human projected on a wall without the aid of specia lized input devices. Only passive sensors (e.g.,, headset microphone, tracking markers), which did not require us er attention, were used as input. Details of the IPS components are discussed in Chapter 3. 19

PAGE 20

1.3.2 Virtual Human Experiences Using the IPS: The VOSCE To serve as a test-bed for studying and improvi ng the natural interface of the IPS, a virtual human experience was created using the IPS to train interpersonal sk ills in healthcare. This was because 1) training in interpersonal skills showed significant improvement in real-world patient care [13], [14] and 2) poor interpersonal ski lls were positively correlated with medical malpractice litigation [15]. As a result, it has b een increasingly necessary to teach and evaluate medical students on thes e skills [16], [17]. Figure 1-2. Virtual humans in the first Virtual Objective Structured Clinical Examination (VOSCE). On the left is the virtual patient DIANA, who complains of abdominal pain. On the right is the virtual instruct or VIC, who provides constructive, expert feedback to the studen t after the interaction. Students initially learn how to interact effec tively with patients through reading material and lectures. To reinforce this knowledge, student s apply it in practical ex periences. However, 20

PAGE 21

providing real experiences for students is ofte n impractical (e.g.,, for hundreds of students), prohibitive (e.g.,, unwilling patients), expensive (e.g.,, supervising doctors time), unethical (e.g.,, cultural diversity training) or unavailable (e.g.,, uncommon patie nt cases). As discussed in Section 1.1, mock experiences address many of these issues, but not all. To enhance the training process, a virtual human experience was developed to augment mock experiences. Based on its similarity to the mock experiences used for testing medi cal students clinical skills, Objective Structured Clin ical Examinations (OSCEs) [5], the virtual human experience was named the Virtual OSCE (VOSCE). Two virt ual humans used the first scenario for VOSCE are shown in Figure 1-2. One virtual human played the role of a patient, while the other played the role of an instructor. Th e patient for the initial case was DIANA (Digital Animated Avatar). DIANA came into the clinic complaining of pa in in her lower right side. In medical terminology, this case is known as Acute Abdominal Pain (AAP). The goal for the medical student was to perform the initial stages of a patient interview. This consisted of determining the chief compla int (e.g.,, why the patient came to the clinic), gathering patient history information (e.g.,, soci al history, sexual hist ory), and providing a preliminary diagnosis and treatmen t plan. Afterwards the virtual human instructor, VIC (Virtual Interactive Character) provided cons tructive, expert feedback to the participant in the form of praise for appropriate actions, a nd suggestions for improvement. The AAP case was chosen because it was a common training scenario used with standardized patients. It also emphasized the patient interview por tion of a clinical encounter (as opposed to the physical examination). Fina lly, the case could be simulated by a mock experience. This meant that a mock experience co uld be used as a standard of comparison. 21

PAGE 22

1.3.3 Usability and Acceptability of the VOSCE Usability and acceptability are important cons iderations for user interfaces, and are especially important for novel user interfaces and applications, such as the natural interface of the IPS and the VOSCE application [18]. To assess and improve the usability and acceptability of the IPS and VOSCE, our approach was end-user evaluation. Medical students were recruited as participants. In addition, the VOSCE was incorporated into a mock-experience training facility for study. This supported the ecological validity of the studies, and increased immersion. Two studies were conducted where medical students interacted in the VOSCE, and afterwards reported their overall impressions of the experience. The studies revealed that the VOSCE was accepted by medical students as an interpersonal skills education platform. Specifically, speech interaction, life-size virtual humans, and th e feedback from the virtual instructor were highlighted as important aspects of the IPS. Participants commented that the VOSCE was a significant addition to current interpersonal skills training tools. Finally, the VOSCE appeared able to differentiate between students with differe nt experience levels, suggesting validity for evaluation. Further details on the usability and ac ceptability studies is given in Chapter 3. 1.3.4 Validity of the VOSCE Validation was an important step toward s integrating the VOSCE into healthcare education. Similar validation steps had been co nducted for other medical simulators [19], [20], enabling widespread adoption of the simu lators. For the VOSCE, validation meant demonstrating effectiveness as an environment for interpersona l skills training. An experiment was conducted to provide validation support for the VOSCE. The method of validation was to compare stud ent performance in the VOSCE to same-student performance in a validated mock experience (an OSCE). In the OSCE, students were evaluated by expert 22

PAGE 23

observers using a checklist of in terpersonal skills (Appendix A.3). The same metric was used to evaluate students in the VOSCE. The validity of the VOSCE was the extent to which the students scores correlated, a nd had the same magnitude. For the experiment, the virtual human experien ce was integrated dire ctly into an OSCE practice session as part of a medical education course. Participant pe rformance in VOSCE was compared to the participant performance in th e OSCE. There was a significant correlation (r(31)=.49, p<.005) between the overall score in th e VOSCE and the overall score in the OSCE. This meant that the interaction skil ls used with a virtual human tran slated to the interaction skills used with a real human. Students automatica lly exhibited skills learned for real human experiences in virtual human experience. For educators, this meant that the VOSCE could be used to identify students who needed additiona l practice without taking up the expensive OSCE resource. In addition, it demonstr ated that the natural interface of the IPS elicited performance predictive of users real-wor ld interpersonal skills. 1.3.5 Roles of Visual Display in Virtual Human Experiences Given the usable, validated VOSCE, we wanted to understand how changes to the natural interface affected user performance. To test the relationship between interface and user performance, we evaluated the choice of visual display system. Alternative visual displays had been proposed for the IPS (e.g.,, head-mounted displa ys, large screen televisions, and monitors); however, little empirical eviden ce existed to select one displa y over the other. In fact, understanding how visual displays impacted us er perception, and ultimately cognition and behavior had been identified as a important open challenge in interface research [21]. To evaluate the choice of display system for the VOSCE, we conducted two user studies. The first study compared two immersive displays, a head-mounted (HMD) and a fish-tank projection (FTPD). We analyzed observer ratings of individu al medical and pharmacy students 23

PAGE 24

interviewing a virtual human patient. In additi on, we analyzed students self-evaluations. Results showed that participants in the FTPD condition were more critical of their own performance. They were also more accurate in their self-evaluation as compared to observed behavior. This suggests that the type of display is an important factor in user cognition about an experience. While more immersive than the FTPD, the HMD was also more surreal, and unfamiliar. Thus, many participants did not reflect upon the experience as real. The second study looked at non-im mersive displays, large plasma television (PTV), and small computer monitor (MON). Results showed th at participants in the PTV condition (life-size virtual human) had increased performance. They were more engaged, empathetic, pleasant, and natural than participants in th e MON condition (small-scale virtual human). This suggests that life-size scale virtual humans are treated more like real humans. In addi tion, participants in the PTV condition were also more cri tical of their own performance. Combined with the similar effect observed in the immersion study, this is st rong evidence that the t ype of display affects user cognition. Overall, the results from both studies suggest that the visual disp lay component of the natural interface is an important consideration fo r virtual human experiences used for training. Under different display configurations, users perform differently, they remember the experience differently, and thus they train differently. 1.4 Innovations Our innovations were in designing, validating, and understanding an interpersonal simulator for interpersonal skills training. We took an innovative approach to designing the IPS, using a natural interface that f eatured realistic virtual humans, user tracking, large-screen displays, speech interaction, and gesture interaction. We applied the IPS to healthcare education, creating a virtual human experience called the VOSCE. The VOSCE provided the first forum 24

PAGE 25

for medical students to virtually practice and obt ain feedback on their interpersonal skills with patients. By conducting usability and acceptabi lity studies of the VOSCE with healthcare students, the VOSCE moved from being a prototype to a system that has been widely accepted and successfully used by over 400 healthcare stud ents internationally. The VOSCE is the first virtual human experience to our knowledge to be used significantly by end-users and to provide benefit in the real world. Validity has also been shown for the VOSCE as test of interpersonal skills. By comparing same-student interactions in existing mock expe riences (with trained actors) and virtual human experiences in the IPS, we f ound a significant correlation in st udents evaluated interpersonal skills. An interaction with a virtual huma n in the IPS predicted students real-world interpersonal skills. Building upon previ ous results that showed users have reactions to virtual humans similar to from real people, this result showed that users have similar performance with virtual humans. Finally, we provided insights in to how user performance is influenced by changing the natural interface design. We compar ed alternatives to an importan t component of the interface, the visual display. We found that large-screen displays, which afforded life-size virtual humans, enhanced user self-reflection and performance relative to both head-mounted displays and smallscreen displays. 25

PAGE 26

CHAPTER 2 REVIEW OF LITERATURE The focus of this chapter is on previous resear ch relevant to the gene ral goal of this body of work, affecting users with virt ual humans. Other related work is given in chapters 3-5, which deal with specific component s of this dissertation. The chapter is divided into thr ee sections. The first section discusses the impact of virtual humans on users. It describe s theoretical foundations and user studies pertaining to how and why users respond socially to virtual humans. The sec ond section discusses existing applications-oriented virtual human experiences that are similar in spirit to the IPS. It compares and contrasts existing system architectures and obj ectives to the IPS. Th e last section discusses the pedagogy involved in interperso nal skills training with simulati ons. It fits the IPS into the existing pedagogical framework developed for inte rpersonal skills training in medicine and similar fields. 2.1 Social Responses to Virtual Humans 2.1.1 Theoretical Foundation: Soci al Responses to Computers Nass et al. have presented multiple studies, which suggested that social responses to computers are mindless [22], [23]. They claimed that mindless soci al responses to computers are pervasive, unaffected by computer expertise or ot her human factors. In user studies, they found that, despite conscious denial that the computer was in any way a social actor, controlled by a human, or associated with a programmer, users demonstrated social behavior, applied social stereotypes, and had social expe ctations of the computer. To elicit such mindl ess actions, only minimal social cues were necessary. For example, a study was presented that showed that participants applied gender stereotypes to a co mputer program. The program used a dominant female or male voice to evaluate the user. Cons istent with gender stereo types, participants found 26

PAGE 27

the evaluation more compelling from the male voi ced computer. Further, users were entirely unaware of the application of stereotypes to computers. Computer programs with minimal social cues elic ited realistic social re sponses from users. Thus, it was suggested that users would have powerful social responses to virtual humans. This was because virtual humans had the full complement of verbal and non-verbal social cues. However Nass et al. pointed out that, while perfect virtual humans were likely to elicit the strongest social responses virtual human technology was not pe rfect [23]. Real -time, realistic appearing virtual humans have been demonstr ated [24]; however, simulations governing the virtual humans autonomous cognition and behavi or were not yet (or may never be) realistic enough to fool a human judge [8]. In most computer applications, user expectations of humanlike behavior are low, and hence users have unexp ected subconscious reactions to them when the computer exhibits human-like behavior. How do user expectations and reactions change when the user is interacting with a virtual human? This question establishes the foundation for human-virtual hu man interaction research. Researchers have sought to identify the factors that influence the type and power of social responses elicited by virtual humans. System factors (e.g.,, displays, track ing lag, input devices), virtual human factors (e.g.,, appearance, behavior ), environment factors (e.g.,, setting, outside distractions), and human factors (e.g.,, gender, ex pertise), may all influe nce user responses to virtual humans. For the virtual human experience in this work, we were particularly interested in realistic responses the responses that a similar real human experience would be expected to elicit. 2.1.2 Using Virtual Humans to Elicit Social Responses Early work on human-virtual human interact ion has shown that people have powerful, realistic responses to virtual humans. This has been particularly eviden t in an emerging field 27

PAGE 28

called virtual reality exposure therapy (VRET), wh ere virtual reality has been used to augment real exposure therapy for psychological conditions [25]. Virtual humans are powerful additions to the virtual environments. Many researchers had demonstrated that social responses associated with the common phobias related to social anxiet y (e.g.,, fear of public speaking) are elicited by virtual audiences [3], [26], [27], [28], [29]. Fu rther, social anxiety could be manipulated by the type of audience [30]. Positive, friendly audiences reduced anxiety and negative, unfriendly audiences increased anxiety. Virtual humans could also elicit powerful respon ses even when they were not the focus of attention. In treating nicotine addiction, Bordnick et al compared different virtual scenes to understand which scenes elicited the highest crav ings [31]. It was found that cravings were highest when the virtual environment was a social situation with virtual humans. Smoking was often seen an enjoyable social activity, and the addition of virtual humans provided the social triggers to increase cravings. Outside of VRET, others have replicated real world social phenomena with virtual humans. Using virtual humans Zanbaka et al. replicated psychological studies on social inhibition [32]. Social inhibition occurred dur ing complex tasks when a virtual human was watching the participant. Bailenson et al. re plicated proxemics effects (distances maintained during social interactions) with virtual humans [33]. The distance maintained between the participants and the virtual human s matched that of real social interactions. Slater et al. replicated the Stanley Milgram obedience expe riments using virtual humans [34]. In the experiment, participants administered increasingly powerful electrical shocks to a virtual human when the virtual human answered questions incorrectly. Similar to Milgrams experiment, they found that participants often refused to give shoc ks, delayed in giving the shocks, and that being 28

PAGE 29

able to see both the verbal and non-verbal respon se, as opposed to just the verbal response (text), of the virtual human made the user response more powerful. 2.1.3 Impact of Virtual Human Factors on Social Responses to Virtual Humans Comparative experiments have found that soci al responses to virt ual humans could be manipulated by adjusting virtual hu man factors, particularly gaze and appearance. Fukayama et al. used a two state Markov model that controlled the amount, durati on, and point of interest of a virtual agent [35]. They found that they could control the users impre ssion of the virtual agent by setting the parameters of the model to those found to create si milar impressions between real humans. In related work, using a teleconferencing application, Garau et al. found that an avatar whose gaze was consistent with conversational fl ow was more beneficial than an avatar who randomly gazed at the user [36]. A later study by Garau et al. found a significant interaction effect on copresence (the phenomenon of feeling with another person) between visual realism and gaze in an immersive collaborative virtual envi ronment. They suggested that it is important to match avatar visual realism to gaze behavior realism [37]. Bailenson et al. extended this result to agents, finding that self-report (copresence ques tionnaire), cognitive (memory of avatar), and behavioral (interpersonal distance) markers of copresence were highest when visual and behavioral realism were matched [33]. 2.2 Virtual Human Experiences Others have created research systems simila r to the IPS. The systems feature virtual humans that have a realistic (as apposed to ca rtoonish) appearance and animation. In addition, the user is a primary input to the virtual hum an simulation. Actions taken by the user re expected to influence the virtual human, and elic it responses from the virtual human that are relevant to the user actions. Th is is in contrast to the system s discussed in section 2.1.3, where the user was mostly a passive observer, and the virtual human was following a pre-scripted 29

PAGE 30

sequence. The added dimensions of interaction increased the potential applications of virtual humans. Researchers at RTI International have de veloped Responsive Virtual Human Technology that has been applied to skills training for clinical examinations [38], informed consent interviews [39], police witness interviewing [40], and battlefield operations [41]. Users interacted with virtual humans displayed on a standard PC monitor. A speech interface supported natural verbal interacti on with the virtual humans. While the systems did not afford natural non-verbal interaction, selection-based in terfaces afforded artificial non-verbal interaction. In the clin ical examination skills training appl ication, users conducted the interview portion by talking, and then switched to the selection interface to perform the physical examination. Immersive virtual human systems have been cr eated by researchers at the University Of Southern California Institute for Creative Technologies (ICT). An example is the mission rehearsal exercise system (MRE)[ 1]. In the MRE system, users could hold conversations with virtual human soldiers, enemy combatants, and civilians displayed us ing immersive projection technology. Another example was is Sergeant Black well, a demo system where a user can talk to a life-size virtual human standing in a doorwa y [42]. The Blackwell demo uses a see through screen that makes the virtual human a ppear at life-size, floating in space. The IPS is different from these systems in so me important ways. First, the IPS affords both verbal and non-verbal comm unication with virtual humans, and provides tracking technology so the virtual human can be aware of the users physic al location (Chapter 3). In addition, the IPS has only natural interface featur es, explicitly avoiding any selection based 30

PAGE 31

interfaces. Furthermore, the IPS uses virtual real ity rendering and displays. The display system is an important part of creating re alistic interactions (Chapter 5). Other virtual human systems have had featur e sets similar to the IPS, and focus on a different type of dialog. The most common types are informati on seeking and storytelling. In information seeking dialogs, the virtual human assists the user. Two notable systems are REA, and MACK. The REA system communicates inform ation about real estate [43]. The MACK system is an interactive information kiosk offe ring directions around a university campus [44]. Interactive storytelling systems also featured di alogs with virtual humans. Applications of storytelling systems include museum guides [45], and playing partners for autistic children [46]. In these applications, the virtua l human typically dominates the dialog. The dialog consists of the virtual human telling a story with occasional input from the user. The user directs where the story leads. These systems have supported speech and gesture interaction, and have used large screen displays to create life-size virtua l humans. However, the intent has not been to create a virtual human experience where the user treats the virtual human as a real human. Rather, interaction techniques from real-human dial ogs are used to make interacti ons with the virtual human more successful. 2.3 Learning with Interpersonal Simulations Pedagogical objectives are often specific to th e particular field, although learning can be generalized into three domainscognitive [47], affective [48], and psychomotor [49]. The cognitive domain is associated with the acquisition of knowledge and problem solving. For interpersonal skills, this was the knowledge of what should be communicated, and how it should be communicated. The affective domain is associ ated with managing feelings, attitudes, and developing values (e.g.,, medical ethics). For in terpersonal skills, this is showing respect for 31

PAGE 32

others, managing impressions, and conveying unde rstanding. Finally, the psychomotor domain is associated with motor-skills training. This means learning to integrate physical movement and perception. For interpersonal sk ills, this is detecting non-verb al behavior, and managing ones own non-verbal behavior. Figure 2-1. Example text and selection based interface for interpersonal skills education. Existing interpersonal skills training tools supported learning in the cognitive domain. They typically use selection-base d input, with text, still-motion, or video based responses [50], [51]. Figure 2-1 shows an example of a text and selection based interface. Using such an interface, students could learn decisi on-making, and problem solving skills. The affective domain had received less atte ntion from computer-based learning, as computers have not typically been intended to process or display emotions and motivations, although this is changing [52]. One of our goals was to be able to identify users who did not treat the virtual human with the respect typica lly given to a real hum an. The virtual human should naturally elicit this resp ect, engaging the user, and r eciprocating the respect. 32

PAGE 33

The psychomotor domain requires supporting natura l, real-time interaction with the virtual human because the best known way to lear n a psychomotor skill was through practicefeedback-practice-feedback etc [4]. Resear ch in this area is active, with many systems supporting speech and gesture interac tion [43], [53]. One of the n eeds of this research is to validate such methods [54]. 33

PAGE 34

CHAPTER 3 DEVELOPING THE VIRTUAL HUMAN EXPERIENCE This chapter describes the development of the IPS system and pilot studies on usability and acceptability. This work was published in the proceedings of the IEEE Conference on Virtual Reality 2005 [55], and an extended version was published in the journal Presence: Teleoperators and Virtual Environments [56]. Significant details have been added relating to the system architecture. Personal contributions : I developed overall system architecture, which integrated the various components of the system, as well as the ma jority of the individual software components. The individual components I developed were the rendering, agent, and gesture systems. I also designed and conducted the user studies, and performed the data analysis. Collaborators : Andrew Raij enabled VIC to give students feedback on their performance and assisted with user studies. Cyrus Harrison, Jonathan Jackson, and Min Shin developed the tracking algorithms used in this particular wor k. Robert Dickerson assisted with user studies, and system testing. Medical coll aborators, Amy Stevens, D. Sc ott Lind, and Jonathan Hernandez provided background information on the medical scenario, and recruited medical students for user studies. Relevance to thesis : The thesis statement begins, An interpersonal simulator with a natural interface to virtual human agents Prior to this work, no virtual human experience had afforded the degree of naturalness we we re proposing (unprompted speech, gestures, lifesize aware virtual humans) and for as complex an a pplication (interpersonal skills training). It was uncertain how such a system could be created that end-users could use, and would accept as an interpersonal sk ills training tool. 34

PAGE 35

3.1 Introduction Doctor, I have a pain in my side! Please make it go away! Training future healthcare practitioners on how to diagnose and handle this common patient complaint, called Acute Abdominal Pain (AAP ), is a challenge for healthcare educators. Typically, interpersonal skills education consists of reading material and lectures, and then includes mock experiences (actor s trained to represent specific conditions, see Figure 3-1). Later, an apprenticeship model is employed where the student observes an expert, and then gradually takes over. Current methods for learning interpersonal skills have repetition experience diversity quality control (keeping the experience consis tent across students), and effectiveness concerns. Figure 3-1 (Right) A patient compla ins of abdominal pain. (Left) The doctor must interview the patient to get relevant information for a diagnosis. 35

PAGE 36

Researchers have been exploring the benefits of simulating real-world social situations with virtual humans. USCs CA RTE group applied intelligent agen ts to pedagogical and training applications[57]. The ICT group created experien ces to train military pe rsonnel in interpersonal leadership [1]. Similar in spirit to this work, RTIs Virtual Standardized Patient is a commercial virtual character system for trai ning medical students [38]. Students interact with 3D virtual patients displayed on a monitor wh ile using natural speech (with natural language processing), and a mouse-based sele ction interface. We aimed to add to this research by provi ding our experiences in studying the following: Could a more natural interface (such as inco rporating life-size displays, head-tracked rendering, and gesture recognition) to virtual humans be created that was usable and acceptable by end-users for inte rpersonal skills training? How did end-users act in a natural so cial experience with a virtual human? To answer these questions, we designed and implemented a system, the InterPersonal Simulator (IPS), and the test-b ed application, the Virtual Ob jective Structured Clinical Examination (VOSCE). The IPS enables students to interact with a life-size virtual human using natural speech and gestures. The VOSCE is a tr aining experience where students interact with a virtual human patient. An observing virtual expe rt (also a life-size vi rtual human) provides scenario information and gave immediate f eedback on the students performance. We expected that the natural interface of the IPS facilitat ed effective training, teaching, and testing of interpersonal skills. We discuss the design of the IPS for that purpose, and report quantitative and qualitative results from pilot studie s with medical students. Results showed that students used the IPS effectively, and accepted the IPS as a tool for teaching and training. 36

PAGE 37

3.2 Simulating Patient-Doctor Interaction The IPS system and VOSCE application were derived from discussions with medical collaborators. We asked the question: how could life-size virtual humans provide benefit to medical education? Whereas many researchers were exploring virtual humans for medicine from a surgical or anatomical perspective, our medical collabora tors saw the potential of virtual humans in simulating early patient-doctor encounters. The interaction between patients and doctors during early diagnosis was identified as an interpersonal scenario 1) which could be simulated despite the limits of current technology, 2) where practice was essential, 3) where a natural interface was important, and 4) which benefited from immediate feedback. Early education of patient-doc tor interpersonal skills involve s overcoming the anxiety of interacting with patients, and le arning efficiency and rigor in following the well-defined patient interview process. This process includes asking a set of core questions, listening intently to patient responses, and finally developing a list of possible diagnoses and a treatment plan. We believed that such a process could be supported by current technology. As technology improved, high-level components of patient-d octor interaction, such as grie f counseling, could be trained. Patient-doctor interaction requires substantial first-hand experience to become proficient. However, current approaches have provided in sufficient opportunities for experience and a limited variety of scenarios. The result has been that many medical students do not have sufficient practice in interpersona l skills when they reach the real patients bedside. Increasing practice with traditional methods (e.g.,, mock experi ences) is costly (e.g.,, instructor time, hiring more actors), difficult (e.g.,, cultu ral diversity training) or impo ssible (e.g.,, simulating certain patient complaints, such as an eye movement disorder). By augmenting interpersonal training 37

PAGE 38

with simulated patient encounters, students would get more experience, and reduced cost. As an added benefit, mock-experiences could focus on higher-level interperso nal skills training. It has been found that practici ng patient-doctor intera ction in mock expe riences results in improved patient-doctor interpersonal skills [ 58]. Closely simulating the mock-experience process would be essential. Thus, a natural in terface with life-size virtual humans and natural interaction would be important. An informal survey of a sample of third year medical students attend ing the University Of Florida College Of Medicine highlighted th at, although students felt they got adequate instruction in interpersonal skills, they received little feedback on their re al time performances. An interpersonal simulator could provide obj ective feedback and information during, or immediately after the encounter with the virtual human patient. 3.3 Application: The VOSCE We designed the VOSCE based on the Objective Structured Clinical Examination (OSCE). An OSCE uses a mock experience to teach, trai n, and evaluate health care students clinical skills, including their interpersonal skills [5]. St udents are required to treat the OSCE as a real patient encounter. They read the chart outside the patients door, en ter the room, spend 10 minutes for the patient interview, and then le ave. During training and teaching sessions, the student can go back into the room to receive s ubjective feedback from the patient or teacher (e.g.,, did they do a good job?). During evalua tion sessions, experts (or in some cases the trained actors) evaluate students on their interpersonal skills [59]. The primary difference between the VOSCE and the OSCE was the virtual humans. DIANA (DIgital ANimated Avatar), a female-looki ng virtual human, played the role of the patient. VIC (Virtual Interactiv e Character), a male-looking virtua l human, played the role of an 38

PAGE 39

observing expert, and gave an objective evaluation of the student at the end of the interaction (See Figure 3-2). VOSCE Scenario: The scenario involved a 19-year-old Caucasian female with a complaint of Acute Abdominal Pain. Acute a bdominal pain (AAP) was defined as follows: The term the acute abdomen refers to the presence of an acute attack of abdominal pain that may occur suddenly or gradually over a period of several hours. The patient with this symptom complex may confront the surge on, internist, pe diatrician, and obstetrician, creating a problem in clinical diagnosis requiring an immediate or urgent decision re garding the etio logy (cause) and method of treatment [60]. AAP was chosen because it was one of the most common ailments encountered by doctors. It was also a basic scenario in patient-doctor interac tion and interpersonal skills education. The process was as follows. The doctor begins the di agnostic process by asking the patient a series of questions about the pain (history of present illness or HPI). At this stage, the doctor is trying to ascertain more information, su ch as the pains location and character, symptoms exhibited, family history, current medication, and aggravati on if certain motions ar e performed. Sample questions include What brought you into the clinic today?, How long have you had the pain?, and On a scale from 1 to 10, please rate the pain. The patients responses will guide the doctor down different routes of questioning. The doctor evaluates the patients response, gestures, and physical and auditory cues, such as winces of pain, weight, posture, difficulty in making instructed motions, or pointing to specif ic areas. Based on asking the appropriate questions and evaluating the answ ers, treatment options can vary from immediate surgery to observation. Medical educators indicated that common mistakes were omitting questions ( e.g.,, sexual history), incorrectly asking questions (e.g.,, pointing to the wrong quadrant of the patients abdomen), or not evaluating and noting the non-ve rbal cues from the pa tients gestures and 39

PAGE 40

verbal responses ( e.g.,, recurring cough). This often leads to incorrect or partially correct diagnoses. VIC Virtual Instructor DIANA Virtual Patient Medical Student Figure 3-2. The (right) student is diagnosing (left) DIANA, a patie nt with acute abdominal pain, while (middle) VIC observes. The colored headset is for head tracking. 3.4 System: The IPS A patient-doctor encounter similar to the VOS CE had never been simulated by a computer system before. Most existing applications in medicine used standard input (e.g.,, typing, selection) and output (e.g.,, text, pictures, and video) modalities [2]. Despite trying to make these applications have a narrative style, studies have demonstrated that these approaches are seen as artificial and restrictive by students and educators [61]. In addition, all existing simulated approaches involved the user interac ting with the application on a standard desktop computer outside of the normal training environm ent. This limited the training benefit of the 40

PAGE 41

applications. Existing applica tions supported the decision maki ng process, but did not support the first-hand experience of physically walki ng into an examination room and interviewing a patient. 3.4.1 Natural Interface to Face-to-Face Interp ersonal Simulations The goal of the IPS system design was to support th is type of experience; to create a virtual human experience that was similar to a face-to -face encounter with another real person. The driving design principle was that the user should interact with virtual human naturally. There would be a natural interface. Users would perceive a realistic-looking virt ual human in the real environment and interact with the human th rough natural interaction modalities (e.g.,, speech, gesture). Underlying this natural interface would be a human agent simulation capable of generating symmetrical (i.e. speec h, gesture), realistic, appropri ate responses to user actions. We focused first on the natural interface. Th e interface dictates how a user may interact, and what the user will perceive. Thus, the interface establishes the requirements of the simulation. By building and studying an interpersonal simulator with a natural interface, we could understand how users would respond to realistic interpersonal simulations and how we could improve those simulations. The design of the IPS interface is shown in Figure 3-3. Cameras unobtrusively captured visual information, including user position and gestures. A headset microphone reliably captured audio information, i.e. user speech. Speakers a nd a large-screen projection display presented the audio and graphics of the virtual humans to th e user. This approximated our ideal natural interface. For the scenario, inte raction and perception were only limited by the capability of the underlying simulation to process the i nput and render realistic output. 41

PAGE 42

Figure 3-3 The IPS interface. 3.4.2 Interface Implementation A stereo camera system was deployed, which used web cameras (30 Hz, 640 x 480). They were placed in the corners of the room, facing the expected loca tion of the user. The covered volume was large enough to observe the users full body during typical interaction. The two cameras were connected to a desktop PC (Inte l Pentium 4, 3.0 GHz, 2 GB of RAM, ATI Radeon 9800). The desktop PC performed 3D stereo re construction of the user s head and hand position. Head and hand position were determined to be th e two most important non-verbal features to track for the scenario. A headset microphone was connected to a tablet PC to capture user speech. The tablet PC (Intel Celeron, 1 GHz, 512 MB RAM, NVIDIA GeFo rce4 Go 420 graphics card) served as a 42

PAGE 43

patient chart, digital notepad for students to take notes, and for speech recognition processing. This allowed the user to start the interaction outside of the ro om because the tablet PC had a wireless network card. The tablet PC and track ing PC were connected via network to another desktop PC, which performed gesture rec ognition, character simulation, animation, and audio/video rendering. Finally, th e virtual humans were projected onto a wall of the examination room using a projector (NEC LT260, 1024 x 768, 60H z, 2000 lumens). The projector enabled a high-resolution large-screen that provided enough screen space the two life-size virtual humans. It allowed the user to sit at a comfortable, realistic distance away from the virtual humans, while providing a high enough resolution to observe facial features and body language. 3.4.3 Simulation Implementation Given the natural interface, we developed a simulation for rendering the virtual humans and driving the virtual humans behavior. The si mulation architecture is shown in Figure 3-4. It can be divided into perception, cognition, and res ponse systems. The perception system receives input in the form of audio and video capture devices. The input from the microphone is processed into phrases over time by the speech recognition component. The input from the cameras is processed into discrete tracked objects (e.g.,, head, hands). Then the gesture recognition component further processes these si gnals over time, into gestures. Given these inputs and using a knowledge database of appropria te responses and input, the cognition system determines the best response. The response syst em generates the spoken utterances (synthesized speech) and animation of the virtua l human model. Finally, the virtual human is rendered at lifesize from the perspective of the users. 43

PAGE 44

Figure 3-4. Data flow diagram of IPS data processing. Data is in yellow, and systems are in white. Perception : The goal of the perception unit was to process the hardware input into the necessary data for the cognition and response units. For user tracking, collaborators from the Un iversity of North Carolina at Charlotte, Jonathan Jackson and Min Shin, developed a ma rker-based color-tracking algorithm that could utilize the web cameras. The markers were unobtrusive, consisting of wrapped cloth. The tracking algorithm is described fully in [62]. For speech recognition, we used a commer cial program, Dragon Naturally Speaking Profession 7.0 [63]. We chose this specific recognizer after subjectively comparing performance to other popular recognizers (Microsoft Speech Engine, and the open source Sphinx engine). Dragon provided a continuous dictation mode so that users did not have to learn and use a 44

PAGE 45

command interface to interact. User training t ook approximately ten minutes, and resulted in approximately ninety percent recognition accuracy changing the users speech into text. The recognized text by the speech recognition system was displayed to the user in the lower left hand corner of the screen, as seen in Figure 3-5. Th is was to allow users to recognize when the speech recognition system did not accurately recognize their voice (e.g.,, howled are you is a common misrecognition of how old are you). Figure 3-5. Student asks Does it hurt here? In addition to recognizing speech input, we wanted to incorporate gesture recognition because gestures are an importa nt part of human-human interactions. Commercial body gesture recognition solutions were not available. T hus, we implemented our own scenario specific gesture recognition system. Th e gesture recognition component used head and hand tracking data to detect two types of gestures pointi ng, and handshake. The handshake gesture allowed 45

PAGE 46

users to follow typical greeting protocol. To shak e hands with the virtual human, the user simply extended his hand in the beginning of the intervie w (the hand that the tr acking marker is fixed to). A pointing gesture was included as a substitute for a physical examination of the virtual human patient. By pointing to the virtual human, the user coul d probe various areas of the virtual human patients body for pain by asking, Doe s it hurt here?, or Is this where it hurts? (See Figure 3-5) When the agent detected a po inting reference, it queri ed the pointing gesture recognizer for the most recent, sustained, pointin g location (such as lower left abdomen). The pointing direction was computed as the vector from the users head position in the direction of the users hand position. This vector had been used by others to accurately determine the intended pointing direction [64]. A small red dot appeared on the screen where the vector intersected the screen plane. Similar to the speech recognition text, the dot was intended to overcome technical difficultie s with tracking inaccuracy, reducing user frustration. Cognition : The goal of the cognition unit was to understand the natural spoken language and gestures of the user, and generate the appropriate response. The agent tries to find the best response for the input user utterance and gesture. The responses (speech text and animation commands) ar e contained in a response database. Stored alongside each response in the da tabase are template sentences and gestures that trigger the response. The agent matches the user utterance and gesture to the closest template, and then sends the response to the respons e unit. For example, for the response hello, there are templates hi, hello, hello Diana, howdy etc. The template matching approach allowed e nd-users to create and extend the response database. Initially the respons e database was sparse, consisti ng of only a few responses and 46

PAGE 47

input templates designed by medical faculty. As end-users (medical students) tested the system, more responses and templates were added the database. The current database for the AAP scenario has over 200 responses and 900 example inputs. The algorithm for matching input to the closest template us ed a modified Levenshtein distance metric [65]. The cost for transfor ming the current utterance into a known speech template was the number of additions and subtractions weighted by the importance of the word to the sentence. A heuristic automatically defi nes the importance of each word. The importance is set as the inverse frequency of that word in the English lang uage, as defined in [66]. Less important words such as the and a have a hi gh frequency, while key words such as pain or location have low frequencies. This heuris tic enabled the system to choose an appropriate response for the majority (> 60%) of user input. While this accuracy initially seems low, it was seen as acceptable by medical educators because it forced students to formulate questions in a simple, easy to understand manner. Response : The goal of the response unit was to render virtual humans with realistic appearance and animation. For realistic virtual human models, we used th e Haptek full-body charac ters package [67]. The Haptek package provided high visual fidelity physics-based virtual humans. In addition, the package provided an animation system, and tools to create animations. An API allowed run-time access to the models being animated, allowing the characters to be integrated into the main application. Furthermore, many idle animations were included in the package such as blinking, breathing, and other subtle body movements that made the characters appear more realistic. Finally, the animation system suppo rted setting the virtual human s gaze direction (always set to look at the user), and providing automatic lip-synching to synthesized speech or recorded speech. 47

PAGE 48

Speech synthesis was chosen over recorded sp eech for rapid prototyping, despite sounding noticeably artificial. The speech synthesis sy stem was the Microsoft Text-to-Speech system, loaded with the AT&T Crystal Voice: Audrey UK English module. We chose the UK English over the US English because it subjectively sounded the most realistic of the available modules. Interestingly, to an observer from the UK, the UK English sounded less rea listic. This highlights that user perception of VHs ma y be impacted by not only system factors (e.g.,, quality of the speech recognition), but also human factors, su ch as age, race, gender, or backgrounds. Figure 3-6. User perspectiv e correct viewing frustum To create an immersive effect, the rendering system rendered the scene from the users perspective. This made the virtual environment a natural extension of the room. In addition, this enabled the virtual humans to make positional inferences with respects to real objects without having to transform coordinates (e.g., where to look at the user). The effect is similar to looking 48

PAGE 49

through a window. The window in this case is the display surface. This is a well-known technique used many virtual realit y displays [68], and the fixed-f unction graphics pipeline can be used to create the effect in hardware (for planar display surfaces). A graphics frustum is created as shown in Figure 3-6. 3.5 Study I Our goal with first pilot study was to determin e if the IPS was usable by end-users and the VOSCE accepted as a training, teaching, and evaluati on tool for interpersonal skills education. We wanted to quantify the importance of various technological (e .g.,, life-size virtual humans) and logistical (e.g.,, installing the IPS in the tr aining facility) aspects of the experience. Moreover, we wanted to quantify the students f eelings about the believability of the virtual human patient as a real patient. 3.5.1 Design The design of the study was that medical students would interview DIANA and get feedback from VIC in the AAP VOSCE. Those students would then rate the system and experience. The VOSCE sessions would also be video recorded so that observers could analyze student behavior and system performance. In addition, students would provide their opinions and feelings about the experien ce. In this way, we would gain valuable qualitative and quantitative data to assess and improve the us ability and acceptability of the IPS and VOSCE. 3.5.1.1 Environment We installed the IPS at the Harrell Adult Development and Testing Center at Shands Hospitals at the University of Florida. This testing site was where students interact with hired actors in mock experiences. This was expected to increase the likelihood of students taking the experience seriously. DIANA and VIC were in Examination Room #3. The examination room was a contextually relevant environment, seen in Figure 3-7. It contained the furniture (e.g.,, 49

PAGE 50

hospital bed, sink) and equipment (e.g.,, opht halmoscope) typically found in a clinical environment. This was expected to enhance the immersion of the student into th e experience. By leveraging the existing training environment, we prepared students to tr eat the experience and the virtual humans how they woul d a real experience. This also added ecological validity to any results, as the Harrell Center is the type of training center where the IPS is intended to be installed. Figure 3-7. Examination room in the Harrell Center where the IPS was installed. The furniture during the study was moved to the side of the room. There were also logistical benefits to this approach. Students were familiar with the environment, and could easily participate in th e study, as it was located near their classrooms and study facilities. In addition, the Harrell Center was designed to conduct OSCEs. This meant that each exam room was monitored by closed circ uit video for direct observation, and session recording. 50

PAGE 51

3.5.1.2 Population Seven participants were involved in the study. There were three male and three female medical students (three 3rd years and three 4th years) and one 3rd year female physician assistant student. All had substantial prior experience with OSCEs (at least five, average: 10-20). Five had OSCEs with an AAP case, and six had expe riences with real patients with AAP. This indicated that the participants had significant prior experience in similar scenarios. While not the target experience level for the system, these stude nts would be able to provide feedback on the appropriateness of the VOSCE and IPS better than less experienced students would. 3.5.1.3 Procedure Pre-experience: After arriving at the Ha rrell Center, each particip ant was asked to sign an informed consent form and video release. Th en, each participant creat ed a voice profile for speech recognition, which took approximately ten mi nutes. We did not cons ider speech training to be a major logistical problem in the overall acceptability of the IPS because each student needed to do this only once, and the saved prof ile could be used for a variety of scenarios. The participant was then led to another r oom to fill out a background survey on prior exposure to mock experiences, abdominal pain scenarios, and medical examinations. Meanwhile, the IPS was started. Experience : The participant was then brought to the examination room where DIANAs patient information chart was displayed on the tablet PC in the door basket The participant put on the headset microphone (attached to the tablet PC) and finger-ring tracking marker. They were allowed to review DIANAs information, and then were led into the room to meet DIANA and VIC. The participant was seated in a chair facing DIANA. The participant was then instructed how to take handwritten notes on the ta blet. Before the experimenter left the room, the participant was shown how to use the hands hake and pointing gestur es. Other than the 51

PAGE 52

gestures, the participant was not gi ven any other instructions other than to treat the interview as they would a real interview. VIC began the experience by explaining some in formation regarding the scenario. He also reminded the participant how to perform the gestures, how to interrupt DIANA if she did not understand correctly, and how to end the session ea rly (By saying, I have completed or Im done). VIC then asked the participant to be gin their examination of DIANA. After eight minutes, VIC interrupted the session and announ ced that two minutes remained. After ten minutes, VIC ended the session and asked the participant for their differential diagnosis. VIC then reported which, if any, of the el even key questions were not asked by the participant. VIC then explained the correct differential diagnoses. This completed the experience, and the participant le ft the examination room. From start to finish, DIANA and VIC were always present. We believed this was important for providing the impression that DIANA and VIC were real people, not transient computer artifacts. Post-experience : After the examination, the participan t filled out the surv eys discussed in Section 3.5.4. Finally, an oral debriefing was conducted with the participants to collect their opinions on the experience, the technology. 3.5.1.4 Measures Task performance evaluation : Experts evaluated each student objectively on the following criteria: 1) Corre ct greeting etiquette ( Introduce self shake hands, query for chief complaint ), 2) Number of core AAP diagnosis questi ons that need to be asked to obtain the correct diagnosis, 3) Differentia l diagnosis (what is the fina l evaluation). The purpose of evaluating task performance was to gauge the behavi or of students in the experience relative to pedagogical goals. Were there questions that st udents did not ask the virtual patient? Did 52

PAGE 53

students follow appropriate greet ing protocol? Did students obtain enough information for a quality diagnosis? Technology survey : This self-created survey was us ed to gauge importance of technical components to the overall experience. Responses were on a 7-point Like rt scale (1 strongly disagree, 4 neutral, 7 strongly agree). Items rated included the lifesize scale of the virtual humans, and the use of speech as opposed to typing. The complete survey is given in Appendix A.1. Experience satisfaction survey: This survey was adapted from a standard survey used to evaluate student satisfaction with the actors in mock experiences,[69]. Answers were on a 5point Likert scale (1. strongly disagree, 3. neutral, 5. strongly agree). Items rated included the quality of DIANA as a patient, and the quality of VICs feedback. DIANA was also given a global rating, from 1 (lowest) to 10 (highest). The complete survey is given in Appendix A.2. 3.5.2 Results The study was a pilot study, and hence did not compare students on any metrics. Given the small population size, we were looking for trends in student responses, task performance, and for qualitative feedback. This is an accepted met hod for initial evaluation of novel interfaces. These results should not be taken as proving the validity of the virtual human system for interpersonal skills education. Validity will be discussed in dept h in Chapter 4. Raw data used for the analysis is given in Appendix B.1. 3.5.2.1 Task performance Student task performance was consistent with what medical educators expected from the study group. Out of the eleven core diagnosti c questions, the average number asked was 6.28 (7.0 is a passing grade in an OSCE). This showed that students auto matically applied their training in the virtual human expe rience. Participants also had a problem consistent with their 53

PAGE 54

level of experience. Students often forget to ask patients to elaborate on their symptoms, accepting the first answer. In this case, all participants forgot to ask DIANA to tell them more about the pain, a common but critical mistake in the diagnostic process. For the final diagnosis, there were six possibilities that should be further explored. The most important were appendicitis and ectopic pregnancy because these would require immediate surgery. The majority of participants (6/7) ha d at least one correct diagnosis, and only one participant included an incorrect diagnosis. This indi cates that the scenar io was somewhat easy, as students did not need to ask all of the core questi ons, yet still were able to arrive at an accurate diagnosis. Still, the students applied their training, and were able to ha ve a relatively involved conversation with a virtual human for 10 minutes This showed that the natural interface approach could be used to simulate interpersonal scenarios. 3.5.2.2 Technology Speech : We had generally positive responses toward s the natural interfa ce technology. All participants strongly agreed that speech interaction was importan t to the experience. Furthering this, the majority of students (4/7) disagreed that they would prefer to ty pe their questions, even if typing improved response accuracy. The positive response to speech interaction was despite the fact that 40% of time, DIANA did not properly understand students questions and statements. It was encouraging that students r ecognized natural interactio n as desirable despite the limitations of speech recogni tion and artificial intelligence. Life-size virtual humans : Students also unanimously agreed that life-size virtual humans were important to the experience. Clearly, if the system did not have life-size virtual humans, it would be less expensive and more portable because ubiquitous monitor displays could be used. Yet, students found the life-size virtual human s provided enough benefit to warrant their inclusion. 54

PAGE 55

Gestures: Participants generally though gestures were unimportant (4 neutral, 3 negative). This suggested that the scenario did not properly utilize gestures, or that the quality of gesture recognition needed improvement. Rendering : Perspective-correct rendering was al so viewed as unimportant by most participants (4 negative, 2 neutral, 1 positive). So me reasons for this could be that the scenario did not require much movement, and thus did not illustrate the value of such a technique. In addition, jitter in the optical tracking component caused noticeable jitter in the display, which students commented on in the debriefing. 3.5.2.3 Application 4.86 6.29 6.29 1 2 3 4 5 6 7 Training Teaching Evaluation Figure 3-8. Results from the Technology survey on acceptability of the VOSCE for each item. Items were rated on the Likert scale: 1 (strongly disagree), 4 (neutral), 7(strongly agree). Overall : All participants agreed that the VOSCE would be a valuable tool in training and testing (See Figure 3-8). Confirmi ng this, participants said they would use the system frequently 55

PAGE 56

(3 weekly, 4 monthly). Feelings were mixed (5 positive, 2 negative) on the system as a skills evaluation tool. This was understandable, as sy stem errors and fidelity compromises would be more critical if used to evaluate performan ce. These positive responses suggested that the VOSCE was an important new tool in interpersonal skills education. DIANA : Overall, DIANA was given an averag e global rating of 6.36 out of 10 for the interaction (authenticity, accuracy, and symptom display). For comparison, ratings for actors in OSCEs have averaged 7.47 [69]. On local rati ngs, participants agre ed that DIANA appeared authentic, communicated how she felt during the session, and stimulat ed them to ask questions. However, many students (3 negative, 1 neutral, 3 positive) did not seem to know whether she was listening to them or not. This was likely a consequence of speech processing errors. The only generally negative rating wa s on DIANAs ability to answer questions in a natural manner (5 negative, 1 neutral, 1 positive). This suggested that the DIANA scenario was well implemented, but that work was needed to make DIANA respond naturally. VIC : VIC was highly praised (5 positive, 1 neut ral, 1 negative), whic h correlated with the participants desire for feedback on their performance. His criticis m was seen as constructive, as well as the manner in which he c onducted the experiment (giving f eedback at the end, informing the student when time was running out). This suggested finding more ways to incorporate VIC. One technique is known as scaffolding in the educa tion literature [70]. VI Cs participation could be gradually reduced, until he was not needed to assist the student. For example, VIC could provide sensitive help and feedback on in terruptions or non-verbal communication. 3.5.2.4 Debriefing The debriefings yielded many interesting comme nts, constructive criticism, and positive feedback on various components of the system. 56

PAGE 57

Instructor: VICs presence enhanced the experience by providing helpful information during the session and feedback immediately after. I liked that [VIC] gave me f eedback at the end and told me exactly which questions I forgot to ask. Multiple participants suggested VIC could provide hints when asked, particularly for first and second year medical students who are not experienced enough to get through an AAP scenario. One participant said VIC should elaborate on his responses. [VIC] should say You forgot to ask about fever, which is important because fever is a sign of infection. Speech: Speaking with DIANA and VI C enhanced the experience I dont think of it as much as watching a computer screen as actually interacting with a person. Youre actually talking to a patient; youre not typing in something and waiting for a response. Speech comprehension: DIANA answering questions incorr ectly or repeating previous answers detracted from the experience This made it harder to get di agnosis-critical information. I wanted to be able to get an answer to those questions. I felt like she didnt always an swer the question I asked her. For example, one participant asked DIANA seve ral times whether she had a history of gall bladder problems, but the response database for DIANA had no information about her gall bladder and thus the system re sponded incorrectly. This visibl y frustrated the student. The system should be able to detect when an important phrase, such as gall bladder is not matched in the sentence, and simply not respond, or respo nd with, I do not have an answer for that. Participants reported learning how to ask DIANA questions to avoid improper responses. Often, medical professionals need to ask the same question of a patient multiple ways to learn the information they need to make a diagnosis. 57

PAGE 58

I think its good for us, too, because if the patient doesnt understand the question, which is inevitably going to happen in real life, too, it forces you to think about other ways to ask questions. Others noted that it was quite distracting to them and thus implied that students would benefit from an improved system. [I got] caught up with trying to think of a way to phrase [the question] rather than taking her history. Scenario content : From the debriefing, we learned that DIANA had a tendency to offer too much information, which is not typical of most patients. Often times I actually thought she gave more detailed answers than real people. Another participant joked that he did not need to interview DIANA after the initial complaint. Others made the point that some patients give up information more readily than others did. One participant sugge sted simulating this variability by providing varying difficulty levels. DIANA, in her current st ate, might be considered an ea sy patient because she offers much information. Harder difficulty level patients would provide less information. Gestures: Most participants felt the gestures were not very useful and many did not use them. I think the whole shaking hand th ing and pointing is not really that important. Some said DIANA pointing to the right plac e on her abdomen indicated where her pain was, so they did not see a need for pointing at her. Some saw handshaking as a novelty while others saw no value in it because DIANA was not real. [People] would not accept an image as someone they can shake their hands with. 58

PAGE 59

Despite this, a few participants felt the hands hake gesture enhanced the training aspect of the system. I think [handshaking] is important because thats one of the things that they try to make sure we do automatically every time we walk in the room. The ability to give a physical exam via hand gestures may be an important feature that could be added. When we asked participants to indicate what additional ge stures they would like to see in the system, the almost unanimous respons e was the ability to give a physical exam. In fact, one participant asked DIANA twice for permission to do the physical exam asking her to lie down to continue the examination. Location and availability: Students emphasized that having the system available at all hours of the day would be very valuable A big benefit would be to be ab le to go in and do this at your leisure and practice with it. I would definitely use it. The low cost and commodity hardware made this a viable option. Similar systems could be installed in training faciliti es, medical schools, and hospitals. Work would need to be done to make the system more robust for continuous use. System audience: Almost all the study participants felt that the current system was most appropriate for 1st and 2nd year medical students They explained that students currently do not get any exposure to mock experiences before early in their 2nd year, but the virt ual patient system could open the door for students to practice inte rviewing patients earli er and more often. 1st and 2nd year students are typically very nervous when asking questions of patients. Such a system would provide good practice at the early stages when youre so nervous and you just need the idea of how it should go. Ultimately, 3rd and 4th year students need to pr actice with real people. 59

PAGE 60

Its not as good for say a 3rd year, because you are now really supposed to be working with people that have different personalities. The AAP scenario is a textbook case that is not challenging for 3rd and 4th year students but would be for 1st and 2nd year students. The comment about di fferent personalities was interesting because it suggested that DIANA had a bland personality. This may have been related to the use of synthetic speech, or that the participant (correctly ) thought that personality was a difficult to model feature. 3.6 InterPersonal Simulator and VOSCE Changes and Improvements Given the feedback and survey results we received from the medical students, we improved the IPS and VOSCE. We also made ch anges to the VOSCE, which reflected changing objectives of the experience. 3.6.1 Response Database Additions Based on analysis of the interv iews performed by the Study I students in Dickerson et al. (2005), we expanded the scope (what types of que stions asked) and depth (variations on each question) a great deal. We show ed that over 60% of the speech input was mapped to the correct response. Of the failed responses, 51% of th em were not in the database and 21% were variations on questions that we re in the database. Remaining errors included those that the natural language processing algo rithm was not designed to handle, such as summarization (when the doctor tries to repeat what the patient sa id to them) and empathetic statements. We incorporated the unforeseen questions asked by st udents in Study I as well as the variations on existing questions into the respons e database. By doing this, we believed that the system would be able to handle a higher percenta ge of user utterances correctly. 60

PAGE 61

3.6.2 Speech Understanding The natural language processing algorith m underwent minor adjustments. DIANA was made to respond less often to vague input (error s in speech recognition typically) by raising the cost threshold to match a template in the data base. While this mean t that DIANA would respond less, it also meant that when she did respond, sh e would be more likely to respond correctly. We believed that this would improve the satis faction of using the speech input system. 3.6.3 Tracking The passive, colored marker-based tracking system used in the study jittered significantly or lost tracking altogether. The markers were also encumbering because of the relatively large size required by the use of the inexpensive web cam eras. We believe that this was directly correlated to the students acceptance of user perspective-correct rendering and gesture recognition. Most web cameras are sensitive to infrared (IR) light We took advantage of this and moved to an active marker-based system us ing IR LEDs. As a result, the system was lighting independent (important beca use the user is often in the path of the projected image). In addition, it used a more efficient algorithm, re ducing latency. Most im portantly, it enabled the pointing gesture, and perspective correct re ndering to have noticeably less jitter. 3.6.4 Virtual Objective Structured Clinical Examination Changes In an upcoming series of studies, we aime d to compare medical students performance between the VOSCE and OSCE. To this end, we removed VIC from the experience. While students indicated that VIC was im portant to the experi ence in the first stu dy, an instructor was not typically present in a mock experience. We choose to remove this potential confounding factor for a formal comparison between the VOSCE and OSCE. 61

PAGE 62

DIANAs responses were also shortened to make her a more difficult and realistic patient. This also meant that students would have mo re time to ask questions because DIANA would not spend as much time talking. 3.7 Study II 3.7.1 Design We conducted a follow-up study to identify the impact of the above changes, and to obtain further feedback. For this study, the population was restricted to second year medical students. Ten students (all with nearly identical patient ex perience) participated in the study. In addition, a comparison was performed between synthetic speech and recorded speech. We randomly separated the study participants into two groups one where DIANA used recorded speech, and another where DIANA used synthetic speech as in first study. There were six men and four women. We felt that this wa s the target group for VOSCE. Other than these changes, the study design, environment, procedure, and measures were as similar as possible to the first study. 3.7.2 Study II Results We compared the results of Study II with those from Study I to see if there were any trends. Given the differences in the study population, these trends could be attributed to the lower experience level of the pa rticipants in the s econd study, or the improved technology. We also compared the group using recorded speech to the group using synthe tic speech. Statistical significance was tested with the students t-test statistic ( =0.05). However, results should be taken as trends, not conclusive evidence given the small study size (N=10). They were used to improve the experience and understand what aspect s of the virtual human experience warranted formal study. Raw data is given in Appendix B.2. 62

PAGE 63

3.7.2.1 Task performance The major result from the task performan ce survey was that task performance was significantly worse than in the first study. Out of the 11 core questions, the average number asked was 5 (7.0 is a passing grad e). Only one out of ten receiv ed a passing grade. We believe the significant difference (p = .03) on the number of relevant questions asked between this study and the first study is a re sult of the differences in experien ce between the two groups. The lower experience level of the participants in the second study was also indicated by much lower frequency in asking questions about sensitive topi cs. Study II participants forgot more often to ask if the patient was sexually active (p = .05) and to ask when DIANAs last menstrual period was (p = .01). Surprisingly, all participants in Study II gave a correct di agnosis of appendicitis. However, the lack of other possible diagnoses such as ectopic pregnancy, is another consequence of the experience difference between the two study groups. Students in the second study were too quick to diagnos e the patient without enough information, and came up with the obvious answer without exploring othe r possibilities. Overall, this suggested that students in the first study were correct when indi cating that the AAP scenario was more appropriate for first and second year students. 3.7.2.2 Technology All rated items on the technology survey eith er did not change between studies, or improved. Those that improved were items wher e the technology changed, indicating that the system and experience changes were effective. Gestures and rendering : Using the pointing gesture this wa s seen as more important than last time (4.2 vs. 3.0, p = .05). There was also a trend towards head tr acked rendering being more important (4.4 vs. 3.2, p = .10). Two expl anations for this improvement are from technology and scenario improvements. The ha ndshake gesture was removed based on negative 63

PAGE 64

user feedback from the first study. In addition, active IR markers made th e head tracking and the pointing gesture noticeably more stable, and thus easier to use. Still, significant jitter was observed during the experience, this motivated further improvements to the tracking system (Discussed in Chapter 4). Speech : Students also felt that the accuracy of the speech interaction was improved this study (4.9 vs. 3.9, p = .10). Adding more scope and depth to the database as well as improving the speech understanding component can explai n the difference. While the speech recognition accuracy was not different (~90%), the patient responded correctly more often. In this study, 70% of the queries were responded to correctly as opposed to 60% in the last study. This suggested that further improveme nts to the response database could improve the response accuracy. 3.7.2.3 Experience satisfaction Students were once again positive about the virtual human experience. Similar to the technology ratings, students were as positive as or more positive about the experience in this study than the first study. This suggested that our engineering was improving the usability and acceptability of the system. Overall : Students again strongly agreed that the system is useful for training and teaching, and were again neutral on its use for evaluation. In addition, they also indicated a desire to use the system frequently. Five of the students indica ted that they would use the system weekly and four said they would use it monthly (one student did not answer the ques tion). This indicated that the VOSCE is accepted by students of varying experience levels. DIANA: One goal of this study was to make DIANA respond more naturally. We did this through three system changes: script improvements, speech understanding improvements, and using real speech for some of the interactions instead of synthetic speech. Indeed, students 64

PAGE 65

tended to rate DIANAs responses as more natural than in the previous study (5.5 vs. 4.6, p = .09). In addition, students felt DIANA now answered questions in a more natural manner (3.7 vs. 2.4, p = .02). The change to recorded speech appeared to have the largest impact on this improvement. Participants also tended to rate DIANA on a scale from 1 higher (7.2 vs. 6.4), closer to the 7.47 that is the average for actors in mock experiences. While dire ct comparisons between real and virtual experiences ar e questionable [71], anecdotally, students seemed to treat DIANA as though she was a real patient. This was confirmed through the comments of numerous medical faculty members who observed and watched recorded sessions of medical students with DIANA. 3.7.2.4 Debriefing Pedagogical objectives: All students expressed that there was educational value in interviewing DIANA. Practicing the process of forming a diagnosis was the most valuable aspect of the system. Sometimes just having it come out of your mouth is useful. This is a way to do that without having to have somebody there. Without any knowledge of the previous study wh ere VIC provided feedback, at least one student specifically mentioned that the system would be bett er if feedback were provided. I felt like the diagnosis should be given at the very end. Right now we dont get that feedback with standardized patients (actors) This further confirms that feedback should be one of the critical components of the VOSCE system. Gestures: Students still had many criticisms of th e pointing gestures, but all tried to use them. Some felt that it was hard to point in the ri ght direction, and that they were too far away 65

PAGE 66

from DIANA to use it appropriate ly. Many revealed that they thought pointing was unnecessary for the experience. Speech recognition : Students were impressed by DIANA s ability to answer most of the questions spoken to her. Understanding most of what I said was really neat. Students in this study commented about th e ability to support compound sentences and having to speak very clearly to the system. I often had to repeat what I sa id and try to enunciate a little clearer. The more complex my se ntences were, the less likely they were to get it. I had to make really short non-compound sentences. Our medical collaborators noted the older students (in the firs t study) did not mention this because they knew that asking compound sentences was bad in practice and that speaking clearly was critical. DIANA said nothing when she did not understand a question (she did this much more often than the first study to lim it the amount of information she revealed by mistake). Most students felt this silence was unnatural and suggested she respond in some way to at least acknowledge that something was heard. Scenario content: While some students felt that DIANA s posture and disposition were appropriate for the scenario, the overwhelming opin ion was that it needed to change. This was not seen as negatively as in the first scenario. She should have doubled over; squirmed, hunched over, laying down She didnt really act like she was in pain. She was talking with a regular voice usually patients are hunched over; its hard for them to speak. With standardized patients you know that theyre not feeling too good because theyre lying, and holding their stomach. 66

PAGE 67

Students who tried DIANA with synthetic sp eech felt DIANA was not expressive enough. I knew it was artificial I couldn t tell any feelings in her voice. No affect in her voice, which might be something important when dealing with a patient. A different posture, improved voice acti ng and animations should improve DIANAs realism. (Chapter 4) 3.8 Chapter Summary This work created and refined a virtual human system with a natural interface, the IPS. The IPS provided life-size, virtual human agents who could understand natural user interaction (speech and gestures). This work also crea ted and studied a virtual human experience, the VOSCE, for interpersonal skills training in medi cine. The VOSCE enabled medical students to practice medical interview skills with a life-size virtual human patient. Pilot evaluations of the IPS and VOSCE found that the natural interface of the IPS was important for the VOSCE. Specifically, life-si ze virtual humans and speech interaction were critical. In addition, the natu ral interface immediately allowed medical students to understand how to interact with the virtual human patient. Further, by listening to user feedback, we improved the virtual humans ability to respond accu rately and appropriately to student actions. This resulted in an overall improvement of participant satisfaction with the system. Students also felt that the VOSCE was an impor tant tool for teaching and training. They indicated that, were the VOSCE available to th em, they would practice with it on a weekly or monthly frequency. The VOSCE enabled students to practice their skills in a realistic manner, and practice in a way that would translate directly to the real experience. The studies also detected an important property of the VOSCE. In the VOSCE, students who were better prepared (by virtue of being older and having more patient experience) had higher overall task performance. This provided support for the validity of the VOSCE as an 67

PAGE 68

evaluation tool. The validity of the VOSCE is analyzed in depth in Chapter 4, where a larger group of study participants interacted in both an OSCE and a VOSCE. 3.9 Conclusions The main finding of this work was that a na tural interface (e.g.,, sp eech interaction, lifesize virtual humans) to a virtua l human experience was usable and accepted by end-users for interpersonal skills education. Pilot studies in medical educa tion demonstrated that medical student participants found the natural interface cr itically important to the experience. In addition, participants responded favor ably to the possible use of th e virtual human experience for teaching and training. They indicated that they would use the experience frequently (weeklymonthly) as an additional trai ning tool. Combined with overwhelming educator approval and support, these positive student responses provi ded the support necessary to move forward towards formal validation of the virtual human experience. 68

PAGE 69

CHAPTER 4 VALIDITY OF THE VOSCE This work explored the validity of the VOSCE in interpersona l skills education. It was published in the proceedings of the ACM Conference on Human Factors in Computer, SIGCHI 2007 [72]. Personal contributions : Based on user feedback and obser vations from the usability and acceptability studies, I improve the IPS implemen tation and the AAP scenario. I also designed and conducted the validation experiment I performed all data analysis. Collaborators : Amy Stevens and D. Scott Lind assi sted with study de sign, participant recruitment, and evaluated student performance. Andrew Raij assi sted with system development and conducting user studies. Relevance to thesis : The usability and acceptability studies prepared the IPS for comparison to a mock experience. This work was that comparison. It test ed the thesis statement because the mock experiences (OSCEs) were a valid tool for interpersonal skills evaluation. If the evaluated performance in the VOSCE corre sponded to performance in the OSCE, then the VOSCE also had the power to ev aluate interpersonal skills. 4.1 Introduction Virtual human experiences could one day be ubiquitous in education. When using real humans is difficult, impossible, or dangerous, vi rtual humans may serve as substitutes. Before this can happen, it must be validated that when using virtual humans, the important educational objectives are met. An important objective in medical education is evaluating clinical examination interview skills. Experts evaluate medical students while the students perform Objective Structured Clinical Examinations (O SCEs) [73]. In an OSCE, a medical student conducts an interview with a hire d actor called a standa rdized patient. The standardized patient 69

PAGE 70

simulates a real patient. We created and refined a virtual human experience in which a student could perform a Virtual OSCE (VOSCE). In the VOSCE, a virtual human simulated a real patient. The results of a user study demonstrated that student performance when interviewing a virtual human in the VOSCE correlated to performa nce when interviewing a standardized patient in the OSCE. This work validated the VOSCE for testing clinical examination interview skills. Figure 4-1. Student interacts with the virtual human during the VOS CE. Retroreflective tracking markers placed at various lo cations to track head gaze, pointing, and body lean. The flash from the camera illuminates the markers. In order for performance in the VOSCE and OSCE to be compared, students needed to be able to interact with the virtua l human as they could with a sta ndardized patient. The interaction skills set needed to interact successfully must be similar. As a result, a natural and transparent interface with the virtual human is required. A student is shown during a VOSCE in Figure 4-1. The virtual human is projected on an examination room wall at life-size. The student interacts 70

PAGE 71

with the virtual human as they would with a re al human, using gestures and speech. The virtual human interacts with the student in the same way. The metric used to evaluate a medical stude nts interview performance is a checklist of required interview skills. A student using good in terview skills obtains all of the information required for an accurate diagnosis, does so efficiently, in the correct order, and follows proper patient-doctor etiquette. In pilot studies with the VOSCE, experts note d that it was evident which students possessed adequate interview sk ills and which did not. In the user study described in this work, an expert used an intervie w skills checklist to evaluate the interview skills of 2nd year medical students in both the VOSCE and OSCE. We present data showing that student pe rformance on the VOSCE, as evaluated by a medical expert, was significantly correlated with student performance on the OSCE. The study design and the correlation in st udent performance validated that the VOSCE could be used to evaluate medical students' interview skills in c linical examinations. Comparing the experience of virtual human interaction to r eal human interaction is the critic al validation step towards using virtual humans for interpersonal skills education. 4.2 System Implementation The system implementation for the study was an improved version of the InterPersonal Simulator (IPS), which is described in Chapter 3. The design of the system did not change, but individual hardware and software compone nts were upgraded, such as tracking, speech recognition, and the microphone. The IPS impl ementation is shown in Figure 4-2. The emphasis was on natural, transparent interaction with the virtual human, as though the user was interacting with a real human. User's could sp eak and gesture as though they were interviewing a real human. It was important to have natura l interaction for direct comparison with real humans. 71

PAGE 72

Speech recognition was natural and unprompte d. A wireless headset microphone was used for speech input, freeing the user from any wires. Speech recognition software, Dragon Naturally Speaking 8, translated the speech into te xt [63]. The translated text was displayed to the user to reduce user frustration when the system did not recognize speech accurately. The head, hand, and torso motion of the user were tracked optically us ing passive infrared markers and two commodity, infrared sensitive, video cameras. The cameras were equipped with infrared (IR) lights and an IR filter. This resulted in the IR markers being the only visible objects in the video stream, making marker se gmentation more efficient and robust. The segmented marker positions in each video stream were combined to produce the three dimensional position of each marker. Rigid clusters of markers and relative marker positions were used to register the marker positions to real world objects. Figure 4-2. The system 72

PAGE 73

The interaction was modeled as a question-answ er session. The user asks a question, and then the system returns the appropr iate speech and gesture. This question-answer pair is defined in a database created by medical faculty and refine d through end-user testing. In an analysis we conducted with the system, we found most errors were speech recognition errors, variations in phrasing (such as negation), a nd other difficult to handle Eng lish wording. We found this acceptable because most students adapt to the questi on-answer style, and are able to complete the interview and come up with a diagnosis. A more thorough description the approach is given in Chapter 3. For a realistic appear ing virtual humans, we use Haptek Corporation's full body virtual characters [67]. In addition to a realistic appearance, they have built in automatic animations including lip syncing, eye blinking, head followin g, and breathing. A change for this study was that the voice for the virtual human was reco rded audio of a standardized patient. The standardized patient was excellent at giving the impression of being in pain. To support a life-size virtual human patient we used a projector. A wall in the examination room served as the projection surf ace. As has been done with other immersive displays (e.g.,, CAVEs, [68]) the im age sent to the display was rendered so as to give the user an immersive viewpoint (Chapter 3). Although the effect did not support stereoscopy, it provided motion parallax depth cues. In in formal testing, we found that these depth cues made it easier to judge the size and look direc tion of the virtual human. In the spectrum of virtual reality, augmented reality, mixed reality, and true reality, this system falls under the mixed reality category. While we use virtual reality techniques for rendering and interaction, the examination room the virtual character resides in is an extension of the real examination room. We believe this co mbination of real and virtual makes the VOSCE a 73

PAGE 74

more immersive experience than if it were a virtual reality system in a computer science laboratory. 4.3 Study The study used a within-subjects design depi cted in Figure 4-3. Each participant performed both an Objective Structured Clin ical Examination (OSCE) and a Virtual OSCE (VOSCE). The interview skills scores on the VOSC E were then compared to the interview skills scores on the OSCE. These scores were given by a medical expert us ing an interview skills checklist (See appendix A.3). E ach participant also filled out a patient satisfaction survey (Appendix A.2), which rated th e quality of the patient. Figure 4-3. Study design. Each stud ent is randomly assigned to th e solid or dashed path. The student evaluates both the st andardized patient and the vi rtual human using the patient 74

PAGE 75

satisfaction survey. An expert evaluate s each student for both the VOSCE and the OSCE using the interview skills checklist. 4.3.1 Population and Environment The Essentials of Patient Care (EPC) class at the University of Florida educates students on medical interview skills. As part of the class, students perform OSCEs and are graded on clinical performance. We integrated our study into the class during one of the OSCE evaluation sessions. Table 4-1. The VOSCE/OSCE interview skills checklist Category Scale Description Example(s) Information Number of yes answers on 12 questions Determines if critical information obtained from patient Description of Pain? Location of Pain? Process Number of yes answers on 13 questions Determines interview performance Is there a logical pattern? Performs medical history? Quality Average of 9 questions with scale (1 very poor, 2 poor, 3 good, 4 very good) Determines interview quality Is empathetic? Displayed Appropriate Eye Contact? Body Lean? Overall 1,2,3-poor,4,5,6adequate,7,8,9-good The actual score assigned for the entire interview What is the overall score for this interaction? The participants were randoml y selected from the fall 2005 EPC class. There were 33 participants, 17 female and 16 male. They were all second year medical students and had similar OSCE experience. The scenario used for the VOSCE was a young Caucasian female with acute abdominal (AAP) pain. We used two different scenarios for the OSCE. One was a middle age Caucasian female complaining of chronic diarrhea (CHD) and the other a young Caucasian male complaining of indigestion (IND). These scen arios are typical for me dical student training because standardized patients can simulate them realistically. 75

PAGE 76

Figure 4-4. Students begin their interview sess ions by knocking on the door and entering the room. The second student from the le ft wearing the hat and microphone is interviewing the virtual human. The rest are interviewing stan dardized patients. The interactions took place at Harrell Center at the University of Florida, the standard testing center where University of Florida student s are trained and evaluated in OSCEs. This was important because it allowed the interviews to be conducted simultaneously and because the VOSCE is intended to be part of a testing enviro nment. When participan ts know they are being tested, they tend to change their behavior [74]. The VOSCE took place inside a medical examination room, where clinical examinations will occur during their careers and where the OSCE takes place as well. During the study, participants were surrounded by other participants, their teachers, and their fellow students who were in the class but not part of the study. The effect of this on the participants was very noticeable. In contrast with earlier studies run on 76

PAGE 77

weekends with volunteer particip ants, participants in this stud y acted more professionally, did not experiment as much with the system, and were more focused on the interview. 4.3.2 Procedure Key to the ecological validity was that the VOS CE was one station as part of a live OSCE training session. Students needed to go through each station in strict timeordered sequence. We had a maximum of 10 minutes that the participant could be in the examination room. To achieve this limited time, we removed a tutorial that taught the participant how to interact with the patient. While not having a practice session is un common in virtual reality research, a tutorial was largely unnecessary due to the natural interface. In ad dition, the partic ipant background survey and speech training were performed a few days before the main study session. In a typical OSCE scenario, a standardized pa tient awaits a medical student in each room of the testing center. During our study, however, Room 3 was occupied by the virtual human. All other rooms were occupied by the standardized patients. The students in the testing center were all part of the EPC course, but were not all part of the study. Only students who were seeing both the virtual human and a standardized patient with IND or CHD pain were included in the study. All study participants were asked to sign an informed consent form and video release when they arrived at the testing fac ility. When it was their turn, one student stood out side each door and prepared for the interview by reading a chart de scribing the patients vital statistics and chief complaint. The only difference for a student inte rviewing the virtual human was that they were outfitted with a hat used for tracking and a wireless microphone as shown in Figure 4-4. The session began with an audible signal sa ying, "You may now start the station". The student at each room then entered and performed the required clinical ex amination. They were allowed 10 minutes. After 8 minutes, an audible si gnal warned them to complete in 2 minutes. 77

PAGE 78

At the end of the session, an a udible signal told the student to leave the room. Students were allowed to exit at any time during their interview. Once all students had completed their sessions, they were permitted to go back into the room. The patient then gave them feedback on their performance. The virtual human also has th is capability. The participants then filled out the patient assessment questionnaire (dis cussed below) for each patient. 4.3.3 Metrics Student performance: The standard way to evaluate student performance in OSCEs is a checklist. We used the interview skills assessment checklist. As seen in Table 4-1, 35 questions were divided into four separate areas: 9 que stions on quality of in teraction, 12 questions on amount of information gathered, 13 questions related to followi ng proper interview process, and a 1 overall score for the interaction. The overall score was what actu ally determined pass or fail. The rest of the checklist served to guide student performance improvements. Typically, either the stan dardized patient is the evaluator right after the interview, or an expert performs the evaluation by watching video of the interview. Studies have shown that a standardized patient can be as accurate or even more accurate than expert s [59]. We used an expert because the virtual human did not have the capability to evaluate the full range of student's interview skills reliably. This is a focu s of future research, because in real world use, medical experts are expensive to use. Patient satisfaction: We also wanted to evaluate the performance of the patients for comparison. For this purpose, we used the Maas tricht Assessment of Si mulated Patients (MaSP) survey (a modified version was used in previous studies). The MaSP uses a series of 5-point Likert-scale items administered to the medical student after an encounter with a standardized patient. Similar to the interview skills checklist the last question asked the medical student to rank the patient on a scale from one to ten. Survey items can be seen in Appendix A.2. We used 78

PAGE 79

the MaSP to evaluate both the standardized patient s and the virtual human patient. This enabled us to study the relationship between student performance and the qua lity of the patient. This was important because the patient was the main difference between the VOSCE and the OSCE. 4.4 Results All participants completed the VOSCE, came up w ith a diagnosis, and were able to do so in the time allotted to them. This occurred co ncurrently with the OSCE in the normal training environment. While this was critical for dire ct comparison, it was al so encouraging from a feasibility standpoint. It show ed that the VOSCE could feasibly integrate into the existing infrastructure of clinical skills educat ion. Raw data is given in Appendix B.3. 4.4.1 Interview Skills Checklist We analyzed the results from the interview skills checklist by category, comparing group mean differences through an ANOVA, and correlation through regression analysis. The last question on the checklist asked the ev aluator to judge the overall quality of the interview on a scale from 1 (worst ) to 9 (best). This is the grad e that a student would get for the interview. A graph of same student overall pe rformance in the VOSCE and the OSCE is shown in Figure 4-5. There was a significant correla tion (r(31)=.497, p<.005) in the overall rating of same student VOSCE to OSCE interactions. The worst students performed the worst in both the VOSCE and the OSCE and the best students performed the best in both the VOSCE and the OSCE. This means that the both the VOSCE and OSCE were able to i ndicate the maturity of the syst em. The VOSCE is a significant new interpersonal skills tool for medical educators. 79

PAGE 80

y = 0.4443x + 1.9298 R2 = 0.24681 2 3 4 5 6 7 8 9 123456789VOSCEOSCE Figure 4-5. Significant (p<.005) co rrelation in the overall score for VOSCE and OSCE sessions. The dotted lines indicate the 95% confidence in terval. Larger points indicate multiple students had those scores. The VOSCE could be used right away by stude nts needing additional practice. The study was run with early second year students with equivalent training for clinical examinations. At this stage in their education, medical students are just beginning to practice their clinical examination skills. No student scored above a 6 for either the VOSCE or the OSCE and only a few scored above a 4. Standardized patients co uld only be used during pre-organized training sessions. The VOSCE could be available anytim e. Standardized patie nt training is still necessary, however, for training students for scen arios requiring complex be haviors that a virtual 80

PAGE 81

human could not yet simulate. Virtual human patients are better suited for cases that standardized patients could not simulate (e.g., wounds, eye movement disorders). A y = 0.2962x + 4.1812 R2 = 0.11840 1 2 3 4 5 6 7 8 9 10 11 12 0123456789101112VOSCEOSCE y = 0.1559x + 8.143 R2 = 0.03430 1 2 3 4 5 6 7 8 9 10 11 12 13 012345678910111213VOSCEOSCE B y = 0.3729x + 1.8112 R2 = 0.14291.0 1.5 2.0 2.5 3.0 3.5 4.0 1.01.52.02.53.03.54.0VOSCEOSCE C Figure 4-6. Interview skills checkli st broken up into three areas: A) Information B) Process and C) Quality. The Process and Informati on scores indicate the number of 'yes' responses for each area. The Quality score is an average of the 9 questions related to quality of the interview. The dotted line represents the 95% confidence interval. The larger points indicate multiple students had those scores. While the overall score determines the stude nts grade for the interaction, the medical expert also evaluated the students in three subcategories, in formation, process, and quality. Figure 4-6. shows graphs of each students performance in both the VOSCE and OSCE by category. The information and quality areas show ed significant, although small correlation. 81

PAGE 82

A significant difference was not observed for the process area. While this did not significantly factor into th e overall score, it is clearly an area for improvement. A multivariate ANOVA was conducted on the interview skills data for the four categories: information, process, quality, and overall score. There were two factors. The first factor was interaction type, VOSCE or OSCE. The second factor was gender, which has been shown to have a large effect in other studies with virt ual characters [75]. There was not a multivariate effect found for gender or interaction type gender. There was a significant (p<.001) multivariate effect found for interaction type. A univariate analysis show ed a significant effect of interaction type on both the quality (p<.001) and the process (p <.01) areas. The mean scores for each area are compared in Figure 4-7. 0.26 0.38 0.37 0.51 0.29 0.53 0.40 0.640.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00Overall Quality** Information Process* VOSCE OSCE Figure 4-7. Scores for each subjec t area of the interview skills checklist are normalized between 0 and 1. **p<.01,*p<.05 82

PAGE 83

The overall score was not significantly differe nt between the VOSCE and the OSCE. The amount of information obtained was also not different. Also, the correlation in these areas was significant. This means that, for testing overall skill and information gathering skill, the VOSCE could be used in place of the OSCE. The quality (p<.01) and process (p<.05) scor es showed significant difference between the VOSCE and OSCE. These are areas where the VO SCE suffers from technical limitations. The individual questions in the quality area of the interview skills checklist were mostly related to non-verbal behavior such as body lean, head nod, and eye contact. Maintaining appropriate nonverbal behavior might have been difficult when we required that the student wear a baseball cap for tracking and a microphone for speech input. In addition, their motion was hindered because of occlusion of the projected image and visual field of the tracking cameras. Expectations of the virtual humans ability to respond to appropriate non-verbal behavior may have also been lower. Regardless, there was still a significant co rrelation in quality (r(31)=.378, p<.05). The process, quality, and information scores are supposed to be reflected in the overall score. The overall scores were not different, but there were significant differe nces in the process and quality scores. This is explained by anal yzing the interview skills scores. While the interview skills checklist has a high degree of internal reliability ( =.78), the overall score had the highest correlation with the information score (r(31)=.77, p<.001). 83

PAGE 84

7.22 6.36 4.88 6.38 9.50 7.941.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.008/04 VH12/04 VH6/05 VH10/05 VH2/05 SP10/05 SPStudy DateRank Figure 4-8. Overall rank given to the patient by participants in the MaSP survey separated by type of patient and study date. The drop fr om previous studies to the current October 2005 study is significant at the p<.001 level. VH-Virtual Human SP-Standardized Patient 4.4.2 Patient Satisfaction While the interview skills checklist looks at how good the medical student is at being a doctor, the MaSP looks at how good the virtual hu man or standardized patient is at being a patient. Participants filled out the MaSP afte r their VOSCE and OSCE. The wrong scale was used for day 1 of the study. This error resulted in MaSP data for 16 participants to be excluded. Our analysis of the MaSP results showed a la rge drop in participant satisfaction for both the standardized patient (p<.001) and virtual human (p<.05) groups relative to previous study results [55], [56], [76]. In fact, while previous studies have shown the virtual human to be close 84

PAGE 85

to the national average for standardized patients (7.2), the mean score for the virtual human in this study is only 4.88( =1.83) This is illust rated in Figure 4-8. Both the virtual human and the standardized pa tient ranks were signif icantly lower than we have observed in previous studies We believe the main reason for this drop was volunteer bias. The study participants were selected randomly from a class that all seco nd year medical students are required to take. The participants were not paid volunteers. This was important because volunteers often are not representa tive of the true population and of ten are less critical of design flaws [74]. 1 2 3 4 5 6 7 8 9 12345678910Patient Satisfaction Rank (1-10)Overall Score (1-9) VH SP Figure 4-9. Patient satisfaction as determined by th e rank given by the participant is compared to the participants overall score as determin ed by the evaluator. Larger data points indicate multiple students had those scores. 85

PAGE 86

Performance of the patient was compared to pe rformance of the participant. No observable correlation was found between pati ent rank on the MaSP and overall score on the interview skills checklist as seen in Figure 4-9. This was true for both the virtual human patient and the standardized patient. The data i ndicate that the quality of the vi rtual human agent at least did not limit performance. For example, a virtual human that could not respond to any input would have made completing the interview frustrating or impossible. We believe that the interface to a virtual human patient must be natural so that the student can demonstrate good interview skills and perform a complete interview. This will be discussed further in Chapter 5. 4.5 Limitations A limitation of the study is from observer bias. Only one subject matter expert reviewed the video. While this could explain some of th e correlation, the student' s interview skills are a more likely explanation. A solution to this problem would be to use multiple experts, or observers trained by experts, to review video and measure their re liability. Future studies have used multiple observers to control for bias. Another limitation was that this study was run with students from just one medical education institution. There is no guarantee that this will translate well to students of other medical schools. Additional medical schools and pharmacy schools have since begun testing the VOSCE as well. Students were also beginning 2nd year students. More work ne eds to be done to show that regardless of educational level, performance is correlated between the VOSCE and the OSCE. However, this population was the target experien ce level for the VOSCE. The training process can be more efficient, and possibly more effectiv e (by virtue of increased practice) by using the VOSCE in early training, and later using OS CEs for more advanced skills training. 86

PAGE 87

Lastly, the study could only determine th e relationship between performance on the VOSCE and performance on the OSCE. It c ould not say how the VOSCE achieved this relationship. We believe that the natural interface enabled stude nts to treat the virtual human patient as they would a real human patient. In addition, the comparison could not have taken place if, for example, students typed to the virt ual human patient. Typing would have taken an unacceptable amount of time, and many non-verbal comparisons would not have been relevant (e.g.,, body lean). 4.6 Chapter Summary The purpose of this work was to validate our virtual human experience for clinical examination interview skills testing. The validation tested if performance during a practice interview with a virtual human was similar to perf ormance during a practice interview with a real human. Performance was determined by a checklist cont aining the required elements for a clinical examination interview. This checklist was fill ed out by a medical expert watching video of the interview. Participants conducte d the interview portion of a clinical examination with a virtual human and then a standardized patient. The correlation in performance was then computed. We found a medium correlation (r(31)=.49, p<.005) in ove rall performance. This suggested that the VOSCE could be used to evaluate students interpersonal skills. In addition, we did not find a significant correlation betw een participant ratings of satisfaction with the patient and participants ev aluated performance for e ither the virtual human or the standardized patient. However, participan ts were more critical of the performance of both the virtual human patient and standardized patient than we had observed in past studies. This reflected the ecological validity of the study. In the testing environment, students were under 87

PAGE 88

significantly more stress, they were not volunteer s, and thus, they were primarily concerned about the potential of the e xperience for evaluation. 4.7 Conclusions Our main finding was that our natural interface approach to virtual human experiences for interpersonal skills evaluation was valid. Establishing validity has moved the IPS from a prototype to a significant new t ool in interpersonal skills educ ation. In addition, the correlation found between the virtual human ex perience and the real human experience significantly reduced concerns that the imperfect artificial intelligence of the system limited its usefulness. In contrast, the natural interface enabled the virtual human expe rience to be transparently incorporated into an existing interpersonal skills training framew ork. Without changing the required educational infrastructure, the virtual human experience pr ovides significant educational and logistical benefits (e.g., standardized evaluation, partner diversity, obj ective feedback, lower costs). 88

PAGE 89

CHAPTER 5 IMPACT OF DISPLAY SYSTEM ON USER PERFORMANCE This work analyzed user performance under several visual display conditions. It was published in the proceedi ngs of the IEEE Conference on Virtua l Reality 2008 [77] An extended version is currently in submission to Presence: Teleoperators and Virtual Environments [78]. Personal contributions : I performed all work incorporati ng the new display types into the IPS system. In addition, I desi gned and conducted the user studies Finally, I performed all data analysis. Collaborators : Adeline Deladisma and D. Sco tt Lind provided the educational background and student population for the first study. Diane Beck provided the educational background and student population for the second study. Relevance to thesis : This work supports the thesis by demonstrating a causal relationship between the user interface design of the IPS and user performance. We found that changing the natural interface design also changed user perf ormance (e.g., empathetic responses). Visual displays that did not afford life-size virtual humans or encumb ered the user inhibited user performance. 5.1 Introduction Virtual human experiences have fundamentally different goals than traditional spatially focused virtual environments. Hence, research needs to be conducted to understand the impact of traditional virtual environm ent interfaces. In addition, this would help to generalize and correlate results from social virtual environment studies that employ different interfaces. However, little research has been conducted in th is area. Thus, we present our results from evaluating different visual display system conditions for our virtual human experience. 89

PAGE 90

Many researchers studying social interactions in virtual environments have used close-up, life-size virtual humans [1], [33], [43], [79]. The theory is that users treat life-size virtual humans as real humans. We explored that theory from the standpoint of the visual display. We report results from two studies designed to evalua te the impact of several visual display systems on social performance in a virtual human experience. 5.1.1 Application Context The application context for this work is the Virtual Objective Structured Clinical Examination (VOSCE). The VOSCE is used for patient interview training. Face-to-face patient interviews are an integral part of the health care process. The current standard of practice in medical and pharmacy education is to provide th ese experiences using either actual or actor patients. However, it is difficult to give students a wide range of patient experiences with actual or actor patients because of availability and feasibility. In the VOSCE, a virtual human simulates the patient, allowing for a greater vari ety of patient demographics and symptoms than students are currently able to experience using actual patients a nd actors. The cost of using actual or actor patients also limits how frequen tly students get this experience. The VOSCE has the potential of providing students more opportunities to practice, and therefore to improve, interpersonal skills such as questioning, listening, informi ng, and empathizing [16]. The emphasis on interpersonal communication skills a nd a motivated user base, make the VOSCE an effective test-bed for studying the impact of inte rfaces on social interactio n with virtual humans. The underlying technology in the VOSCE is th e InterPersonal Simulator (IPS). The goal of the IPS is to enable natural transparent interactions with virtual humans. A natural interface affords users the ability to communicate with a virtual human similarly as they would communicate with a real human in the same situ ation. Transparent inte raction aims to reduce users need to be aware of the technology to inte ract. Such interaction has been largely achieved 90

PAGE 91

in the IPS. Students can interact with the vi rtual human using unprompt ed speech and gestures, and the virtual human speaks and gestures to the student. Only minimal us er training (2 minutes of speech recognition training and a system tutori al), is required prior to a virtual human interaction. 5.1.2 Motivation to Evaluate Visual Displays As Fred Brooks stated in his 1999 survey of virtual reality (VR), one of the major challenges facing VR is choosing which display best fits each a pplication [21]. Many display systems have been used for virtual human experiences, including monitors, head-mounted displays (for both virtual and a ugmented reality), projection disp lays, and large-screen TVs. Each type of display has wellknown advantages and disadvantag es in terms of logistical requirements (e.g., cost, space, availa bility), and interaction afford ances. For example, a tracked head-mounted display is often expensive, can lead to simulator sickness, and encumbers the user with a device; however, tracked head-mounted displays are physically immersive, allowing users to explore a virtual space naturally. In contrast, a small m onitor display is cheap, widely available, and familiar; but often requires an interaction device to adjust the virtual viewing direction. It is difficult to choose a visual display purel y analytically. Empirica l observations need to be included in the decision in orde r to fit application sp ecific requirements. Evaluations of visual display systems have shown impacts on task perf ormance, presence, and behavior in virtual environments; however, these impact s have been in the context of spatial tasks (e.g., navigating a virtual environment, manipulating a spatial da taset) [80], [81], [82], [83]. Spatial tasks emphasize locomotion and object manipulation. In contrast, the social tasks of virtual human experi ences emphasize communication. Evidence exists that indicate soci al communication with embodied agents may be affected by the 91

PAGE 92

visual display. Mehrabian showed that up to 55 % of the information exchanged in face-to-face interactions might be visual cues (e.g., interpersonal distance, posture, eye/head movement, facial expressions) [84]. Display systems that affect these visual cues may affect social communication. We believed that differences in displays, and resulting differences in the visual experience, would change the extent to whic h users felt that they were wi th another real human. This phenomenon of feeling with another person is known as copresence An excellent survey of copresence literature, and the definition of copresence used in this work is given in a Bailenson et al [85]. A strong feeling of copresence is vital to the pedagogical goals of the VOSCE. We measured copresence both directly (self-report), and indi rectly in terms of it s expected effects on behavior and self-reflection. Hi gher copresence would be indicat ed by caring, and understanding responses towards the virtual human patient (e .g., empathy). In addition, higher copresence would make students evaluate thei r own behavior more critically, a nd the behavior of the virtual human less critically. This is because they w ould evaluate performance as they would with a real human, not a computer system. 5.1.3 Study Methodology Two user studies explored the choice of visual display. Display type was a betweensubjects independent factor in both studies. The first study was on im mersion, the second study was on life-size virtual humans. The immersion study compared two immersive displays, a headmounted display (HMD) and a fish-tank proj ection display (FTPD). The life-size study compared two non-immersive displaysnon-immersive in the se nse that the virtual human was rendered into the physical world with a monitor di splay rather than the user being immersed in 92

PAGE 93

the virtual world. A large (42) vertically orient ed plasma TV monitor (PTV) was compared to a small (22), standard computer monitor (MON). In each study, we divided the participants into two groups by display condition. Each participant performed a test interview with th e same virtual human patient, and rated their interview on various items (e.g., performance, satisfaction). We then compared groups along three dimensions expected to be influenced by copresence: Behavior : Did one group treat the virtual human consistently more like a real human? Self-reflection : Did one group rate their performance or the virtual humans performance higher? Self-reported copresence: Did one group have a subjectively higher feeling that they were with the virtual human? In the immersion study, we hypothesized that copresence would be different between the immersive displays However, we did not know which would be higher. The FTPD presented an immersive experience without encu mbering the user. The HMD encumbered the user with the device and wires, but created a more immersive experience and isolated the virtual human and the user in the virtual world. In the life-size study, we hypothesized that the PTV would give users a higher feeling of copresence than the MON This was because the PTV emphasi zed a life-size virtual human, and the MON emphasized a computer application. 5.2 Related Work 5.2.1 Comparative Evaluations of Virtual Human Experiences The most well studied aspects of virtual humans are gaze and appearance. Using a teleconferencing application, Garau et al. found that an avatar whose gaze was consistent with conversational flow was more beneficial than an avatar who randomly gazed at the user [36]. A later study by Garau et al. found a significant inte raction effect between visual realism and gaze 93

PAGE 94

in an immersive collaborative virt ual environment, suggesting that it is important to match avatar visual realism to gaze behavior realism [37]. Bailenson et al. ex tended this result to agents, relating self-report (copresence questionnaire), cognitive (memory of avatar), and behavioral markers (interpersonal distance) [33]. Fukayama et al. used a two state Markov model that controlled the amount, duration, and poi nt of interest of a virtual agent [35]. They found that they could control the users im pression of the virtual agent by setting the parameters of the model those found to create similar impressions between real humans. Dickerson et al. explored the realism of th e virtual humans voice [86]. They compared synthetic speech to recorded real speech. They found an effect on the impression formed by the user of the virtual human, but found no effect on us er behavior. Impressi ons of virtual humans are also looked at in [87]. They found that posture of the virtual human affects the users impression of the virtual humans mood, as we ll as the users physiological state. The IPS leveraged these results to maximize copresence. The Fukayama model of agent gaze behavior was used in the IPS, set to frie ndly parameters. In addition, the virtual humans breathed, blinked, gestured in a realistic manne r, and had a realistic appearance and voice (recorded speech). The virtual humans facial expressions, posture and animations were reflective of their mood (pain and fear for the current patient scenarios). This matched the appearance and behavioral realism. 5.2.2 Impact of Display Systems on Presence and Copresence Presence and copresence are related constructs, presence being the extent to which a user feels in a virtual environment, and copresence being the extent to which a user feels with another person in a virtual environment. Presence is th e better-understood construct, especially in terms of interface factors. It is widely accepted that the visual display system is an important factor determining presence in virtual environments. Presence and copresence are measured similarly, 94

PAGE 95

often by administering questionnaires and by observing expected effects (e.g., more powerful emotional responses, better task performance, incr eased physiological activity, behavior that is more realistic). Presence : The level of visual immersion afford ed by the display is the display property most associated with presence [88]. Visual imme rsion is the degree to which the users visual senses are involved in the virtual environmen t. Increasing the field-of-view, providing stereoscopy, or coupling th e display to the users head movements have been found to increase presence [89], [90], [91]. In a more general sense, immersive display systems such as HMDs and large-screen projection displa ys usually result in a higher user sense of presence than monitor displays. This can be seen in the tr aining (e.g., vehicle simulators) and therapy (e.g., virtual reality exposure therapy) fields where pr esence (emotional and behavioral realism) play a large role, and where immersiv e displays are used almost exclusively over non-immersive displays. Copresence: Increased immersion is also thoug ht to increase copresence. Media psychologists studying the impact of large-screen televisions on viewers found that large screen displays motivate people to evaluate images of other people more favorably [92]. For virtual environments, Slater et al. found that an immersive display amplif ied the effect of user anxiety when speaking to an audience of virtual humans [3]. Another experime nt by Slater et al. replicated Stanley Milgrams obedience experiment s [34]. Participants who could see and hear the virtual humans responses were more likely to having caring responses towards the virtual human subject being shocked than participan ts who could only hear the virtual humans responses. Zanbaka et al. did not find an effect of level of im mersion on social facilitation and inhibition effects of virtual humans; however, they point out that the lack of social impact may 95

PAGE 96

have been a consequence of a physical differen ce between the displays (HMD and Projector). The virtual human in their study obs erved participants from the side and the low field-of-view of the HMD removed the virtual human from th e participants peripheral vision [32]. We aimed to add to existing copresence research in two ways. The first was the impact of the visual display system on training social skills Existing research focused on replicating psychological experiments and virtual humans for exposure therapy. The second was the impact of the visual display system on richer, conversational tasks Existing research had shown that users tend to respond to virtual humans realistic ally, despite the virtual humans often having limited interaction capability. The current work built upon these results by providing data on the impact of displays on tasks where users can ha ve prolonged, bidirectiona l conversations with a virtual human. 5.3 Study I: Immersive Displays In the immersion study, two immersive visual displays were compared for their influence on copresencea non-stereo head -mounted display (HMD) and a fish-tank projection display (FTPD). The motivation behind choosing these displa ys was that both displays 1) enabled closeup, full-body, life-size scale virtual humans, 2) were common di splay types used in other immersive virtual human experiences, a nd 3) had similar graphical quality. The FTPD is a type of large-screen display. The term fish-tank means that the displayed image is rendered from the perspective of the user head position (tracked at 1 cm accuracy, 60 Hz). For the study, the image (2 m x 2.6 m) was displayed on one of the walls of the experimental space (ceiling, front projected, See Figure 5-1(top)). It enabled an immersive experience with close-up, likesize virtual humans without en cumbering the user (with the exception of a hat worn for head tracking). It had been used in pr evious studies, and well received by participants, who felt that life-size vi rtual humans were important to the experience. 96

PAGE 97

Figure 5-1. User interacts with a close, life-size virtual human using either a fish-tank projection display (top) or head-m ounted display (bottom) The HMD used was an eMagin Z800 3D Viso r (See Figure 5-1(bottom)). By tracking the orientation (internal sensor, 1 degree accuracy, 33 Hz) and position (tracked using same system as FTPD) of the HMD, the illusion is created th at the user was in the virtual world, and could look in any direction. The Z800 provided a r easonable field-of-view (40 diagonal) and 97

PAGE 98

resolution (800x600), giving it a similar visual quality to the FTPD. Table 5-1 shows a side-byside comparison of the two displays. Table 5-1. Comparison of the displays used in the study Feature HMD (EMagin 3D Visor) FTPD (NEC LT260) Cost $1500 $2000 Resolution 800 x 600 1024 x 768 (800 x 600 for study) Display Area 40 diagonal FOV Always in front of user 2.6 m x 2 m screen User must face physical screen to be immersed Immersion User is always immersed User must face the screen to be immersed Stereo Capable, but not used Not capable Avatar None User can see their own body Encumbrance User is tethered by necessary wires 225 g device on users head User must wear a hat used for head tracking 5.3.1 Design In a pilot study, four third-year medical stude nts reported a slight qua litative preference for the HMD over the FTPD after experiencing the VOSCE with both. Reasons included a feeling of closeness to the virtual human patient in the HMD and a feeling of being inside the virtual examination room with the virtual human patient However, both displays were usable and acceptable by students for the experience. A user study was conducted to quantify differe nces in copresence between the displays. This study was designed with display system (HMD and FTPD) as the primary independent variable. However, this study was conducted al ongside a study on racial diversity, comparing a dark-skinned virtual human patient to the same virtual human patient with lighter skin. The racial diversity study conflicted with this study, but eff ects were controlled for by group 98

PAGE 99

assignment. No significant interaction effect s were observed between display and race on any measures. A between-subjects design was used. While a between-subjects design reduced statistical power, it also reduced potential threat s to the validity of the results. 5.3.1.1 Application scenario and task The scenario for the VOSCE was one commonly experienced by physicians. A patient had come into the clinic with some complaints. The task for the student-physician was to interview the patient. During the interview, the student -physicians goals were to 1) gather enough information to make a preliminary diagnosis and form a treatment plan and 2) empathize and address patient concerns. The pe dagogical goal of the VOSCE was not to teach diagnostic skills, but instead develop the students ability to fo rmulate questions, listen to patient responses, empathize with the patient, and provide the patient with relevant information. Two virtual human patients were used in the studyDIANA (DIgital ANimated Avatar), and EDNA (Elderly DiaNA). DIANA had symptoms of acute abdominal pain and EDNA had found a mass in her breast. The DIANA case had b een used in previous studies for diagnostic evaluation and was used as a practice intervie w. The EDNA case was developed for this study. It emphasized empathetic communication skills. 5.3.1.2 Population and environment Medical students were actively recruited by the experimenters from the student population of the Medical College of Georgia. Twenty-s even students volunteered to participate in the study. Twenty-three (13 male, 10 female) of the part icipants were in their third year of medical school and four (2 male, 2 female) were in their first year of physician s assistant school. The third-year of medical school and first-year of physicians assistant school are focusing on empathetic communication skills. Participants were compensated $20 (USD) for their time. 99

PAGE 100

There was a significant difference in bac kgrounds between the HMD and FTPD groups on the number of years in medical school. This was because the four physicians assistant students were all randomly assigned to the FTPD group. However, a significant correlation was not observed between number of years in medical sc hool and any of the copresence measures. In addition, results did not change in significance when the physicians assistant students were excluded from the analysis. The study environment was the surgical skills laboratory at the Medica l College of Georgia over the course of four days. This environmen t enhanced the difference in immersion between the HMD and FTPD. The HMD provided full immers ion in into the virtual examination room, while the FTPD only immersed the user in the vi rtual environment when the user looked at the display surface. In contrast to other studies, the real environment was not a real examination room. Thus, the FTPD was significantly less immersive. 5.3.1.3 Measures Two approaches are commonly used to m easure copresencedire ctly through selfreported surveys and indirectly by measuring the ex tent to which expected effects of copresence are observed through cognitive and behavioral metric s. This study used both approaches. Table 5-2 summarizes the measures used in the study. Bailenson et als copresence questionnaire was used [33]. The SUS presence survey obtained from [71] was used as a secondary meas ure. These questionnaires directly prompt the participant with statements such as I felt like there was someone else in the room with me, and are answered on the 7-point Likert scale: 1 (strongly disagree) 4 (neutral) 7(strongly agree). The indirect measures of copresence were ba sed on the expectation that copresence would influence related factors. These include the extent to which the participant exhibited empathetic 100

PAGE 101

behavior with the virtual human, enjoyed the interaction with virtual human, interacted realistically with the virtual huma n, and was critical of his or her own interaction with the virtual human. Measures were used which were related to the medical interview task: Table 5-2. Measures of copresence used in Study I Measure Description Purpose Self-reported copresence Participants directly rate their feeling of being with the other person (the virtual human) Compared to cognitive and behavioral markers of copresence Selfevaluation (pre-test) Participants rate their skill at important aspects of the medical interview (history taking, expressing empathy, etc.) Increases statistical power in between-subjects experiments Selfevaluation (post-test) Participants rate their interviews of the virtual human patient (expressing empathy, taking a thorough medical history). Used to compare how critical participants were of their own performance, which is expected to be linked to copresence. Patientevaluation Participants rate the quality of the virtual human in terms of its ability to simulate a patient interaction Higher rating would be indicative of liking, which is associated with copresence Observation of empathetic response Observers rate the empathetic quality of the participants response to the virtual human asking, Could this be cancer? More empathy indicates caring, and a belief that the virtual human understands emotional behavior. Observation of social response Observers rate the degree of realism in the participants response to the virtual human sneezing. Immediate, unconscious responses such as saying Bless you indicate that participants instinctively react to the virtual human as a real human Self-evaluation : In order to measure how critical participants were of their own behavior during the interview, a pre-test/post-test self-evaluation survey was administered, which was designed by medical faculty. Particip ants had seen the survey before as part of coursework. The pre-test asks participants to rate their skill at various aspects of a patient interview (e.g., expressing empathy 1 (lowest) (highest)) prio r to the interview. The post-test asks participants to evaluate their actual us e of those skills during the interview. 101

PAGE 102

Patient-evaluation : To evaluate the how much particip ants enjoyed the interaction with virtual human patient, the Maastricht Assessment of Simulated Patients (MaSP) [69] was used. The MaSP is a survey originally designed for studen ts to evaluate the qual ity of actors trained to be patients, but each item (e.g., T he patient answered questions in a realistic manner) is directly applicable for virtual human patients as well. Participants rated aspects of the virtual human performance, such as whether the virtual human exhibited patient complaints realistically, and the participants overall opinion of the virtual human. Observation of empathetic response: Emotional responses are indicated by the participants response to an empathetic opportuni ty presented by EDNA. At seven minutes into each interview, EDNA says My sister had cancer. Could this be cancer? Video observers (three novices, one expert) quantified participants res ponses to this empath etic opportunity on a scale from one (not empathetic at all) to seven (very empathetic). Two of the observers (novices) evaluated audio only, and the other two evaluated both audio and video. This was to make sure that knowing the display condition (they could clearly see the participant in the HMD and could not observe facial expressions) did not bias the observer. A participant who cares may respond with I understand that you are afraid because you know cancer can be hereditary. Were going to run some tests and get you the best care possible. Observation of social response: In order to quantify realism of the participants social behavior, a predefined moment, similar to th e empathetic opportunity was observed. Edna sneezed two minutes after each participant began to interview her. The responses were quantified as none, realistic, or unrealistic. Bo th verbal (e.g., bless you) and non-verbal (e.g., head-jerk) responses were considered. 102

PAGE 103

5.3.1.4 Procedure Participants began the experiment by filli ng out background surveys used to collect demographics, video game-playing and prior patient interview experience. They also filled out the pre-test self-evaluation survey. Participants were then outfitted with a wire less microphone. Then participants performed a one-minute microphone volume adjustment. Part icipants in the HMD condition were then instructed to put on the HMD and shown how to adjust it. Part icipants in the FTPD condition were instructed to put on the tracked hat (the HMD was tracked in the HMD condition). A two-minute tutorial was then conducted by th e experimenter (same for all participants) where the participant was taught how to identify when speech recognition or speech understanding errors were preventing the patien t from understanding the participants speech. For example, when the participant asks a questio n, the recognized text appears above the virtual humans head. If this text was not what the pa rticipant spoke, they know there was an error in speech recognition and they should repeat the question. If the text is correct they know that the virtual human does not know how to respond to the question and they should rephrase the question. The participant was then give n an opportunity to practi ce interviewing DIANA until they felt comfortable with the system. Immediately upon completion of the pr actice interview, the participant filled out a paper survey, which aske d for a diagnosis and treatment plan for DIANA. The participant then re-entered the examina tion room and performed an unassisted patient interview of EDNA. A ten-minute time limit (the length of time given for standardized patient interactions) was enforced for this interview. After the interviews, the participant was escort ed back to the survey area to fill out the post-experience surveys. The pos t-experience surveys in cluded the MaSP, self-evaluation, self103

PAGE 104

reported presence, and self-reported copresence surveys. The st udent also provided a diagnosis and treatment plan for EDNA. 5.3.1.5 Statistical methods Data was analyzed using ANOVAs in SPSS w ith display condition as the independent factor. Covariables, such as speech understand ing accuracy (the percen tage of the virtual humans responses that were accurate), self-skill ratings, and demographics were included in the analyses when a significant correlation was obs erved with the dependent variable. A power analysis revealed that with th e group sizes (N=13, N=14), the anal ysis had the power to detect only large effects (delta = 1.1) at the acceptable 80% confidence. 5.3.2 Results 5.3.2.1 Self-reported presence and copresence The presence and copresence scores for each questionnaire were calculated by counting the number of high (6 and 7) ratings on each item The HMD group had a higher average score on presence (M=1.2, SD=2.0) than the FTPD gr oup (M=0.4, SD=0.9). The groups had similar scores on copresence, with the HMD group (M=0.8, SD=1.2) scoring slightly higher than the FTPD group (M=0.7, SD=0.7). Neither difference was statistically si gnificant, although the presence score was borderline (p=.10). Consis tent with others resu lts, the presence and copresence scores were significantly correlated (r(25)=.46, p<.05). 5.3.2.2 Self-evaluation and patient-evaluation A significant (p<.05) effect of display system was found in part icipants self-ra ting of their use of empathy after their interviews. Part icipants in the HMD group (M=5.15, SD=1.82) rated their use of empathy higher than participants in the FTPD group (M=3.6 4, SD=1.34). Relative to pre-test rating, the FTPD group scored themselves on average 3.7 (SD=1.3) points lower, while the HMD group scored themselves 2.5 (SD = 1.6) points lower. 104

PAGE 105

The HMD group had a higher average rating of the patient (M=6.3, SD=1.2) than the FTPD group (M=5.6, SD=1.4), although the differen ce was not statistically significant. We did find that the percentage of incorrect VH responses was a dominant variable in this rating (r(25) = -0.36, p=.07). However, incorporating this covari able into the display type comparison did not change significance. 5.2.2.3 Behavioral observations Social responses : There was a trend (p=.09) in how pa rticipants respond ed (verbally or non-verbally) to EDNA sneezing. Out of 13 partic ipants in the HMD group, 11 participants in the HMD group had no response, 1 had a delayed re sponse, and 1 had an immediate response. Out of 14 participants in the FTPD group, 7 ha d no response, 4 had a delayed response, and 3 had an immediate response. This indicated a higher sense of copresence in the FTPD group because participants instinctively treated the virtual human as a real person. Empathetic responses : Contrary to the general lack of responses to EDNAs sneeze, most participants (10 out of 13 in the HMD group, 9 out of 14 in the FTPD group) responded to the empathetic opportunity. Analysis of the results from the four observers ratings of the empathetic opportunity did not find a significant differe nce between the HMD (M=3.25, SD=1.80) and FTPD (M=3.44, SD=1.91) groups. The difference between the audio only observers and audio + video observers was not significant. Relationship between self-rating of empathy and empathetic response : The HMD group had a significantly higher se lf-evaluation score than the FTPD group. In contrast, the observers rated the FTPD groups empathetic responses slightly hi gher. Correlating each participants self-rating to the observers ratings, we found that that the FTPD group had a higher correlation (r(12) = 0.46, p=.08) than the HMD gr oup (r(11) = 0.00 ). Using only the experts ratings, the difference was more prominent between FTPD group (r(12)=0.59, p<.05) and the 105

PAGE 106

HMD group (r(11)=0.00)). The difference between th e correlations is borderl ine significant (z = 1.54, p=.06). This implied that the FTPD group was able to self-reflect on their interview more accurately than the HMD group. The ability to self-assess one s competence accurately is an important skill and any element that affects this ability could have an impact on an individuals self-management of their learning [93], [94], [95]. 5.3.3 Summary and Discussion In this work, we sought to identify the impact of two immersive visual display systems, a large-screen display (FTPD) and a non-stereo head-mounted display (HMD), on copresence with conversational virtual human agents. A between-subjects comparison was performed for a medical interview task. Copresence was direc tly measured by self-report questionnaires and indirectly by observing expected eff ects on task performance metrics. From a global perspective, there was no clear in fluence of display type on copresence. In contrast to the pilot study results, which suggested that the HMD group would have higher copresence, the indirect measur es (self-evaluation, behavioral observation) sugge sted that the FTPD group had higher copresence. The FTPD group was more cri tical of their own behavior. Furthermore, the FTPD group was more accurate in their self-evaluation of their empathetic behavior. The majority of par ticipants in the HMD group also i gnored social behavior from the virtual human, indicating that they were not paying attention to the virtual human except when asking questions. The implication is that a difference between the displays caused this difference in self-perception and attention. The two most likely candidates are the le vel of immersion and the difference in encumbrance between the displays. The latter is more likely, as participan ts tended to focus on the virtual human during the inte rview. They did not leverage the 360-degree immersion of the HMD. As a result, the immersion level was rela tively similar between the two displays. The 106

PAGE 107

HMD encumbered participants with the weight, and wires of the device. It raised their awareness that they were not in teracting with anothe r real person, and inst ead were interacting with a computer system. Supporting this, many of the FTPD participants negatively rated their performance rather than the VHs performance. The HMD group was less critical of themselves despite exhibiting similar empathetic behavior with EDNA. Although participants behavior was not significantly changed, their cognition wa s different. Interacting through the HMD emphasized the computer interface. Interacting through the FTPD emphasized the virtual human. 5.4 Study II: Non-Immersive Visual Displays The life-size study compared two non-immersiv e displays for the IPSa 42 plasma TV monitor (PTV), and a 22 LCD monitor (MON). When using an immersive display is not practical (e.g., the FTPD require d a large wall, and HMD does not allow easy interaction with real objects), a non-immersive disp lay has been shown to be an e ffective alternative in spatially focused virtual environments [96]. For the purpo ses of this work, the term non-immersive is used to describe a display where the virtual world is not the dominant visual reality for the user, i.e. it does not occupy the majority of the users visual field. The study was designed to ev aluate two displays for presenting the virtual human in a nonimmersive manner, similar to the pr ior study on immersive displays. Again, the metric for evaluation was copresence. The hypothesis was that presenti ng a life-size scale virtual human on the PTV enhances copresence relative to presen ting a small scale virtual human on the MON. Figure 5-2 shows the tw o display conditions. The PTV condition (See Figure 52(top)) was designed to create the illusion of a person seated across a desk from the user, deemphasizing the display. By orienting the PTV vertically, the upper body of the virtual human (torso, arms and head) was closely framed. The PTV was then placed in a standard desk -chair and placed behind a desk. A picture was taken of the area 107

PAGE 108

directly behind the PTV, and used as a texture for the background of the virtual environment. Finally, fish-tank was also enable d for this display using the same optical tracking system as in the immersion study. The emphasis was on designing an experience that was visually similar to a real-world meeting. The MON condition (See Figure 5-2(bottom)) was designed to look similar to a traditional computer interface. The MON was placed on the desk, with the virtual human displayed as sitting behind a virtual desk. Further, a keyboard and mouse were placed in front of the monitor (although these were non-functional). The same picture was used for the virtual environment background texture. In contrast to the PTV cond ition, this experience was designed to similar to a video game, or teleconference. The resolution of the PTV and monitor were set at 1024x768, which is the maximum interpolated resolution supporte d by the plasma TV (native resolution 720x480). The plasma was oriented vertically, so the horizontal resolution was 480 pixels in the plasma and 1024 pixels in the monitor. The higher resolution of the monitor was not anticipated to be an important factor because the virtual human was represented by a similar number of physical pixels on both displays. 5.4.1 Differences between Study I and Study II In addition to the types of displays compar ed, the life-size study was different from the immersion study in a number of significant ways For the life-size study, the VOSCE was used to teach pharmacy students to interview patients. The change to the pharmacy domain required a different application context (one which focuses on problems with medications), and consequently the development of a new virtual hu man patient. These changes were not expected to influence the impact of the displays on copresence because the primary purpose of the task, practicing interpersonal communica tion skills, did not change. 108

PAGE 109

Figure 5-2. Virtual human displayed on a plasma te levision (Top) and a mon itor (Bottom). Real pill bottles enhance the experience. A video camera records the experience for behavioral coding. One change that was expected to increase the power of the expe riment to identify differences in copresence is a change to th e simulation. The previous study employed an autonomous agent to generate virtual human respons es to user input. However, for this study, a Wizard-of-Oz (WOz) was used to simulate a huma n-quality agent. The speech recognition and understanding system was still us ed, however, to assist the WOz. Participants were led to 109

PAGE 110

believe that they were interacting with an autonomous agent by performing speech recognition training. Additionally, they were shown a tutorial video that taught them how to recognize when the patient did not recognize their speech accu rately. The wizard could initiate scripted responses and animations. To find the appropriate response quickly, the wizard has four options: Manually search through th e entire list of responses (~200 indexed by topic) Use a search dialog to search the responses Use one of twenty quick respons es (e.g., yes, no, right) Choose the response generated by the spoken language system During the study, the system performed efficien tly. The typical delay was less than two seconds from the end of the participants u tterance. The only difficulty arose when an appropriate response was not available (~5% of the time), to which the wizard would have the virtual human say I dont have an answer for that. 5.4.2 Design Similar to the immersion study, the life-size study was designed with display system (PTV and MON) as the independent, between-subjects variable. A larger (N =39) population sample was used in response to the large variances in the data observed in th e immersion study. The same statistical methods were used to analyze th e results. This study had the power to detect smaller effects (d=.91) at the 80% power level than the previous immersion study. 5.4.2.1 Application scenario and task In the pharmacy profession, the pharmacists role is changing from not only focusing only on dispensing the correct product to also assuri ng that the patient understands how to manage his/her own medication therapy [97]. These newe r responsibilities require that the pharmacist have competency in communicating with patien ts and identifying medication therapy-related problems. Specifically, students are taught how to inqui re about prescription and 110

PAGE 111

nonprescription medications, use of illegal drugs, the presence of side effect s, drug allergies, and adherence to the medication regimen. This interaction with the patient can occur in either a community pharmacy, ambulatory clinic, or a hospital setting. The goal of pharmacy educators in teaching th e patient interview task is the same as medical educators, teaching interpersonal commun ication skills. The be st way to learn these skills is through practice, and th e virtual human experience offers a way to increase the amount and diversity of student practice opportunities. Working with pharmacy educators, a scenario was developed similar to the peptic ulcer disease case found in [98]. The virtual human si mulates a 35-year-old Caucasian male patient, VIC (Virtual Interactive Character). VIC has had symptoms of abdominal pain for the past month, and his pain is getting worse. This case was designed to emphasize empathe tic communication skills VIC was worried about the cause of his pain (pos sibly cancer), and the repercussions of the pain (needing to quit his job). The pharmacist should empathize with VIC, while performi ng a thorough medical and medication history. 5.4.2.2 Population and environment The virtual human patient was inte grated as part of a clinical assessment exercise. As part of their degree coursework, students completed 12 clinical practice assessments (CPAs) in the first year. These CPAs involve a series of 8 to 10 practice tasks (e.g., taki ng a medical history). During the CPAs, students encounte r standardized patients (hired actors) who are trained to simulate a patient. The typical training environment is a comm unity pharmacy or an ambulatory clinic environment. The student sits in a chai r across a desk from the patient and performs the designated task, typically a patient interview. Students were not paid for their participation. 111

PAGE 112

The program for this study was a University of Florida College of Pharmacys doctorate program for working professionals. The progr am included a three-year curriculum which enables practicing pharmacists who have a bachelors degree in pharmacy to attain the Doctor of Pharmacy degree, which is now the sole degree for entry into the prof ession. The environment for this study was as described above. In room s 1-3, students encountered standardized patients; however, in room 4 one of the CPAs was with the virtual human patient, VIC. Using nonimmersive displays utilized the surrounding real environment effectively, and easily enabled the virtual human to be incorporated into the natural testing environment. Thirty-nine students (12 men, 27 women) participated in the study. One of the interesting aspects of working with the doctorate program is that the students were practicing pharmacists. As a result, the participants were older than what one might typically find in copresence studies. The average age of the participants was 41.23 years old (min=26, max=65, SD=8.649). In addition, the population was cultura lly diverse (11 Asian/Pacific Islander, 11 African/Caribbean American, 2 Hispanic, and 15 Caucasian), many of whom had accents when speaking English. This was another reason for util izing a WOz. English speech r ecognition software often does not work well with users who have strong accents when speaking English. 5.4.2.3 Measures The self-reported metrics (presence, copresen ce, self-evaluation, patient-evaluation) used in the immersion study were not used in the life-size study. The only survey metric included in the study was a self-evaluation of empathy (pre and post). A significan t difference in selfevaluated empathy was observed from the imme rsion study, suggesting that it might be a sensitive metric of copresence for the task. The study primarily focused on behavioral marker s of copresence. The reason for this was limited time with the participant. The experien ce took place in a real-wor ld testing situation, 112

PAGE 113

which was time-limited. Thus, all measures and the interaction needed to take place during a 20minute interval. Behavioral measures were analyzed post-experiment, and did not require additional time. In addition, behavioral measures were the most important from an educational standpoint. Educators evaluate students interpersonal comm unication skills based on their behavior. Similar to the previous study, moments were used which would highlight the degree of copresence being experienced by th e participant. Using moments, rather than ratings of the entire interview enabled behavior to be assessed quickly and re liably. Two moments were used: Moment 1 (M1): The first moment occurred when the pharmacist entered the room, and introduced herself as the pharmacist. The patie nt asked (rudely), Why arent I speaking to the doctor? After getting an answer, the patient further stated, I thi nk Id like to speak to the doctor! This type of moment is a co mmon occurrence for a pharmacist working in a clinical setting. Pharmacists are trained to respond pleasan tly, to explain the purpose of the interview, and to comfort the patient. Moment 2 (M2): Around seven minutes into the interview, the patient related to the participant: my dad died of cancer and asked, could this be cancer? This moment was designed to evoke sadness and an empathetic re sponse from the participant. Pharmacists are trained to handle this situation delicatel y and professionally, expressing sympathy for the patients loss, and empathize by reassuring the patient that the medical team will do everything they can to find out what is wrong. The method for recording behavi or was improved from the pr evious study. Participants interactions with the virtual human were recorded from the front using a video camera. In the previous study, video coders found it difficult to see the participants face. Just facial expressions (independent of other non-verbal cues) may enc ode up to 33% of inform ation in a conversation [84]. As seen in Figure 5-2, the camera was placed directly next to the display. Participants were informed that they were being recorded, and signed an informed consent and waiver to this effect prior to the interaction. Five video coders independently coded all 39 videos. A video coding form was designed, which focused on four aspectsengagement, empat hy, liking, and realism. Participants in the 113

PAGE 114

high copresence condition (PTV) were expected to be more engaged, empathetic, pleasant, and natural than participants in the low copresence condition (MON). Natural was reversed coded as robotic. Coders evaluated each aspect on a 7-po int Likert scale 1(not at all) (neutral) (very). 5.4.2.4 Procedure The experiment was pipelined into three st ages. The first stage consisted of the background survey, speech recogniti on training, and a tutorial video. The tutorial video prepared the participant for the interview by briefly explaining the speech recognition system. While speech recognition did not actually drive the in teraction (although it did assist the WOz), participants were given the impre ssion that they were interacting with a fully autonomous virtual human. In the second stage, the partic ipant performed the interview. Participants were instructed to take less than 15 minutes (the time given for standard clini cal practice assessments) for the interview. Most participants used the full tim e, spending an average of 13.4 minutes (SD = 4.1 minutes) with the virtual human. The experimenter acting as the WOz was hidden in a far corner of the room behind a large desk. The experimenter was only able to hear the participant, which closely simulated a human-quality sp oken language understanding system. In the last stage, the particip ant was asked to rate their us e of empathy, and was debriefed by the experimenter. During this debriefing, the pa rticipant was asked if th ey were aware that a WOz was controlling the virtual human. While a few participants did believe there was someone controlling the virtual human, most participants were convinced that the virtual human was autonomous. As the IPS was designed to be comp letely autonomous, it was important to hide the WOz. Bailenson et al. found that users who perceived a virtual human to be an avatar 114

PAGE 115

(controlled by a human) liked the virtual human more than users who perceived the virtual human to be an agent [33]. 5.4.3 Results 5.4.3.1 Self-evaluation Similar to the immersion study, a significant e ffect of display type was observed on the participants rating of their use of empathy (p < .05). Participants in the PTV condition (M=6.10, SD=1.55) rated their use of empathy lower th an participants in the MON condition (M=7.05, SD=1.31). Contrary to the immersion study, partic ipants did not significa ntly lower their selfratings of empathy from pre to post experience, s uggesting that participants perceived that they were able to empathize with the patient. 5.4.3.2 Behavioral observations The reliability between the five coders was assessed for each item on the behavior coding rating form. Table 5-3 shows the intra-class co rrelation for each coding item. For each coding item, coders scores were averaged for analys is. The average correla tion was medium (0.614), and a factor analysis of the average data showed a single significant (Eig en value > 1.0) factor for each moment, which we propose is copresence Table 5-3. Inter-coder reliabil ity in judging critical moments (average measure intra-class correlation) Item CM1 CM2 Engaged 0.614 0.654 Empathetic 0.503 0.806 Pleasant 0.743 0.373 Robotic 0.684 0.536 The behavioral data was analyzed using a multivariate ANOVA for each critical moment survey. There was a significant multivariate effect of display type for both critical moments (M1 Wilks =.59 p=.02, M2 =.612 p=.01). As seen in both Figure 5-3 and Figure 5-4, in the PTV condition, participants were more engage d, empathetic, pleasant, and natural. 115

PAGE 116

1.84 4.17 3.79 5.22 3.11 3.57 3.18 4.46 1 2 3 4 5 6 7 Engaged*Empathetic*PleasantRobotic** PTV MON Figure 5-3. Results for Moment 1 (Why arent I sp eaking to the doctor?). The dashed lines are +/one standard deviation (p<.05) ** (p<.01) 2.30 4.69 4.86 5.40 3.27 3.54 3.83 4.56 1 2 3 4 5 6 7 Engaged**Empathetic*Pleasant***Robotic** PTV MON Figure 5-4. Results for Moment 2 (Could this be cancer?). The dashed lines are +/one standard deviation (p<.05) ** (p<.01) *** (p<.001) The raw magnitude of engagement is also wort h noting. Participants in both the MON and PTV conditions were found to be engaged, but ther e was a main effect of display type in both M1 ( p=.03) and M2 ( p=.03), showing participants in the PT V condition were significantly more 116

PAGE 117

engaged. The reason for the high level of engage ment in both conditions was likely the use of the WOz, and the isolation of the participant a nd the virtual human. Du ring the interview, the participant was left alone in an office envi ronment. With minimal breaks in copresence occurring from technology, such as errors in sp eech recognition and outside influences (e.g., a person holding an HMD cable), participants were engaged throughout the experience. In addition to qualitative beha vioral coding measures, some observations were made that reflect the differences between the display cond itions. One of the memorable moments during the study occurred during M1 of participant 33s interview (P TV). Participant 33 was so engaged by the experience that he believed th at the virtual human wanted him to go get the doctor. He asked the virtual human to wait while he got help, exited the room, and asked the experimenter what to do. At this point, the ex perimenter told him that he could not actually get the doctor, and that he needed to handle the complaint. He retu rned, and adequately handled the complaint. After this point, he was truly enga ged by the experience. Afterwards he told the experimenters that he did not know if VICs complaint was an error or part of the scenario. When he found out that it was part of the scen ario, he was more impressed by the system and more willing to believe VICs responses. Participant behavior was also remarkable in M2. M2 was an emoti onally charged moment for many participants, and as seen in Figure 5-4, participants in the PTV condition were rated as more empathetic and pleasant than participants in the MON condition. Emotional responses were observed in the PTV condition, such as those by participant 16. Participant 16 had the most genuine, emotionally charged reaction the author s have ever seen in a human-virtual human interaction. 117

PAGE 118

Participant 16 was clearly saddened to hear that the virtual humans father died of cancer. She paused before responding to VIC, and answered slowly and carefully. Her eyes appeared to water and she choked up. After th e interview, she told an experi menter that she almost cried. She then re-entered the room and exclaimed to VIC Loser! Abuse of robots is a known phenomenon [99], and appears to apply to virtua l humans as well. The example given suggests that virtual humans that evoke strong emotions from users may be more subject to robot or virtual human abuse. Overall, the observed responses of participan ts in the PTV group were more powerful and natural than in the MON group. In addition to supporting hypothesis of higher copresence in the PTV group, the results have implications for the use of virtual human experiences in pharmacy education. A PTV or similar display may assist educators in understanding how students would react to similar situations in the real world. 5.4.4 Summary and Discussion In this study, two non-immersive display condit ions were compared for their influence on copresence. The first condition was a life-size sc ale virtual human displayed on a plasma TV. The second condition was a small scale virtual hu man displayed on a standard monitor. The hypothesis was that the PTV cond ition would engender a higher sense of copresence than the MON condition. The hypothesis was tested in th e context of virtual human experience that allowed pharmacy students to prac tice patient interviews. At moments during the interview, the virtual human patient challenged th e participant with emotionally charged questions. Answers to these questions were observed by video coders, who rated the extent to which participants seemed engaged, empathetic, pleasant, and natural. Analysis of study participants during the mome nts showed significant effects of display condition on cognition and behavior. Participants in the PTV condition were significantly more 118

PAGE 119

engaged, empathetic, pleasant, and natural. They were also more critical of themselves. They treated the virtual human as they would treat another person sitting acros s a desk from them. They paid attention to the virtual human and formed strong impressions of both the virtual human and themselves. In the MON condition, part icipants treated the virtual human as they would treat a computer. They appeared disconn ected from the social interaction, and focused more on the general task (taking a medication history) than on the virtual human. This suggests that the life-size virtual huma n displayed on the PTV engendered a higher sense of copresence than a small-scale virtual human displayed on the MON. The survey data indicate that participants in the MON condition felt that they used empathy more appropriately than participan ts in the PTV condition did. This was the opposite of how behavioral coders rated participants use of empathy. On both critical moments, participants in the PTV condition were rated as more empathetic than participants in the MON condition. A similar effect wa s observed in the immersion study, where the FTPD participants were more accurate in their self -evaluation of empathy. In add ition to supporting the claim from the immersion study that the FTPD condition gave rise to a higher sense of copresence, this result shows that self-evaluation of empathetic behavior is a sensitive metric of copresence. Overall, the hypothesis of this study was s upported. For an interview task, the PTV condition created a higher sense of copresence than the MON condition. The results have implications for designers of virtual human e xperiences. Designers should be aware of the limitations of small monitor displays. Small mon itor based virtual human experiences are easily accessible; however, interpersonal communication skills go beyond simple procedures. They involve complex social and emotional behavior. Users will not be as emotionally impacted by virtual human experience on small monitor as th ey would with a real life-size person. As a 119

PAGE 120

result, small monitor based virtual human expe riences may be limited for the evaluation and training of interpersona l communication skills. 5.5 Limitations Ideally, both studies would have been combined into a 2 x 2 between subjects user study, with participants from the same population. While this was the orig inal plan, practical limitations on participant recruitment along with required statistical power prevented a large enough population for a combined study. A positive as pect of this approach was that lessons learned from the immersion study could be applie d to the life-size study. Also, having both medical and pharmacy students involved in the studi es enabled the results to be generalized to both populations and the gr eater healthcare population. Another limitation is that a WOz was used in the life-size study. Few virtual human experiences would use this approach in real world training. To understand how the greater interaction fidelity afforded by the WOz influen ced copresence, the study was repeated with a new group of pharmacy students (N=35) to ev aluate the MON and PTV conditions in a completely autonomous (no WOz) version of the IPS. The system was entirely responsible for choosing VICs responses. Results suggested that an autonom ous agent is less likely to el icit a significant feeling of copresence than a WOz agent is. The groups di d not have significant differences in selfevaluation or behavioral ratings (engaged, empathetic, pleasant, and natural). Between the studies, behavioral ratings were only significantly reduced in th e PTV condition. This suggested that the MON does limit copresence. The PTV w ould offer increased opportunity for copresence as the quality of the simulation improved. It is important to note that the WOz was util ized in the immersion study because of speech recognition problems with the population group. Indeed, this was the case during the second 120

PAGE 121

pharmacy study. The older, culturally diverse po pulation was noticeably less satisfied with the performance of the IPS than population groups in previous studies. Future studies in the pharmacy domain will likely use early pharmacy st udents rather than the working professional students. 5.6 Chapter Summary Virtual environments incorporating virtual hum an agents are increasingly be used for interpersonal skills training a nd therapy. To give users a se nse of copresence with virtual humans, some effective virtual human experiences have used visual displays that provide immersion and life-size scale. The purpose of this work was to understand the extent to which user copresence is influenced by visual display systems in a virtual human experience. Two between-subjects experiments found significant differences on variables expected to be related to copresence. The primary findings in the fist study were th at participants in the FTPD condition were more critical of their own behavior, and were mo re accurate in their self-evaluation relative to observer ratings than participants in the HMD condition were. In contrast, observed behavior was similar between the two conditions. Diffe rences between pre and post experience selfevaluations indicate that part icipants in the HMD condition were not as affected by the interaction. The results highlight ed the differences between spatia l tasks common in most virtual environments and the social tasks of virtual human experiences. Increasing immersion is often a trade-off with user encumbrance. Whereas in creased immersion may be important in spatial tasks, the close-quarter humanvirtual human conversations did not leverage this increased immersion. Without encumbering the user, a large projection screen offered enough immersion for users have strong feelings of copresence with the virtual human, to e xhibit realistic social behavior, and to evaluate their behavior accura tely. By encumbering the user and emphasizing 121

PAGE 122

the display hardware in the HMD condition, c opresence was diminished, making the overall virtual human experience less effective. Results from the life-size study indicated that life-size scal e virtual humans had a larger impact on copresence than immersion. In the lif e-size study, user self-evaluation and behavior were strongly influenced by the display condition. Participants in the PTV condition (life-size virtual human) were more engaged, empathetic, pl easant, and natural than participants in the MON condition (small-scale virtual human). Participants in the PTV condition were also more critical of their own behavior. Similar to HM D condition in the immersi on study, participants in the MON condition tended to ignore the virtual human, to treat th e virtual human as a computer, and ultimately to be less a ffected by the experience. 5.7 Conclusions The main finding for this work was that cha nges to the natural in terface of our virtual human experience affected its benefit for interp ersonal skills education. Specifically, results from two user studies suggested that the visual display system plays a critical role in user behavior during the virtual human experience, and reflection about the virtual human experience. We suggest using natural display systems that place less emphasis on the display hardware, and more emphasis on life-size scale virtual humans. Pa rticipants using natural display system in our study treated, and reflected upon, th e virtual human experience more like a real experience. Further, we suggest that the be nefits of improved ar tificial intelligence may only be apparent under natural display systems. Participan t performance was higher with human-level intelligence (Wizard-of-Oz technique) only in the natural display interface condition. The implication is that, to realize improvements in virtual human simulati on, a natural interface should be used. 122

PAGE 123

CHAPTER 6 SUMMARY AND FUTURE DIRECTIONS We have developed a system for interpersonal skills education. Users practice, and are evaluated on, interpersonal ski lls through interaction with lif e-size virtual humans through a natural interface. 6.1 Review of Results In this dissertation, we claimed that: An interpersonal simulator with a natural inte rface to virtual human agents elicits performance that is predictive of the users real world interpersonal skills. We first demonstrated an inte rpersonal simulator with a natu ral interface, the IPS. The natural interface featured speech and gesture rec ognition to capture natural interaction, and lifesize virtual human agents. The virtual human agen ts were modeled and animated with state-ofthe-art commercial tools, and demonstrated ve rbal and non-verbal behaviors with the user. We showed how the IPS could be applied to medical interview traini ng. Medical interview training emphasized interpersonal skills. To train interpersonal skills, we developed an application called the VOSCE. In the VOSCE students interviewed a li fe-size virtual human patient inside a real examination room. The virtual human patient was capable of responding accurately to the majority (> 60%) of student interaction. The VOSCE and IPS were evaluated through a series of pilot user studie s. The results of these studies showed that the IPS and VOSCE we re usable, and accepted by students as real world patient interview training tools. Students highlighted natural interface components (speech, life-size virtual humans) as important to the experience. The studies also showed that students with more experience in terviewing real patients performe d better with the virtual human patient. 123

PAGE 124

We formally tested our claim with a valida tion study. The validation study showed that users evaluated interpersonal skills were si gnificantly correlated be tween the VOSCE, and its real world, validated counterpart, the OSCE. The VOSCE had a similar predictive power of users real world in terpersonal skills. After the validation, we studied an important component to the natural interface the visual display system. This purpose of this work was to 1) evaluate different visual display alternatives to the existing large-screen display, a nd 2) show that changes to the natural interface impacted user behavior. We conducted a series of studies. The first study found that a more immersive head-mounted display caused users to be less critical of their own behavior, and more critical of the IPS system. The second study also found that a less im mersive small monitor display had the same effect. A large-screen display, with associated life-size virtual human, caused users to be substantially more engaged, em pathetic, pleasant, and natural with the virtual human. Given these results, we find that the natural in terface design was critical to elicit real world interpersonal skills. Naturally embedding life-si ze virtual humans into the real world enabled users to think about, and trea t, those virtual humans as they would real people. 6.2 Future Directions Understanding the natural interface has provide d the motivation to improve the virtual human simulation, in particular the virtual human perception and cognition components. Currently, the perception and cognition systems ar e limited implementations. By recognizing a more complete set of non-verbal behavior, the virtual human could resp ond more accurately. The virtual human can only respond to the inform ation it gathers, and currently this excludes important non-verbal features such facial expressions, eye gaze, voice tone. The challenge is to integrate this tracking in a r obust, efficient, and accurate manner without encumbering or 124

PAGE 125

limiting the user. One idea we are working on is that, while current sens ing technologies (i.e. cameras, microphones) are not as ca pable as human equivalents (v ision, hearing), the virtual human system may be able to compensate by having more sensors (e.g., 10 cameras or microphones). This approach would also leverage increasingly parallel processing architectures. Just having better perception is not e nough to increase virtual human simulation performance. The virtual human must also give pl ausible responses to user input. Our approach has been to create a database of responses and input templates for matching. From Wizard-of-Oz studies, we know that this approach could be capable of responding accurately to over 95% of user utterances, given human-level intelligence in choosing the res ponse. A current effort is to expand the number of input templates for each res ponse. The method for doing this is to provide a web-based interface to the IPS. Using this interface, hundreds of health-professions students are using the IPS, and adding a ppropriate input templates and re sponses. To leverage this expanded database in the virtual human experien ce, we need to understand how typed input is different from spoken input. Finally, we have only provided a natural interface for two human senses, visual and aural. While there is little demand for some senses such as taste in interpersonal simulation, smell and touch interfaces have been proposed which coul d improve and expand the use of interpersonal simulation. Some work has alr eady been started in this area by adding a mannequin with touch sensors to the IPS. Initial experiments show th at the touch interface is usable and accepted by end-users [100]. More work needs to be conducted in this area to make the touch interface bidirectional (i.e. the virtual human can touch the user), and to allow the touch interface to move with the virtual human. 125

PAGE 126

APPENDIX A SURVEYS A.1 Technology Survey The following was important to the experience (1 strongly disagree, 4 neutral, 7 strongly agree) Interacting with DIANA using speech Interacting with DIANA using gestures, e.g., shaking her hand and pointing at a location on her body Having the scene move when I moved my head Seeing DIANA and VIC at life-size Having the system in the Harrell Center Taking notes and receiving information on the TabletPC I felt that (1 strongly disagree, 4 neutral, 7 strongly agree) The characters s poke realistically The accuracy of the systems speech rec ognition was sufficient to complete the task I would rather type my questions if it improved the systems accuracy in understanding my intent For communication skills (1 strongly disagree, 4 neutral, 7 strongly agree) The system is a valuable training tool The system is a valuable teaching tool The system is a valuable evaluation tool If this system were installed in a room at Shands (available 24/7) (Daily, Weekly, Monthly, Never) I would use it 126

PAGE 127

A.2 Experience Satisfaction Survey All items are rated on a scale from 1 (strongly di sagree) to 3 (neutral) to 5 (strongly agree) unless otherwise indicated. Virtual Patient VP appears authentic VP is challenging/testing the student VP simulates physical co mplains unrealistically VP answers questions in a natural manner VP appears to withold info unnecessarily VP appearance fits the role VP stimulates the student to ask questions I can judge from the reactions of the VP whether he/she listens to the student VP communicates how she/he felt during the session What mark (1 lowest-10 highest) would you give the VP for this interaction? Instructor VIC gives constructive criticism I liked the instructor interrupted during the interaction I would like feedback from instruct or only at the end of the session Overall I found this a worthwhile e ducational learning experience I would use this as a practice tool 127

PAGE 128

A.3 Interview Skills Checklist Information (Check Yes or No) 1. The pain started yesterday around 8pm 2. The pain started in the middle of my abdomen and moved to my lower right side 3. The pain is sharp and stabbing 4. I am nauseated 5. I vomited one time around 11pm last night 6. I do not feel like eating anything 7. I had one soft, brown stool yesterday around 6pm. I do not have diarrhea 8. I am sexually active 9. I had my period around a week ago 10. I do not have any vaginal discharge 11. My temperature has been around 100 12. I do not have burning, urgency or frequency with urination Quality (Select one) (1 (lowes t quality)-4 (highest quality) 1. Attentiveness (1) Did not seem to really be paying attention or listening; Interrupted without apology or explanation (2) Attention drifted at times; asked a question that had already been answered without apology (3) Appeared to be paying attention (4) Appeared to be paying attention and responded to verbal or non-verbal cues 2. Eye Contact (1) Little or no eye contact (2) Some eye contact 128

PAGE 129

(3) Appropriate eye contact at most times (4) Appropriate eye contact at all times 3. Body Lean (1) Little or no forward body lean (2) Some forward body lean (3) Appropriate forward lean at most times (4) Appropriate forwar d lean at all times 4. Head Nod (1) Little or no head nodding (2) Some head nodding (3) Appropriate head nodding at most times (4) Appropriate head nodding at all times 5. Level of Immersion (1) Did not appear to be immersed at any time (2) Appeared to be immersed some of the time (3) Appeared to be immersed most of any time (4) Appeared to be immersed at all times 6. Level of anxiety (1) Little or no anxiety (2) Had some anxiety during the interview (3) Appeared anxious at most times (4) Appeared anxious at all times 7. Attitude (1) Made judgemental comments of criticized patient; OR talked down to patient (2) Made 1-2 comments with inappropriate affect 129

PAGE 130

(3) No judgemental comments; talked to patient as an equal (4) No judgemental comments; talked to patient as an equal; offered praise/encouragemtns with opportunity arose 8. Empathy and Support (1) Offered no empathetic comments; No encouragement or support (did not state intention to help) (2) Offered only breif supportive of empath etic comment and only in response to a distinct emotional statement by patie nt. Comments may seem prospective or forced (3) Offered empathetic or supportive co mments OR stated intention to help (4) Offered empathetic or supportive co mments OR stated intention to help; despite limited time seemed to be on way to establishing a caring relationship 9. Clarity of questions (1) Frequent unclear questions: patient has difficulty in understanding what was being asked (2) Some unclear questions: patient ha s difficulty once or twice understanding what was being asked (3) Clear Questions (4) Clear Questions Process (Check Yes or No) 1. The student greets the patient 2. The student introduces self and role 3. The student explores the patients con cerns re problem (chief complaint) 4. The student progresses using transiti onal statements (moving from HPI to PMH, FMH, SH, ROS, etc.) 5. The student allows adequate time fo r the patient to respond to questions 6. The student uses verbal facilita tors (tell me more, please continue) 7. The student ends the interview(summa ry statement, checks accuracy, asks for patient questions, thanks patient) 130

PAGE 131

8. The student conducts the interview in a logical and orderly fashion and was responsive to patient ve rbal and nonverbal comments 9. Did the student ask about pertinent social history 10. Did the student ask about per tinent family medical history 11. Did the student ask about pe rtinent past medical history 12. Did the student ask about pertin ent history of present illness 13. Did the student ask about pe rtinent review of systems Overall rating of the interaction 1, 2, 3 (Unsatisfactory) 4, 5, 6 (Sa tisfactory) 7, 8, 9 (Superior) 131

PAGE 132

APPENDIX B RAW STUDY DATA B.1 Study Data for Section 3.4 Interacting with DIANA using speech 7 7 7 7 7 6 6 Interacting with DIANA using gestures 4 4 4 4 1 3 1 Having the scene move when I moved my head 2 4 7 4 1 4 1 Seeing DIANA and VIC at lifesize 7 4 6 7 7 7 Having the system in the Harrell Center 7 6 7 4 4 7 6 Taking notes and receiving information on the TabletPC 5 5 2 4 7 3 5 The characters spoke realistically 5 5 1 6 5 5 5 The accuracy of the systems speech recognition was sufficient to complete the task 7 3 7 2 2 3 3 I would rather type my questions if it improved the systems accuracy in understanding my intent 2 2 1 6 2 7 7 The system is a valuable training tool 7 6 7 7 6 5 6 The system is a valuable teaching tool 7 6 7 7 6 5 6 The system is a valuable evaluation tool 7 6 5 2 3 5 6 VP appears authetic 4 4 5 4 4 4 4 VP is challenging/testing the student 4 2 5 4 4 4 4 VP simulates physical complains unrealistically 2 3 4 2 1 2 4 VP answers questions in a natural manner 2 3 2 2 4 2 2 VP appears to withold info unnecessarily 3 1 1 1 2 2 4 VP appearances fits the role 3 4 3 4 5 4 5 VP stimulates the student to ask questions 4 4 5 5 2 4 5 I can judge from the reactions of the VP whether he/she listens to the student 4 3 4 5 1 2 2 VP communicates how she/he felt during the session 4 5 5 5 3 3 5 What mark (1-10) would you give the VP for this interaction? 6 7 7.5 7 6 5 6 I would use this as a practice tool 4 5 5 5 5 2 4 132

PAGE 133

B.2 Study Data for Section 3.5 Interacting with DIANA using speech 7 6 7 6 7 7 7 7 7 5 Interacting with DIANA using gestures 7 4 4 4 4 2 5 5 4 3 Having the scene move when I moved my head 4 4 5 5 7 4 6 2 3 4 Seeing DIANA and VIC at life-size 6 4 7 6 7 7 6 3 7 5 Having the system in the Harrell Center 5 6 7 3 6 7 5 1 7 1 Taking notes and receiving information on the TabletPC 7 6 1 3 6 7 3 4 7 3 The characters spoke realistically 6 5 6 6 3 7 5 5 5 7 The accuracy of the systems speech recognition was sufficient to complete the task 5 6 5 2 5 5 6 5 5 5 I would rather type my questions if it improved the systems accuracy in understanding my intent 3 6 1 6 3 2 5 6 3 4 The system is a valuable training tool 6 7 7 4 5 6 5 6 7 6 The system is a valuable teaching tool 6 7 7 3 5 6 5 6 7 4 The system is a valuable evaluation tool 5 7 7 5 4 2 5 4 2 1 VP appears authetic 4 5 4 4 4 4 4 4 4 5 VP is challenging/testing the student 2 5 4 4 4 4 4 3 4 4 VP simulates physical complains unrealistically 2 2 2 4 3 4 2 2 2 2 VP answers questions in a natural manner 4 4 4 2 2 5 4 5 2 5 VP appears to withold info unnecessarily 2 1 2 3 2 1 3 4 2 1 VP appearances fits the role 5 3 4 5 5 4 5 4 2 3 VP stimulates the student to ask questions 4 4 4 4 4 5 3 2 5 2 I can judge from the reactions of the VP whether he/she listens to the student 4 4 4 3 4 4 5 1 3 3 VP communicates how she/he felt during the session 5 3 4 2 4 5 4 4 4 4 What mark (1-10) would you give the VP for this interaction? 8 8 9 6 8 7 8 7 4 I would use this as a practice tool 5 5 5 3 4 4 4 4 5 4 133

PAGE 134

B.3 Study Data for Section 4.4 VOSCE Participant Overall Score / (19) Information / 012 Process / 013 Quality /(14) Patient Rating / (110) 1 1 3 6 2.375 6 2 4 8 7 1.875 3 4 7 7 1.75 4 6 8 11 2.375 7 5 5 8 10 2 6 4 8 9 2.5 7 4 6 6 2.375 4 8 1 3 6 2.125 2 9 2 5 3 1.875 10 3 6 5 2 11 3 6 10 2.875 6 12 3 3 10 2.25 3 13 2 5 7 1.75 8 14 3 2 6 2.125 3 15 3 6 8 2.375 16 1 3 6 1.875 17 2 3 8 1.625 18 6 8 11 3 2 19 2 6 6 1.375 20 2 3 7 2.375 21 5 6 11 2.5 22 1 3 4 1.5 23 4 8 13 2.375 5 24 3 5 5 2 4 25 4 7 8 2 7 26 4 7 10 2.25 7 27 3 4 5 1.875 28 2 3 8 2.125 6 29 2 5 6 1.75 30 3 4 10 2.75 4 31 2 6 5 2 5 32 4 7 9 2.125 4 33 4 7 9 2.125 OSCE Participant Overall Score / (1-9) Information / 012 Process / 013 Quality /(1-4) Patient Rating / (110) 1 1 3 7 2.5 8 2 4 6 10 2.25 3 3 5 8 2.375 4 6 9 11 3.25 9 5 5 9 11 2.75 6 3 6 7 2.375 7 5 8 12 3 8 8 4 6 11 2.428571429 9 9 3 5 9 2.5 134

PAGE 135

10 5 8 11 2.625 11 3 6 8 2.5 7 12 5 7 13 2.625 7 13 4 9 13 2.75 8 14 4 5 11 2.625 7 15 2 4 8 2.375 16 3 5 8 3.375 17 3 6 8 2.125 18 4 7 10 3.125 7 19 2 6 7 2.25 20 3 7 9 3.428571429 21 4 5 11 2.75 22 3 4 11 2.25 23 3 5 10 3.125 8 24 3 5 10 3 9 25 5 8 10 2.25 7 26 2 5 9 2.25 8 27 2 4 6 2.75 28 2 5 5 2.142857143 8 29 3 4 9 2.25 30 3 4 10 2.625 8 31 1 4 6 2.25 8 32 4 7 11 2.714285714 9 33 2 4 8 2.375 135

PAGE 136

B.4 Study Data for Section 5.3 Participant Self-Rating Empathy Patient Rating Observed Empathy (AV) Observed Empathy (A) Presence Count Copresence Count 1 4 5 1 1 0 1 2 4 5 5.5 5 0 1 3 3 6 2 2.33 0 0 4 8 7 1 1 1 1 5 3 6 4 3.67 0 2 6 3 8 1 2 1 7 5 4 6.75 6 0 0 8 2 8 1 1 0 0 9 5 5 5.75 6.33 0 2 10 6 6 5.75 6.67 1 0 11 7 8 4 4.33 0 1 12 3 7 2 2.67 5 2 13 4 4 4.5 5 1 0 14 4 7 3.75 5.33 0 3 15 6 6 5.75 6.67 0 0 16 2 4 4 4.67 0 0 17 4 6 3.5 3.33 0 0 18 1 5 2.75 3 0 0 19 4 6 1 1.67 1 0 20 4 6 5.75 5.67 3 1 21 6 7 2 2.33 0 0 22 8 7 1.75 1.67 6 3 23 4 7 3.5 3.67 0 1 24 5 4 4.75 4.67 0 1 25 6 7 4.25 4 2 0 26 3 6 1 1.33 0 0 27 4 3 2.5 3.33 0 0 136

PAGE 137

B.5 Study Data for Section 5.4 Partici pant SelfRating Empathy M1 Engage d M1 Empath etic M1 Pleasa nt M1 Roboti c M2 Engag ed M2 Empath etic M2 Pleasa nt M2 Roboti c 1 7 5.4 3.6 5 1.4 4.25 3 5 3.5 2 6 . 5 4 4.2 3 3 7 . . 4 7 . . 5 7 . 5.2 4.4 2.4 4.2 6 6 5 4.25 2.75 2.5 . 7 6 2.6 1.75 2 4.6 3 3.5 3.5 5 8 6 4.8 4.8 5.2 1.8 5.6 5.2 5 1.8 9 4 5.8 4.6 5.8 1.2 6 2 5 2 10 3 5.2 3.2 5.6 1.8 5.5 6 5.5 1.5 11 6 5.8 3.6 4.8 2.6 5.8 4 4.4 2.6 12 8 . 3.6 1.8 3 4 13 9 . . 14 5 5.6 4.6 3.8 1.6 4.75 2.75 3.75 2.25 15 7 4.8 4.2 4.8 1.8 5 4.8 4.4 2.8 16 5 6 4 4.4 1.2 6.75 7 6.75 1 17 9 4.6 3.4 3.6 2.8 5.4 5.8 4.6 3.2 18 9 4.8 3.4 4.2 2.4 5 4.6 4 3.4 19 4 3.4 2.2 3.8 4.4 . 20 7 5.6 2.8 3.6 2 4.2 1.8 3.4 2.6 21 8 4.6 3.2 3.6 3 5.6 6.2 4.8 2.8 22 7 4.8 3.4 4.4 3.8 5.2 4.2 4 3.2 23 6 . 5 5.2 4 2.2 24 8 . 3.25 2 2 3.75 25 4 5.4 3.8 4.8 1.6 5.8 5.6 4.8 1.4 26 7 5.4 2.4 2.8 2.2 4.8 4.8 4 2 27 6 5.4 3.8 4.4 2.4 5 4.6 4.2 2 28 6 5.4 3.2 3.8 1.8 6.4 6.4 5.2 1 29 8 3.5 3 3 1.5 6.2 6 5 1.6 30 9 3.75 2.75 4.25 4 4.8 5.4 3.4 4.2 31 6 4.75 3.75 4.5 1.25 4.4 3.4 3.4 2.75 32 6 4.8 3.8 2.6 2.8 3.8 3.6 3.6 2.6 33 7 5 3.6 3.2 1.4 6.2 6.4 5.2 1.8 34 7 4.8 3.2 3.2 3.4 4.67 4 3.67 5.67 35 8 6 4.2 3.8 1.6 6.2 6.4 5.2 1.2 36 4 5 5 3 1 3.2 3.6 3.2 3.4 37 8 3 2.8 2.8 3.6 4.8 2.4 3.4 2.8 38 7 4.4 2.8 2.6 3.6 4.6 4.2 3.8 3.4 39 6 6.5 4.5 5.25 2.5 6 4.8 4.4 2.2 137

PAGE 138

LIST OF REFERENCES [1] R. Hill, J. Gratch, S. Marsella, J. Rickel, W. Swartout, and D. Traum, "Virtual Humans in the Mission Rehearsal Exercise System," Knstliche Intelligenz, vol. 17, no. 4, pp. 5-10, 2003. [2] K.A. Hayes and C.U. Lehmann, "The In teractive Patient: A Mu ltimedia Interactive Educational Tool on the World Wide Web," MD Computing, vol. 13, no. 4, pp. 330-4, 1996. [3] M. Slater, D. Pertaub, and A. Steed, "Public Speaking in Virt ual Reality: Facing an Audience of Avatars," IEEE Computer Graphics and Applications, vol. 19, no. 2, pp. 6-9, 1999. [4] H.S. Barrows, Simulated (Standardized) Patie nts and Other Human Simulations Health Sciences Consortium, 1987. [5] R.M. Harden, "What Is an OSCE?," Medical Teacher, vol. 10, no. 1, pp. 19-22, 1988. [6] R.G. Williams, "Have Standardized Patient Examinations Stood the Test of Time and Experience?," Teaching and Learning in Medicine, vol. 16, no. 2, pp. 215-222, 2004. [7] M. Mori, "The Uncanny Valley," Energy, vol. 7, no. 4, pp. 33-35, 1970. [8] R.M. French, "The Turing Test: The First 50 Years," Trends in Cognitive Sciences, vol. 4, no. 3, pp. 115-122, 2000. [9] B. Lok, S. Naik, M. Whitton, and F.P. Brooks, "Effects of Handling Real Objects and SelfAvatar Fidelity on Cognitive Task Performa nce and Sense of Presence in Virtual Environments," Presence: Teleoperators & Virtual Environments, vol. 12, no. 6, pp. 615628, 2003. [10] C. Zanbaka, S. Babu, D. Xiao, A. Ulinski, L. Hodges, and B. Lok, "Effects of Travel Technique on Cognition in Vi rtual Environments," Proc. IEEE Virtual Reality 2004. [11] W.V. Meurs, M. Good, and S. Lampota ng, "Functional Anatomy of Full-Scale Patient Simulators," J. of Clinical Monitoring, vol. 13, pp. 317-324, 1997. [12] J.A. Veltman, "A Comparative Study of Psychophysiological Reactions During Simulator and Real Flight," Intl J. of Aviation Psychology, vol. 12, no. 1, pp. 33-48, 2002. [13] V. Jenkins and L. Fallowfield, "Can Co mmunication Skills Training Alter Physicians Beliefs and Behavior in Clinics?," J. of Clinical Oncology, vol. 20, no. 3, pp. 765-769, 2002. [14] Association of Ameri can Medical Colleges, "Contem porary Issues in Medicine: Communication in Medicine.," Washington, DC, 1999.

PAGE 139

[15] C. Vincent, M. Young, and A. Phillips, "W hy Do People Sue Doctors? A Study of Patients and Relatives Taking Legal Action," Obstetrical and Gynecological Survey, vol. 50, no. 2, pp. 103-105, 1995. [16] F.D. Duffy, G.H. Gordo n, G. Whelan, K. Cole-Kelly, and R. Frankel, "Assessing Competence in Communication and Interp ersonal Skills: The Kalamazoo II Report," Academic Medicine, vol. 79, no. 6, pp. 495-507, 2004. [17] M.J. Yedidia and M. Lipkin, "Effect of Communications Traini ng on Medical Student Performance," J. of the American Medical Association, vol. 290, pp. 1157-1165, 2003. [18] B. Shneiderman, Designing the User Interface: Strate gies for Effective Human-Computer Interaction Addison-Wesley Longman Publishing Co., Inc., 1992. [19] I.R. Hart, "The OSCE: Objective Yes, but How Useful," Newer Developments in Assessing Clinical Competence pp. 22-28, 1986. [20] A.M. Paisley, P.J. Baldwin, and S. Paters on-Brown, "Validity of Surgical Simulation for the Assessment of Operative Skill," British J. of Surgery, vol. 88, no. 11, pp. 1525-1532, 2001. [21] F.P. Brooks, "Whats Real About Virtual Reality," IEEE Computer Graphics and Applications, vol. 19, no. 6, pp. 16-27, 1999. [22] B. Reeves and C. Nass, The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places Cambridge University Press, 1996. [23] C. Nass and Y. Moon, "Machines and Mindl essness: Social Responses to Computers," J. of Social Issues, vol. 56, pp. 81-103, 2000. [24] L. Emering, R. Boulic, and D. Thalma nn, "Body Expression in Virtual Environments," Proc. ROMAN 1999. [25] B.O. Rothbaum, L. Hodges, and R. K ooper, "Virtual Reality Exposure Therapy," J. of Psychotherapy Practice and Research, vol. 6, no. 3, pp. 219-226, 1997. [26] M.M. North, S.M. North, and J.R. Coble, "Virtual Reality Therapy: An Effective Treatment for Psychological Disorders," in Virtual Reality in Neuro-Psycho-Physiology: Cognitive, Clinical and Methodological Issues in Assessm ent and Rehabilitation IOS Press, 1997. [27] B. Herbelin, "Virtual Reality Exposure Therapy for Social Phobia," PhD dissertation, cole PolyTechnique Federale de Lausanne, 2005. [28] L.F. Hodges, P. Anderson, G.C. Burdea, H.G. Hoffman, and B.O. Rothbaum, "Treating Psychological and Physical Disorders with Vr," IEEE Computer Graphics and Applications, vol. 21, no. 6, pp. 25-33, 2001.

PAGE 140

[29] N.E. Alessi and M.P. Huang, "Evolution of the Virtual Human: From Term to Potential Application in Psychiatry," Cyberpsychology & Behavior, vol. 3, pp. 321-326, 2000. [30] D. Pertaub, M. Slater, and C. Barker, "A n Experiment on Public Speaking Anxiety in Response to Three Different Types of Virtual Audience," Presence: Teleoperators & Virtual Environments, vol. 11, pp. 68-78, 2002. [31] P.S. Bordnick, K.M. Graap, H.L. Copp, J. Brooks, and M. Ferrer, "Virtual Reality Cue Reactivity Assessment in Cigarette Smokers," Cyberpsychology & Behavior, vol. 8, no. 5, pp. 487-492, 2005. [32] C.A. Zanbaka, A.C. Ulinski, P. Goolkasian, and L.F. Hodges, "Social Responses to Virtual Humans: Implications for Fu ture Interface Design," in ACM SIGCHI 2007. [33] J.N. Bailenson, J. Blascovich, A.C. Beall, and J.M. Loomis, "Interpersonal Distance in Immersive Virtual Environments," Personality and Social Psychology Bulletin, vol. 29, no. 7, pp. 1-15, 2003. [34] M. Slater, A. Antley, A. Davison, D. Swapp, C. Guger, C. Barker, N. Pistrang, and M.V. Sanchez-Vives, "A Virtual Reprise of th e Stanley Milgram Obedience Experiments," PLoS ONE, vol. 1, no. 1, p. e39, 2006. [35] A. Fukayama, T. Ohno, N. Mukawa, M. Sa waki, and N. Hagita, "Messages Embedded in Gaze of Interface Agents Impression Management with Agent's Gaze," Proc. ACM SIGCHI 2002. [36] M. Garau, M. Slater, S. Bee, and M.A. Sa sse, "The Impact of Eye Gaze on Communication Using Humanoid Avatars," Proc. ACM SIGCHI 2001. [37] M. Garau, M. Slater, V. Vinayagamoorthy, A. Brogni, A. Steed, and M.A. Sasse, "The Impact of Avatar Realism and Eye Gaze Control on Perceived Quality of Communication in a Shared Immersive Virtual Environment," Proc. ACM SIGCHI 2003. [38] R.C. Hubal, P.N. Kizakevich, C.I. Gui nn, K.D. Merino, and S.L. West, "The Virtual Standardized Patient: Simulated Patient-Practitioner Dialog for Patient Interview Training," Studies in Health Tec hnology and Informatics, vol. 70, pp. 133-138, 2000. [39] R.C. Hubal and R.S. Day, "Informed Consen t Procedures: An Experimental Test Using a Virtual Character in a Dialog Systems Training Application," J. of Biomedical Informatics 2006. [40] G. Frank, C. Guinn, and R. Hubal, "Jus t-Talk: An Application of Responsive Virtual Human Technology," Proc. The Interservice/Industry Training, Simulation & Education Conf. 2002. [41] R.C. Hubal, G.A. Frank, and C.I. Gui nn, "Avatalk Virtual Humans for Training with Computer Generated Forces," Proc. 9th Conf. on Computer Genera ted Forces. Institute for Simulation & Training pp. 16-18, 2000.

PAGE 141

[42] A. Leuski, J. Pair, D. Traum, P.J. McNerne y, P. Georgiou, and R. Patel, "How to Talk to a Hologram," Proc. 11th Intl Conf. on Intelligent user interfaces pp. 360-362, 2006. [43] J. Cassell, T.W. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H.H. Vilhjalmsson, and H. Yan, "Embodiment in Conversational Interfaces: Rea," Proc. ACM SIGCHI pp. 520-527, 1999. [44] J. Cassell, T. Stocky, T. Bickmore, Y. Gao, Y. Nakano, K. Ryokai, D. Tversky, C. Vaucelle, and H. Vilhjlmsson, "Mack: Medi a Lab Autonomous Conversational Kiosk," Proc. of Imagina, vol. 2, pp. 12-15, 2002. [45] N.O. Bernsen and L. Dybkj, "Evaluati on of Spoken Multimodal Conversation," in 6th Intl Conf. on Multimodal interfaces : ACM, 2004. [46] A. Tartaro and J. Cassell, "Authorable Virtual Peers for Autism Spectrum Disorders," Proc. Workshop on Language-Enabled Educational Technology at the 17th European Conf. on Artificial Intelligence 2006. [47] B.S. Bloom, Taxonomy of Educational Object ives, Handbook I: Cognitive Domain McKay, 1956. [48] D.R. Krathwohl, B.S. Bloom, and B.B. Masia, Taxonomy of Educational Objectives: The Classification of Educational G oals. Handbook Ii, Affective Domain Longman, 1964. [49] E.J. Simpson, The Classification of Educational Obj ectives in the Psychomotor Domain Gryphon House, 1972. [50] P.N. Kizakevich, M.L. McCartney, D.B. Ni ssman, K. Starko, and N.T. Smith, "Virtual Medical Trainer. Patient Assessment and Trauma Care Simulator," Stud Health Technol Inform, vol. 50, pp. 309-315, 1998. [51] J.B. McGee, J. Neill, L. Goldman, and E. Casey, "Using Multimedia Virtual Patients to Enhance the Clinical Curriculum for Medical Students," Proc. MEDINFO pp. 732-735, 1998. [52] R.W. Picard, Affective Computing MIT Press, 1997. [53] R.C. Hubal, D.H. Fishbein, and M.J. Pasc hall, "Lessons Learned Using Responsive Virtual Humans for Assessing Inte raction Skills," Proc. Interservice/Industry Training, Simulation and Education Conf. 2004. [54] R.A. Bergin and U.G.H. Fors, "Interactive Simulated Patient--an Advanced Tool for Student-Activated Learning in Medicine and Healthcare," Computers & Education, vol. 40, no. 4, pp. 361-376, 2003. [55] K. Johnsen, R. Dickerson, A. Raij, B. Lok, J. Jackson, M. Shin, J. Hernandez, A. Stevens, and D.S. Lind, "Experiences in Using Immers ive Virtual Characters to Educate Medical Communication Skills," Proc. IEEE Virtual Reality pp. 179-186, 324, 2005.

PAGE 142

[56] K. Johnsen, R. Dickerson, A. Raij, C. Harrison, B. Lok, A. Stevens, and D.S. Lind, "Evolving an Immersive Medical Communication Skills Trainer," Presence: Teleoperators & Virtual Environments, vol. 15, no. 1, pp. 33-46, 2006. [57] W.L. Johnson and J. Rickel, "Steve: An Animated Pedagogical Agent for Procedural Training in Virtual Environments," ACM SIGART Bulletin, vol. 8, no. 1-4, pp. 16-21, 1997. [58] F. Lang, R. McCord, L. Harvill, and D.S. Anderson, "Communication Assessment Using the Common Ground Instrument: Psychometric Properties," Family Medicine, vol. 36, no. 3, pp. 189-98, 2004. [59] K. McLaughlin, L. Gregor, A. Jones, and S. Coderre, "Can Standardized Patients Replace Physicians as OSCE Examiners?," BMC Medical Education, vol. 6, no. 1, p. 12, 2006. [60] A. Diethelm, "The Acute Abdomen," in Textbook of Surgery: Th e Biological Basis of Modern Surgical Practice W.B. Suanders Co., 1997. [61] M. Bearman, B. Cesnik, and M. Liddell, "Random Comparison of 'Vir tual Patient' Models in the Context of Teaching C linical Communication Skills," Medical Education, vol. 35, pp. 824-832, 2001. [62] J. Jackson, B. Lok, J. Kim, D. Xiao, L. Hodges, and M. Shin, "Straps: A Simple Method for Augmenting Primary Tracking Systems in Immersive Virtual Environments," University of North Carolina at Charlotte, 2004. [63] Nuance, "Dragon Naturally Speaking 8 Professional," www.nuance.com 2008 [64] K. Nickel and R. Stiefelhagen, "Visual Recognition of Pointing Gestures for Human-Robot Interaction," Image and Vision Computing, vol. 25, no. 12, pp. 1875-1884, 2007. [65] V.I. Levenshtein, "Binary Codes Capabl e of Correcting Deletions, Insertions and Reversals," Soviet Physics Doklady, vol. 10, p. 707, 1966. [66] G. Leech, P. Rayson, and A. Wilson, Word Frequencies in Written and Spoken English: Based on the British National Corpus Longman, 2001. [67] Haptek Inc., "Full Body Characters," www.haptek.com 2008 [68] C. Cruz-Neira, D.J. Sandin, and T. DeFa nti, "The Design and Implementation of the Cave," Proc. ACM SIGGRAPH pp. 135-142, 1993. [69] L.A. Wind, J.v. Dalen, A.M.M. Muijtjens, a nd J.-J. Rethans, "Assessing Simulated Patients in an Educational Setting: The Masp (Maastr icht Assessment of Simulated Patients)," Medical Education, vol. 38, no. 1, pp. 39-44, 2004. [70] D.H. Jonassen, "Scaffolding Diagnostic Reasoning in Case-Based Learning Environments," J. of Computing in Higher Education, vol. 8, no. 1, pp. 48-68, 1996.

PAGE 143

[71] M. Usoh, E. Catena, S. Arman, and M. Slat er, "Using Presence Questionnaires in Reality," Presence: Teleoperators & Virtual Environments, vol. 9, no. 5, pp. 497-503, 2000. [72] K. Johnsen, A. Raij, A. Stevens, D.S. Li nd, and B. Lok, "The Validity of a Virtual Human System for Interpersonal Skills Education," Proc. ACM SIGCHI 2007. [73] R.S. Park, J.T. Chibnall, R.J. Blaskiewi cz, G.E. Furman, J.K. Powell, and C.J. Mohr, "Contruct Validity of an Objec tive Structured Clinical Examin ation (OSCE) in Psychiatry: Associations with the Clinical Skill s Examination and Other Indicators," Academic Psychiatry, vol. 28, no. 2, pp. 122-128, 2004. [74] M. Mitchell and J. Jolley, Research Design Explained Harcourt, 2001. [75] C. Zanbaka, P. Goolkasian, and L. Hodges, "Can a Virtual Cat Persuade You?: The Role of Gender and Realism in Speaker Persuasiveness," Proc. ACM SIGCHI pp. 1153-1162, 2006. [76] A. Raij, K. Johnsen, R. Dickerson, B. Lok, M. Cohen, T. Bernard, C. Oxendine, P. Wagner, and D.S. Lind, "Interpersonal S cenarios: Virtual A pprox Real?," Proc. IEEE Virtual Reality pp. 80-88, 378, 2006. [77] K. Johnsen and B. Lok, "An Evaluation of Immersive Displays for Virtual Human Experiences," Proc. IEEE Virtual Reality 2008. [78] K. Johnsen, D. Beck, A. Deladisma, D.S. Lind, and B. Lok, "The Impact of Visual Display Systems on Copresence in Virtual Human Experiences," Presence: Teleoperators and Virtual Environments submitted for publication. [79] S. Babu, E. Suma, T. Barnes, and L.F. Hodges, "Can Immersive Virtual Humans Teach Social Conversationa l Protocols," Proc. IEEE Virtual Reality pp. 10-14, 2007. [80] R. Pausch, D. Proffitt, and G. Williams, "Q uantifying Immersion in Virtual Reality," Proc. ACM SIGGRAPH pp. 13-18, 1997. [81] R. Pausch, M.A. Shackelford, and D. Pr offitt, "A User Study Comparing Head-Mounted and Stationary Displays," Proc. IEEE Symposium on Virtual Reality pp. 41-45, 1993. [82] D. Bowman, A. Datey, Y. Ryu, U. Farooq, and O. Vasnaik, "Empirical Comparison of Human Behavior and Performance with Different Display Devices for Virtual Environments," Proc. Human Factors and Ergonomics Society pp. 2134-2138, 2002. [83] J.E. Swan, Gabbard, II, J.L. Hix, D. Schul man, and R. Kim, "A Comparative Study of User Performance in a Map-Based Virtual Environment," IEEE Virtual Reality pp. 259-266, 2003. [84] E.W. Mehrabian, Silent Messages Wadsworth, 1971.

PAGE 144

[85] J.N. Bailenson, K. Swinth, C. Hoyt, S. Persky, A. Dimov, and J. Blascovich, "The Independent and Interactive Effects of Embodi ed-Agent Appearance a nd Behavior on SelfReport, Cognitive, and Behavioral Marker s of Copresence in Immersive Virtual Environments," Presence: Teleoperators & Virtual Environments, vol. 14, no. 4, pp. 379393, 2005. [86] R. Dickerson, K. Johnsen, A. Raij, B. Lok, T. Bernard, A. Stevens, and D.S. Lind, "Virtual Patients: Assessment of Synthesized Versus Recorded Speech," Proc. Medicine Meets Virtual Reality pp. 114-119, 2005. [87] V. Vinayagamoorthy, A. Brogni, A. Steed, a nd M. Slater, "The Role of Posture in the Communication of Affect in an Immersive Virtual Environment," Proc. ACM VRCA 2006. [88] B.G. Witmer and M.J. Singer, "Measuri ng Presence in Virtual Environments: A Presence Questionnaire," Presence: Teleoperators & Virtual Environments, vol. 7, no. 3, pp. 225240, 1998. [89] J.J.-W. Lin, H.B.L. Duh, H. Abi-Rached, D.E. Parker, and T.A.F. Iii, "Effects of Field of View on Presence, Enjoyment, Memory, and Simulator Sickness in a Virtual Environment," Proc. IEEE Virtual Reality, 2002. [90] C. Hendrix and W. Barfield, "Presence w ithin Virtual Environmen ts as a Function of Visual Display Parameters," Presence: Teleoperators & Virtual Environments, vol. 5, no. 3, pp. 274-289, 1996. [91] R.M. Banos, C. Botella, M. Alcaniz, V. Liano, B. Guerrero, and B. Rey, "Immersion and Emotion: Their Impact on the Sense of Presence," Cyberpsychology & Behavior, vol. 7, no. 6, pp. 734-741, 2004. [92] M. Lombard, "Direct Responses to People on the Screen: Televisi on and Personal Space," Communication Research, vol. 22, no. 3, pp. 288-324, 1995. [93] V. Langendyk, "Not Knowing That They Do Not Know: Self-Assessment Accuracy of Third-Year Medical Students," Medical Education, vol. 40, no. 2, pp. 173-179, 2006. [94] K.W. Eva, J.P.W. Cunnington, H.I. Reiter, D.R. Keane, and G.R. Norman, "How Can I Know What I Don't Know? Poor Self A ssessment in a Well-Defined Domain," Advances in Health Sciences Education, vol. 9, no. 3, pp. 211-224, 2004. [95] N. Mattheos, A. Nattestad, E. Falk-N ilsson, and R. Attstrom, "The Interactive Examination: Assessing Students' Self-Assessment Ability," Medical Education, vol. 38, no. 4, pp. 378-389, 2004. [96] G.G. Robertson, S.K. Card, and J.D. Mackinlay, "Three Views of Virtual Reality: Nonimmersive Virtual Reality," Computer, vol. 26, no. 2, pp. 81-83, 1993.

PAGE 145

[97] American Pharmacists Association and National Association of Chain Drug Stores Foundation, "Medication Therapy Management in Pharmacy Practice: Core Elements of an Mtm Service Model, Version 2.0," in American Pharmaceutical Association Washington, DC, 2008. [98] T.L. Schwinghammer, Pharmacotherapy Casebook: A Patient-Focused Approach McGraw-Hill Medical, 2005. [99] C. Bartneck, C. Rosalia, R. Menges, and I. Decker, "Robot Abuse a Limitation of the Media Equation," Proc. Interact 2005 Workshop on Abuse 2005. [100] A. Kotranza and B. Lok, "Virtual Huma n + Tangible Interface = Mixed Reality Human. An Initial Exploration with a Vi rtual Breast Exam Patient," Proc. IEEE Virtual Reality 2008.

PAGE 146

BIOGRAPHICAL SKETCH Kyle Johnsen was born in Port Jefferson, Ne w York, in1981, to John and Teresa Johnsen. In 1999 he received a full scholarship to attend th e University of Florida. Following in the footsteps of his late father, a civil engineer Kyle pursued a career in engineering. Upon graduating with honors from the undergraduate co mputer engineering program, Kyle accepted a four-year fellowship to pursue a Ph.D. at the Univ ersity of Florida. As Dr. Benjamin Loks first PhD student, Kyle assisted in developing a novel vi rtual reality research program to study virtual humans. In his first year, Kyle led of a devel opment team of masters a nd undergraduate students to build the first virtual human system for interpersonal skills education. This work led to significant international recogni tion for Kyle and his research team in the international community. In five years as a PhD student at th e University of Florida, Kyle has authored over 16 journal and conference articles in computer science and medicine. He has been awarded for his publication record with undergraduates by the Howard Hughes Medical Institute, and served as a panelist at IEEE Virtual Reality 2007. Kyle is continuing his research as an assistant professor at the University of Georgia.